Modelling the habitat suitability of cetaceans: Example of the sperm whale in the northwestern Mediterranean Sea

Modelling the habitat suitability of cetaceans: Example of the sperm whale in the northwestern Mediterranean Sea
of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  Modelling the habitat suitability of cetaceans: Example of the spermwhale in the northwestern Mediterranean Sea Emilie Praca a,b,  , Alexandre Gannier c , Krishna Das b , Sophie Laran a a Centre de Recherche sur les Ce´tace´s — Marineland, 306 avenue Mozart, 06600 Antibes, France b MARE Center  — Laboratory for Oceanology, University of Lie` ge, Sart Tilman, Baˆtiment B6c, 4000 Lie` ge, Belgium c Groupe de Recherche sur les Ce´tace´s, BP715, 06633 Antibes cedex, France a r t i c l e i n f o  Article history: Received 7 April 2008Received in revised form1 November 2008Accepted 6 November 2008Available online 13 November 2008 Keywords: Habitat modellingEcological niche factor analysisPrincipal component analysisGeneralized linear modelMultivariate adaptive regression splinesSperm whaleNorthwestern Mediterranean Sea a b s t r a c t Cetaceans are mobile and spend long periods underwater. Because of this, modellingtheir habitat could be subject to a serious problem of false absence. Furthermore,extensive surveys at sea are time and moneyconsuming, and presence–absence data aredifficult to apply. This study compares the ability of two presence–absence and twopresence-only habitat modelling methods and uses the example of the sperm whale( Physeter macrocephalus ) in the northwestern Mediterranean Sea. The data consist of summer visual and acoustical detections of sperm whales, compiled between 1998 and2005. Habitat maps were computed using topographical and hydrological eco-geographical variables. Four methods were compared: principal component analysis(PCA), ecological niche factor analysis (ENFA), generalized linear model (GLM) andmultivariate adaptive regression splines (MARS). The evaluation of the models wasachieved by calculating the receiver operating characteristic (ROC) of the models andtheir respective area under the curve (AUC). Presence–absence methods (GLM,AUC  ¼  0.70, and MARS, AUC  ¼  0.79) presented better AUC than presence-only methods(PCA, AUC  ¼  0.58, and ENFA, AUC  ¼  0.66), but this difference was not statisticallysignificant, except between the MARS and the PCA models. The four models showed aninfluence of both topographical and hydrological factors, but the resulting habitatsuitability maps differed. The core habitat on the continental slope was well highlightedby the four models, while GLM and MARS maps also showed a suitable habitat in theoffshore waters. Presence–absence methods are therefore recommended for modellingthe habitat suitability of cetaceans, as they seem more accurate to highlight complexhabitat. However, the use of presence-only techniques, in particular ENFA, could be veryuseful for a first model of the habitat range or when important surveys at sea are notpossible. &  2008 Elsevier Ltd. All rights reserved. 1. Introduction Habitat modelling increases the knowledge about thespatial distribution of a species and its relationship withenvironmental variables. Such information is of greatinterest for theoretical studies on ecological niches orfor practical purposes such as defining and managingprotected areas. Habitat modelling can, moreover, be usedto predict the impact of climate changes on species spatialdistribution. In recent years, such conservation andmanagement considerations have gained in ecologicalimportance. At the same time, computational capabilitieshave considerably improved, leading to an increase in thenumber of habitat modelling techniques, using various Contents lists available at ScienceDirectjournal homepage: www.elsevier.com/locate/dsri Deep-Sea Research I ARTICLE IN PRESS 0967-0637/$-see front matter  &  2008 Elsevier Ltd. All rights reserved.doi:10.1016/j.dsr.2008.11.001  Corresponding author at: Centre de Recherche sur lesCe´tace´s — Marineland, 306 avenue Mozart, 06600 Antibes, France.Tel.: +33493335577; fax: +33493337691. E-mail address:  emilie.praca@gmail.com (E. Praca).Deep-Sea Research I 56 (2009) 648–657  statistical methods such as multiple regression or multi-factorial analyses.The most used habitat modelling techniques (suchas generalized linear models, GLMs) are based onpresence–absence data (Guisan and Zimmermann, 2001;Redfern et al., 2006). ‘True’ absence data (when animalsare actually absent) are not easy to collect for mobile orinconspicuous species. For example, Kelly (2000) cited inHirzel et al. (2002) estimated that 34 visits to a site areneeded to confirm the absence of a snake ( Coronellaaustriaca ). ‘False’ absence data, when animals are presentbut not detected, can significantly bias the analysis. Asseveral cetacean species are able to spend long periodsunderwater and are very discreet at the surface, modellingtheir habitat with presence–absence methods may besubject to such biases, if absence data are not carefullyconsidered. Moreover, collection of cetacean distributiondata requires long and expensive surveys at sea.This shortcoming can be avoided using presence-onlymethods such as principal component analysis (PCA) orecological niche factor analysis (ENFA) (Hirzel et al.,2002). Because of the use of presence-only data,such methods tend to overestimate the area of suitablehabitat. Indeed, presence-only methods seem to predictthe potential distribution (fundamental niche), whereaspresence–absence methods could reflect the presentdistribution (realized niche) of the species (Brotonset al., 2004; Zaniewski et al., 2002). Even though presence-only methods have limitations, they could bevery useful for a first approach of habitat modelling forcetaceans.The sperm whale ( Physeter macrocephalus ) is one of theeight common cetacean species inhabiting the north-western Mediterranean Sea (NWMS, Fig.1) (Duguy,1991). In this area, sperm whales are exposed to anthropogenicdisturbances such as noise and ship collisions (withferries or high-speed boats), net entanglement andpollution (Aguilar et al., 2002; Di Natale and Notarbartolo di Sciara, 1994; Notarbartolo di Sciara and Gordon, 1997). With the creation of a marine protected area, theInternational Sanctuary for Marine Mammals, it wasinteresting to model the critical habitat of the spermwhale within the framework of management and con-servation.In the NWMS, the average patterns of the sperm whaletypical deep dive are 45min for the underwater feedingperiod and 9min for the surface resting period (Aguilaret al., 2002; Drouot et al., 2004). Because the whales feed throughout the day/night cycle (Drouot et al., 2004;Watwood et al., 2006) and spend around 15% of the timeat the surface, the use of only visual detection does notwell represent their spatial distribution. We compensatefor this lack by the use of passive acoustic detection alongthe survey track. Indeed, the sperm whale emits regularclicks during its feeding dives. These clicks are produced80% of the time of the dive, allowing detection of individuals from several kilometres away when theyare underwater (Watwood et al., 2006) and the use of both presence-only and presence–absence modellingmethods. We therefore compared four methods to modelthe habitat suitability (HS) of the sperm whale: two well-established methods, PCA and GLM, and two more recentmethods, ENFA and multivariate adaptive regressionsplines (MARS). We will discuss the statistical accuracyand the ecological meaning of the resulting models, inorder to show the advantages and disadvantages of eachtechnique for the habitat modelling of cetaceans. 2. Material and methods  2.1. Sampling surveys From 1998 to 2005, summer surveys were conductedon a motor-sailing boat at a speed of 6knots. In addition,during summer 2001, a motor boat was used for surveysat a speed of 11–12knots (Fig. 1a) (Gannier, 2006). The survey track was designed as random zigzags fromthe upper slope to nearby pelagic waters, and crossingsfrom France mainland to Corsica or to Balearic Islands ARTICLE IN PRESS Fig.1.  Survey tracks (greyline) realized from 1998 to 2005 in the northwestern Mediterranean Sea and location of the observation sequences (black dots) of sperm whales ( Physeter macrocephalus ) (a); presence cells (black) and absence cells with a minimum of 5km of survey effort (grey) of the sperm whale (b). E. Praca et al. / Deep-Sea Research I 56 (2009) 648–657   649  were performed when good meteorological conditionsoccurred during several days.The protocolcombined visual searching and systematicpassive acoustic listening station (see Gannier et al.,2002). In brief, the visual survey was conducted by threeexperienced observers, scanning continuously with thenaked eye the frontal sector (  90 1  to +90 1 ). The passiveacoustic surveyalong the cruise track consisted of 1min of listening station every 2nm (3.7km) and used a dual-channel towed hydrophone (Magrec Ltd., Lifton, UK).Sperm whales were recognized by their typical signalcomposed of regular clicks (Teloni, 2005). The followingparameters were recorded at each listening station orsighting location: sea state, position of the boat and of theanimals (if a sightingoccurred), visual conditions (index  V  ,varying between 0 and 6 and depending on the windspeed in Beaufort, the sky cover and the sea state)(Gannier,1997; Gannier et al., 2002), background acoustic noise (index  U  , varying from 1 to 5) and the bio-acousticsignal level (index SL, varying between 0 and 5) (Gordonet al., 1998, 2000).  V  ,  U   and SL were estimated byexperienced observers. The significance of the data wasimproved by removing observations with bad visual oracoustical conditions, i.e. when  V  o 4,  U  4 3 or SL  o 2.The data were merged into observation sequences, inArcGIS 8.3 (ESRI Inc., Redlands, USA), in order to minimizeautocorrelation in the analysis. All successive acoustical orvisual observations obtained with less than a 1-h time-lag(approximately 6nm) were considered to be part of thesame group (Gordon et al., 2000). One geographic positionfor each observation sequence was chosen either as thelocation of one central visual sighting or of the acousticdetection with the best SL.  2.2. Data treatment  A 9  9km grid cell of the study area was created, inwhich both observation sequences and eco-geographicalvariables (EGVs) were implemented to construct presence–absence and EGVs grid cells. This cell size was chosenin order to use chlorophyll concentrations that were notavailable at higher resolution.Presence cells were defined as cells where one orseveral observation sequences were located and had thevalue of 1. A homogenous searching effort was not feasiblein our large study area, because the survey of offshoreareas requires extended periods of good weather. Aweightfor presence cells was then used to balance thisdiscrepancy. It was computed for each presence cell asthe corresponding total number of observation sequencesobtained divided by the total number of kilometres of searching effort obtained in this cell. This weight was usedin the statistical softwares (see below) as an observationmultiplier, for example, a cell with a weight of 3 will beconsidered three times. Absence cells were defined ascells on the survey track where no detections wereobtained and had the value of 0. Absence data weremaximized by selecting only absence cells with a mini-mum of 5km of searching effort. The presence–absencedata set was randomly split into calibration and validationdata sets, representing 70% and 30% of the data set,respectively. GLM and MARS were performed with thetotal calibration data set, while only the presence cells of this data set were used for PCA and ENFA.As information on the spatial distribution of spermwhale preys in the Mediterranean are scarce, hydrologicaland chlorophyll concentration data were used as proxies.However, sperm whales are not directly influenced bychlorophyll concentrations, as a gap occurs in trophicwebs between primary production and cephalopods( Jaquet, 1996). The sperm whale summer distributionmay then be influenced by the primary productionsituation during the phytoplankton bloom. Therefore, wemodelled the summer distribution of sperm whale usingdata from both summer and phytoplankton bloomperiods.EGVs were variables used in previous cetacean habitatmodelling studies (e.g. Gregr and Trites, 2001; Hamazaki, 2002), related to topography, temperature, salinity andprimary production. Monthly resolution was used for thehydrological and biological EGVs, in order to computeseasonal situations for the two following periods: thesummer (June–August) and the phytoplankton bloomperiod (February–April). These seasonal maps were thenaveraged over all survey years, resulting in two seasonalmaps for each EGV.This use of multi-year average situations, instead of daily or weekly data, was needed because we compiledsperm whale presence data interannually in order to havesufficient data. This was also required by the ENFA, as thismethod compares the mean available habitat in the studyarea and the species habitat (see below), preventing theuse of close to real-time data. The same multi-yearaverages were then used in all models in order to performthe comparison of methods.Depth, slope and the distance to the 200-m contour,which has been shown to be more relevant than thecoastline for teuthophageous species (Mangion andGannier, 2002), were obtained from the GEBCO DigitalAtlas (IOC-IHO-BODC, 2003). Depth and slope were log-transformed in order to reduce their high variation range.Sea surface temperature (SST) data were downloaded,depending on their availability, from the Pathfinder sensor(PO.DAAC) for 1998–2002 and from the Modis sensor(OceanColor) for 2002–2005. The front detection mapswere computed in Idrisi Andes (Clark Labs, Clark Uni-versity, Worcester, USA) applying a Sobel filter on the SST.This filter highlights horizontal and vertical gradients andreplaces the value of a central cell, in a matrix of 3  3cells, by the magnitude of the gradient (here in  1 C), usingthe following coefficients:  x  ¼ 1 0 1  2 0 2  1 0 1 264375  c  1  c  4  c  7 c  2  c  5  c  8 c  3  c  6  c  9 264375 (1)  y  ¼ 1 2 10 0 0  1   2   1 264375  c  1  c  4  c  7 c  2  c  5  c  8 c  3  c  6  c  9 264375 (2) ARTICLE IN PRESS E. Praca et al. / Deep-Sea Research I 56 (2009) 648–657  650  The new value of the central cell  c  5  is computed as c  5  ¼  ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  ð  x 2 þ  y 2 Þ q   (3)Chlorophyll concentrations were obtained from theSeaWifs sensor website (OceanColor) for 1998–2005.Salinity data were obtained from the MEDAR/MEDATLASII database (MODB). They were only available from 1998until 2002, but were considered to be representative of average conditions of the study period.In the Gulf of Lions, the Rhoˆne river exports highquantities of nutrients and particles (Conan et al., 1998),which increase the turbidity. This phenomenon leads toan overestimation of chlorophyll concentrations in satel-lite data ( 4 0.8mgm  3 even in summer) and the Rhoˆnepanache can be classified as turbid case 2 water (Antoineet al., 1996). Consequently, the area influenced by thepanache of the Rhoˆne was removed from our analysis.  2.3. Modelling methods PCA and ENFA are both presence-only multifactorialanalyses, transforming the set of EGVs in the samenumber of non-correlated factorial axes (Hirzel et al.,2002; Legendre and Legendre, 1998). In PCA, the eigenva- lues of the factorial axes are computed with thevariance–covariance matrix of the EGV matrix (Hirzelet al., 2002; Legendre and Legendre, 1998), while ENFA introduces ecological significance in the computation of factorial axes (Hirzel et al., 2002). For this method,marginality (how much a species’ habitat differs fromthe mean available conditions) is represented in the firstfactorial axis, and specialization (breadth of the habitat) ismaximized in the subsequent axes. For both methods, thenumber of relevant axes was chosen using Mac Arthur’sbroken-stick method (Hirzel et al., 2006). Finally, HS mapswere built with the median algorithm, which comparesthe position of each cell of the study area to thedistribution of presence cells on the different factorialaxes. A cell adjacent to the median of an axis would score1, and a cell outside of the species distribution wouldscore 0. All ENFA and PCA analyses were conducted usingBiomapper 3.2 (Hirzel et al., 2006).GLM and MARS are presence–absence methods(Friedman, 1991; McCullagh and Nelder, 1989), for which a logistic regression was used to relate the binaryresponse variable (presence or absence of sperm whales)with the continuous EGVs:log  it  ½ P  ð Y   ¼  1 Þ ¼  P  ð Y   ¼  1 Þ 1   P  ð Y   ¼  1 Þ ¼  b 0  þ X b i  x i  (4) P  ð Y   ¼  1 Þ ¼  e b 0 þ P b i  x i 1 þ e b 0 þ P b i  x i (5)where  P  ( Y   ¼  1) is the probability of presence varyingbetween 0 and 1,  x i  is an EGV and  b 0  is the intercept. InGLMs,  b i  is a scalar coefficient, leading to a linearrelationship between the response variable and the EGVs(McCullagh and Nelder,1989). An exhaustive estimation of the GLM models was computed, and the model with thelowest Akaike information criterion (AIC) was chosen asthe more parsimonious (Tabachnick, 2000). In MARS,  b i  x i is replaced by a piecewise basis function, composed of several linear segments with different slopes and breakingknots (Friedman, 1991). Basic functions are defined inpairs: bf  i  ¼  a i  max ð 0 ; t  i    x i Þ  (6) bf  i þ 1  ¼  a i max ð 0 ;  x i    t  i Þ  (7)where  bf  i  and  bf  i +1  are basic functions,  a i  is the slope of thelinear segment,  t  i  is the breaking knot and  x i  is an EGV. Thevalues of   bf  i  (Eq. (6)) will be  a i t  i  when  x i  is 0, declining to 0as  x i  approaches  t  i  and remaining at 0 when  x i  is superiorto  t  i . In contrast,  bf  i +1  (Eq. (7)) takes the values of   a i (  x i  t  i )when  x i  is greater than  t  i  and takes the value of 0otherwise. More than 1knot (i.e.  bf   pair) can be defined foreach EGV, allowing the development of complex non-linear relationships. The whole of the basic functionsinitially over-fits the data. The model is then simplifiedusing a backward/forward stepwise cross-validation inorder to identify the significant functions. The probabilityof presence equations of GLM and MARS models werecomputed in Statistica 8.0 (Statsoft Inc., Tulsa, USA) andimported in Idrisi Andes to compute HS maps of GLM andMARS models with the relevant EGVs.  2.4. Model validation and comparison The statistical accuracies of the model predictionswere evaluated bycomparing the probabilities of presencehighlighted by HS maps and the validation data set. Amethod of validation commonly used is the confusionmatrix, which cross-tabulates the observed and predictedpresence and absence patterns (Fielding and Bell,1997). Itcomputes sensitivity as the fraction of presence cells wellpredicted as presence and specificity as the fraction of absence cells well predicted as absence. However, thismethod depends on a threshold between presence andabsence, generally fixed to 0.5, which could introduce biasif this threshold is not optimal (Boyce et al., 2002). Analternative is the receiver operating characteristic (ROC)curves and their corresponding area under the curve(AUC) (Beck and Schultz,1986). This method evaluates theproportion of correctly and incorrectly classified predic-tions over a continuous range of thresholds (Beck andSchultz, 1986; Boyce et al., 2002). ROC curves were obtained with Analyse-it (Analyse-it Software Ltd., Leeds,UK) for Microsoft Excel (Microsoft Corporation, Redmond,USA), plotting sensitivity vs. 1  specificity pairs for eachpresence–absence threshold. A perfect model has anAUC of 1 and a random model an AUC of 0.5. The closerthe AUC is to 1, the better is the fit of the model. AUCsof the different models were compared to the AUC of arandom model with a  Z  -test (Boyce et al., 2002; DeLong et al., 1988). Finally, the point minimizing sensitivity–specificity, i.e. where the number of wrongly predictedcells as absence and presence is minimal, was chosen asthe threshold between absence and presence. ARTICLE IN PRESS E. Praca et al. / Deep-Sea Research I 56 (2009) 648–657   651  3. Results The data set was composed of 14,259km of effortcovered during the survey period (1998–2005), with 187observation sequences, transformed in a grid cell with 135cells of presence and 1025 cells of absence. The selectionof the absence cells with a minimum of 5km of effortresulted in 180 absence cells (Fig. 1b). Furthermore,considering the high correlation between salinity andSST of both seasons ( r  2 4 0.75), models were tested onlywith the salinity or the SST of the summer period and themodel with the best statistical validation was kept.  3.1. Principal component analysis The Mac Arthur’s broken-stick method retained thefirst three factorial axes as relevant to the PCA model. Ithad an AUC of 0.58, not significantly different from therandom model (  Z  -test,  Z   ¼  1.23,  p  ¼  0.11, Fig. 2a). Thethreshold between predicted presence and absence was0.47, with a sensitivity of 59% and a specificity of 54.5%.The first factorial axis seemed to highlight theinfluence of topography on the habitat of the spermwhale (Table 1). It indicated the importance of depth anddistance to the 200-m contour (coefficients of 0.82 and0.91, respectively) and low slope (  0.73). In contrast, thesecond axis highlighted the influence of biological EGVswith important chlorophyll concentrations in summer(0.91) and during the phytoplankton bloom period (0.79).The third axis highlighted the influence of thermal frontsin both periods (0.80 for the summer period and 0.52 forthe phytoplankton bloom period) (Table 1). The HS map of the PCA model (Fig. 4a) showed a core habitat on thecontinental slope in the Ligurian Sea and near the BalearicIslands, but also on the continental shelf close to Sardinia,and offshore between Corsica and the Spanish coast.  3.2. Ecological niche factor analysis For the ENFA, Mac Arthur’s broken-stick methodretained the first five factorial axes. This model had anAUC of 0.66, significantly different from the randommodel (  Z  -test,  Z   ¼  2.33,  p  ¼  0.01, Fig. 2a). The presen-ce–absence threshold was of 0.45, with a sensitivity of 64.1% and a specificity of 63.6%.The marginality factorial axis indicated a strongrelationship for cells with steep slope (coefficient of 0.61) (Table 2). The first specialization axis highlightedthe restriction of the species to the lower SST, lowerdistance to the 200-m contour (0.54) and lower chlor-ophyll concentrations for the phytoplankton bloom period(0.57). The last three axes showed the restriction of thespecies to waters with the lower frequencies of SST fronts(0.57), the higher chlorophyll concentrations in summer(0.56) and the steeper slopes (0.56). This method did nothighlight the water depth as a significant variable(Table 2). The HS map of this method revealed a corehabitat on the continental slope in almost the whole studyarea: near the Sardinian coast, near the French coast in the ARTICLE IN PRESS Fig. 2.  Receiver operating characteristic curves of the four modelling methods used: PCA and ENFA (a) GLM and MARS (b).  Table 1 Relevant axes (with their eigenvalues) and the EGV coefficients of theprincipal component analysis model.EGVs Axis 1 (0.31) Axis 2 (0.23) Axis 3 (0.15)Depth (log) 0.82   0.16 0.08Chlorophyllconcentrations summer 0.08 0.91 0.14Chlorophyll concentration bloom  0.50 0.79 0.09Distance to the 200-m contour 0.91   0.23 0.12Salinity summer  0.37 0.29   0.51Slope (log)   0.73 0.42 0.02Thermal front detection bloom   0.15   0.17 0.80Thermal front detection summer  0.18 0.24 0.52Summer: summer period, bloom: phytoplankton bloom period. E. Praca et al. / Deep-Sea Research I 56 (2009) 648–657  652
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks