• Non ci sono risultati.

Identification of areas suitable for the sable antelope (Hippotragus niger niger) in South Africa with the use of a Species Distribution Model

N/A
N/A
Protected

Academic year: 2021

Condividi "Identification of areas suitable for the sable antelope (Hippotragus niger niger) in South Africa with the use of a Species Distribution Model"

Copied!
122
0
0

Testo completo

(1)

UNIVERSITÀ DI PISA

Dipartimento di Biologia

Laurea Magistrale in Conservazione ed Evoluzione

Identification of areas suitable for the sable antelope

(Hippotragus niger niger) in South Africa

with the use of a Species Distribution Model

Relatori

Candidata

Prof.ssa Francesca Parrini

Lucia Cenni

Prof. Filippo Barbanera

(2)

1

To my Mother

(3)

2

INDEX

INDEX ... 2

INDEX OF TABLES AND FIGURES ... 4

Tables ... 4 Figures ... 4 INDEX OF ABBREVIATIONS ... 7 ABSTRACT ... 8 INTRODUCTION ... 9 Study species ... 9 Taxonomy ... 9 Geographical distribution ...10

Conservation status and population ...11

Physical description ...14

Home range extent ...16

Habitat and resource use ...16

Behavioural traits ...19

Species Distribution Modelling ...20

General background ...20

Theoretical background ...21

Model Design ...25

Model Evaluation ...34

Applications of SDMs ...40

General aim of the study ...43

The need for the study ...43

(4)

3

Specific questions related to objectives ...46

MATERIALS AND METHODS ...48

Study Area ...48

Data collection ...51

Sable antelope locations ...51

Environmental data ...52

MAXENT data analysis ...55

RESULTS ...60 Objective 1 ...60 Objective 2 ...71 Objective 3 ...82 DISCUSSION ...85 Model performance ...85 Habitat suitability ...86

Environmental variables contribution ...88

Drawbacks of the study and recommendations ...92

Conclusions ...94

ACKNOLEDGEMENTS ...96

CITED LITERATURE ...97

VISITED WEBSITES ...116

(5)

4

INDEX OF TABLES AND FIGURES

Tables

Introduction

Table 1 p.9

Table 2 p.36

Materials and methods

Table 1 p.53 Table 2 p.54 Table 3 p.56 Table 4 p.58

Results

Table 1 p.62 Table 2 p.73

Figures

Introduction

Figure 1 p.11 Figure 2 p.15 Figure 3 p.24 Figure 4 p.39

Materials and methods

Figure 1 p.50

(6)

5

Results

Figure 1 p.60 Figure 2 p.61 Figure 3 p.63 Figure 4 p.63 Figure 5 p.64 Figure 6 p.65 Figure 7 p.65 Figure 8 p.66 Figure 9 p.66 Figure 10 p.67 Figure11 p.67 Figure 12 p.68 Figure 13 p.68 Figure 14 p.68 Figure 15 p.69 Figure 16 p.69 Figure 17 p.70 Figure 18 p.72 Figure 19 p.74 Figure 20 p.75 Figure 21 p.75 Figure 22 p.76 Figure 23 p.76 Figure 24 p.77 Figure 25 p.77 Figure 26 p.78 Figure 27 p.78 Figure 28 p.79 Figure 29 p.79 Figure 30 p.79

(7)

6 Figure 31 p.80 Figure 32 p.80 Figure 33 p.80 Figure 34 p.81 Figure 35 p.82 Figure 36 p.83

(8)

7

INDEX OF ABBREVIATIONS

ANN – Artificial Neural Network AUC – Area Under the ROC Curve BRT – Boosted Regression Tree

CITES - Convention on International Trade in Endangered Species of Wild Fauna and Flora ESRI – Environmental Systems Research Institute

EWT – Endangered Wildlife Trust GAM – Generalized Additive Model

GARP – Genetic Algorithm for Ruleset Prediction GCS – Geographic Coordinate System

GCM – Global Climate Model

GIS – Geographic Information System GLM – Generalized Linear Model

IUCN - International Union for the Conservation of Nature and Natural Resources MAXENT – Maximum Entropy

MESS – Multidimensional Environmental Similarity Surface RCP – Representative Concentration Pathway

ROC – Receiver Operating Characteristic RSA – Republic of South Africa

SA – South Africa

SDM – Species Distribution Model SOTER – Soil and Terrain Database

SPECIES – Spatial Estimator of Climate Impacts on the Envelope of Species WGS – World Geodetic Survey

(9)

8

ABSTRACT

Multiple range of factors are threatening the survival of many mammals in southern Africa, hence predicting patterns of spatial distribution of a given species has gained an ever-increasing attention. However, for conservation management it is essential to determine which environmental factors influence both presence and persistence of the species under investigation. According to this, in this thesis we applied Species Distribution Models (SDMs) technique, in particular a Maximum Entropy (MAXENT) modelling technique, to build models capable to identify the potential distribution range of the sable antelope, Hippotragus niger niger (Harris, 1838), in South Africa as well as to infer the most important environmental predictors driving such distribution. Sable antelope occur in south-eastern Africa, including Zimbabwe, Botswana, Mozambique, Malawi and South Africa. Despite being listed as Least Concern by the International Union for the Conservation of Nature and Natural Resources (IUCN), the species is experiencing a rapid demographic decline in South Africa, the southernmost edge of its distribution range. Biological locations of the sable antelope, corresponding to the latitude-longitude coordinates of parks and reserves where the species does occur in South Africa, and environmental variables influencing their ecology were selected and included in the model. Both bioclimatic (e.g. mean annual temperature and annual precipitation) and non-climatic (e.g. vegetation group, dominant chemistry of the soil and land cover) environmental variables were considered. With reference to the present-time, models relied on occurrence records referring to either the sable antelope historical range only or South Africa as a whole. As far as the future is concerned, four different climate change scenarios were taken into account: 2050 – Representative Concentration Pathway (RCP) 2.6; 2050 – RCP 8.5; 2070 – RCP 2.6 and 2070 – RCP 8.5. For all models, we disclosed an excellent predictive ability of the algorithm to identify suitable conditions for the species, and, as such, to identify the areas with a significant probability of sable antelope occurrence. Vegetation Group, National Land Cover, Mean Annual Temperature, Minimum Temperature of the Coldest Month and Precipitation Seasonality turned out to be key predictors in the elaboration of the suitability maps for the sable antelope in South Africa. The sable antelope distribution maps provided in this study for current and future times indicated that the north-eastern edge of the country hosted the land with the highest level of suitability for the species, which is entirely comprised within its historical range (e.g. Limpopo, Mpumalanga, North-West and Gauteng provinces). Nevertheless, a potential expansion of the habitat of the sable towards both central and western areas of South Africa (e.g. Free State, Eastern Cape, Northern Cape and Lesotho) was suggested by the suitability maps elaborated for the future, thus indicating that Hippotragus niger niger could gain appropriate habitat following the forthcoming climatic changes. A more specific survey targeting the ecological requirements for the survival of the sable antelope is recommended to confirm suitability of the above-mentioned areas and to support long-term conservation plans for the species. In conclusion, this study appears as a paradigmatic one with reference to the usefulness of modelling distribution range to draw up conservation plans for declining antelope species in southern Africa.

(10)

9

INTRODUCTION

Study species

Taxonomy

Sable antelope (Hippotragus niger) are large ruminant grazers of the Bovidae family (Table 1). Four subspecies are recognized based on morphology: Hippotragus niger niger, which is the focal taxon of this study, Hippotragus niger kirkii, Hippotragus niger roosevelti and the isolated giant sable,

Hippotragus niger variani, from Angola (IUCN 2008). Hippotragus niger niger has been first

described by Harris in 1838 in the Magaliesberg, North West Province (South Africa). With the exception of the giant sable confined to Angola, the precise distribution of each subspecies is still uncertain. However, Pitra et al. (2002) disclosed a certain degree of genetic subdivision among the four subspecies in their analyses of mitochondrial DNA Control Region and Cytochrome-b sequences. In particular, H.n.roosevelti seems to represent the Kenya and East Tanzania clade, H.n.kirkii the West Tanzania clade and H.n.niger the southern Africa one. If such a spatial genetic structure would be confirmed, this information would be fundamental to plan any translocation. Pitra et al. (2002) corroborated the findings of Matthee and Robinson (1999), as they disclosed a genetic barrier between the east and southern African (Angola, Zambia and Malawi southwards) clades.

Table 1. Taxonomic classification of sable antelope.

Kingdom Phylum Class Order Family Subfamily Genus Species Subspecies Animalia Chordata Mammalia Cetartiodactyla Bovidae Hippotraginae Hippotragus Hippotragus niger Hippotragus niger niger

(11)

10

Geographical distribution

Sable antelope generally occur in south-eastern African savannah woodlands, while in certain areas they are associated with the well-watered Miombo woodland (Estes 2013). The distribution range of this species is not continuous. The giant sable (Hippotragus niger variani), which is represented by an isolated population, occurs in northern Angola between the Cuanza and Londo rivers (Wilson and Hirst 1977; IUCN 2008). Southern Kenya and south-eastern Angola represent the northern limit of the sable antelope range, while Magaliesberg (North-West Province, South Africa) and the Crocodile and Komati Rivers (South Africa) mark out the south-western and south-eastern limit, respectively (Skinner and Chimimba 2005). In particular, the sable antelope is thought to be native to Angola, Botswana, Congo, Kenya, Malawi, Mozambique, Namibia, South Africa, Tanzania, Zambia, Zimbabwe, while it has been introduced into Swaziland (Skinner and Chimimba 2005; IUCN 2008).

Hippotragus niger kirkii, which is also known as the common sable, occurs from North of the

Zambezi River through Zambia, eastern Democratic Republic of Congo and Malawi to south-western Tanzania. Hippotragus niger roosevelti, or eastern sable, occurs in the hinterlands of southern Kenya, in eastern Tanzania and in northern Mozambique. The southernmost subspecies, Hippotragus

niger niger, also known as the black sable, ranges across north-eastern South Africa, through

Zimbabwe and northern Botswana, reaching the southern edge of the distribution in the Kruger National Park (Wilson and Hirst 1977; Skinner and Chimimba 2005). This subspecies exhibits the largest distribution range, as it occurs in Zimbabwe, Botswana, South Africa, Mozambique and Malawi (Estes 1991; Wilson and Reeder 2005).

Within South Africa, sable antelope naturally occur in the lowveld of eastern Mpumalanga, northern Limpopo, in the North West and Gauteng provinces (Fig. 1). They have been patchily reintroduced into many areas both inside and outside their historical range. Some sub-populations can be found outside the natural range in the Northern, Western and Eastern Cape, Free State and KwaZulu-Natal Provinces (Parrini et al. 2016).

(12)

11

Figure 1. Natural distribution range of the sable antelope in South Africa. Although the natural range of the sable antelope includes only the

north-eastern part of the country (the provinces of Limpopo, North West, Gauteng and Mpumalanga), some subpopulations are patchily distributed and comprised in other provinces because they have been introduced out of the species historical range. Image provided by Department of Environmental Affairs – Republic of South Africa (January 2015).

Conservation status and population

Hippotragus niger is included in the International Union for the Conservation of Nature and Natural

Resources (IUCN) Red List as Least Concern, as current population has been estimated to amount to ca. 75,000 individuals; of these, about half occurs in and around protected areas and one-quarter in private lands (East 1999; IUCN 2008). Given the species aesthetic appeal and its high value as a trophy animal, the overall population trends are more or less stable because any decrease in free-living population is balanced by the continued growth of its number on private farms and conservancies. However, some subpopulations remain vulnerable. This is the case of the giant sable population in Angola, that is listed as Critically Endangered and included in the Appendix I of the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) (IUCN 2008). In South Africa, sable antelope are listed as Vulnerable, according to the IUCN Red List Assessment of this region, so facing a high risk of extinction in the wild (Parrini et al. 2016).

(13)

12 In southern Africa sable numbers have shown alarming decline in the past decades (Ogutu and Owen-Smith 2003). Between 1983 and 1995, El Niño caused recurrent droughts in southern Africa (Ogutu and Owen-Smith 2003); a severe scarcity of water was simultaneously observed in Hwange (Zimbabwe), Gonarezhou (Zimbabwe) and Kruger (South Africa) National Parks. Between 1990 and 2001, the sable population declined by ca. 25% in these areas (Ogutu and Smith 2003; Owen-Smith et al. 2005; Dunham 2012), with the main cause of this decrease being the very low level of rainfall in the region (Crosmary et al. 2015). In the same period, two species closely related to the sable antelope, the roan antelope (Hippotragus equinus) and tsessebe (Damaliscus lunatus), experienced a similar decrease in number in the Kruger National Park because of recurrence of droughts (Harrington et al. 1999; Grant and van der Walt 2000). It is likely that a reduction in rainfall and an increase in temperature -with consequent increase in water evaporation- may have reduced the availability of green foliage especially through the critical dry period (Ogutu and Owen-Smith 2003). Unfortunately, conditions in Africa are predicted to get progressively hotter and drier by 2080 with the proportion of arid and semi-arid lands likely to increase by 5-8% (Boko et al. 2007). Overall, this might lead to remarkable shifts in habitat availability and, as a consequence, to a reduction in the number of subpopulations (Parrini et al. 2016).

Sable antelopes have been eliminated from larger parts of their former range by bush meat hunting and poaching (Skinner and Chimimba 2005; IUCN 2008). In fact, Hippotragus niger is one of the most important trophy species in this region (Lindsey et al. 2006; Parrini et al. 2016) and sable hunting can even reach very expensive prices, as high as $23,000 per individual (Booth 2009). Furthermore, the number of intensive breeding locations in private properties, which use captive-bred populations for sale at game auctions or trophy hunting, has increased a lot in the last years in South Africa (Parrini et al. 2016). Although these captive-bred subpopulations have little contact with the wild ones, the consequences of this activity are facing multiple risks, that include increased inbreeding (Grobler and van der Bank 1994), weakened adaptive potential and reduced diseases resistance (Parrini et al. 2016). Moreover, sable antelope are generally sold as flagship species for safari (Crosmary et al. 2013). With the highly managed populations on private farms and breeding camps

(14)

13 progressively losing behavioural adaptations that are necessary to survive in the wild and genetic variability, the risk is that these populations will not be viable for reintroduction in the wild.

Poor habitat management includes land fragmentation, habitat degradation caused by anthropogenic impact (livestock ranching, incorrect fire management) and habitat loss due to agriculture and expanding human population (East 1999; Skinner and Chimimba 2005). For example, in the Kruger National Park, which has experienced a very substantial decrease in sable numbers since 1986, decline in rare antelopes like roan and sable seems to have been influenced by the provision of artificial water points in the northern plains where no water points were present (Ogutu and Owen-Smith 2003). Provision of artificial water points has attracted zebra (Equus burchelli) and wildebeest (Cannochaetes taurinus) in areas where they were previously in low number. This increase in prey numbers has led to an influx of lions (Panthera leo) in the northern areas, making vulnerable species as sable susceptible to their predation (Harrington et al. 1999; Owen-Smith and Mills 2006). However, the lack of recovery despite the closure of water points has recently been explained by a combination of reduced herd size (and thus increased vulnerability to predation) and Allee effect (lowered probability of findings mates) (Owen-Smith et al. 2012). Therefore, high density of artificial water points can lead to both inflated grazing competition and predation, which, in turn, can facilitate the decline of the sable population (Harrington et al. 1999; Grant and van der Walt 2000).

Sable are particularly susceptible to potential direct and indirect competition with other grazers (East 1999; Skinner and Chimimba 2005). Given that sable are a low-density taxon, they risk to be outcompeted by more abundant species that depress grass height through their own grazing (Macandza et al. 2012). At the same time, the increase in the abundance of high-density species in the same areas used by the sable can attract more predators (Owen-Smith and Mills 2006). Sable have a high dietary tolerance, but they might be restricted to areas with low availability of nutritious food types by the presence of competitors and predators (Parrini et al. 2016).

As reported by Wilson and Hirst (1977) in their study carried out in the Matetsi area of Zimbabwe, density of the sable antelope stands around 4 individuals/Km², on average. Taking into account that the sable live in small groups, overall, this species is particularly susceptible to deep

(15)

14 demographic changes because any loss due also to stochastic events can be catastrophic (Grant and van der Walt 2000). In fact, even a small change in adult female sable survival (10%) can lead to serious consequences in the overall population growth (Capon et al. 2013), as recorded also in many other large herbivores and birds (Gaillard et al. 2000; Saether and Bakke 2000).

Within South Africa, the total sexually mature adult free-ranging population size (60-70% of the whole population) amounts to 409-857 individuals, depending on whether formally protected areas outside the natural distributional range are considered or not (e.g., Free State, Northern Cape, Western Cape and Eastern Cape regions: Parrini et al. 2016). However, captive-bred individuals from private game reserves and wildlife ranches represent a very important part of the sable population in South Africa, as it roughly corresponds to about the 68% of the overall subpopulations (Parrini et al. 2016). Unfortunately, these subpopulations do not necessarily add conservation value, as it has been estimated that only 2-10% of these individuals may be still considered as wild (i.e., not dependent on direct human interventions: Parrini et al. 2016). Altogether, the new estimate of the wild and free-ranging population in South Africa would be 818-1,346 mature individuals. Some subpopulations in South Africa have become locally extinct, as those in Songimvelo Nature Reserve (Mpumalanga) and Madikwe Game Reserve, whereas others are performing well, as those of Loskop Dam Nature Reserve, Kgaswane Mountain Reserve and, surprisingly (because outside its distribution range), the Free State provincial nature reserves (Parrini et al. 2016).

Physical description

The sable antelope is one of the larger species of African antelopes. It is 2.3-2.56 m in length, with a shoulder height average of 1.35 m, a tail of about 50 cm with a tuft at the end. The body mass average is 180-270 Kg, the bulls mass average being 230 Kg (Wilson and Hirst 1977; Stuart and Stuart 2013). Both sexes have divergent horns that arch backwards: their average length is 102 cm for bulls and 80 cm for cows (Wilson and Hirst 1977; Stuart and Stuart 2013). In this respect, Crosmary et al. (2013), proved that male sable horn length has declined by 6% in the last three decades because of trophy hunting. The same trend has been observed in male impalas (Aepyceros melampus), while greater

(16)

15 kudus (Tragelapus strepsiceros), surprisingly, showed horn length increase during the study period. Given that trophy fees do not increase with trophy size, hunters preferentially target the largest-horned males, providing an artificial selection towards smaller horns through time. Horn length, indeed, is a heritable character (Crosmary et al. 2013). The effect of this activity could result in favouring individuals with smaller sexually selected traits (Coltman et al. 2003), thus reducing individual fitness (Hartl et al. 2003) as well as the genetic variability of the sable populations (Scribner et al. 1989).

Another trait that differentiates males from female sable is the colour of the coat: adult bulls are shiny black, while cows and younger bulls are usually reddish brown above. However, in the southern populations females may also show brown to black coat (Estes 1999). Both sexes have sharply contrasting white underparts and mainly white face with black markings. Moreover, there is an erect, fairly long mane running from the top of the neck to just beyond the shoulders (Fig. 2) (Wilson and Hirst 1977; Stuart and Stuart 2013).

(17)

16

Home range extent

Annual home range sizes for sable antelope have been recorded from different areas (Wilson and Hirst 1977; Grobler 1981; Magome 1991; Parrini 2006; Magome et al. 2008). They vary between 2.4 km² (Rhodes Matopos National Park, Zimbabwe: Grobler 1981) and 196 km² (Kruger National Park: Henley 2005). Advancement in technology, use of different home range estimation techniques and different environmental conditions can explain the variation in the extent of the home range. However, sable home ranges can be heavily influenced by other factors, such as the local terrain, vegetation type, availability of water and intra-specific issues (Wilson and Hirst 1977).

Habitat and resource use

In contrast with the general pattern for African grazers evidenced by East (1984), sable antelope are more widely prevalent on granitic and sandstone substrates than in regions underlain by basalt and gabbro, although the latter generate more fertile soils and, as a consequence, more nutrient rich grasses (Grant and Scholes 2006; Chirima et al. 2013). In fact, whereas granitic rocks give rise to soils with a high sand content yet deficient in nutrients, especially in areas with heavy rainfall (Bell 1984), in basaltic areas soils are usually richer in nutrients and more fertile (Venter et al. 2003).

Sable antelope can be referred to as an ecotonal species (woodland-grassland ecotone: Estes and Estes 1974). Sekulic (1981) identified three factors influencing sable’s landscape choice: density of trees and bushes (sable prefer open areas with sparsely distributed trees and bushes), occurrence of plateau or hills (sable use mid-slopes frequently) and grass height (sable show preference for medium-tall grass). As far as the density of trees and bushes is concerned, sable are reportedly to forage in open savannah woodlands, where they rely on thickets for shade and open valleys for grazing, whereas thick savannah woodlands are generally avoided (Wilson and Hirst 1977; Grobler 1981; Sekulic 1981; Magome 1991; Skinner and Chimimba 2005; Parrini 2006). In a recent study, Chirima et al. (2013) suggested that sable are most abundant in sour bushveld and mopane savannah woodland, whereas they are almost entirely absent in knob-thorn marula parkland. Sable antelopes show seasonal habitat selection.

(18)

17 Some studies carried out in the Okavango Delta Region (northern Botswana) point to the shift of the grazing to upland grasslands in the wet season, possibly due to the displacement with competing grazers (Hensman et al. 2012; Hensman et al. 2013). However, in many other cases it has been demonstrated that during the wet season the sable seem to choose habitats as miombo (Brachystegia) woodlands, the ecotone between woodlands and grasslands (Estes and Estes 1974) and vlei grassland (Parrini 2006). This choice is driven by the fact that in the floodplain grasslands floodwater subsides starting from the late dry season and for this reason these habitats have grass species containing higher nutrients contents. In the dry season sable select for green marshes (Estes 1991), the valley savannahs and thickets (Magome 1991) and open woodland areas (Parrini 2006).

Seasonality finds room also in the topographic choice of the habitat by sable antelopes. Indeed, they tend to use bottomland areas during the dry season and slopes in the wet season (Estes and Estes 1974; Magome 1991; Parrini and Owen-Smith 2010). Estes and Estes (1974) suggested that the avoidance for slopes (so the selection for valley habitats) in the dry season can be ascribed to the occurrence of green leaves in valleys. In fact, topography affects the grass quality through its influence on the distribution of nutrients. These latter move down with water along a slope gradient and accumulate in bottomland areas, thus allowing retention of green foliage and stimulating the building up of structural carbohydrates that ultimately dilute the high nutrient concentration (Bell 1970; Scoones 1995; Scholes et al. 2003). A series of studies demonstrated that sable depend on grassy valleys referred to as “dambos”, which provide them with green forage through the dry season, especially following fires (Estes and Estes 1974; Grobler 1981; Magome et al. 2008; Parrini and Owen-Smith 2010). In fact, sable generally select for a high crude protein and low fibre diet, often making use of recently burnt patches with new grass growth (Parrini 2006; Magome et al. 2008). Use of new green regrowth from burns allows sable to maintain high protein levels in their diet and to compensate for the low quality food typical to the dry season (Magome 1991; Parrini 2006). As a plant matures, it gradually accumulates structural tissues, which impose digestive constraints on the foraging herbivore (Van Soest 1987). As such, for an herbivore it is more advantageous to concentrate its foraging activity on young plants, avoiding senescent material, as in the case of burnt areas.

(19)

18 The grass height selection in herbivores is driven by the muzzle structure (Owen-Smith 1982). Among grazers, some species favour short grass, aided by their relatively broad muzzles, while other with narrower muzzles depend on taller grass (Gordon and Illius 1988; Arsenault and Owen-Smith 2008). According to the results of a quantitative analysis on muzzle shape in ungulates performed by Gordon and Illius (1988), sable have relatively narrow mouth dimensions compared to other ruminants of similar body size. Consequently, they are able to display high levels of selectivity (Wilson and Hirst 1977; Grobler 1981; Magome 1991). Sable have been found to feed preferentially on fresh growth (Estes and Estes 1974; Sekulic 1981; Parrini and Owen-Smith 2010), spanning over a height range of 4-40 cm (Grobler 1981; Parrini and Owen-Smith 2010).

Sable antelopes generally select leafy, palatable grasses from Chrysopogon serrulatus,

Panicum maximum, Heteropogon contortus and Themeda triandra (Parrini 2006; Magome et al. 2008;

Le Roux 2010; Macandza et al. 2012). During the wet season, most grasses of high forage value are highly accepted by the sable, as, for instance, Panicum maximum, Digitaria eriantha and Themeda

triandra (Parrini 2006; Le Roux 2010; Macandza et al. 2012). On the contrary, in the dry season the sable also accept grasses of low-to-moderate forage value, as Hyperthelia dissoluta (Le Roux 2010) and Aristida meridionalis (Hensman et al. 2013), thus including brown material in their diets. In fact, stems contain considerably higher levels of structural carbohydrates compared to leaves and are harder to digest, yet they can anyway be part of sable diet during the most unfavourable season (Bell 1970; Hensman et al. 2013). Sable have also been observed to partly switch to browsing during the critical dry season; in the latter, sedges, which remain greenest, represent alone 10% of the diet of the species (Hensman et al. 2013).

Drinking water requirement affects spatial distribution and habitat selection of grazers, especially during the dry season (Ogutu et al. 2010). However, there are contrasting evidences on sable dependence on water source availability. Older scientific studies, in fact, suggest that sable antelopes are highly water dependent: they rely on water every day and never moving further than 2.5 km from a permanent water source (Skinner and Chimimba 2005). On the contrary, more recent studies carried out in the Kruger National Park disclosed that sable herds drink only every 2-3 up to

(20)

3-19 4 days and move up to 7 km from water sources, particularly during the dry season (Rahimi and Owen-Smith 2007; Cain et al. 2012). In this way sable do not meet other grazers and related predators.

Behavioural traits

Hippotragus niger is a gregarious species, typically occurring in herds of 15-30 individuals, which

occupy fixed spatially discrete home ranges (Wilson and Hirst 1977; Skinner and Chimimba 2005; Parrini 2006), even if occasionally larger groups come together (Stuart and Stuart 2013). Herds are usually composed of a dominant bull, adult cows, sub-adult and juvenile females, young bulls (younger than 24 months) and calves (Bothma and van Rooyen 2005). In every herd, there is a strong social order, in which an alpha or dominant female leads the group to feed or to water points (Wilson and Hirst 1977; Skinner and Chimimba 2005). Generally, bulls establish themselves in territories overlapping those of nursery herds, composed by cows and young animals (Stuart and Stuart 2013). Young bulls leave the nursery herd when they reach 24 months of age to form small bachelor herds until they are old enough to establish their own territory (Grobler 1974).

Sable antelope average longevity is 17-20 years (Stuart and Stuart 2013). The age of first reproduction in male and female sable differs markedly. Despite box genders reach sexual maturity at two years of age, young bulls are prevented from breeding by territorial adult males until they establish their own territory at about six years of age (Grobler 1980). Sable females, instead, usually can calve for the first time in their third year of life and can reproduce up until the age of ten in the wild (Wilson and Hirst 1977; Grobler 1980). Sable antelope are seasonal breeders, dropping their calves in January-March, although this varies according to the area. Gestation usually lasts for 270 days; sable give birth to a single calf and the new born weights 13-22 kg (Stuart and Stuart 2013). In suitable areas, sable populations can increase up to 13% per annum, as it is common for each adult cow to become pregnant in the breeding season (Sekulic 1981; Parrini 2006; Capon et al. 2013).

(21)

20

Species Distribution Modelling

General background

Relationships between a given species and its environment are a central issue in ecology. In the last three decades, the interest in plant and animal Species Distribution Models (SDMs) has grown dramatically, as highlighted by the rapidly rising of scientific literature records on this topic and by its application in many fields of biology (Guisan and Zimmermann 2000; Guisan and Thuiller 2005). The advantage of using SDMs is that they allow filling the geographical gaps in the information about the distribution of species and, thanks to the increased availability of inexpensive and powerful computers, they are quickly developing (Phillips et al. 2006). Moreover, by integrating such modelling with Geographical Information System (GIS) technology, it is possible to extrapolate biological distribution across large regions, thereby providing detailed information for a wide range of environmental applications (Ferrier et al. 2002). In recent years the use of SDMs has gained more and more importance in order to predict the distribution of species and to use the information derived from such models in ecology, biogeography, evolution, and more recently, in conservation biology and climate change research (Guisan and Thuiller 2005).

Species distribution models are empirical models that associate field observation about the species occurrence - presence or abundance - to environmental predictor variables, based on statistically or theoretically derived response surfaces (Guisan and Zimmermann 2000; Elith et al. 2006). In other words, they identify the environmental conditions suitable for a species and where suitable environments are distributed in space, to estimate actual or potential geographic distribution of the species (Pearson 2007). SDMs have a high level of flexibility in fitting complex responses (Elith et al. 2006). These models predict the environmental suitability for a species as a function of the given environmental variables (Phillips et al. 2006; Pearson 2007). However, it is important to note that the environmental suitability does not imply the occurrence of the species, because the latter can be absent from a very suitable habitat, e.g. because of historical or biological reasons (Holt 2003). Similarly, a given taxon can be present in a very unsuitable environment, e.g. due to immigration processes (Pulliam 2000). Therefore it is important to keep in mind that models are not perfect; they

(22)

21 attempt to predict a species distribution according to a set of environmental variables and occurrence record inputs. So, when a model produces its output, it is necessary to do the appropriate considerations in order to interpret the results in the best way possible, also considering the purpose of the study and the type of applications of the model output. Moreover, models cannot represent a complete substitution for detailed, ongoing collection of field data on species distribution, demography, abundance and interactions, but they can only integrate such information (Guisan and Thuiller 2005).

Theoretical background

Environmental conditions suitable for the survival of a given species can be identified in two different ways. In fact, models can use either a mechanistic or a correlative approach.

Mechanistic models base predictions on real cause-effect relationships, incorporating physiologically limiting mechanisms in a species tolerance to environmental conditions (Guisan and Zimmermann 2000). These models stand out for the theoretical correctness of their response; given their nature, they cannot be incorrect (Pickett et al. 1994; Korzukhin et al. 1996). The drawback of this kind of models is that they require detailed knowledge about the physiological responses of the species to environmental factors and, therefore, they can only be applied to the very well-known species (Pearson 2007). One case where mechanistic models have been successfully applied is in the modelling of the distributions of North American tree species (Chuine and Beaubien 2001).

Models that use a correlative approach associate known occurrence records to a suite of environmental variables, which are expected to affect the physiology and probability of persistence of the species, to estimate environmental conditions that are suitable for that species. Basically, the observed distribution of a species provides information to the model about the environmental requirements for the species occurrence (Pearson 2007). In other words, such models assume direct deterministic relationships between species distribution and mapped environmental variables (Ferrier et al. 2002). Of course, correlative models have limitations. One important assumption of these models is that a species occurs in all suitable areas, while it is absent from all unsuitable ones (Pearson 2007).

(23)

22 However, these conditions are not always met, as often a given species has to face with some other constraints that affect its distribution, such as biotic interactions or its own dispersal ability (Araújo and Pearson 2005). In these cases, correlative models could result less accurate than mechanistic models, being unable to cope with such additional constraints. Nevertheless, given that they provide valuable biogeographical information (Raxworthy et al. 2003; Bourg et al. 2005) and that spatial occurrence records are available for a large number of species, the vast majority of species distribution models are correlative (Pearson 2007). Moreover, if high predictive precision is required to model the distribution of biological entities on a large spatial scale under some environmental conditions, then static correlative models represent a valid and powerful approach (Guisan and Zimmermann 2000).

Both correlative and mechanistic models identify areas considered to be suitable for the species, yet they do not provide information on areas that are actually occupied (Fig.3). To understand how a model output is generated it is important to first review some concepts concerning the distribution of a species over a certain space. Pearson (2007) defines the geographical space of a species as the space where the species really occurs, which refers to the spatial locations as commonly identified using x and y coordinates (Fig.3). Species also occur in an environmental space, which is a conceptual space defined by the environmental variables to which the species respond (Fig.3). Although models describe the suitability of areas within an environmental space, the results are then projected into a geographical space, to obtain a geographic area for the predicted presence of the species (Pearson 2007). The concept of environmental space is linked to the concept of niche (Pearson 2007). A fundamental niche of a species is the “n-dimensional hypervolume” in the environmental space where a species can maintain a viable population and persist (Hutchinson 1957; represented by the solid line in environmental space in Fig.3). In this “n-dimensional hypervolume” each point corresponds to a state of the environment that is suitable for the species to occur. Hence, the fundamental niche represents the full range of abiotic conditions within which the species can persist (Pearson 2007). Unfortunately, in many cases species are not able to occupy all suitable sites, which means they are not present in their whole fundamental niche, e.g. because of competition with other species (Pearson 2007) (see the non-shaded area around label E in Fig.3). The term realized niche is then adopted to identify the portion of the fundamental niche from which the species is not excluded

(24)

23 due to biotic competition (Hutchinson 1957). The fundamental niche depends on climate, habitat and resource requirements, while the realized niche also includes relationships with competitors (Soberón 2007). However, competition is not the only reason explaining why a species could be excluded by a portion of its fundamental niche. To emphasize this point, Pearson (2007) defined the occupied niche of a species (represented by the grey area in the environmental space in Fig.3). The latter takes into account all constraints imposed on the actual distribution of a species, including the geographical and historical ones, such as, for instance, (I) species limited ability to reach or re-occupy all suitable areas (e.g., due to geographical barriers to dispersal) (Holt 2003), (II) biotic interactions of all forms (competition, predation, parasitism and symbiosis), and (III) human modification of the environment (Phillips et al. 2006). Therefore, the occupied niche is smaller than the realized niche, which, in turn, is usually smaller than the fundamental niche. One more aspect to take into account when dealing with niche concepts is that some studies aim to investigate only one part of the fundamental niche using a limited set of predictor variables. For example, in the investigation of potential impacts of future climate change, it is fairly common to focus only on how climate variables impact species distributions (Pearson 2007). A species niche defined only in terms of climate variables is referred to as climatic niche (Pearson and Dawson 2003). When the climatic niche is mapped on geographical space, it is then termed as bioclimate envelope (Pearson and Dawson 2003).

(25)

24

Figure 3.Illustration of the hypothetical distribution of a species in geographical and environmental space. Environmental space is

represented here for simplicity in only two dimension (where 𝑒1 and 𝑒2are two environmental factors). Crosses represent observed species occurrence records. Some areas of the distribution do not include known localities because they may have not been detected yet, as area A in geographical space and the shaded area immediately around label D, that when projected back to geographical space results in area 2. Some areas of the potential distribution may not be inhabited by the species even if environmentally suitable, due for example to biotic interactions (e.g. competitors) (as the non-shaded area around label C), or dispersal limitation (e.g. geographic barriers) (see area B in geographic space) or because the species has been extirpated from the area (e.g. human modification of the landscape) (area 3).In environmental space the model may not identify neither the species occupied niche or the fundamental niche; rather, it identifies only that part of the niche defined by the observed records. When projected back in geographical space, the model will identify parts of the actual distribution and potential distribution (area 1 is the known distributional area). Image adapted from Pearson (2007).

In SDMs, the potential distribution of a species is constituted by those areas which satisfy the species fundamental niche requirements projected on the geographic space (Anderson and Martinez-Meyer 2004; Phillips et al. 2004; Phillips et al. 2006; represented by the light blue solid line in geographical space in Fig.3). It is important to stress out that the fundamental niche - i.e., the potential distribution of a species - can only be derived by means of the mechanistic modelling approach (Kearney et al. 2008). As stated earlier, the different ecological niches are located on a gradient stretching from the fundamental to the occupied niche (Jiménez-Valverde et al. 2008). The exact position where a model output lies on this niche gradient depends on the species biology, the spatial resolution, the variables included in the model and the modelling approach (Soberón 2010). For example static correlative predictive models, which are generally based on large empirical field data sets, are likely to only predict the realized niche (Guisan and Zimmermann 2000). As the observed distributions are already constrained by biotic interactions, which limit resources and dispersal ability

(26)

25 of the species, correlative SDMs actually quantify the occupied niche (Guisan and Thuiller 2005). Similarly, the model identifies only some parts of the actual and potential distribution when projected back in the geographical space. Therefore, SDMs are not able to predict the full extent of either actual or potential distribution ranges (Pearson 2007).

Model design

Since only few species have been studied in detail in terms of their dynamic responses to environmental change, correlative models are applied to the majority of the SDMs studies (Woodward and Cramer 1996). In this section, the most important steps that characterize the process of correlative SDMs design will be discussed. Three main components are needed (Austin 2002): a data model (explaining the method by which presence localities have been collected), a statistical theory to identify the relationships between species occurrence records and the environmental variables and a model based on the ecological theory (see Theoretical background).

Data model

Correlative SDMs require two type of data input: known occurrence records (biological data) and a suite of variables that describe the environment in which the species is found (Pearson 2007). Data used for the distribution modelling are usually stored in a GIS as a grid of cells and are termed raster

data.

An important aspect to take into account when collecting data is the spatial scale at which the model will operate. The latter is a central and recurrent issue to be solved in SDMs (Pearson et al. 2004). Spatial scale has two components: extent, which refers to the size of the region over which the model is run, and resolution, which refers to the size of the cells of the grid (Guisan and Thuiller 2005). In SDMs the spatial resolution of biological and environmental data must necessarily be the same (Guisan and Zimmermann 2000). The choice of a proper spatial scale is essential because patterns observed on one scale may not be apparent on another scale (Guisan and Zimmermann 2000). For example, if only part of an important environmental gradient is sampled because of an

(27)

26 inappropriate choice of the spatial scale (e.g., using political instead of natural boundaries), this can lead to an incorrect interpretation of the results (Thuiller et al. 2003). The choice of a scale is also related to the type of species considered, in terms for example of detectability and prevalence in the landscape (Guisan and Thuiller 2005). For sessile or very locally mobile organisms, finer resolution usually provides better prediction (Brotons et al. 2004). For highly mobile organisms various types of habitat might need to be included in each cell in order to fulfil their ecological requirements. Hence, in this case the use of coarser resolution could be more appropriate (Jaberg and Guisan 2001; Guisan and Thuiller 2005).

Latitude-longitude occurrence data usually provide the information on species distribution (Margules and Austin 1994; Soberón et al. 2000; Phillips et al. 2006). These can be categorized in: presence-only data, if only records of localities where the species has been recorded are available, and presence-absence data, if records of both presence and absence of the species are known (Ferrier et al. 2002; Guisan and Thuiller 2005; Pearson 2007). The inclusion of absence records in the model improves its performance (Brotons et al. 2004). In fact, most research on development of distribution modelling techniques has focused on creating models using presence-absence data, typically collected only by rigorously designed surveys (Guisan and Zimmermann 2000; Ferrier et al. 2002; Elith et al. 2006). However, occurrence data for most of the species have been recorded without planned sampling schemes and so absences are not available (Ferrier et al. 2002; Elith et al. 2006; Phillips et al. 2006). This is for example the case of museums or herbarium collections, rare species records, that are difficult to detect, or from taxa inhabiting remote regions (Anderson et al. 2002; Ferrier et al. 2002; Graham et al. 2004; Soberón and Peterson 2005). For this reason, the majority of SDMs are built on presence-only data. Both presence-absence or presence-only data can have biases associated with them (Pearson 2007). One such bias is false absences, when a species could not be detected although present or it was absent despite the occurrence of suitable environmental conditions (Pearson 2007). In this case, the model can register a record denoting unsuitable environmental conditions, even though the environment is suitable. Other source of bias and errors are incorrect species identification, inaccurate spatial referencing of records, sampling only concentrated in easily accessible locations, and errors in the transcription of the data (Graham et al. 2004; Phillips et al. 2004; Phillips et al. 2006).

(28)

27 For example if a sampling is representative of the geographical space but not of the environmental space, the model fails to identify all main environmental variables driving the species distribution (Phillips et al. 2006). Therefore, a sampling should be representative of both the geographical and the environmental space in order to avoid spatial autocorrelation and to provide an accurate picture of the environments inhabited by the species (Phillips et al. 2006; Pearson 2007).

The variables used in a distribution model are unlikely to define all possible dimensions of the environmental space, representing only a subset of the factors that influence a species distribution (Pearson 2007). For this reason, the choice of the environmental characteristics to be included in the model is a critical step, since selected features are assumed to represent all those that constrain the distribution of the species (Phillips et al. 2006; Pearson 2007). There is not a lower or upper limit in the number of environmental factors that can be included in a distribution model. Some studies worked well with three only (Huntley et al. 1995), while others used up to 14 environmental variables (Phillips et al. 2006). However, when deciding on the numbers of variables to use in a model, it is important to consider the number of occurrence points available. Models with a large number of environmental factors tend to overfit small occurrence data sets, although they provide more accurate results for large occurrence ones (Phillips et al. 2004).

Environmental variables can be either biotic or abiotic (Heikkinen et al. 2007; Pearson 2007). Biotic factors refer to the biological constraints represented by the intra- and inter-specific interactions (e.g. competition, predation, resources availability), while abiotic factors refer to the environmental and physiological constraints (e.g. temperature, rainfall, elevation, soil type, metabolic rate, growth rate) (Austin et al. 1984; Guisan and Zimmermann 2000). The majority of the models are built using abiotic environmental variables, although some include also biological ones. For example, Heikkinen et al. (2007) used the distribution of woodpecker species to predict the distribution of owls in Finland. Indeed, woodpeckers excavate cavities in trees that provide nesting sites for owls.

(29)

28 Statistical model

Algorithm choice

Once the data model is completed, the species occurrence records and the environmental variables are entered into an algorithm that detects complex non-linear relationships in a multi-dimensional environmental space to identify the environmental conditions associated with the species occurrence (Johnson and Omland 2004; Pearson 2007). A number of approaches are available to classify the probability of the species occurrence/absence as function of a set of environmental variables (Pearson 2007).

Different statistical models could yield different potential distribution predictions (Loiselle et al. 2003; Brotons et al. 2004; Segurado and Araújo 2004; Elith et al. 2006; Pearson et al. 2006; Pearson 2007). Elith et al. (2006) compared 16 modelling methods using 226 animal and plant species across six regions of the world. They found that there were differences in the performance of the models with some consistently outperforming others. Several factors may lead to differences among predictions: I) the ability of the model to identify complex relationships in the data, such as interactions among environmental variables; II) the inclusion of presence-absence or presence-only data (Brotons et al. 2004; Pearson et al. 2006); III) in the case of presence-only data, the inclusion of solely presence records, background data or pseudo-absences (Elith et al. 2006); IV) the use of a parametric or non-parametric algorithm (Segurado and Araújo 2004); V) the method by means of which the model extrapolates beyond the range of data used for its calibration (Pearson et al. 2006); VI) the incorporation of categorical environmental variables in the model (Pearson 2007). Moreover, models can also differ in the form of their output because some produce continuous predictions (a probability value ranging from 0 to 1), while others provide binary predictions (with 0 indicating unsuitable and 1 suitable environment) (Pearson 2007).

As previously mentioned, one key factor that differentiates between various algorithms is whether they require species absence data. Models based on group discrimination techniques operate by comparing sites where the species has been detected against those where the species is absent (e.g. Generalized Linear Models - GLM, Generalized Additive Models - GAM, Artificial Neural Network -

(30)

29 ANN). Models based on profile techniques rely only on presence data (Stokland et al. 2011). Profile techniques account for three types of presence-only methods (Pearson 2007): those relying solely on presence records (BIOCLIM and DOMAIN) and that provide prediction without any reference to other samples from the study area; those using “background” environmental data for the entire study area (MAXENT and ENFA), which focus on how the environment where the species is known to occur relates to the environment across the rest of the area, the “background”; and those sampling “pseudo-absences” (random selection of locations) from the study area by assessing differences between occurrence localities and a set of other localities, used in place of real absence data (Pearson 2007).

Boosted Regression Tree (BRT) is a group of discrimination techniques, which require presence-absence data and rely on power provided by the strength of regression trees and the boosting (Stokland et al. 2011). Regression trees are models that relate a response to predictors by recursive binary splits (Stokland et al. 2011). Boosting is a method for combining many simple models into models with improved predictive performance (Elith et al. 2006; Stokland et al. 2011). Hence, the final BRT model consists of numerous simple trees that can be interpreted as an additive regression model (Pearson 2007). Statistical methods such as Generalized Linear Models (GLMs) and Generalized Additive Models (GAMs) are also commonly used for modelling using presence-absence data (Phillips et al. 2006). GAM is an extension of GLM, which, in turn, represents a further extension of ordinary linear regression. Linear regression fits linear functions relating a response (dependent) variable to one or more predictor (independent) variables, assuming the relationship between the response variable and each of the predictors as a straight line (Elith et al. 2006). GLMs use logistic regression to model the relative probability of presence of the species as response to a gradient of environmental factors (Stokland et al. 2011). GAMs use the same general process as GLMs do, except that the effect of each predictor variable is specified as a non-parametric smooth function estimated from the data (Elith et al. 2006). Both GLMs and GAMs are extensively used in species distribution modelling because of both their strong statistical foundation and ability to realistically model ecological relationships (Austin 2002). However, GAMs generally outperform GLMs (Pearce and Ferrier 2000; Araújo et al. 2005) and both GLMs and GAMs perform better than BRT models

(31)

30 (Brotons et al. 2004). Artificial Neural Network (ANN) models, which also requires presence-absence data, provide projections yielding generally higher accuracies than the majority of modelling methods (Araújo et al. 2005). ANNs are computer systems that have increasingly been employed in ecological studies as an alternative to more traditional statistical techniques (Lek and Guegan 1999). Inspired by the structure of the brain, ANNs consist of many processing elements (artificial neurons) that are interconnected to form a network (Pearson et al. 2004). ANNs are “trained” by repeatedly going through large numbers of known examples of the problem under consideration. By repeatedly adjusting the connection between processing elements, the difference between the network predictions and the known examples can be minimised (Pearson et al. 2004). The main advantages of ANNs are that they are able to identify non-linear responses to environmental variables and incorporate multiple types of input variables either categorical or continuous (Pearson et al. 2004). A drawback of these types of models is that it is not easy to identify the relative contribution of different input variables (Gevrey et al. 2003). Spatial Estimator of Climate Impacts on the Envelope of Species (SPECIES) is the most famous species distribution model belonging to the ANNs group, which allows characterizations of bioclimatic envelopes based on inputs generated from climate and soil data (Pearson et al. 2004). BIOCLIM and DOMAIN belong to the group of models that use presence-only data (Phillips et al. 2006). BIOCLIM derives values for indices that summarize the climatic conditions at locations where a target taxon has been recorded. Subsequently, a matching algorithm is run to identify areas with similar climatic conditions to those of the location in the original location records (Nix 1986). Identified areas are then plotted on a map to provide the potential distribution of the target species (Nix 1986). One of the major assumptions of BIOCLIM is that all climatic variables are independent. While the latter can be true at a continental and subcontinental scale, this is very unlikely at a finer spatial one (Nix 1986) and may result in an overestimation of the species distribution (Pearce and Lindenmayer 1998). DOMAIN gives a predicted suitability index by computing the minimum distance in environmental space between any record of occurrence and new sites (Pearce and Lindenmayer 1998). While BIOCLIM is an envelope-style method, DOMAIN is a two-distance based one (Elith et al. 2006).Genetic Algorithm for Ruleset Prediction (GARP) models also use presence-only data to infer the distribution of a given species. GARP identifies a binary prediction based on a

(32)

31 collection of rules that are based on the similarity (suitable areas) or lack of similarity (unsuitable) of the environmental conditions at each site with those of recorded occurrence records (Phillips et al. 2006). GARP has been used extensively in recent years to study diverse topics such as global warming (Thomas et al. 2004), infectious diseases (Peterson and Shaw 2003) and invasive species (Peterson and Robbins 2003).

The Maximum Entropy model (MAXENT) is a general purpose machine learning method for making predictions on inferences from incomplete information (Pearson 2007). Its origin lies in the statistical mechanics (Jaynes 1957), but it can be applied to different subjects, such as astronomy, portfolio optimization, image reconstruction, statistical physics and signal processing (Phillips et al. 2006). In an ecological context, MAXENT can be used as a general approach for modelling species distributions, and it is suitable for presence-only datasets (Phillips et al. 2006). MAXENT does not require absence data in order to model species distributions and relies on background environmental data for the entire study area (Pearson 2007). MAXENT estimates the probability distribution of a species across a study area based on the principle that the estimated distribution must agree with everything that is known or inferred from the environmental conditions of where the species has been observed (Pearson 2007). MAXENT estimates a target probability distribution by finding the maximum entropy distribution, that is the most spread out (i.e. closest to uniform) subject to a set of constraints imposed by the information available regarding the observed distribution of the species and the environmental features across the study area (Jaynes 1957; Phillips et al. 2004; Phillips et al. 2006; Pearson 2007).

The applicability of the maximum entropy concept to species distribution models is supported by thermodynamic theories of ecological processes (Aoki 1989; Schneider and Kay 1994). The second law of thermodynamics states that in systems without any outside influence, processes move towards an increase in disorder, thus in a direction that maximises entropy (Feynman 2001). Therefore, in the absence of influences other than those included as constraints in the model, the geographic distribution of a species will tend towards the maximum entropy (Phillips et al. 2006). However, the species distribution provided by the model is only an approximation (𝜋′) of the real one (π) due to the nature of the models themselves. The approximation that the model provides is another probability

(33)

32 distribution, called 𝜋′. The entropy of this probability distribution 𝜋′ is defined as (Phillips et al. 2006):

H(𝜋′) = - ∑ 𝜋′(𝑥)lnπ′⁡(𝑥) 𝑥∊𝑋

Equation 1

where:

H(𝜋′) is the entropy of the probability distribution 𝜋′,

𝜋′(x) is the probability of every occurrence record x in the area of study X,

x is every point defined in the space X for which an occurrence has been recorded.

𝜋′ is built over a finite set of X, that represent the set of pixels of the study area. The individual elements of X are points and represent the occurrence records of the target species. The distribution 𝜋′ assigns a non-negative probability 𝜋′(x) to each point x defined in the X space and these probabilities sum to 1. All the environmental variables constraining the probability distribution 𝜋′ are called features and each of them 𝑓𝑗 assumes a real value 𝑓𝑗(x) in each point x in X, the geographical space (Phillips et al. 2006). The entropy is non negative and is at most the natural log of the number of elements in X (Phillips et al. 2004; Phillips et al. 2006). From equation 1, it is evident that the entropy reaches its maximum value when all values x that X can adopt have equal probability (Shannon 1948). To look for a system with the maximum level of entropy means looking for a system with the maximum level of uniformity, as the objects contained in the system are disordered and completely interspersed instead of being clustered (Feynman 2001). Entropy is a fundamental concept in information theory too, a discipline whose aim is the quantification of the amount of information with respect to its memorisation or transmission on an appropriate channel (Shannon 1948). In information theory, entropy is the quantity measuring the amount of data available or how much choice is involved in the selection of an event (Shannon 1948). Thus, a distribution with higher entropy involves more choices: it agrees with everything that is known, being controlled only by founded constraints that are given to the model as an input, while all other unfounded constraints do not affect the maximum entropy probability distribution, 𝜋′, avoiding assuming anything that is unknown (Jaynes 1990; Phillips et al. 2006).

(34)

33 One problem in the application of MAXENT for species distribution modelling is that predicted feature means will usually not be equal the true empirical means; hence, features associated to the probability distribution 𝜋′ will only approximate the real empirical means of features associated with π (Phillips et al. 2006). This problem can be solved if a parameter of regularization is introduced in the model, which is called lasso and indicated with 𝑙1 (Phillips et al. 2004). Regularization forces MAXENT to focus on only the most important features that affect the species distribution, weighting their relative importance based on input data (Williams 1995). As a consequence, models with fewer parameters are less likely to overfit, since the simplest explanation of a phenomenon is usually the best according to the principle of parsimony or Occam’s Razor (Phillips et al. 2006). Regularization is particularly important in the elaboration of species distribution models when the sample size is small (Phillips et al. 2004).

The algorithm used by MAXENT is deterministic, iterative and guaranteed to converge to the maximum entropy probability distribution (Della Pietra et al. 1997). The algorithm stops when a user-specified number of iterations have been performed or when the change in lasso falls under a specific value, which means that the model has found the probability distribution of maximum entropy (Phillips et al. 2006).

MAXENT has several advantages. First, it requires presence data only and makes use of both continuous and categorical data while incorporating interactions among variables. It is based on efficient deterministic algorithms that are guaranteed to converge to maximum entropy probability distribution. The probability distribution provided by the algorithm has a concise mathematical definition and, as such, it is easy to understand and analyse; the output produced is continuous, thus allowing fine distinctions to be made among suitability of different areas, but if a binary prediction is required, it can be produced by setting a threshold in order to see which areas are suitable and which not. This modelling method generates a model that can be put in a form that is easily understandable and interpretable (Phillips et al. 2004; Phillips et al. 2006; Pearson 2007). Moreover, MAXENT can avoid overfitting of data by using a regularization parameter 𝑙1. It is a generative approach which has been demonstrated to perform better than discriminative techniques, especially when sample size is small, and substantially better than many other SDMs, such as GLM, GAM and GARP (Phillips et al.

(35)

34 2004; Elith et al. 2006; Phillips et al. 2006). On the other hand, MAXENT has only a few drawbacks: I) being recent, it is not a statistical method as widely used as both GLM and GAM are; hence, only a few guidelines are available for its use; II) it requires further study in the setup of a proper regularization parameter to apply to the model; III) it uses an exponential model for probabilities and this can lead to wrong predicted values for environmental conditions outside the range of environmental conditions present in the study area, which mean that careful attention must be taken when extrapolating to another study area or to future or past climatic conditions; IV) it requires a special-purpose software, as MAXENT is not available in standard statistical packages (Phillips et al. 2004; Phillips et al. 2006; Pearson 2007).

Model evaluation

After having run the modelling algorithm, a map can be drawn to show the species predicted distribution. At this stage, it is particularly important to test the ability of the model to predict the known species distribution. The testing process is referred to as model evaluation (Pearson 2007). The evaluation framework can be undertaken in two main steps: first, the model needs to be calibrated in order to know the level of agreement between predicted probabilities of occurrence and observed proportions of occupied sites; secondly, the discrimination capacity of the model needs to be calculated (Murphy and Winkler 1992). There is no single approach that can be recommended in modelling processes, and it is important to underline that the choice of the validation strategy depends on the aim of the model, the type of data available and the modelling method used (Pearson 2007).

Calibration is applied to models in order to test their predictive accuracy. During the calibration process, predicted probabilities of occurrence generated by the model are tested against actual observation of occurrence at a set of surveyed sites (Ferrier et al. 2002). It requires the choice of a set of data against which the model predictions can be tested (Araújo et al. 2005; Pearson 2007). The test data are those used in the calibration step to test the model predictive performance, while the calibration data, or training data, are those used to build the model (Pearson 2007). If possible, it is preferable to evaluate models with independent test data that are obtained from sites or periods other than those used to develop the model (Ferrier et al. 2002; Pearson 2007). For example, data collected

Riferimenti

Documenti correlati

been modified; in fact, Zaouali and Baeten (1984) identified a benthic community regrouped as follows : i) species in stations with marine influence; ii)

In this study, we aimed to investigate whether the species responses to environmental factors are taxon specific, and can shed more light on the differentiation between closely

From left to right, we represented (2a) two predominant patterns of change for woodland species (upward shift, expansion), (2b) the contrasting patterns showed by edge species

example of a story; demonstrate some of the metaphors it uses and how these metaphors structure a larger-scale perceptual unit, namely, force of nature; and outline the mythic

To cancel out pro- duction and detection charge-asymmetry effects, the search is carried out by measuring the difference between the CP asymmetries in a charmless decay and in a

Se lo studio della complessità e della ricchezza della diale ttica tra i diversi soggetti sociali e politici, economici e culturali nella dimensione “micro” sottintende la presa

19 To be deductible from taxes amounts must be used: by the Catholic Church (Art. 222/1985), to support the clergy sustenance system provided by the law; by the Seventh-day

Given a spherical homogeneous space G/H, the normal equivariant embeddings of G/H are classified by combinatorial objects called colored fans, which generalize the fans appearing in