Sampling bias in presence-only data used for species distribution modelling: theory and methods for detecting sample bias and its effects on models

Open access


This paper provides a theoretical understanding of sampling bias in presence-only data in the context of species distribution modelling. This understanding forms the basis for two integrated frameworks, one for detecting sampling bias of different kinds in presence-only data (the bias assessment framework) and one for assessing potential effects of sampling bias on species distribution models (the bias effects framework). We exemplify the use of these frameworks to museum data for nine insect species in Norway, for which the distribution along the two main bioclimatic gradients (related to oceanicity and temperatures) are modelled using the MaxEnt method. Models of different complexity (achieved by use of two different model selection procedures that represent spatial prediction or ecological response modelling purposes, respectively) were generated with different types of background data (uninformed and background-target-group [BTG]). The bias assessment framework made use of comparisons between observed and theoretical frequency-of-presence (FoP) curves, obtained separately for each combination of species and bioclimatic predictor, to identify potential sampling bias. The bias effects framework made use of comparisons between modelled response curves (predicted relative FoP curves) and the corresponding observed FoP curves for each combination of species and predictor. The extent to which the observed FoP curves deviated from the expected, smooth and unimodal theoretical FoP curve, varied considerably among the nine insect species. Among-curve differences were, in most cases, interpreted as indications of sampling bias. Using BTG-type background data in many cases introduced strong sampling bias. The predicted relative FoP curves from MaxEnt were, in general, similar to the corresponding observed FoP curves. This indicates that the main structure of the data-sets were adequately summarised by the MaxEnt models (with the options and settings used), in turn suggesting that shortcomings of input data such as sampling bias or omission of important predictors may overshadow the effect of modelling method on the predictive performance of distribution models. The examples indicate that the two proposed frameworks are useful for identification of sampling bias in presence-only data and for choosing settings for distribution modelling options such as the method for extraction of background data points and determining the appropriate level of model complexity.

Aarvik, L., Hansen, L.O. & Kononenko, V. 2009. Norges sommerfugler. Håndbok over Norges dagsommerfugler og nattsvermere. – Norwegian Entomological Society & Natural History Museum, Univ. of Oslo, Oslo.

Ahti, T., Hämet-Ahti, L. & Jalas, J. 1968. Vegetation zones and their sections in northwestern Europe. – Annls bot. fenn. 5: 169-211.

Anderson, R.P. 2012. Harnessing the world’s biodiversity data: promise and peril in ecological niche modeling of species distributions. – Ann. N.Y. Acad. Sci. 1260: 66-80.

Anderson, R.P. & Gonzalez, I.J. 2011. Species-specific tuning increases robustness to sampling bias in models of species distributions: an implementation with Maxent. – Ecol. Modelling 222: 2796–2811.

Anderson, R.P., Peterson, A.T. & Gómez-Laverde, M. 2002. Using niche-based GIS modeling to test geographic predictions of competitive exclusion and competitive release in South American pocket mice. – Oikos 98: 3–16.

Araújo, M.B. & Guisan, A. 2006. Five (or so) challenges for species distribution modelling. –J. Biogeogr. 33: 1677–1688.

Araújo, M.B. & Luoto, M. 2007. The importance of biotic interactions for modelling species distributions under climate change. – Global Ecol. Biogeogr. 16: 743–753.

Austin, M. 2002. Spatial prediction of species distribution: an interface between ecological theory and statistical modelling. – Ecol. modelling 157: 101–118.

Austin, M. 2007. Species distribution models and ecological theory: a critical assessment and some possible new approaches. – Ecol. Modelling 200: 1–19.

Austin, M. & Smith, T., 1989. A new model for the continuum concept. – Vegetatio 83: 35–47.

Austin, M.P. & Gaywood, M.J., 1994. Current problems of environmental gradients and species response curves in relation to continuum theory. – Journal Veg. Sci. 5: 473–482.

Bakkestuen, V., Erikstad, L. & Halvorsen, R. 2008. Step-less models for regional environmental variation in Norway. – J. Biogeogr. 35: 1906–1922.

Bartsch, H., Binkiewitz, E., Rådén, A. & Nasibov, E. 2009. Nationalnyckeln till Sveriges flora och fauna. Tvåvingar: Blomflugor: Syrphinae. Diptera: Syrphidae: Syrphinae. – ArtDatabanken, Uppsala.

Bean, W.T., Stafford, R. & Brashares, J.S. 2012. The effects of small sample size and sample bias on threshold selection and accuracy assessment of species distribution models. –Ecography 35: 250–258.

Boakes, E.H., McGowan, P.J., Fuller, R.A., Chang-qing, D., Clark, N.E., O’Connor, K. & Mace, G.M. 2010. Distorted views of biodiversity: spatial and temporal bias in species occurrence data. – PLoS ONE 8: 1000385: 1–11.

Boulangeat, I., Gravel, D. & Thuiller, W. 2012. Accounting for dispersal and biotic interactions to disentangle the drivers of species distributions and their abundances. – Ecol. Letters 15: 584–593.

Brown, J.H. 1984. On the relationship between abundance and distribution of species. – Am. Nat. 124: 255–279.

Brown, J.H., Stevens, G.C. & Kaufman, D.M., 1996. The geographic range: size, shape, boundaries, and internal structure. – A. Rev. Ecol. Syst. 27: 597–623.

Bystriakova, N., Peregrym, M., Erkens, R.H., Bezsmertna, O. & Schneider, H. 2012. Sampling bias in geographic and environmental space and its effect on the predictive power of species distribution models. – Syst. Biodiv. 10: 305–315.

Cavanaugh, K.C., Siegel, D.A., Raimondi, P.T. & Alberto, F. 2014. Patch definition in metapopulation analysis: a graph theory approach to solve the mega-patch problem. – Ecology 95: 316-328.

Collins, S.L., Glenn, S.M. & Roberts, D.W., 1993. The hierarchical continuum concept. – J. Veg. Sci. 4: 149–156.

Cox, C.B. & Moore, P.D. 2010. Biogeography: an ecological and evolutionary approach, ed. 8. – Wiley, Chichester.

Crall, A.W., Jarnevich, C.S., Panke, B., Young, N., Renz, M. & Morisette, J. 2013. Using habitat suitability models to target invasive plant species surveys. – Ecol. Appl. 23: 60–72.

Crawley, M.J. 2013. The R book, ed. 2. – Wiley, Chichester.

Dahl, E. & Birks, J., 1998. The phytogeography of Northern Europe. – Cambridge University Press, Cambridge.

Edvardsen, A., Bakkestuen, V. & Halvorsen, R. 2011. A fine-grained spatial prediction model for the red-listed vascular plant Scorzonera humilis. – Nord. J. Bot. 29: 495–504.

Ehnström, B. & Holmer, M. 2007. Nationalnyckeln till Sveriges flora och fauna. Skalbaggar: Långhorningar. Coleoptera: Cerambycidae. – ArtDatabanken, Uppsala.

Eliasson, C.U., Ryrholm, N. & Gärdenfors, U. 2005. Nationalnyckeln till Sveriges flora och fauna: Fjärilar. Dagfjärilar: Hesperiidae–Nymphalidae. – ArtDatabanken, Uppsala.

Elith, J., Graham, C.H., Anderson, R.P., Dudik, M., Ferrier, S., Guisan, A., Hijmans, R.J., Huettmann, F., Leathwick, J.R., Lehmann, A., Li, J., Lohmann, L.G., Loiselle, B.A., Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J.M., Peterson, A.T., Phillips, S.J., Richardson, K., Scachetti-Pereira, R., Schapire, R.E., Soberon, J., Williams, S., Wisz, M.S. & Zimmermann, N.E. 2006. Novel methods improve prediction of species’ distributions from occurrence data. – Ecography 29: 129–151.

Elith, J., Phillips, S.J., Hastie, T., Dudík, M., Chee, Y.E. & Yates, C.J. 2011. A statistical explanation of MaxEnt for ecologists. – Divers. Distrib. 17: 43–57.

Ellenberg, H. 1954. Über enige Fortschritte der kausalen Vegetationskunde. – Vegetatio 5-6: 199-211.

Erikstad, L., Bakkestuen, V., Bekkby, T. & Halvorsen, R. 2013. Impact of scale and quality of digital terrain models on predictability of seabed terrain types. – Mar. Geod. 36: 2–21.

Fitzpatrick, M., Gotelli, N. & Ellison, A. 2013. MaxEnt versus MaxLike: empirical comparisons with ant species distributions. – Ecosphere 4: 55: 1–15.

Fourcade, Y., Engler, J.O., Rödder, D. & Secondi, J. 2014. Mapping species distributions with MAXENT using a geographically biased sample of presence data: a performance assessment of methods for correcting sampling bias. – PLoS ONE 9: e97322: 1-18.

Franklin, J. 2009. Mapping species distributions: spatial inference and prediction. – Cambridge University Press, Cambridge.

Gaston, K.J., Blackburn, T.M., Greenwood, J.J.D., Gregory, R.D., Quinn, R.M. & Lawton, J.H. 2000. Abundance–occupancy relationships. – J. Appl. Ecol. 37: 39–59.

Gaston, K.J., Chown, S.L. & Evans, K.L. 2008. Ecogeographical rules: elements of a synthesis. – J. Biogeogr. 35: 483–500.

Gauslaa, Y. 1984. Heat resistance and energy budget in different Scandinavian plants. – Ecography 7: 5–78.

Graham, J., Jarnevich, C., Young, N., Newman, G. & Stohlgren, T. 2011. How will climate change affect the potential distribution of Eurasian tree sparrows Passer montanus in North America? – Curr. Zool. 57: 648–654.

Grytnes, J.A., Birks, H.J.B. & Peglar, S.M., 1999. Plant species richness in Fennoscandia: evaluating the relative importance of climate and history. – Nord. J. Bot. 19: 489–503.

Guisan, A. & Zimmermann, N.E. 2000. Predictive habitat distribution models in ecology. – Ecol. Modelling 135: 147–186.

Gutiérrez, D., Fernández, P., Seymour, A.S. & Jordano, D. 2005. Habitat distribution models: are mutualist distributions good predictors of their associates? – Ecol. Appl. 15: 3–18.

Halvorsen, R. 2012. A gradient analytic perspective on distribution modelling. – Sommerfeltia 35: 1–165.

Halvorsen, R. 2013. A maximum likelihood explanation of MaxEnt, and some implications for distribution modelling. – Sommerfeltia 36: 1–132.

Halvorsen, R., Mazzoni, S., Bryn, A. & Bakkestuen, V. 2015. Opportunities for improved distridistribution modelling practice via a strict maximum likelihood interpretation of MaxEnt. – Ecography 38: 172-183.

Hansen, V. & Larsson, S., 1973. Biller X. Blødvinger, klannere m.m.: Malacodermata, Fossipedes, Marcrodactylia og Brachymera. – Danm. Fauna 44: 41–42.

Hanski, I. 1982. Dynamics of regional distribution: the core and satellite species hypothesis. – Oikos: 210–221.

Hanski, I. 1998. Metapopulation dynamics. – Nature 396: 41–49.

Hanski, I. & Ovaskainen, O. 2000. The metapopulation capacity of a fragmented landscape. – Nature 404: 755–758.

Hanski, I. & Simberloff, D., 1997. The metapopulation approach, its history, conceptual domain, and application to conservation. In: Hanski, I. & Gilpin, M.E. (eds.), Metapopulation biology: ecology, genetics, and evolution, Academic Press, San Diego, pp. 5–26.

Hastie, T., Tibshirani, R. & Friedman, J. 2009. The elements of statistical learning, ed. 2. – Springer, New York.

Hatteland, B.A., Roth, S., Andersen, A., Kaasa, K., Støa, B. & Solhøy, T. 2013. Distribution and spread of the invasive slug Arion vulgaris Moquin-Tandon in Norway. – Fauna norv. 32: 13–26.

Hebblewhite, M., Merrill, E. & McDonald, T. 2005. Spatial decomposition of predation risk using resource selection functions: an example in a wolf–elk predator–prey system. – Oikos 111: 101–111.

Heibl, C. & Renner, S.S. 2012. Distribution models and a dated phylogeny for Chilean Oxalis species reveal occupation of new habitats by different lineages, not rapid adaptive radiation. – Syst. Biol. 61: 823–834.

Heikkinen, R.K., Luoto, M., Virkkala, R., Pearson, R.G. & Körber, J.-H. 2007. Biotic interactions improve prediction of boreal bird distributions at macro-scales. – Global Ecol. Biogeogr. 16: 754–763.

Hengeveld, R. & Haeck, J., 1982. The distribution of abundance. I. Measurements. – J. Biogeogr. 9: 303–316.

Huston, M.A. 2002. Introductory essay: critical issues for improving predictions. In: Scott, J.M., Heglund, P.J., Morrison, M.L. Haufler, J.B., Raphael, M.G., Wall, W.A. & Samson, F.B. (eds) Predicting species occurrences: issues of accuracy and scale, Island Press, Washington, DC, pp. 7–21.

Jansen, F. & Oksanen, J. 2013. How to model species responses along ecological gradients – Huisman–Olff–Fresco models revisited. – J. Veg. Sci. 24: 1108–1117.

Jaynes, E.T. 1957a. Information theory and statistical mechanics. – Phys. Rev. 106: 620–630.

Jaynes, E.T. 1957b. Information theory and statistical mechanics 2. – Phys. Rev. 108: 171–190.

Jiménez-Valverde, A., Lobo, J. & Hortal, J. 2008. Not as good as they seem: the importance of concepts in species distribution modelling. – Divers. Distrib. 14: 885-890.

Kadmon, R., Farber, O. & Danin, A. 2003. A systematic analysis of factors affecting the performance of climatic envelope models. – Ecol. Appl. 13: 853–867.

Kramer-Schadt, S., Niedballa, J., Pilgrim, J.D., Schröder, B., Lindenborn, J., Reinfelder, V., Stillfried, M., Heckmann, I., Scharf, A.K., Augeri, D.M., Cheyne, S.M., Hearn, A.J., Ross, J., Macdonald, D.W., Mathai, J., Eaton, J., Marshall, A.J., Semiadi, G., Rustam, R., Bernard, H., Alfred, R., Samejima, H., Duckworth, J.W., Breitenmoser-Wuersten, C., Belant, J.L., Hofer, H. & Wilting, A. 2013. The importance of correcting for sampling bias in MaxEnt species distribution models. – Divers. Distrib. 19: 1366-1379.

Leathwick, J. & Austin, M. 2001. Competitive interactions between tree species in New Zealand’s old-growth indigenous forests. – Ecology 82: 2560–2573.

Lobo, J.M. 2008. More complex distribution models or more representative data? – Biodiv. Inform. 5: 14–19.

Loehle, C. 2012. Relative frequency function models for species distribution modeling. – Ecography 35: 487–498.

Loiselle, B.A., Jørgensen, P.M., Consiglio, T., Jiménez, I., Blake, J.G., Lohmann, L.G. & Montiel, O.M. 2008. Predicting species distributions from herbarium collections: does climate bias in collection sampling influence model outcomes? – J. Biogeogr. 35: 105–116.

MacKenzie, D.I., Nichols, J.D., Royle, J.A., Pollock, K.H., Bailey, L.L. & Hines, J.E. 2005. Occupancy estimation and modeling: inferring patterns and dynamics of species occurrence. – Academic Press, Amsterdam.

Mateo, R.G., Croat, T.B., Felicísimo, Á.M. & Muñoz, J. 2010. Profile or group discriminative techniques? Generating reliable species distribution models using pseudo-absences and target-group absences from natural history collections. – Divers. Distrib. 16: 84–94.

Mazzoni, S., Halvorsen, R. & Bakkestuen, V. 2015. MIAT: Modular R-wrappers for flexible implementation of MaxEnt Distribution Modelling. – Ecol. Informatics 30: 215-221.

McGill, B. & Collins, C. 2003. A unified theory for macroecology based on spatial patterns of abundance. – Evol. Ecol. Res. 5: 469–492.

Meier, E.S., Kienast, F., Pearman, P.B., Svenning, J.-C., Thuiller, W., Araújo, M.B., Guisan, A. & Zimmermann, N.E. 2010. Biotic and abiotic variables show little redundancy in explaining tree species distributions. – Ecography 33: 1038–1048.

Merow, C., Smith, M.J. & Silander, J.A. 2013. A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter. – Ecography 36: 1058–1069.

Millar, C.S. & Blouin-Demers, G. 2012. Habitat suitability modelling for species at risk is sensitive to algorithm and scale: A case study of Blanding’s turtle, Emydoidea blandingii, in Ontario, Canada. – J. Nat. Conserv. 20: 18–29.

Minchin, P.R. 1989. Montane vegetation of the Mt. Field massif, Tasmania: a test of some hypotheses about properties of community patterns. – Vegetatio 83: 97–110.

Moen, A. 1999. National atlas of Norway: vegetation. – Norwegian Mapping Authority, Høne-foss.

Oksanen, J. & Minchin, P.R. 2002. Continuum theory revisited: what shape are species responses along ecological gradients? – Ecol. Modelling 157: 119–129.

Pearce, J. & Ferrier, S. 2000. Evaluating the predictive performance of habitat models developed using logistic regression. – Ecol. Modelling 133: 225–245.

Pellissier, L., Pradervand, J.-N., Pottier, J., Dubuis, A., Maiorano, L. & Guisan, A. 2012. Climate-based empirical models show biased predictions of butterfly communities along environmental gradients. – Ecography 35: 684–692.

Phillips, S.J., Anderson, R.P., Dudík, M., Schapire, R.E. & Blair, M.E. 2017. Opening the black box: an open-source release of Maxent. – Ecography 40: 887-893.

Phillips, S.J., Anderson, R.P. & Schapire, R.E. 2006. Maximum entropy modeling of species geographic distributions. – Ecol. Modelling 190. 231–259.

Phillips, S.J. & Dudík, M. 2008. Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. – Ecography 31: 161–175.

Phillips, S.J., Dudík, M., Elith, J., Graham, C.H., Lehmann, A., Leathwich, J.R. & Ferrier, S. 2009.

Phillips, S.J., Dudík, M. & Schapire, R. 2004. A maximum entropy approach to species distribution modeling. – In: Anonymous (ed.), Proceedings of the 21st international conference on machine learning, ACM Press, New York, pp. 655-662.

Phillips, S.J. & Elith, J. 2010. POC plots: calibrating species distribution models with presence-only data. – Ecology 91,: 2476-2484.

Primack, R.B. & Miao, S.L., 1992. Dispersal can limit local plant distribution. – Conserv. Biol. 6,:513–519.

Redfern, J., Ferguson, M., Becker, E., Hyrenbach, K., Good, C.P., Barlow, J., Kaschner, K., Baumgartner, M.F., Forney, K., Ballance, L. 2006. Techniques for cetacean–habitat modeling. – Mar. Ecol. Prog. Ser. 310: 281-295.

Reineking, B. & Schröder, B. 2006. Constrain to perform: regularization of habitat models. – Ecol. Model. 193: 675–690.

Renner, I.W., & Warton, D.I. 2013. Equivalence of MAXENT and Poisson point process models for species distribution modeling in ecology. – Biometrics 69: 274–281.

Robertson, M.P., Cumming, G.S. & Erasmus, B.F.N. 2010. Getting the most out of atlas data. – Divers. Distrib. 16: 363-375.

Rydgren, K., Økland, R.H. & Økland, T. 2003. Species response curves along environmental gradients: a case study from SE Norwegian swamp forests. – J. Veg. Sci. 14: 869–880.

Schweiger, O., Heikkinen, R.K., Harpke, A., Hickler, T., Klotz, S., Kudrna, O., Kühn, I., Pöyry, J. & Settele, J. 2012. Increasing range mismatching of interacting species under global change is related to their ecological characteristics. – Global Ecol. Biogeogr. 21: 88–99.

Searcy, C.A. & Shaffer, H.B. 2014. Field validation supports novel niche modeling strategies in a cryptic endangered amphibian. – Ecography 37: 983-992.

Shmida, A. & Wilson, M.V. 1985. Biological determinants of species diversity. – J. Biogeogr. 12: 1-20.

Skre, O. 1979. The regional distribution of vascular plants in Scandinavia with requirements for high summer temperatures. – Nor. J. Bot. 26: 295–318.

Soberon, J. & Peterson, A.T. 2005. Interpretation of models of fundamental ecological niches and species’ distributional areas. – Biodivers. Informatics 2: 1-10..

Stokland, J.N., Halvorsen, R. & Støa, B. 2011. Species distribution modelling—Effect of design and sample size of pseudo-absence observations. – Ecol. Model. 222: 1800–1809.

Syfert, M.M., Smith, M.J. & Coomes, D.A. 2013. – PloS one 8: e55158: 1–10.

ter Braak, C.J.F. & Prentice, I.C., 1988. A theory of gradient analysis. – Adv. Ecol. Res. 18: 271–317.

Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. – J. R. Statist. Soc. Ser. B 58: 267-288.

Vollering, J., Mazzoni, S. & Halvorsen, R. 2016. Package ‘MIAmaxent’ Version 0.3.7. – The R foundation for statistical computing,

Warren, D.L. & Seifert, S.N. 2011. Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. – Ecol. Appl. 21: 335–342.

Westley, P.A., Ward, E.J. & Fleming, I.A. 2013. Fine-scale local adaptation in an invasive freshwater fish has evolved in contemporary time. – Proc. R. Soc. B: Biol. Sci. 280: e20122327.

Whittaker, R.H. 1956. Vegetation of the Great Smoky Mountains. – Ecol. Monogr. 26: 1–80.

Whittaker, R.H. 1967. Gradient analysis of vegetation. – Biol. Rev. 42: 207–264.

Wollan, A.K., Bakkestuen, V., Kauserud, H., Gulden, G. & Halvorsen, R. 2008. Modelling and predicting fungal distribution patterns using herbarium data. – J. Biogeogr. 35: 2298–2310.

Yackulic, C.B., Chandler, R., Zipkin, E.F., Royle, J.A., Nichols, J.D., Grant, E.H.C. & Veran, S. 2013. Presence-only modelling using MAXENT: when can we trust the inferences? – Meth. Ecol. Evol. 4: 236–243.

Økland, R.H. 1986. Rescaling of ecological gradients. II. The effect of scale on symmetry of species response curves. – Nord. J. Bot. 6: 661–670.

Økland, R.H. 1990. Vegetation ecology: theory, methods and applications with reference to Fennoscandia. – Sommerfeltia Suppl. 1: 1-233.

Økland, R.H. 1992. Studies in SE Fennoscandian mires: relevance to ecological theory. – J. Veg. Sci. 3: 279–284.


The Journal of Natural History Museum and University of Oslo

Journal Information


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 379 379 41
PDF Downloads 253 253 33