Back to Search Start Over

Predictive performance of presence‐only species distribution models: a benchmark study with reproducible code.

Authors :
Valavi, Roozbeh
Guillera‐Arroita, Gurutzeta
Lahoz‐Monfort, José J.
Elith, Jane
Source :
Ecological Monographs. Feb2022, Vol. 92 Issue 1, p1-27. 27p.
Publication Year :
2022

Abstract

Species distribution modeling (SDM) is widely used in ecology and conservation. Currently, the most available data for SDM are species presence‐only records (available through digital databases). There have been many studies comparing the performance of alternative algorithms for modeling presence‐only data. Among these, a 2006 paper from Elith and colleagues has been particularly influential in the field, partly because they used several novel methods (at the time) on a global data set that included independent presence–absence records for model evaluation. Since its publication, some of the algorithms have been further developed and new ones have emerged. In this paper, we explore patterns in predictive performance across methods, by reanalyzing the same data set (225 species from six different regions) using updated modeling knowledge and practices. We apply well‐established methods such as generalized additive models and MaxEnt, alongside others that have received attention more recently, including regularized regressions, point‐process weighted regressions, random forests, XGBoost, support vector machines, and the ensemble modeling framework biomod. All the methods we use include background samples (a sample of environments in the landscape) for model fitting. We explore impacts of using weights on the presence and background points in model fitting. We introduce new ways of evaluating models fitted to these data, using the area under the precision‐recall gain curve, and focusing on the rank of results. We find that the way models are fitted matters. The top method was an ensemble of tuned individual models. In contrast, ensembles built using the biomod framework with default parameters performed no better than single moderate performing models. Similarly, the second top performing method was a random forest parameterized to deal with many background samples (contrasted to relatively few presence records), which substantially outperformed other random forest implementations. We find that, in general, nonparametric techniques with the capability of controlling for model complexity outperformed traditional regression methods, with MaxEnt and boosted regression trees still among the top performing models. All the data and code with working examples are provided to make this study fully reproducible. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00129615
Volume :
92
Issue :
1
Database :
Academic Search Index
Journal :
Ecological Monographs
Publication Type :
Academic Journal
Accession number :
155005035
Full Text :
https://doi.org/10.1002/ecm.1486