Back to Search Start Over

Simulating multi-scale optimization and variable selection in species distribution modeling

Authors :
Samuel A. Cushman
Zaneta M. Kaszta
Patrick Burns
Christopher R. Hakkenberg
Patrick Jantz
David W. Macdonald
Jedediah F. Brodie
Mairin C.M. Deith
Scott Goetz
Source :
Ecological Informatics, Vol 83, Iss , Pp 102832- (2024)
Publication Year :
2024
Publisher :
Elsevier, 2024.

Abstract

Species distribution modeling (SDM) is a fundamental tool in theoretical and applied ecology. However, relatively little is known about the performance of different approaches for scale optimization, model selection, and algorithmic prediction in the context of nonlinear, multiscale and interactive relationships between environmental variables and species occurrence. Modelers often struggle to optimize a tradeoff between ecological relevance, model robustness, complexity, and overfitting. In this paper, we investigated several methods designed to optimize spatial scale and variable selection in SDMs, in each case evaluating model fitness, parsimony and predictive performance. We used a simulation approach to produce a large pool of alternative underlying habitat relationships that reflect a broad range of realistic habitat associations. We also compared several different modeling algorithms, including logistic regression with a generalized linear model (GLM), Lasso and Elastic-Net Regularized GLMs (GLMNet), and random forest (RF), as well as alternative variable and scale selection methods. We found that GLM methods employing all-subsets dredge routines for variable selection were consistently the best predictors based on all criteria of our model performance assessment and across all attributes of the simulated underlying relationship, including nonlinearity and interaction. We had expected machine learning approaches, such as random forest, to perform better in these more complex forms of species-environment relationships. GLM using dredge variable selection was also the method that included the fewest spurious covariates and included the most correct predictors as a proportion of all predictors. We found that univariate scaling was the most robust method of variable and scale selection, along with Minimal Redundancy Maximal Relevancy (MRMR) which performed equivalently. The simulation experiment presented here provides a robust assessment of simulated multi-species distribution model performance, complexity and fidelity. By simulating a large range of potential habitat relationships with varying spatial scale, effect sizes, linearity, and interactions, we comprehensively evaluated model performance across gradients of complexity of the underlying relationships and violations of classical statistical assumptions. This study provides a valuable assessment and a broader example of the power and utility of controlled simulation experiments in habitat relationships and other ecological spatial predictive modeling.

Details

Language :
English
ISSN :
15749541
Volume :
83
Issue :
102832-
Database :
Directory of Open Access Journals
Journal :
Ecological Informatics
Publication Type :
Academic Journal
Accession number :
edsdoj.7093467999ae46dc91786eb88bf3baa3
Document Type :
article
Full Text :
https://doi.org/10.1016/j.ecoinf.2024.102832