1. Foundation for unbiased cross-validation of spatio-temporal models for species distribution modeling
- Author
-
Koldasbayeva, Diana and Zaytsev, Alexey
- Subjects
Statistics - Applications ,Computer Science - Machine Learning - Abstract
Species Distribution Models (SDMs) often suffer from spatial autocorrelation (SAC), leading to biased performance estimates. We tested cross-validation (CV) strategies - random splits, spatial blocking with varied distances, environmental (ENV) clustering, and a novel spatio-temporal method - under two proposed training schemes: LAST FOLD, widely used in spatial CV at the cost of data loss, and RETRAIN, which maximizes data usage but risks reintroducing SAC. LAST FOLD consistently yielded lower errors and stronger correlations. Spatial blocking at an optimal distance (SP 422) and ENV performed best, achieving Spearman and Pearson correlations of 0.485 and 0.548, respectively, although ENV may be unsuitable for long-term forecasts involving major environmental shifts. A spatio-temporal approach yielded modest benefits in our moderately variable dataset, but may excel with stronger temporal changes. These findings highlight the need to align CV approaches with the spatial and temporal structure of SDM data, ensuring rigorous validation and reliable predictive outcomes.
- Published
- 2025