1. Prevalence affects the evaluation of discrimination capacity in presence-absence species distribution models.
- Author
-
Jiménez-Valverde, Alberto
- Subjects
SPECIES distribution ,RECEIVER operating characteristic curves ,SAMPLE size (Statistics) - Abstract
The aim of this study is to understand how prevalence—the ratio of instances of presence to total sample size—affects the estimation of three discrimination indexes commonly used in distribution modelling: the area under the receiver operating characteristic curve (AUC), the value of sensitivity at the threshold where sensitivity equals specificity (Se*), and the maximum value of the Youden index or true skill statistic (Y). For four sample size levels, samples of suitability scores for the instances of presence and absence with varying prevalences were simulated from known distributions so that the true values of the discrimination indexes were known, and the three indexes were empirically estimated (AUC
est , Se*est , Yest ). AUCest and Se*est are unbiased estimators, and the greatest precision is achieved with a balanced prevalence. As sample size increases, there is a larger prevalence interval around 0.5 in which precision is more or less stable. As a rule of thumb, in the case of n ≤ 100, at least ten observations of the rare state (either instances of presence or absence) should be considered, whereas the safety prevalence interval [0.01, 0.99] should be used for higher sample sizes. The lower the true discriminative power of the models, the higher the negative effect of prevalence. Yest is positively biased, and bias and precision become worse towards low and high prevalences. Highly unbalanced prevalences increase the imprecision in estimating the discrimination capacity of the models. Y is not recommended as a discrimination measure since it provides overoptimistic results. [ABSTRACT FROM AUTHOR]- Published
- 2021
- Full Text
- View/download PDF