1. Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP.
- Author
-
Scavuzzo CM, Scavuzzo JM, Campero MN, Anegagrie M, Aramendia AA, Benito A, and Periago V
- Abstract
In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the "black box" paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies., Competing Interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (© 2022 The Authors.)
- Published
- 2022
- Full Text
- View/download PDF