1. Analyzing Unimproved Drinking Water Sources and Their Determinants Using Supervised Machine Learning: Evidence from the Somaliland Demographic Health Survey 2020.
- Author
-
Ismail, Hibak M., Muse, Abdisalam Hassan, Hassan, Mukhtar Abdi, Muse, Yahye Hassan, and Nadarajah, Saralees
- Subjects
MACHINE learning ,SUPERVISED learning ,SUPPORT vector machines ,K-nearest neighbor classification ,DRINKING water - Abstract
Access to clean and safe drinking water is a fundamental human right. Despite global efforts, including the UN's "Water for Life" program, a significant portion of the population in developing countries, including Somaliland, continues to rely on unimproved water sources. These unimproved sources contribute to poor health outcomes, particularly for children. This study aimed to investigate the factors associated with the use of unimproved drinking water sources in Somaliland by employing supervised machine learning models to predict patterns and determinants based on data from the 2020 Somaliland Demographic and Health Survey (SHDS). Secondary data from SHDS 2020 were used, encompassing 8384 households across Somaliland. A multilevel logistic regression model was applied to analyze the individual- and community-level factors influencing the use of unimproved water sources. In addition, machine learning models, including logistic regression, decision tree, random forest, support vector machine (SVM), and K-nearest neighbor (KNN), were compared in terms of accuracy, sensitivity, specificity, and other metrics using cross-validation techniques. This study uses supervised machine learning models to analyze unimproved drinking water sources in Somaliland, providing data-driven insights into the complex determinants of water access. This enhances predictive accuracy and informs targeted interventions, offering a robust framework for addressing water-related public health issues in Somaliland. The analysis identified key determinants of unimproved water source usage, including socioeconomic status, education, region, and household characteristics. The random forest model performed the best with an accuracy of 93.57% and an area under the curve (AUC) score of 98%. Decision tree and KNN also exhibited strong performance, while SVM had the lowest predictive accuracy. This study highlights the role of socioeconomic and community factors in determining access to clean drinking water in Somali Land. Factors such as age, education, gender, household wealth, media access, urban or rural residence, poverty level, and literacy level significantly influenced access. Local policies and resource availability also contribute to variations in access. These findings suggest that targeted interventions aimed at improving education, infrastructure, and community water management practices can significantly reduce reliance on unimproved water sources and improve the overall public health. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF