1. STI/HIV risk prediction model development—A novel use of public data to forecast STIs/HIV risk for men who have sex with men
- Author
-
Xiaopeng Ji, Zhaohui Tang, Sonya R. Osborne, Thi Phuoc Van Nguyen, Amy B. Mullens, Judith A. Dean, and Yan Li
- Subjects
human immunodeficiency virus ,sexually transmissible infections ,artificial intelligence ,machine learning ,risk prediction ,Public aspects of medicine ,RA1-1270 - Abstract
A novel automatic framework is proposed for global sexually transmissible infections (STIs) and HIV risk prediction. Four machine learning methods, namely, Gradient Boosting Machine (GBM), Random Forest (RF), XG Boost, and Ensemble learning GBM-RF-XG Boost are applied and evaluated on the Demographic and Health Surveys Program (DHSP), with thirteen features ultimately selected as the most predictive features. Classification and generalization experiments are conducted to test the accuracy, F1-score, precision, and area under the curve (AUC) performance of these four algorithms. Two imbalanced data solutions are also applied to reduce bias for classification performance improvement. The experimental results of these models demonstrate that the Random Forest algorithm yields the best results on HIV prediction, whereby the highest accuracy, and AUC are 0.99 and 0.99, respectively. The performance of the STI prediction achieves the best when the Synthetic Minority Oversampling Technique (SMOTE) is applied (Accuracy = 0.99, AUC = 0.99), which outperforms the state-of-the-art baselines. Two possible factors that may affect the classification and generalization performance are further analyzed. This automatic classification model helps to improve convenience and reduce the cost of HIV testing.
- Published
- 2025
- Full Text
- View/download PDF