Back to Search Start Over

Identifying cardiovascular disease risk in the U.S. population using environmental volatile organic compounds exposure: A machine learning predictive model based on the SHAP methodology

Authors :
Qingan Fu
Yanze Wu
Min Zhu
Yunlei Xia
Qingyun Yu
Zhekang Liu
Xiaowei Ma
Renqiang Yang
Source :
Ecotoxicology and Environmental Safety, Vol 286, Iss , Pp 117210- (2024)
Publication Year :
2024
Publisher :
Elsevier, 2024.

Abstract

Background: Cardiovascular disease (CVD) remains a leading cause of mortality globally. Environmental pollutants, specifically volatile organic compounds (VOCs), have been identified as significant risk factors. This study aims to develop a machine learning (ML) model to predict CVD risk based on VOC exposure and demographic data using SHapley Additive exPlanations (SHAP) for interpretability. Methods: We utilized data from the National Health and Nutrition Examination Survey (NHANES) from 2011 to 2018, comprising 5098 participants. VOC exposure was assessed through 15 urinary metabolite metrics. The dataset was split into a training set (70 %) and a test set (30 %). Six ML models were developed, including Random Forest (RF), Light Gradient Boosting Machine (LightGBM), Decision Tree (DT), Extreme Gradient Boosting (XGBoost), Multi-Layer Perceptron (MLP), and Support Vector Machines (SVM). Model performance was evaluated using the Area Under the Receiver Operating Characteristic Curve (AUROC), accuracy, balanced accuracy, F1 score, J-index, kappa, Matthew's correlation coefficient (MCC), positive predictive value (PPV), negative predictive value (NPV), sensitivity (sens), specificity (spec) and SHAP was applied to interpret the best-performing model. Results: The RF model exhibited the highest predictive performance with an ROC of 0.8143. SHAP analysis identified age and ATCA as the most significant predictors, with ATCA showing a protective effect against CVD, particularly in older adults and those with hypertension. The study found a significant interaction between ATCA levels and age, indicating that the protective effect of ATCA is more pronounced in older individuals due to increased oxidative stress and inflammatory responses associated with aging. E-values analysis suggested robustness to unmeasured confounders. Conclusions: This study is the first to utilize VOC exposure data to construct an ML model for predicting CVD risk. The findings highlight the potential of combining environmental exposure data with demographic information to enhance CVD risk prediction, supporting the development of personalized prevention and intervention strategies.

Details

Language :
English
ISSN :
01476513
Volume :
286
Issue :
117210-
Database :
Directory of Open Access Journals
Journal :
Ecotoxicology and Environmental Safety
Publication Type :
Academic Journal
Accession number :
edsdoj.7f7d633661c44812bdea0e836d846b07
Document Type :
article
Full Text :
https://doi.org/10.1016/j.ecoenv.2024.117210