Back to Search Start Over

Machine Learning Approaches to Predict Alcohol Consumption from Biomarkers in the UK Biobank.

Authors :
Hassan MF
Gentry AE
Prom-Wormley EC
Peterson RE
Webb BT
Source :
MedRxiv : the preprint server for health sciences [medRxiv] 2024 Dec 24. Date of Electronic Publication: 2024 Dec 24.
Publication Year :
2024

Abstract

Background: Measuring and estimating alcohol consumption (AC) is important for individual health, public health, and Societal benefits. While self-report and diagnostic interviews are commonly used, incorporating biological-based indices can offer a complementary approach.<br />Methods: We evaluate machine learning (ML) based predictions of AC using blood and urine-derived biomarkers. This research has been conducted using the UK Biobank (UKB) Resource. In addition to the prediction of the number of alcoholic Drinks Per Week (DPW), four other related phenotypes were predicted for performance comparison. Five ML models were assessed including LASSO, Ridge regression, Gradient Boosting Machines (GBM), Model Boosting (MBOOST), and Extreme Gradient Boosting (XGBOOST).<br />Results: All five ML methods achieved moderate prediction of DPW (r <superscript>2</superscript> =0.304-0.356) with biomarkers significantly increasing prediction above using only known covariates and liver enzymes (r <superscript>2</superscript> =0.105). XGBOOST achieved the best prediction performance (r <superscript>2</superscript> =0.356, MAE=5.214) at the expense of increasing model complexity and training resources compared to other ML methods. All ML models were able to accurately predict if subjects were heavy drinkers (DPW>8 for women and DPW>15 for men) and produced explainable models that highlighted the role of biomarkers in predicting DPW. While phenotype correlations were similar across methods, XGBOOST produced similar heritability estimates for observed (h <superscript>2</superscript> =0.064) and predicted (h <superscript>2</superscript> =0.077) DPW. The estimated genetic correlation between observed and predicted DPW was 0.877.<br />Conclusions: Predicting AC from ML-based biological measures provides an opportunity to identify individuals at increased risk of heavy AC, thereby offering complementary avenue for risk assessment beyond self-report, screening instruments, or structured interviews, which have some known biases. In addition, explainable AI tools identified a constellation of biomarkers associated with AC.

Details

Language :
English
Database :
MEDLINE
Journal :
MedRxiv : the preprint server for health sciences
Publication Type :
Academic Journal
Accession number :
39763569
Full Text :
https://doi.org/10.1101/2024.12.22.24319486