1. Comparison of predicting cardiovascular disease hospitalization using individual, ZIP code-derived, and machine learning model-predicted educational attainment in New York City.
- Author
-
Takkavatakarn K, Dai Y, Hsun Wen H, Kauffman J, Charney A, Coca SG, Nadkarni GN, and Chan L
- Subjects
- Humans, Retrospective Studies, New York City epidemiology, Educational Status, Hospitalization, Machine Learning, Cardiovascular Diseases epidemiology
- Abstract
Background: Area-level social determinants of health (SDOH) based on patients' ZIP codes or census tracts have been commonly used in research instead of individual SDOHs. To our knowledge, whether machine learning (ML) could be used to derive individual SDOH measures, specifically individual educational attainment, is unknown., Methods: This is a retrospective study using data from the Mount Sinai BioMe Biobank. We included participants that completed a validated questionnaire on educational attainment and had home addresses in New York City. ZIP code-level education was derived from the American Community Survey matched for the participant's gender and race/ethnicity. We tested several algorithms to predict individual educational attainment from routinely collected clinical and demographic data. To evaluate how using different measures of educational attainment will impact model performance, we developed three distinct models for predicting cardiovascular (CVD) hospitalization. Educational attainment was imputed into models as either survey-derived, ZIP code-derived, or ML-predicted educational attainment., Results: A total of 20,805 participants met inclusion criteria. Concordance between survey and ZIP code-derived education was 47%, while the concordance between survey and ML model-predicted education was 67%. A total of 13,715 patients from the cohort were included into our CVD hospitalization prediction models, of which 1,538 (11.2%) had a history of CVD hospitalization. The AUROC of the model predicting CVD hospitalization using survey-derived education was significantly higher than the model using ZIP code-level education (0.77 versus 0.72; p < 0.001) and the model using ML model-predicted education (0.77 versus 0.75; p < 0.001). The AUROC for the model using ML model-predicted education was also significantly higher than that using ZIP code-level education (p = 0.003)., Conclusion: The concordance of survey and ZIP code-level educational attainment in NYC was low. As expected, the model utilizing survey-derived education achieved the highest performance. The model incorporating our ML model-predicted education outperformed the model relying on ZIP code-derived education. Implementing ML techniques can improve the accuracy of SDOH data and consequently increase the predictive performance of outcome models., Competing Interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: S.G.C. has salary support from NIH grants R01DK115562, UO1DK106962, R01HL085757, R01DK112258, R01DK115562, R01DK126477 and UH3DK114920. SGC reports personal income and equity and stock options from Renalytix and pulse Data; he also reports personal income from Axon Therapeutics, Bayer, Boehringer-Ingelheim, CHF Solutions, ProKidney, Vifor, and Takeda. G.N.N. has received consulting fees from AstraZeneca, Reata, BioVie, and GLG Consulting; has received financial compensation as a scientific board member and advisor to RenalytixAI; and owns equity in RenalytixAI and Pensieve Health as a cofounder. D.M.C has received financial compensation for consulting or service on clinical trials comitess from Eli Lilly/Boehringer Ingelheim, Janssen,Astra Zeneca, Allena Pharmaceuticals, Fresenius, Amgen, Gilead, Novo Nordisk, GSK, Medtronic, Merck, Amgen and CSL Behring and receives research funding from Medtronic for clinical trial support, Gilead, NovoNordisk, and Amgen, as well as expert witness fees related to proton pump inhibitors. He also serves as an editor for CJASN. L.C. has received consulting fees from Vifor Pharma, honorarium from Fresenius Medical Care, and is supported in part by K23DK124645. All remaining authors have declared no conflicts of interest. This does not alter our adherence to PLOS ONE policies on sharing data and materials, (Copyright: © 2024 Takkavatakarn et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
- Published
- 2024
- Full Text
- View/download PDF