1. An omics-based machine learning approach to predict diabetes progression: a RHAPSODY study
- Author
-
Slieker, Roderick C. (author), Münch, Magnus (author), Donnelly, Louise A. (author), Bouland, G.A. (author), Dragan, Iulian (author), Kuznetsov, Dmitry (author), Elders, Petra J.M. (author), Rutter, Guy A. (author), Ibberson, Mark (author), Slieker, Roderick C. (author), Münch, Magnus (author), Donnelly, Louise A. (author), Bouland, G.A. (author), Dragan, Iulian (author), Kuznetsov, Dmitry (author), Elders, Petra J.M. (author), Rutter, Guy A. (author), and Ibberson, Mark (author)
- Abstract
Aims/hypothesis: People with type 2 diabetes are heterogeneous in their disease trajectory, with some progressing more quickly to insulin initiation than others. Although classical biomarkers such as age, HbA1c and diabetes duration are associated with glycaemic progression, it is unclear how well such variables predict insulin initiation or requirement and whether newly identified markers have added predictive value. Methods: In two prospective cohort studies as part of IMI-RHAPSODY, we investigated whether clinical variables and three types of molecular markers (metabolites, lipids, proteins) can predict time to insulin requirement using different machine learning approaches (lasso, ridge, GRridge, random forest). Clinical variables included age, sex, HbA1c, HDL-cholesterol and C-peptide. Models were run with unpenalised clinical variables (i.e. always included in the model without weights) or penalised clinical variables, or without clinical variables. Model development was performed in one cohort and the model was applied in a second cohort. Model performance was evaluated using Harrel’s C statistic. Results: Of the 585 individuals from the Hoorn Diabetes Care System (DCS) cohort, 69 required insulin during follow-up (1.0–11.4 years); of the 571 individuals in the Genetics of Diabetes Audit and Research in Tayside Scotland (GoDARTS) cohort, 175 required insulin during follow-up (0.3–11.8 years). Overall, the clinical variables and proteins were selected in the different models most often, followed by the metabolites. The most frequently selected clinical variables were HbA1c (18 of the 36 models, 50%), age (15 models, 41.2%) and C-peptide (15 models, 41.2%). Base models (age, sex, BMI, HbA1c) including only clinical variables performed moderately in both the DCS discovery cohort (C statistic 0.71 [95% CI 0.64, 0.79]) and the GoDARTS replication cohort (C 0.71 [95% CI 0.69, 0.75]). A more extensive model including HD, Pattern Recognition and Bioinformatics
- Published
- 2024
- Full Text
- View/download PDF