Back to Search Start Over

Machine learning-based prediction of diabetic patients using blood routine data.

Authors :
Li, Honghao
Su, Dongqing
Zhang, Xinpeng
He, Yuanyuan
Luo, Xu
Xiong, Yuqiang
Zou, Min
Wei, Huiyan
Wen, Shaoran
Xi, Qilemuge
Zuo, Yongchun
Yang, Lei
Source :
Methods. Sep2024, Vol. 229, p156-162. 7p.
Publication Year :
2024

Abstract

• A computational frame was proposed to predict diabetes that collected from hospitals and health center by leveraging blood routine data. • The contributions of different blood routine indicators for diabetes were identified by our frame. • A nomogram was constructed for assessing the influence of blood routine indicators on prediction outcomes. Diabetes stands as one of the most prevalent chronic diseases globally. The conventional methods for diagnosing diabetes are frequently overlooked until individuals manifest noticeable symptoms of the condition. This study aimed to address this gap by collecting comprehensive datasets, including 1000 instances of blood routine data from diabetes patients and an equivalent dataset from healthy individuals. To differentiate diabetes patients from their healthy counterparts, a computational framework was established, encompassing eXtreme Gradient Boosting (XGBoost), random forest, support vector machine, and elastic net algorithms. Notably, the XGBoost model emerged as the most effective, exhibiting superior predictive results with an area under the receiver operating characteristic curve (AUC) of 99.90% in the training set and 98.51% in the testing set. Moreover, the model showcased commendable performance during external validation, achieving an overall accuracy of 81.54%. The probability generated by the model serves as a risk score for diabetes susceptibility. Further interpretability was achieved through the utilization of the Shapley additive explanations (SHAP) algorithm, identifying pivotal indicators such as mean corpuscular hemoglobin concentration (MCHC), lymphocyte ratio (LY%), standard deviation of red blood cell distribution width (RDW-SD), and mean corpuscular hemoglobin (MCH). This enhances our understanding of the predictive mechanisms underlying diabetes. To facilitate the application in clinical and real-life settings, a nomogram was created based on the logistic regression algorithm, which can provide a preliminary assessment of the likelihood of an individual having diabetes. Overall, this research contributes valuable insights into the predictive modeling of diabetes, offering potential applications in clinical practice for more effective and timely diagnoses. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10462023
Volume :
229
Database :
Academic Search Index
Journal :
Methods
Publication Type :
Academic Journal
Accession number :
178735406
Full Text :
https://doi.org/10.1016/j.ymeth.2024.07.001