Back to Search Start Over

Using Hypothesis-led Machine Learning and Hierarchical Cluster Analysis to Predict Incident Dementia Based on Patterns of Disease in Longitudinal Health Records (Preprint)

Authors :
Shih-Tsung Huang
Tsung-Hsien Tsai
Pei-Jung Chen
Li-Ning Peng
Fei-Yuan Hsiao
Liang-Kung Chen
Publication Year :
2022
Publisher :
JMIR Publications Inc., 2022.

Abstract

BACKGROUND Dementia development is a complex process in which the occurrence and sequential relationships of different diseases or conditions may construct specific patterns leading to incident dementia. OBJECTIVE This study aimed to identify patterns of disease or symptom clusters and their sequences prior to incident dementia using a novel approach incorporating machine learning methods to identify at-risk patterns of disease or symptom clusters and their sequences for preventive intervention activities. METHODS Using Taiwan’s National Health Insurance Research Database (NHIRD), data from 15,700 older people with dementia and 15,700 nondementia controls matched on age, sex, and index year (training dataset [67%] and the testing dataset [33%]) were retrieved for analysis. Using machine learning methods to capture specific hierarchical disease triplet clusters prior to dementia, we designed a study algorithm with four steps: (1) data preprocessing, (2) disease pathway selection, (3) model construction and optimization, and (4) data visualization. RESULTS Among 15,700 identified older people with dementia, 10,466 and 5,234 subjects were randomly assigned to the training and testing datasets, and 6,215 hierarchical disease triplet clusters with positive correlations with dementia onset were identified. We subsequently generated 19,438 features to construct prediction models, and the model with the best performance was support vector machine (SVM) with the by-group Lasso regression method (total corresponding features=2,513; accuracy=0.615; sensitivity=0.607; specificity=0.622; positive prediction value [PPV]=0.612; negative prediction value [NPV]=0.619; area under the curve [AUC]=0.639). In total, the current study captured 49 hierarchical disease triplet clusters related to dementia development, and the most characteristic patterns leading to incident dementia started with cardiovascular conditions (mainly hypertension), cerebrovascular disease, mobility disorders, or infections, followed by neuropsychiatric conditions. CONCLUSIONS Dementia development in the real world is an intricate process involving various diseases or conditions, their co-occurrence, and sequential relationships. Using a machine learning approach, we identified 49 hierarchical disease triplet clusters with leading roles (cardio- or cerebrovascular disease) and supporting roles (mental conditions, locomotion difficulties, infections, and nonspecific neurological conditions) in dementia development. Further studies using data from other countries are needed to validate the prediction algorithms for dementia development, allowing the development of comprehensive strategies to prevent or care for dementia in the real world.

Details

Database :
OpenAIRE
Accession number :
edsair.doi...........86218fffbe1853435a7609a284ab4ad3
Full Text :
https://doi.org/10.2196/preprints.41858