1. Blinded Clinical Evaluation for Dementia of Alzheimer’s Type Classification Using FDG-PET: A Comparison Between Feature-Engineered and Non-Feature-Engineered Machine Learning Methods
- Author
-
Karteek Popuri, Stephan Probst, Evangeline Yee, Da Ma, Jane Stocks, Lisanne M. Jenkins, Guillaume Chaussé, Lei Wang, and Mirza Faisal Beg
- Subjects
0301 basic medicine ,Neuroimaging ,Machine learning ,computer.software_genre ,Convolutional neural network ,Article ,Machine Learning ,03 medical and health sciences ,0302 clinical medicine ,Alzheimer Disease ,Fluorodeoxyglucose F18 ,Classifier (linguistics) ,Humans ,Medicine ,Dementia ,Cognitive Dysfunction ,Generalizability theory ,business.industry ,General Neuroscience ,Brain ,General Medicine ,medicine.disease ,Clinical Practice ,Psychiatry and Mental health ,Clinical Psychology ,030104 developmental biology ,Feature (computer vision) ,Positron-Emission Tomography ,Clinical diagnosis ,Neural Networks, Computer ,Artificial intelligence ,Radiopharmaceuticals ,Geriatrics and Gerontology ,business ,computer ,Clinical evaluation ,030217 neurology & neurosurgery - Abstract
Background: Advanced machine learning methods can aid in the identification of dementia risk using neuroimaging-derived features including FDG-PET. However, to enable the translation of these methods and test their usefulness in clinical practice, it is crucial to conduct independent validation on real clinical samples, which has yet to be properly delineated in the current literature. Objective: In this paper, we present our efforts to enable such clinical translational through the evaluation and comparison of two machine-learning methods for discrimination between dementia of Alzheimer’s type (DAT) and Non-DAT controls. Methods: FDG-PET-based dementia scores were generated on an independent clinical sample whose clinical diagnosis was blinded to the algorithm designers. A feature-engineered approach (multi-kernel probability classifier) and a non-feature-engineered approach (3D convolutional neural network) were analyzed. Both classifiers were pre-trained on cognitively normal subjects as well as subjects with DAT. These two methods provided a probabilistic dementia score for this previously unseen clinical data. Performance of the algorithms were compared against ground-truth dementia rating assessed by experienced nuclear physicians. Results: Blinded clinical evaluation on both classifiers showed good separation between the cognitively normal subjects and the patients diagnosed with DAT. The non-feature-engineered dementia score showed higher sensitivity among subjects whose diagnosis was in agreement between the machine-learning models, while the feature-engineered approach showed higher specificity in non-consensus cases. Conclusion: In this study, we demonstrated blinded evaluation using data from an independent clinical sample for assessing the performance in DAT classification models in a clinical setting. Our results showed good generalizability for two machine-learning approaches, marking an important step for the translation of pre-trained machine-learning models into clinical practice.
- Published
- 2021