Back to Search
Start Over
The reliability of a deep learning model in external memory clinic MRI data: A multi‐cohort study: New imaging methods.
- Source :
- Alzheimer's & Dementia: The Journal of the Alzheimer's Association; Dec2020 Supplement S11, Vol. 16 Issue 11, p1-2, 2p
- Publication Year :
- 2020
-
Abstract
- Background: Deep learning (DL) has provided impressive results in numerous domains in recent years, including medical image analysis. Training DL models requires large data sets to yield good performance. Since medical data can be difficult to acquire, most studies rely on public research cohorts, which often have harmonized scanning protocols and strict exclusion criteria. This is not representative of a clinical setting. In this study, we investigated the performance of a DL model in out‐of‐distribution data from multiple memory clinics and research cohorts. Method: We trained multiple versions of AVRA: a DL model trained to predict visual ratings of Scheltens' medial temporal atrophy (MTA) scale (Mårtensson et al., 2019). This was done on different combinations of training data—starting with only harmonized MRI data from public research cohorts, and further increasing image heterogeneity in the training set by including external memory clinic data. We assessed the performance in multiple test sets by comparing AVRA's MTA ratings to an experienced radiologist's (who rated all images in this study). Data came from Alzheimer's Disease Neuroimaging Initiative (ADNI), AddNeuroMed, and images from 13 European memory clinics in the E‐DLB consortium. Results: Models trained only on research cohorts generalized well to new data acquired with similar protocols as the training data (weighted kappa κw between 0.70‐0.72), but worse to memory clinic data with more image variability (κw between 0.34‐0.66). This was most prominent in one specific memory clinic, where the DL model systematically predicted too low MTA scores. When including data from a wider range of scanners and protocols during training, the agreement to the radiologist's ratings in external memory clinics increased (κw between 0.51‐0.71). Conclusion: In this study we showed that increasing heterogeneity in training data improves generalization to out‐of‐distribution data. Our findings suggest that studies assessing reliability of a DL model should be done in multiple cohorts, and that softwares based on DL need to be rigorously evaluated prior to being certified for deployment to clinics. References: Mårtensson, G. et al. (2019) 'AVRA: Automatic Visual Ratings of Atrophy from MRI images using Recurrent Convolutional Neural Networks', NeuroImage: Clinical. Elsevier, 23(March), p. 101872. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 15525260
- Volume :
- 16
- Issue :
- 11
- Database :
- Supplemental Index
- Journal :
- Alzheimer's & Dementia: The Journal of the Alzheimer's Association
- Publication Type :
- Academic Journal
- Accession number :
- 147466979
- Full Text :
- https://doi.org/10.1002/alz.042969