Back to Search Start Over

The reliability of a deep learning model in external memory clinic MRI data: A multi‐cohort study: New imaging methods.

Authors :
Mårtensson, Gustav
Ferreira, Daniel
Granberg, Tobias
Cavallin, Lena
Oppedal, Ketil
Padovani, Alessandro
Rektorova, Irena
Bonanni, Laura
Pardini, Matteo
Kramberger, Milica G.
Taylor, John‐Paul
Hort, Jakub
Snædal, Jón
Kulisevsky, Jaime
Blanc, Frédéric
Antonini, Angelo
Mecocci, Patrizia
Vellas, Bruno
Tsolaki, Magda
Kloszewska, Iwona
Source :
Alzheimer's & Dementia: The Journal of the Alzheimer's Association; Dec2020 Supplement S11, Vol. 16 Issue 11, p1-2, 2p
Publication Year :
2020

Abstract

Background: Deep learning (DL) has provided impressive results in numerous domains in recent years, including medical image analysis. Training DL models requires large data sets to yield good performance. Since medical data can be difficult to acquire, most studies rely on public research cohorts, which often have harmonized scanning protocols and strict exclusion criteria. This is not representative of a clinical setting. In this study, we investigated the performance of a DL model in out‐of‐distribution data from multiple memory clinics and research cohorts. Method: We trained multiple versions of AVRA: a DL model trained to predict visual ratings of Scheltens' medial temporal atrophy (MTA) scale (Mårtensson et al., 2019). This was done on different combinations of training data—starting with only harmonized MRI data from public research cohorts, and further increasing image heterogeneity in the training set by including external memory clinic data. We assessed the performance in multiple test sets by comparing AVRA's MTA ratings to an experienced radiologist's (who rated all images in this study). Data came from Alzheimer's Disease Neuroimaging Initiative (ADNI), AddNeuroMed, and images from 13 European memory clinics in the E‐DLB consortium. Results: Models trained only on research cohorts generalized well to new data acquired with similar protocols as the training data (weighted kappa κw between 0.70‐0.72), but worse to memory clinic data with more image variability (κw between 0.34‐0.66). This was most prominent in one specific memory clinic, where the DL model systematically predicted too low MTA scores. When including data from a wider range of scanners and protocols during training, the agreement to the radiologist's ratings in external memory clinics increased (κw between 0.51‐0.71). Conclusion: In this study we showed that increasing heterogeneity in training data improves generalization to out‐of‐distribution data. Our findings suggest that studies assessing reliability of a DL model should be done in multiple cohorts, and that softwares based on DL need to be rigorously evaluated prior to being certified for deployment to clinics. References: Mårtensson, G. et al. (2019) 'AVRA: Automatic Visual Ratings of Atrophy from MRI images using Recurrent Convolutional Neural Networks', NeuroImage: Clinical. Elsevier, 23(March), p. 101872. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15525260
Volume :
16
Issue :
11
Database :
Supplemental Index
Journal :
Alzheimer's & Dementia: The Journal of the Alzheimer's Association
Publication Type :
Academic Journal
Accession number :
147466979
Full Text :
https://doi.org/10.1002/alz.042969