1. Expert Variability and Deep Learning Performance in Spinal Cord Lesion Segmentation for Multiple Sclerosis Patients
- Author
-
Walsh, Ricky, Meurée, Cédric, Kerbrat, Anne, Masson, Arthur, Hussein, Burhan Rashid, Gaubert, Malo, Galassi, Francesca, Combès, Benoit, Neuroimagerie: méthodes et applications (EMPENN), Institut National de la Santé et de la Recherche Médicale (INSERM)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-SIGNAL, IMAGE ET LANGAGE (IRISA-D6), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Service de Neurologie [Rennes] = Neurology [Rennes], CHU Pontchaillou [Rennes], Centre d'Investigation Clinique [Rennes] (CIC), Université de Rennes (UR)-Hôpital Pontchaillou-Institut National de la Santé et de la Recherche Médicale (INSERM), Service de Neuroradiologie [Rennes], ANR-20-THIA-0018,AI4SDA,IA pour l'analyse de données sémantiques(2020), ANR-21-RHUS-0014,Primus,Transforming the care of patients with Multiple Sclerosis using a multidimensional data-driven clinical decision support system(2021), and ANR-10-COHO-0002,OFSEP,Observatoire Français de la Sclérose en Plaques(2010)
- Subjects
Multiple sclerosis spinal cord magnetic resonance imaging lesion segmentation inter-rater variability intrarater variability deep learning automated segmentation ,spinal cord ,deep learning ,intrarater variability ,inter-rater variability ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Multiple sclerosis ,intra-rater variability ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] ,lesion segmentation ,[INFO.INFO-IM]Computer Science [cs]/Medical Imaging ,magnetic resonance imaging ,automated segmentation - Abstract
Accepted at 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS).© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work.; International audience; Multiple sclerosis (MS) patients often present with lesions in spinal cord magnetic resonance (MR) volumes. However, accurately detecting these lesions is challenging and prone to inter-and intra-rater variability. Deep learning-based methods have the potential to aid clinicians in detecting and segmenting MS lesions, but can also be affected by rater variability. This study assesses the inter-and intra-rater variability in manual segmentation of spinal cord lesions, and evaluates raters and a state-of-the-art nnU-Net model against a ground truth (GT) segmentation of a senior expert. Four experts segmented twelve spinal cord MR volumes from six patients twice, at a time distance of two weeks. Considerable inter-and intra-rater variability were observed, with the total number of detected lesions ranging from 28 to 60, depending on the rater. Moreover, the segmented volumes of individual lesions varied substantially between raters. All raters and the model achieved high precision when evaluated against the senior expert GT, but sensitivity was notably lower. These results motivate the need for more sensitive automated methods to aid clinicians in lesion detection, and suggest that consideration should be given to inter-rater variability when training and evaluating automated methods.
- Published
- 2023