1. Uncovering the structure of clinical EEG signals with self-supervised learning
- Author
-
Hubert Banville, Alexandre Gramfort, Omar Chehab, Aapo Hyvärinen, Denis-Alexander Engemann, Modelling brain structure, function and variability based on high-field MRI data (PARIETAL), Service NEUROSPIN (NEUROSPIN), Université Paris-Saclay-Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Ecole Normale Supérieure Paris-Saclay (ENS Paris Saclay), Helsingin yliopisto = Helsingfors universitet = University of Helsinki, PROMICE_BERNOULLI, ANR-20-CHIA-0016,BrAIN,Intelligence Artificielle et Neurosciences(2020), ANR-17-CONV-0003,Institut DATAIA (I2-DRIVE),Data Science, Artificial Intelligence and Society(2017), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Service NEUROSPIN (NEUROSPIN), Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay, University of Helsinki, Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Department of Biochemistry and Developmental Biology, and Department of Computer Science
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer science ,02 engineering and technology ,Electroencephalography ,computer.software_genre ,Quantitative Biology - Quantitative Methods ,Machine Learning (cs.LG) ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,[SHS]Humanities and Social Sciences ,representation learning ,[SCCO]Cognitive science ,0302 clinical medicine ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Statistics - Machine Learning ,Feature (machine learning) ,Quantitative Methods (q-bio.QM) ,medicine.diagnostic_test ,Clinical neuroscience ,Research Design ,Neurons and Cognition (q-bio.NC) ,Sleep Stages ,Supervised Machine Learning ,Self-supervised learning ,0206 medical engineering ,Biomedical Engineering ,Machine Learning (stat.ML) ,Sleep staging ,Machine learning ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,FOS: Electrical engineering, electronic engineering, information engineering ,medicine ,Humans ,Presentation learning ,Electrical Engineering and Systems Science - Signal Processing ,Structure (mathematical logic) ,Self supervised learning ,business.industry ,Deep learning ,[SCCO.NEUR]Cognitive science/Neuroscience ,Supervised learning ,3112 Neurosciences ,020601 biomedical engineering ,Pathology detection ,ComputingMethodologies_PATTERNRECOGNITION ,FOS: Biological sciences ,Quantitative Biology - Neurons and Cognition ,Neural Networks, Computer ,Artificial intelligence ,business ,computer ,Feature learning ,030217 neurology & neurosurgery - Abstract
Objective. Supervised learning paradigms are often limited by the amount of labeled data that is available. This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG), where labeling can be costly in terms of specialized expertise and human processing time. Consequently, deep learning architectures designed to learn on EEG data have yielded relatively shallow models and performances at best similar to those of traditional feature-based approaches. However, in most situations, unlabeled data is available in abundance. By extracting information from this unlabeled data, it might be possible to reach competitive performance with deep neural networks despite limited access to labels. Approach. We investigated self-supervised learning (SSL), a promising technique for discovering structure in unlabeled data, to learn representations of EEG signals. Specifically, we explored two tasks based on temporal context prediction as well as contrastive predictive coding on two clinically-relevant problems: EEG-based sleep staging and pathology detection. We conducted experiments on two large public datasets with thousands of recordings and performed baseline comparisons with purely supervised and hand-engineered approaches. Main results. Linear classifiers trained on SSL-learned features consistently outperformed purely supervised deep neural networks in low-labeled data regimes while reaching competitive performance when all labels were available. Additionally, the embeddings learned with each method revealed clear latent structures related to physiological and clinical phenomena, such as age effects. Significance. We demonstrate the benefit of self-supervised learning approaches on EEG data. Our results suggest that SSL may pave the way to a wider use of deep learning models on EEG data., 32 pages, 9 figures
- Published
- 2021
- Full Text
- View/download PDF