201. Real-time Control of a DNN-based Articulatory Synthesizer for Silent Speech Conversion: a pilot study
- Author
-
Thomas Hueber, Christophe Savariaux, Laurent Girin, Blaise Yvert, Florent Bocquelet, BOCQUELET, Florent, GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing (GIPSA-CRISSP), Département Parole et Cognition (GIPSA-DPC), Grenoble Images Parole Signal Automatique (GIPSA-lab), Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Grenoble Images Parole Signal Automatique (GIPSA-lab), Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), GIPSA-Services (GIPSA-Services), Institut de Neurosciences cognitives et intégratives d'Aquitaine (INCIA), and Université Bordeaux Segalen - Bordeaux 2-Université Sciences et Technologies - Bordeaux 1-SFR Bordeaux Neurosciences-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Computer science ,articulatory speech synthesis ,Speech recognition ,020206 networking & telecommunications ,02 engineering and technology ,[STAT.ML] Statistics [stat]/Machine Learning [stat.ML] ,[INFO.INFO-SD] Computer Science [cs]/Sound [cs.SD] ,030507 speech-language pathology & audiology ,03 medical and health sciences ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,deep neural networks ,Real-time Control System ,EMA ,[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD] ,0202 electrical engineering, electronic engineering, information engineering ,0305 other medical science ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,silent speech ,[SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing - Abstract
International audience; This article presents a pilot study on the real-time control of an articulatory synthesizer based on deep neural network (DNN), in the context of silent speech interface. The underlying hypothesis is that a silent speaker could benefit from real-time audio feedback to regulate his/her own production. In this study, we use 3D electromagnetic-articulography (EMA) to capture speech articulation, a DNN to convert EMA to spectral trajectories in real-time, and a standard vocoder excited by white noise for audio synthesis. As shown by recent literature on silent speech, adaptation of the articulo-acoustic modeling process is needed to account for possible inconsistencies between the initial training phase and practical usage conditions. In this study, we focus on different sensor setups across sessions (for the same speaker). Model adaptation is performed by cascading another neural network to the DNN used for articulatory-to-acoustic mapping. The intelligibility of the synthetic speech signal converted in real-time is evaluated using both objective and perceptual measurements.
- Published
- 2015