Start Over

Glottal Flow Synthesis for Whisper-to-Speech Conversion

Authors :: Ian McLoughlin
Olivier Perrotin
GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing (GIPSA-CRISSP)
GIPSA Pôle Parole et Cognition (GIPSA-PPC)
Grenoble Images Parole Signal Automatique (GIPSA-lab)
Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )
Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )
Université Grenoble Alpes (UGA)-Grenoble Images Parole Signal Automatique (GIPSA-lab)
Université Grenoble Alpes (UGA)
University of Kent [Canterbury]
This work was supported by the French National Research Agency in the framework of the 'Investissements d’avenir' Programs ANR-15-IDEX-02 and ANR-11-LABX-0025-01
ANR-15-IDEX-0002,UGA,IDEX UGA(2015)
ANR-11-LABX-0025,PERSYVAL-lab,Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique(2011)
Source :: IEEE/ACM Transactions on Audio, Speech and Language Processing, IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2020, 28, pp.889-900. ⟨10.1109/TASLP.2020.2971417⟩
Publication Year :: 2020
Publisher :: Institute of Electrical and Electronics Engineers (IEEE), 2020.
Abstract: International audience; Whisper-to-speech conversion is motivated by laryngeal disorders, in which malfunction of the vocal folds leads to loss of voicing. Many patients with laryngeal disorders can still produce functional whispers, since these are characterised by the absence of vocal fold vibration. Whispers therefore constitute a common ground for speech rehabilitation across many kinds of laryngeal disorder. Whisper-to-speech conversion involves recreating natural-sounding speech from recorded whispers, and is a non-invasive and non-surgical rehabilitation that can maintain a natural method of speaking, unlike the existing methods of rehabilitation. This paper proposes a new rule-based method for whisper-to-speech conversion that replaces the noisy whisper sound source with a synthesised speech-like harmonic source, while maintaining the vocal tract component unaltered. In particular, a novel glottal source generator is developed in which whisper information is used to parameterise the excitation through a high-quality glottis model. Evaluation of the system against the standard pulse train excitation method reveals significantly improved performance. Since our method is glottis-based, it is potentially compatible with the many existing vocal tract component adaptation systems.