Back to Search
Start Over
Learning robust speech representation with an articulatory-regularized variational autoencoder
- Source :
- Proccedings of Interspeech 2021, Interspeech 2021-22nd Annual Conference of the International Speech Communication Association, Interspeech 2021-22nd Annual Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic
- Publication Year :
- 2021
-
Abstract
- International audience; It is increasingly considered that human speech perception and production both rely on articulatory representations. In this paper, we investigate whether this type of representation could improve the performances of a deep generative model (here a variational autoencoder) trained to encode and decode acoustic speech features. First we develop an articulatory model able to associate articulatory parameters describing the jaw, tongue, lips and velum configurations with vocal tract shapes and spectral features. Then we incorporate these articulatory parameters into a variational autoencoder applied on spectral features by using a regularization technique that constrains part of the latent space to represent articulatory trajectories. We show that this articulatory constraint improves model training by decreasing time to convergence and reconstruction loss at convergence, and yields better performance in a speech denoising task.
- Subjects :
- FOS: Computer and information sciences
Sound (cs.SD)
Speech production
Speech perception
Computer science
speech production
Quantitative Biology::Tissues and Organs
Speech recognition
Physics::Medical Physics
02 engineering and technology
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
Computer Science - Sound
030507 speech-language pathology & audiology
03 medical and health sciences
representation learning
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
0202 electrical engineering, electronic engineering, information engineering
variational autoencoder
Representation (mathematics)
Computer Science - Computation and Language
020206 networking & telecommunications
Autoencoder
Speech enhancement
Generative model
Computer Science::Graphics
Computer Science::Sound
[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]
speech enhancement
articulatory model
0305 other medical science
Computation and Language (cs.CL)
Feature learning
Vocal tract
Electrical Engineering and Systems Science - Audio and Speech Processing
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Proccedings of Interspeech 2021, Interspeech 2021-22nd Annual Conference of the International Speech Communication Association, Interspeech 2021-22nd Annual Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic
- Accession number :
- edsair.doi.dedup.....803fc115688f2ca8ffbb2f8b90c7739f