Start Over

Learning and controlling the source-filter representation of speech with a variational autoencoder

Authors :: Sadok, Samir
Leglaive, Simon
Girin, Laurent
Alameda-Pineda, Xavier
Séguier, Renaud
Source :: Speech Communication, vol. 148, 2023
Publication Year :: 2022
Abstract: Understanding and controlling latent representations in deep generative models is a challenging yet important problem for analyzing, transforming and generating various types of data. In speech processing, inspiring from the anatomical mechanisms of phonation, the source-filter model considers that speech signals are produced from a few independent and physically meaningful continuous latent factors, among which the fundamental frequency $f_0$ and the formants are of primary importance. In this work, we start from a variational autoencoder (VAE) trained in an unsupervised manner on a large dataset of unlabeled natural speech signals, and we show that the source-filter model of speech production naturally arises as orthogonal subspaces of the VAE latent space. Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we propose a method to identify the latent subspaces encoding $f_0$ and the first three formant frequencies, we show that these subspaces are orthogonal, and based on this orthogonality, we develop a method to accurately and independently control the source-filter speech factors within the latent subspaces. Without requiring additional information such as text or human-labeled data, this results in a deep generative model of speech spectrograms that is conditioned on $f_0$ and the formant frequencies, and which is applied to the transformation speech signals. Finally, we also propose a robust $f_0$ estimation method that exploits the projection of a speech signal onto the learned latent subspace associated with $f_0$.<br />Comment: 23 pages, 7 figures, companion website: https://samsad35.github.io/site-sfvae/

Subjects :: Computer Science - Sound
Computer Science - Machine Learning
Electrical Engineering and Systems Science - Audio and Speech Processing

Details

Database :: arXiv
Journal :: Speech Communication, vol. 148, 2023
Publication Type :: Report
Accession number :: edsarx.2204.07075
Document Type :: Working Paper
Full Text :: https://doi.org/10.1016/j.specom.2023.02.005

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Learning and controlling the source-filter representation of speech with a variational autoencoder

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Learning and controlling the source-filter representation of speech with a variational autoencoder

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources