1. Meta Learning Text-to-Speech Synthesis in over 7000 Languages
- Author
-
Lux, Florian, Meyer, Sarina, Behringer, Lyonel, Zalkow, Frank, Do, Phat, Coler, Matt, Habets, Emanuël A. P., and Vu, Ngoc Thang
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech synthesis in languages without any available data. We validate our system's performance through objective measures and human evaluation across a diverse linguistic landscape. By releasing our code and models publicly, we aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology., Comment: accepted at Interspeech 2024
- Published
- 2024