1. Speech Technology for Unwritten Languages
- Author
-
Mark Hasegawa-Johnson, Lucas Ondel, Elin Larsen, Shruti Palaskar, Liming Wang, Sebastian Stüker, Francesco Ciannella, Markus Müller, Odette Scharenborg, Rachid Riad, Florian Metze, Pierre Godard, Laurent Besacier, Mingxing Du, Alan W. Black, Danny Merkx, Emmanuel Dupoux, Philip Arthur, Graham Neubig, Delft University of Technology (TU Delft), Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP), Laboratoire d'Informatique de Grenoble (LIG), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Carnegie Mellon University [Pittsburgh] (CMU), Department. of Computer Science [Illinois], University of Illinois System, Institute for Anthropomatics (KIT), Karlsruhe Institute of Technology (KIT), Traitement du Langage Parlé (TLP), Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Brno University of Technology [Brno] (BUT), Johns Hopkins University (JHU), Laboratoire de sciences cognitives et psycholinguistique (LSCP), Département d'Etudes Cognitives - ENS Paris (DEC), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS), Radboud University [Nijmegen], Apprentissage machine et développement cognitif (CoML), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)-Département d'Etudes Cognitives - ENS Paris (DEC), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), The work reported here was started at JSALT 2017 in CMU,Pittsburgh, and was supported by JHU and CMU via grants from Google, Microsoft, Amazon, Facebook, and Apple. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by NSF grant number OCI-1053575. Specifically, it used the Bridges system, which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC). OS was partially supported by a Vidi-grant from NWO (276-89-003) and partially by a Delft Technology Fellowship from Delft University of Technology. PG, MM and SS were funded by the French ANR and the German DFG under grant ANR-14-CE35-0002 (BULB project). MD, EL, RR and ED were funded by the European Research Council (ERC-2011-AdG-295810 BOOTPHON), and ANR-10-LABX-0087 IEC and ANR-10-IDEX-0001-02 PSL., ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019), ANR-17-EURE-0017,FrontCog,Frontières en cognition(2017), ANR-10-IDEX-0001,PSL,Paris Sciences et Lettres(2010), ANR-19-P3IA-0003,MIAI,MIAI @ Grenoble Alpes(2019), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), Radboud university [Nijmegen], Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de sciences cognitives et psycholinguistique (LSCP), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA), Université Paris-Sud - Paris 11 (UP11)-Université Paris-Saclay-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Université Paris Saclay (COmUE), Sorbonne Université (SU)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Université Paris Saclay (COmUE)-Université Paris-Sud - Paris 11 (UP11)-Université Paris-Saclay-Sorbonne Université - UFR d'Ingénierie (UFR 919), Brno University of Technology, and École normale supérieure - Paris (ENS Paris)-École normale supérieure - Paris (ENS Paris)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Acoustics and Ultrasonics ,Computer science ,Speech synthesis ,Semantics ,computer.software_genre ,unsupervised learning ,image retrieval ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,030507 speech-language pathology & audiology ,03 medical and health sciences ,speech synthesis ,Computer Science (miscellaneous) ,Electrical and Electronic Engineering ,Everyday life ,automatic speech recognition ,Speech technology ,Speech processing ,Linguistics ,Language & Communication ,Computational Mathematics ,Task analysis ,Language & Speech Technology ,0305 other medical science ,computer ,Utterance ,Meaning (linguistics) - Abstract
International audience; Speech technology plays an important role in our everyday life. Speech is, among others, used for human-computer interaction, including, for instance, information retrieval and on-line shopping. In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard combination of pre-trained speech-to-text and text-to-speech subsystems. The research presented in this paper takes the first steps towards speech technology for unwritten languages. Specifically, the aim of this work was 1) to learn speech-to-meaning representations without using text as an intermediate representation, and 2) to test the sufficiency of the learned representations to regenerate speech or translated text, or to retrieve images that depict the meaning of an utterance in an unwritten language. The results suggest that building systems that go directly from speech-to-meaning and from meaning-to-speech, bypassing the need for text, is possible.
- Published
- 2020