1. STOP: A dataset for spoken task oriented semantic parsing
- Author
-
Tomasello, Paden, Shrivastava, Akshat, Lazar, Daniel, Hsu, Po-Chun, Le, Duc, Sagar, Adithya, Elkahky, Ali, Copet, Jade, Hsu, Wei-Ning, Adi, Yossi, Algayres, Robin, Nguyen, Tu Ahn, Dupoux, Emmanuel, Zettlemoyer, Luke, Mohamed, Abdelrahman, Meta AI, University of Tampere [Finland], Laboratoire de sciences cognitives et psycholinguistique (LSCP), Département d'Etudes Cognitives - ENS Paris (DEC), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS), Apprentissage machine et développement cognitif (CoML), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)-Département d'Etudes Cognitives - ENS Paris (DEC), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Dalhousie University [Halifax], Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria), École des hautes études en sciences sociales (EHESS), Department of Computer Science & Engineering (CSE), and University of Washington [Seattle]
- Subjects
FOS: Computer and information sciences ,Domain adaptation ,Sound (cs.SD) ,Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Assistant ,[SCCO.LING]Cognitive science/Linguistics ,Computer Science - Sound ,Spoken language understanding ,Artificial Intelligence (cs.AI) ,Audio and Speech Processing (eess.AS) ,FOS: Electrical engineering, electronic engineering, information engineering ,Computation and Language (cs.CL) ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assistant systems on-device. However, the limited number of public audio datasets with semantic parse labels hinders the research progress in this area. In this paper, we release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available. Additionally, we define low-resource splits to establish a benchmark for improving SLU when limited labeled data is available. Furthermore, in addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems. Initial experimentation show end-to-end SLU models performing slightly worse than their cascaded counterparts, which we hope encourages future work in this direction.
- Published
- 2023