Back to Search
Start Over
Speech-Language Pre-Training for End-to-End Spoken Language Understanding
- Source :
- ICASSP
- Publication Year :
- 2021
- Publisher :
- IEEE, 2021.
-
Abstract
- End-to-end (E2E) spoken language understanding (SLU) can infer semantics directly from speech signal without cascading an automatic speech recognizer (ASR) with a natural language understanding (NLU) module. However, paired utterance recordings and corresponding semantics may not always be available or sufficient to train an E2E SLU model in a real production environment. In this paper, we propose to unify a well-optimized E2E ASR encoder (speech) and a pre-trained language model encoder (language) into a transformer decoder. The unified speech-language pre-trained model (SLP) is continually enhanced on limited labeled data from a target domain by using a conditional masked language model (MLM) objective, and thus can effectively generate a sequence of intent, slot type, and slot value for given input speech in the inference. The experimental results on two public corpora show that our approach to E2E SLU is superior to the conventional cascaded method. It also outperforms the present state-of-the-art approaches to E2E SLU with much less paired data.
- Subjects :
- FOS: Computer and information sciences
Sound (cs.SD)
Computer Science - Computation and Language
Computer science
Semantics (computer science)
Speech recognition
Natural language understanding
computer.software_genre
Computer Science - Sound
Speech enhancement
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
Language model
Computation and Language (cs.CL)
computer
Encoder
Utterance
Natural language
Electrical Engineering and Systems Science - Audio and Speech Processing
Spoken language
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Accession number :
- edsair.doi.dedup.....8a025624a0899c5688218366a3439a07