Back to Search
Start Over
The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge
- Publication Year :
- 2023
- Publisher :
- arXiv, 2023.
-
Abstract
- This paper describes our system for the low-resource domain adaptation track (Track 3) in Spoken Language Understanding Grand Challenge, which is a part of ICASSP Signal Processing Grand Challenge 2023. In the track, we adopt a pipeline approach of ASR and NLU. For ASR, we fine-tune Whisper for each domain with upsampling. For NLU, we fine-tune BART on all the Track3 data and then on low-resource domain data. We apply masked LM (MLM) -based data augmentation, where some of input tokens and corresponding target labels are replaced using MLM. We also apply a retrieval-based approach, where model input is augmented with similar training samples. As a result, we achieved exact match (EM) accuracy 63.3/75.0 (average: 69.15) for reminder/weather domain, and won the 1st place at the challenge.<br />Comment: To appear at ICASSP2023
- Subjects :
- FOS: Computer and information sciences
Sound (cs.SD)
Computer Science - Computation and Language
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
Computation and Language (cs.CL)
Computer Science - Sound
Electrical Engineering and Systems Science - Audio and Speech Processing
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....eb97ff0cb15d3a3b092b1dcadf724e06
- Full Text :
- https://doi.org/10.48550/arxiv.2305.01194