1. Low-Resource Language Discrimination toward Chinese Dialects with Transfer Learning and Data Augmentation
- Author
-
YanKeyu, MaYong, DanYangjie, XuFan, and WangMingwen
- Subjects
General Computer Science ,Computer science ,business.industry ,Low resource ,computer.software_genre ,Task (project management) ,Annotation ,Resource (project management) ,Artificial intelligence ,Transfer of learning ,business ,Language discrimination ,computer ,Natural language processing - Abstract
Chinese dialects discrimination is a challenging natural language processing task due to scarce annotation resource. In this article, we develop a novel Chinese dialects discrimination framework with transfer learning and data augmentation (CDDTLDA) in order to overcome the shortage of resources. To be more specific, we first use a relatively larger Chinese dialects corpus to train a source-side automatic speech recognition (ASR) model. Then, we adopt a simple but effective data augmentation method (i.e., speed, pitch, and noise disturbance) to augment the target-side low-resource Chinese dialects, and fine-tune another target ASR model based on the previous source-side ASR model. Meanwhile, the potential common semantic features between source-side and target-side ASR models can be captured by using self-attention mechanism. Finally, we extract the hidden semantic representation in the target ASR model to conduct Chinese dialects discrimination. Our extensive experimental results demonstrate that our model significantly outperforms state-of-the-art methods on two benchmark Chinese dialects corpora.
- Published
- 2021
- Full Text
- View/download PDF