Back to Search
Start Over
A Semi-Supervised Complementary Joint Training Approach for Low-Resource Speech Recognition
- Source :
- IEEE-ACM Transactions on Audio, Speech, and Language Processing; 2023, Vol. 31 Issue: 1 p3908-3921, 14p
- Publication Year :
- 2023
-
Abstract
- Both unpaired speech and text have shown to be beneficial for low-resource automatic speech recognition (ASR), which, however were either separately used for pre-training, self-training and language model (LM) training, or jointly used for designing hybrid models in literature. In this work, we leverage both unpaired speech and text to train a general ASR model, which are used in the form of data pairs by generating the missing parts in prior to model training. We propose to train a model alternatively using the prepared speech-PseudoLabel and SynthesizedAudio-text pairs and reveal the complementary property in both acoustic and linguistic features. The proposed method is thus called complementary joint training (CJT). Based on the basic CJT, label masking for pseudo-labels and parallel layers for synthesized audio are then proposed for re-training to further cope with the deviations from real data, termed as CJT++. In addition, the proposed CJT is extended to the scenario with zero paired data by considering an iterative CJT for the training of seed ASR model. Experimental results on Libri-light show the efficacy of joint training as well as two second-round training strategies, and the superiority over recent models is validated, particularly in extreme low-resource cases.
Details
- Language :
- English
- ISSN :
- 23299290
- Volume :
- 31
- Issue :
- 1
- Database :
- Supplemental Index
- Journal :
- IEEE-ACM Transactions on Audio, Speech, and Language Processing
- Publication Type :
- Periodical
- Accession number :
- ejs64350241
- Full Text :
- https://doi.org/10.1109/TASLP.2023.3313434