Back to Search Start Over

A Semi-Supervised Complementary Joint Training Approach for Low-Resource Speech Recognition

Authors :
Du, Ye-Qian
Zhang, Jie
Fang, Xin
Wu, Ming-Hui
Yang, Zhou-Wang
Source :
IEEE-ACM Transactions on Audio, Speech, and Language Processing; 2023, Vol. 31 Issue: 1 p3908-3921, 14p
Publication Year :
2023

Abstract

Both unpaired speech and text have shown to be beneficial for low-resource automatic speech recognition (ASR), which, however were either separately used for pre-training, self-training and language model (LM) training, or jointly used for designing hybrid models in literature. In this work, we leverage both unpaired speech and text to train a general ASR model, which are used in the form of data pairs by generating the missing parts in prior to model training. We propose to train a model alternatively using the prepared speech-PseudoLabel and SynthesizedAudio-text pairs and reveal the complementary property in both acoustic and linguistic features. The proposed method is thus called complementary joint training (CJT). Based on the basic CJT, label masking for pseudo-labels and parallel layers for synthesized audio are then proposed for re-training to further cope with the deviations from real data, termed as CJT++. In addition, the proposed CJT is extended to the scenario with zero paired data by considering an iterative CJT for the training of seed ASR model. Experimental results on Libri-light show the efficacy of joint training as well as two second-round training strategies, and the superiority over recent models is validated, particularly in extreme low-resource cases.

Details

Language :
English
ISSN :
23299290
Volume :
31
Issue :
1
Database :
Supplemental Index
Journal :
IEEE-ACM Transactions on Audio, Speech, and Language Processing
Publication Type :
Periodical
Accession number :
ejs64350241
Full Text :
https://doi.org/10.1109/TASLP.2023.3313434