Back to Search
Start Over
Training Speech Recognition Model with Speech Synthesis and Text Discriminator.
- Source :
- Journal of Information Science & Engineering; Mar2024, Vol. 40 Issue 2, p359-373, 15p
- Publication Year :
- 2024
-
Abstract
- In this paper, we build neural-network model-based automatic speech recognition (ASR) systems incrementally for performance improvement. First, we add an adversarial text discriminator module to train the speech recognition model to correct typos in recognition results. Experiments show that the character error rate (CER) and word error rate (WER) of the ASR system achieved 12.3% and 31.4%. Second, we insert a pre-trained speech synthesis (text-to-speech, TTS) module to the ASR model. When we exploit a pre-trained TTS in ASR training, the CER and WER are reduced from 12.6% and 31.7% to 10.8% and 24.4%, demonstrating that pre-trained TTS can improve ASR. Finally, we include both pre-trained TTS and text discriminator in ASR training. The performance of this ASR system is further improved, achieving the CER and WER of 9.9% and 22.7% respectively. On Formosa Speech Recognition Challenge task using Taibun Hàn-jī transcription, the proposed method also achieves better CER than a system based on hybrid DNN-HMM chain model. [ABSTRACT FROM AUTHOR]
- Subjects :
- AUTOMATIC speech recognition
SPEECH perception
SPEECH synthesis
ERROR rates
Subjects
Details
- Language :
- English
- ISSN :
- 10162364
- Volume :
- 40
- Issue :
- 2
- Database :
- Supplemental Index
- Journal :
- Journal of Information Science & Engineering
- Publication Type :
- Academic Journal
- Accession number :
- 176158380
- Full Text :
- https://doi.org/10.6688/JISE.202403_40(2).0010