51. Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition
- Author
-
Zhang, Shiliang, Lei, Ming, and Yan, Zhijie
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Neural and Evolutionary Computing ,Computer Science - Sound - Abstract
Connectionist Temporal Classification (CTC) based end-to-end speech recognition system usually need to incorporate an external language model by using WFST-based decoding in order to achieve promising results. This is more essential to Mandarin speech recognition since it owns a special phenomenon, namely homophone, which causes a lot of substitution errors. The linguistic information introduced by language model will help to distinguish these substitution errors. In this work, we propose a transformer based spelling correction model to automatically correct errors especially the substitution errors made by CTC-based Mandarin speech recognition system. Specifically, we investigate using the recognition results generated by CTC-based systems as input and the ground-truth transcriptions as output to train a transformer with encoder-decoder architecture, which is much similar to machine translation. Results in a 20,000 hours Mandarin speech recognition task show that the proposed spelling correction model can achieve a CER of 3.41%, which results in 22.9% and 53.2% relative improvement compared to the baseline CTC-based systems decoded with and without language model respectively., Comment: 6pages, 5 figures
- Published
- 2019