Contextual Biasing for End-to-End Chinese ASR

Authors :: Kai Zhang
Qiuxia Zhang
Chung-Che Wang
Jyh-Shing Roger Jang
Source :: IEEE Access, Vol 12, Pp 92960-92975 (2024)
Publication Year :: 2024
Publisher :: IEEE, 2024.
Abstract: The end-to-end speech recognition approach exhibits higher robustness compared to conventional methods, enhancing recognition accuracy across diverse contexts. However, due to the absence of an independent language model, it struggles to identify vocabulary beyond the training data, thus impacting the recognition of certain specific terms. Adapting to various scenarios necessitates a pivot towards specific domains. This study, based on the CATSLU dataset, constructed two tasks for Chinese contextual biasing, targeting both proper nouns and mixed-domain sentences. Additionally, it explored four methods of contextual biasing at different stages within the speech recognition process: pre-recognition, within the model, decoding, and post-processing stages. Experimental results indicate that all biasing methods to some extent improved the recognition efficacy of the speech recognition model within specific domains.