1. 基于 Conformer 的端到端语音识别方法.
- Author
-
胡从刚, 申艺翔, 孙永奇, and 赵思聪
- Abstract
The acoustic input network based on the Conformer encoder has the problem of insufficient extraction of FBank speech information and missing channel feature information. This paper proposed an end-to-end method based on RepVGG-SE-Conformer for speech recognition to solve these problems. Firstly, the proposed model used the multi-branch structure of RepVGG to enhance the speech information extraction capability, and using the structural re-parameterization fused the multibranch into a single branch to reduce the computational complexity and speed up the model inference. Then, based on the squeeze-and-excitation network, the channel attention mechanism made up for the missing channel feature information to improve speech recognition accuracy. Finally, the experimental results on the public dataset Aishell-1 show that the proposed method's character error rate is reduced by 10.67% compared with Conformer, and the advancement of the method is verified. In addition, the proposed RepVGG-SE acoustic input network has good generalization ability in the end-to-end scene, which can effectively improve the overall performance of speech recognition models based on Transformer variants. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF