1. 结合 Transformer 的轻量化中文语音识别.
- Author
-
沈逸文 and 孙 俊
- Subjects
- *
SPEECH perception , *MATRIX decomposition , *ERROR rates , *LOW-rank matrices , *SPEED - Abstract
Recently, deep neural network model has become a hot research object in the field of speech recognition. However, the deep neural network relies on huge parameters and computational overhead, the excessively large model size also increases the difficulty of its deployment on edge devices. Aiming at the above problems, this paper proposed a lightweight speech recognition model based on Transformer. This method used depthwise separable convolution to obtain the feature information. Secondly, this method constructed a two half-step feed-forward layers, namely Macaron-N et, and introduced the lowrank matrix factorization to realize the model compression. Finally, it used a sparse attention mechanism to improve the training speed and decoding speed of the model. It tested on the Aishell-1 and aidatang_200zh datasets. The experimental results show that compared with Open-Transformer, the word error rate and real time factor of LM-Transformer decrease by 19. 8% and 32.1%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF