Descriptor: "Speech recognition" / Journal: application research of computers / jisuanji yingyong yanjiu / Language: chinese - Searchworks@Jio Institute Digital Library Search Results

1. 结合 Transformer 的轻量化中文语音识别.

Author: 沈逸文 and 孙俊
Subjects: *SPEECH perception, *MATRIX decomposition, *ERROR rates, *LOW-rank matrices, *SPEED
Abstract: Recently, deep neural network model has become a hot research object in the field of speech recognition. However, the deep neural network relies on huge parameters and computational overhead, the excessively large model size also increases the difficulty of its deployment on edge devices. Aiming at the above problems, this paper proposed a lightweight speech recognition model based on Transformer. This method used depthwise separable convolution to obtain the feature information. Secondly, this method constructed a two half-step feed-forward layers, namely Macaron-N et, and introduced the lowrank matrix factorization to realize the model compression. Finally, it used a sparse attention mechanism to improve the training speed and decoding speed of the model. It tested on the Aishell-1 and aidatang_200zh datasets. The experimental results show that compared with Open-Transformer, the word error rate and real time factor of LM-Transformer decrease by 19. 8% and 32.1%, respectively. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

2. 混合CTC/attention架构端到端带口音普通话识别.

Author: 杨威 and 胡燕
Subjects: *SPEECH perception, *ERROR rates, *AUTOMATIC speech recognition, *MACHINE learning, *DEEP learning
Abstract: To improve the performance of multi-accent Mandarin speech recognition task,this paper presented a method for hybrid end-to-end automatic speech recognition(ASR) by combining CTC and multi-head attention by using a multiobjective training and joint decoding.The analysis shows that hybrid model with lower CTC weight and deeper encoder layers performance better learning capacity.And it trained a very deep models with up to 48 layers for encode-decoder architecture,which outperforms all previous end-to-end ASR approaches on Aidatatang 200 h multi-accent dataset,achieves 5.6% character error rate(CER) and 26.2% sentence error rate(SER).The experiment proves that the recognition rate of the end-to-end model proposed exceeds the general end-to-end model,and it has certain advancedness in solving the Mandarin recognition with accents. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

3. 深度学习在语音识别中的研究进展综述.

Author: 侯一民, 周慧琼, and 王政一
Abstract: In the era of big data, many of traditional machine learning methods of disposing unlabeled raw voice data have become less applicable. At the same time, deep learning models can directly process unlabeled data because of its powerful capability of modeling to deal with the massive data, and has become a hot research in the field of speech recognition. To begin with, this paper analyzed and summarized the state-of-the-art deep learning of models. And then,it discussed the applications to speech recognition with speech features extraction and acoustic modeling. Finally,it concluded the problems faced and development orientation. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

4. 低资源条件下基于ｉ-ｖｅｃｔｏｒ特征的ＬＳＴＭ递归神经网络语音识别系统

Author: 黄光许, 田篧, 康健, 刘加, and 夏善红
Abstract: Under the condition of low resource, little labeled training data is available and the performance of speech recognition system is not ideal. To solve this problem. First, this paper investigated long short term memory recurrent neural network ( LSTM RNN) for acoustic modeling. It was a powerful tool to model long time series and could make full use of the context information. Linear projection layer reduced the number of model parameters. Then, it explored speaker modeling methods in the feature space, and extracted identity vector (i-vector) which contained the speaker and channel information simultaneously. Finally, it presented a novel system, which combined the LSTM RNN model and i-vector feature. Results on the standard Open KWS 2013 data set show that this technology produces a relative improvement of about 10% in TER over the DNN baseline system. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

5. 深层神经网络语音识别自适应方法研究.

Author: 邓侃 and 欧智坚
Abstract: To handle the speaker and noise adaptation problem in deep neural network-based speech recognition system, this paper studied the inherent characters of speaker and noise random factors and proposed a new adaptation method using long term features. Firstly, it built a joint adaptation model based on Gaussian mixture models and estimated and used the parameters of speaker and noise factors as long term features. Then, it used these long term features in deep neural network together with traditional short term features. Experiment results on Aurora4 database show that this method can effectively factorize speaker and noise factors, and improve adaptation performance. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

6. Analysis and detection of vocal effort for speech recognition.

Author: Chao Hao, Song Cheng, and Peng Weiping
Abstract: The impact of changes in a speaker's vocal effort on the performance of automatic speech recognition is the research target. Firstly, this paper performed analysis for sound intensity level, duration, frame energy distribution and spectral tilt, and proposed vocal effort detector based on GMM by using these phonetic properties and acoustic properties. Then, it studied impact of vocal effort variability on automatic speech recognition, and presented multiple model framework approach to improve robustness of speech recognition system. Experiments conducted on isolated-word recognizer show that accompanied by a slight decline for normal mode, significantly improvement of recognition accuracy for the remaining four vocal effort modes can be achieved. Thus, potential of the method is demonstrated. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

7. Study of isolated speech recognition based on deep learning neural networks.

Author: Wang Shanhai, Jing Xinxing, and Yang Haiyan
Abstract: To improve the performance of the conventional speech recognition system, this paper introduced the autoencoder deep learning neural networks which was applied to speech recognition. The neural networks based on deep learning introduced greedy layer-wise learning algorithm by pretraining and fine-tuning. It could extract the essential features of speech signal which was needed to recognition. It could overcome the shortcomings of the conventional multilayer artificial neural networks which easily trapped into local optimum when training the model. And they needed a large number of labeled data. Then the structured alignment networks could align arbitrary frames of features to fixed frames. And it input these features to a classifier to speech recognition. This paper did some experiment with back propagation neural networks and autoencoder neural networks respectively. The results illustrate that the deep learning neural networks can outperform the conventional neural networks by 20.0% in accuracy. It is an excellent speech recognition model. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

8. Integrating articulatory information into stochastic segment models for continuous mandarin speech recognition.

Author: CHAO Hao, LIU Zhi-zhong, and XUE Xiao
Abstract: This paper proposed a framework which attempted to incorporate articulatory information into the stochastic segment model based mandarin speech recognition system. According to the characteristics of the stochastic segment model, it used hierarchical artificial neural network to obtain the posteriors of speech signal belonging to the phonemes. Then, the posteriors were integrated into the stochastic segment model system in the first search process. Experiments conducted on "863-test" set show that about 5.93% relative improvement can be achieved in the recognition accuracy. Thus, potential of the method is demonstrated. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

9. Simultaneous recognition of both speech and speaker's group with small space occupation.

Author: FANG Jing, ZHU Jia-gang, and LU Xiao
Abstract: In order to meet the requirement environment in which both speech and speaker's group were simultaneously recognized through embedded system, this paper proposed a new method with low space consumption for simultaneous recognition of both speech and speaker's group. On the basis of research on mechanism of speech recognition and speaker identification in recent years, GMM instead of HMM, was used to model the voice characteristics of speech and speaker and voting mechanism to complete the speaker's group identification to reduce space occupation. SQ(soft quantization) was used to integrate the outputs from multiple speech recognizer. When integrating 6 models, it reduced 20. 88% error rate in speech recognition and achieved 81.57% average recognition rate in speaker's group. The experimental results confirm feasibility of the proposed algorithm. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

10. Integrating articulatory information into stochastic segment models for continuous Mandarin speech recognition.

Author: CHAO Hao, YANG Zhan-lei, and LIU Wen-ju
Subjects: *SPEECH perception, *STOCHASTIC analysis, *INFORMATION technology, *ARTIFICIAL neural networks, *PROBABILITY theory
Abstract: This paper proposed a framework which attempted to incorporate articulatory information into the stochastic segment model based on Mandarin speech recognition system. According to the characteristics of the stochastic segment model, it used hierarchical artificial neural network to obtain the posterior probability of speech signal belonging to the phonemes. Then, it integrated the posterior probability into the stochastic segment model system in the first search process. Experiments conducted on "863-test" set show that about 5.93% relative improvement could be achieved in the recognition accuracy. Thus, it demonstrates the feasibility of the method. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

10 results on '"Speech recognition"'

1. 结合 Transformer 的轻量化中文语音识别.

2. 混合CTC/attention架构端到端带口音普通话识别.

3. 深度学习在语音识别中的研究进展综述.

4. 低资源条件下基于ｉ-ｖｅｃｔｏｒ特征的ＬＳＴＭ递归神经网络语音识别系统

5. 深层神经网络语音识别自适应方法研究.

6. Analysis and detection of vocal effort for speech recognition.

7. Study of isolated speech recognition based on deep learning neural networks.

8. Integrating articulatory information into stochastic segment models for continuous mandarin speech recognition.

9. Simultaneous recognition of both speech and speaker's group with small space occupation.

10. Integrating articulatory information into stochastic segment models for continuous Mandarin speech recognition.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

10 results on '"Speech recognition"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources