Descriptor: "Speech recognition" / Language: chinese / Publication Year Range: Last 10 years / Search Limiters: Full Text - Searchworks@Jio Institute Digital Library Search Results

1. 基于 Transformer的多编码器端到端语音识别.

Author: 鹿江飞,孙占全
Abstract: local feature information at shallow layers. To solve this problem, this study proposes a method using multiple encoders to improve the ability of speech feature extraction. An additional convolutional encoder branch is added to strengthen the capture of local feature information, make up for the neglect of local feature information in shallow Transformer, and effectively realize the integration of global and local dependencies of andio feature se quences. In other words, a multi-encoder model hased on Transformer is proposed. Experiments on the open source Chinese Mandarin data set Aishell 1 show that without an external language model, the proposed Transformer-based multi-encoder model has a relative reduction of 4.00% in character error rate when compared with the Transformer model. On the internal non public Shanghainese dialect data set, the performance improve ment of the proposed model is more obvious, and the character error rate is reduced by 48.24% from 19.92% to 10. 31%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Research of Speech Recognition Algorithm Based on Time Delay Neural Network and Its Application in Rail Transit

Author: LIU Yue, LIN Jun, LUO Xiao, CHU Wei, and LIU Ren
Subjects: voice interaction, speech recognition, deep learning, time-delay neural network, rail transit, Control engineering systems. Automatic machinery (General), TJ212-225, Technology
Abstract: Speech recognition technology is a key component of the intelligent voice interaction system. In recent years, this technology has developed rapidly and is widely used in automotive electronics, consumer electronics, medical and other fields. This paper analyzes the development status of speech technology in the field of automotive field, introduces the development process of this technology in detail, focuses on the demand and application of speech recognition technology in the field of rail transit.Aiming at the problem that the display interactive content of different models of rail transit is changeable and needs to be customized and developed, the research on speech recognition technology based on time-delay neural network is carried out and a speech recognition system for train display based on intelligent computing platform is developed. The system can flexibly build models according to the interaction requirements of different display platforms and the processing performance of intelligent hardware platforms, and optimize the interaction keywords on the basis of the general model to improve the recognition rate. Finally, the test is completed in the voice interaction system of autonomous-rail rapid tram display in different scenes and noisy conditions and the word recognition rate reaches more than 85%. It also lays the foundation for the application of intelligent voice technology in other rail transit vehicles.
Published: 2022
Full Text: View/download PDF

3. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition

Author: XU Ming-ke, ZHANG Fan
Subjects: speech emotion recognition, attention mechanism, convolutional neural network, noisy speech, speech recognition, Computer software, QA76.75-76.765, Technology (General), T1-995
Abstract: Speech emotion recognition(SER) refers to the use of machines to recognize the emotions of a speaker from speech.SER is an important part of human-computer interaction(HCI).But there are still many problems in SER research,e.g.,the lack of high-quality data,insufficient model accuracy,little research under noisy environments.In this paper,we propose a method called Head Fusion based on the multi-head attention mechanism to improve the accuracy of SER.We implemente an attention-based convolutional neural network(ACNN) model and conduct experiments on the interactive emotional dyadic motion capture(IEMOCAP) data set.The accuracy is improved to 76.18% (weighted accuracy,WA) and 76.36% (unweighted accuracy,UA).To the best of our knowledge,compared with the state-of-the-art result on this dataset(76.4% of WA and 70.1% of WA),we achieve a UA improvement of about 6% absolute while achieving a similar WA.Furthermore,We conduct empirical experiments by injecting speech data with 50 types of common noises.We inject the noises by altering the noise intensity,time-shifting the noises,and mixing different noise types,to identify their varied impacts on the SER accuracy and verify the robustness of our model.This work will also help researchers and engineers properly add their training data by using speech data with the appropriate types of noises to alleviate the problem of insufficient high-quality data.
Published: 2022
Full Text: View/download PDF

4. Study on Keyword Search Framework Based on End-to-End Automatic Speech Recognition

Author: YANG Run-yan, CHENG Gao-feng, LIU Jian
Subjects: keyword search, speech recognition, end-to-end, frame-synchronous alignment, Computer software, QA76.75-76.765, Technology (General), T1-995
Abstract: In the past decade,end-to-end automatic speech recognition (ASR) frameworks have developed rapidly.End-to-end ASR has shown not only very different characteristics from traditional ASR based on hidden Markov models (HMMs),but also advanced performances.Thus,end-to-end ASR is being more and more popular and has become another major type of ASR frameworks.A keyword search (KWS) framework based on end-to-end ASR and frame-synchronous alignment is proposed for solving the problem that end-to-end ASR cannot provide accurate keyword timestamps and confidence scores,and experimental verification on a Vietnamese dataset is made.First,utterances are decoded by an end-to-end Uyghur ASR system,obtaining N-best hypotheses.Next,a dynamic programming-based alignment algorithm is implemented on each of these ASR hypotheses and per-frame phoneme probabilities,which are provided by a phoneme classifier jointly trained with the ASR model,to compute time stamps and confidence scores for each word in N-best hypotheses.Then,final KWS result is obtained by detecting keywords within N-best hypotheses and removing duplicated keyword occurrences according to time stamps and confident scores.Experimental results on a Vietnamese conversational telephone speech dataset show that the proposed KWS system achieves an F1 score of 77.6%,which is relatively 7.8% higher than the F1 score of the traditional HMM-based KWS system.The proposed system also provides reliable keyword confidence scores.
Published: 2022
Full Text: View/download PDF

5. Latest Development of Multilingual Speech Recognition Acoustic Model Modeling Methods

Author: CHENG Gao-feng, YAN Yong-hong
Subjects: multilingual, speech recognition, acoustic model, Computer software, QA76.75-76.765, Technology (General), T1-995
Abstract: With the rapid development of multimedia and communication technology,the amount of multilingual speech data on the Internet is increasing.Speech recognition technology is the core for media analysis and processing.How to quickly expand from a few major languages such as Chinese and English to more languages has become a prominent issue yet to be overcome in order to improve multilingual processing capabilities.This article summarizes the latest progress in the field of acoustic model modeling,and discusses breakthroughs needed by traditional speech recognition technology in the course of moving from single language to multi-languages.The latest end-to-end speech recognition technology was exploited to construct a keyword spotting system,and the system achieves favorable performance.The approach is detailed as follows:1)multi-lingual hierarchical and structured acoustic model modeling method;2)multilingual acoustic modeling based on language classification information;3)end-to-end keyword spotting based on frame-synchronous alignments.
Published: 2022
Full Text: View/download PDF

6. Survey of Speaker Adaptation Methods in Speech Recognition

Author: ZHU Fangyuan, MA Zhiqiang, CHEN Yan, ZHANG Xiaoxu, WANG Hongbin, BAO Caijilahu
Subjects: speech recognition, speaker adaptation (sa), neural network, Electronic computers. Computer science, QA75.5-76.95
Abstract: Speech is one of the ways of human-computer interaction, and speech recognition technology is an important part of artificial intelligence. In recent years, the application of neural network technology in the field of speech recognition has developed rapidly, and it has become the mainstream acoustic modeling technology in the field of speech recognition. However, there is a difference between target speaker's voice and training data in the test conditions, which leads to the problem of model incompatibility. Therefore, the speaker adaptation (SA) method is to solve the mismatch problem caused by the speaker difference, and the research on the speaker adaptation method has become a popular direction in the field of speech recognition. Compared with the speaker adaptation method in the traditional speech recognition system, the self-adaptation in the speech recognition system using neural network has the characteristics of huge model parameters and relatively small amount of data. Therefore, the speaker adaptation method in the neural network-based speech recognition system becomes a challenge. Firstly, this paper reviews the development history of the speaker adaptation method and the various problems encountered in the research of the neural network-based speaker adaptation method. Secondly, the speaker adaptation method is divided into the speaker adaptation method based on feature domain and the speaker adaptation method based on model domain. It also introduces the corresponding principles and improvement methods, and finally points out the pro-blems that still exist in the speaker adaptation method in speech recognition and the future development direction.
Published: 2021
Full Text: View/download PDF

7. Research Status and Prospect of Transformer in Speech Recognition

Author: ZHANG Xiaoxu, MA Zhiqiang, LIU Zhiqiang, ZHU Fangyuan, WANG Chunyu
Subjects: transformer, deep learning, end-to-end, speech recognition, Electronic computers. Computer science, QA75.5-76.95
Abstract: As a new deep learning algorithm framework, Transformer has attracted more and more researchers?? attention and has become a current research hotspot. Inspired by humans focusing on important things only, the self-attention mechanism in the Transformer model mainly learns important information in the input sequence. For speech recogni-tion tasks, the focus is to transcribe the information of the input speech sequence into the corresponding language text. The past practice was to combine acoustic models, pronunciation dictionaries, and language models into a speech recognition system to achieve speech recognition tasks, while Transformer can integrate them into a single neural network to form an end-to-end speech recognition system, which solves the issues such as forced alignment and multi-module training of the traditional speech recognition system. Therefore, it is very necessary to discuss the problems of Transformer in speech recognition tasks. In this paper, the structure of the Transformer model is first introduced. Besides, the problems confronted by speech recognition are analyzed with respect to input speech sequence, deep model architecture, and model inference. Then the methods to solve the obstacles within the three aspects afore mentioned are outlined and summarized. Finally, the future application and direction of Transformer in speech recognition are concluded and prospected.
Published: 2021
Full Text: View/download PDF

8. Transformer在语音识别任务中的研究现状与展望.

Author: 张晓旭, 马志强, 刘志强, 朱方圆, and 王春喻
Abstract: Copyright of Journal of Frontiers of Computer Science & Technology is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2021
Full Text: View/download PDF

9. Domain word recognition enhancement method in speech recognition

Author: Yang Wei, Zhang Caijun, and Ma Yongbo
Subjects: speech recognition, HCLG, Electronics, TK7800-8360
Abstract: Aiming at the problem that the national network customer telephone voice recognition has poor recognition of core words in specific fields, this paper proposes a method based on HCLG domain weight enhancement and domain word correction, which can add domain words in real time and quickly, to dynamically optimize the language model and improve speech recognition. The model and algorithm are optimized in the various fields of the telephone voice consultation, maintenance, complaints, etc. of the State Grid Customer Service Center. The speech recognition results have been greatly improved.
Published: 2019
Full Text: View/download PDF

10. 人机交互安全攻防综述.

Author: 葛凯强 and 陈铁明
Abstract: Copyright of Telecommunications Science is the property of Beijing Xintong Media Co., Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2019
Full Text: View/download PDF

11. 基于对抗训练策略的语言模型数据增强技术.

Author: 张一珂, 张鹏远, and 颜永红
Abstract: Copyright of Acta Automatica Sinica is the property of Chinese Academy of Sciences, Institute of Automation and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2018
Full Text: View/download PDF

12. 语音识别中卷积神经网络优化算法.

Author: 刘长征 and 张磊
Abstract: Copyright of Journal of Harbin University of Science & Technology is the property of Journal of Harbin University of Science & Technology and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2016
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

12 results on '"Speech recognition"'

1. 基于 Transformer的多编码器端到端语音识别.

2. Research of Speech Recognition Algorithm Based on Time Delay Neural Network and Its Application in Rail Transit

3. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition

4. Study on Keyword Search Framework Based on End-to-End Automatic Speech Recognition

5. Latest Development of Multilingual Speech Recognition Acoustic Model Modeling Methods

6. Survey of Speaker Adaptation Methods in Speech Recognition

7. Research Status and Prospect of Transformer in Speech Recognition

8. Transformer在语音识别任务中的研究现状与展望.

9. Domain word recognition enhancement method in speech recognition

10. 人机交互安全攻防综述.

11. 基于对抗训练策略的语言模型数据增强技术.

12. 语音识别中卷积神经网络优化算法.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Journal

Database

Publisher

12 results on '"Speech recognition"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources