Author: "Tan, Zheng-Hua" / Topic: automatic speech recognition - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Tan, Zheng-Hua"' showing total 18 results

Start Over Author "Tan, Zheng-Hua" Topic automatic speech recognition

18 results on '"Tan, Zheng-Hua"'

1. Network, Distributed and Embedded Speech Recognition: An Overview

Author: Tan, Zheng-Hua, Varga, Imre, Singh, Sameer, editor, Tan, Zheng-Hua, and Lindberg, Børge
Published: 2008
Full Text: View/download PDF

2. On Training Targets and Activation Functions for Deep Representation Learning in Text-Dependent Speaker Verification.

Author: Sarkar, Achintya Kumar and Tan, Zheng-Hua
Subjects: ARTIFICIAL neural networks, DEEP learning, DATABASES, ERROR rates, SUPERVISED learning, AUTOMATIC speech recognition
Abstract: Deep representation learning has gained significant momentum in advancing text-dependent speaker verification (TD-SV) systems. When designing deep neural networks (DNN) for extracting bottleneck (BN) features, the key considerations include training targets, activation functions, and loss functions. In this paper, we systematically study the impact of these choices on the performance of TD-SV. For training targets, we consider speaker identity, time-contrastive learning (TCL), and auto-regressive prediction coding, with the first being supervised and the last two being self-supervised. Furthermore, we study a range of loss functions when speaker identity is used as the training target. With regard to activation functions, we study the widely used sigmoid function, rectified linear unit (ReLU), and Gaussian error linear unit (GELU). We experimentally show that GELU is able to reduce the error rates of TD-SV significantly compared to sigmoid, irrespective of the training target. Among the three training targets, TCL performs the best. Among the various loss functions, cross-entropy, joint-softmax, and focal loss functions outperform the others. Finally, the score-level fusion of different systems is also able to reduce the error rates. To evaluate the representation learning methods, experiments are conducted on the RedDots 2016 challenge database consisting of short utterances for TD-SV systems based on classic Gaussian mixture model-universal background model (GMM-UBM) and i-vector methods. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

3. rVAD: An unsupervised segment-based robust voice activity detection method.

Author: Tan, Zheng-Hua, Sarkar, Achintya kr., and Dehak, Najim
Subjects: *VOICE analysis software, *VOICE frequency, *AUTOMATIC speech recognition, *ROBUST control, *INTONATION (Phonetics), *VOICEPRINTS
Abstract: • Proposed an unsupervised segment-based method for robust voice activity detection. • Proposed modified rVAD that uses computationally fast spectral flatness calculation. • Evaluated rVAD in terms of VAD performance using RATS and Aurora-2 databases. • Evaluated rVAD in terms of speaker verification performance using RedDots 2016. • rVAD showed favorable performance on various difficult tasks over existing methods. This paper presents an unsupervised segment-based method for robust voice activity detection (rVAD). The method consists of two passes of denoising followed by a voice activity detection (VAD) stage. In the first pass, high-energy segments in a speech signal are detected by using a posteriori signal-to-noise ratio (SNR) weighted energy difference and if no pitch is detected within a segment, the segment is considered as a high-energy noise segment and set to zero. In the second pass, the speech signal is denoised by a speech enhancement method, for which several methods are explored. Next, neighbouring frames with pitch are grouped together to form pitch segments, and based on speech statistics, the pitch segments are further extended from both ends in order to include both voiced and unvoiced sounds and likely non-speech parts as well. In the end, a posteriori SNR weighted energy difference is applied to the extended pitch segments of the denoised speech signal for detecting voice activity. We evaluate the VAD performance of the proposed method using two databases, RATS and Aurora-2, which contain a large variety of noise conditions. The rVAD method is further evaluated, in terms of speaker verification performance, on the RedDots 2016 challenge database and its noise-corrupted versions. Experiment results show that rVAD is compared favourably with a number of existing methods. In addition, we present a modified version of rVAD where computationally intensive pitch extraction is replaced by computationally efficient spectral flatness calculation. The modified version significantly reduces the computational complexity at the cost of moderately inferior VAD performance, which is an advantage when processing a large amount of data and running on low resource devices. The source code of rVAD is made publicly available. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

4. Guided spectrogram filtering for speech dereverberation.

Author: Zheng, Chengshi, Tan, Zheng-Hua, Peng, Renhua, and Li, Xiaodong
Subjects: *ACOUSTIC vibrations, *AUDITORY pathways, *SOUND reverberation, *SPEECH processing systems, *AUTOMATIC speech recognition
Abstract: Guided filtering is a computationally efficient and powerful technique used in image processing applications, such as edge-preserving smoothing, details enhancing and single image dehazing. In this paper, we propose a novel single channel speech dereverberation method using guided spectrogram filtering by considering a speech spectrogram as an image. The proposed method requires neither room acoustic parameter estimation nor late reverberant spectral variance estimation. Objective test results show the validity of the guided spectrogram filtering method for speech dereverberation. Compared with state-of-the-art speech dereverberation methods, the proposed method has better performance in terms of perceptual evaluation of speech quality (PESQ), speech-to-reverberation modulation energy ratio (SRMR) and short-time objective intelligibility (STOI) in most cases. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

5. A perceptually motivated LP residual estimator in noisy and reverberant environments.

Author: Peng, Renhua, Tan, Zheng-Hua, Li, Xiaodong, and Zheng, Chengshi
Subjects: *AUTOMATIC speech recognition, *ADDITIVE white Gaussian noise, *SINGULAR value decomposition, *PERFORMANCE evaluation, *SIGNAL filtering
Abstract: Both reverberation and additive noise can degrade the quality of recorded speech and thus should be suppressed simultaneously. Previous studies have shown that the generalized singular value decomposition (GSVD) has the capability of suppressing the additive noise effectively, but it is not often applied for speech dereverberation since reverberation is considered to be convolutive as well as colored noise. Recently, we revealed that late reverberation is also additive and relatively white interference component in the linear prediction (LP) residual domain. To suppress both late reverberation and additive noise, we have proposed an optimal filter for LP residual estimator (LPRE) based on a constrained minimum mean square error (CMMSE) by using GSVD in single channel speech enhancement, where the algorithm is referred as CMMSE-GSVD-LPRE. Experimental results have shown a better performance of the CMMSE-GSVD-LPRE than spectral subtraction methods, but some residual noise and reverberation components are still audible and annoying. To solve this problem, this paper incorporates the masking properties of the human auditory system in the LP residual domain to further suppress these residual noise and reverberation components while reducing speech distortion at the same time. Various simulation experiments are conducted, and the results show an improved performance of the proposed algorithm. Experimental results with speech recorded in noisy and reverberant environments further confirm the effectiveness of the proposed algorithm in real-world environments. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

6. Incorporating pass-phrase dependent background models for text-dependent speaker verification.

Author: Sarkar, Achintya Kumar and Tan, Zheng-Hua
Subjects: *PHRASE structure grammar, *ORATORS, *LIKELIHOOD ratio tests, *HIDDEN Markov models, *AUTOMATIC speech recognition
Abstract: In this paper, we propose pass-phrase dependent background models (PBMs) for text-dependent (TD) speaker verification (SV) to integrate the pass-phrase identification process into the conventional TD-SV system, where a PBM is derived from a text-independent background model through adaptation using the utterances of a particular pass-phrase. During training, pass-phrase specific target speaker models are derived from the particular PBM using the training data for the respective target model. While testing, the best PBM is first selected for the test utterance in the maximum likelihood (ML) sense and the selected PBM is then used for the log likelihood ratio (LLR) calculation with respect to the claimant model. The proposed method incorporates the pass-phrase identification step in the LLR calculation, which is not considered in conventional standalone TD-SV systems. The performance of the proposed method is compared to conventional text-independent background model based TD-SV systems using either Gaussian mixture model (GMM)-universal background model (UBM) or hidden Markov model (HMM)-UBM or i-vector paradigms. In addition, we consider two approaches to build PBMs: speaker-independent and speaker-dependent. We show that the proposed method significantly reduces the error rates of text-dependent speaker verification for the non-target types: target-wrong and impostor-wrong while it maintains comparable TD-SV performance when impostors speak a correct utterance with respect to the conventional system. Experiments are conducted on the RedDots challenge and the RSR2015 databases that consist of short utterances. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

7. Speech Recognition on Mobile Devices

Author: Tan, Zheng-Hua and Lindberg, Børge
Subjects: automatic speech recognition, text entry, mobile device
Abstract: The enthusiasm of deploying automatic speech recognition (ASR) onmobile devices is driven both by remarkable advances in ASR technology andby the demand for efficient user interfaces on such devices as mobile phonesand personal digital assistants (PDAs). This chapter presents an overview ofASR in the mobile context covering motivations, challenges, fundamentaltechniques and applications. Three ASR architectures are introduced: embeddedspeech recognition, distributed speech recognition and network speechrecognition. Their pros and cons and implementation issues are discussed.Applications within command and control, text entry and search are presentedwith an emphasis on mobile text entry.
Published: 2010

8. Variable Frame Rate Analysis for Automatic Speech Recognition

Author: Tan, Zheng-Hua
Subjects: Automatic speech recognition, Variable frame rate analysis
Published: 2007

9. Joint variable frame rate and length analysis for speech recognition under adverse conditions.

Author: Tan, Zheng-Hua and Kraljevski, Ivan
Subjects: *AUTOMATIC speech recognition, *SIGNAL-to-noise ratio, *ROBUST control, *DIGITAL signal processing, *SPEECH processing systems
Abstract: This paper presents a method that combines variable frame length and rate analysis for speech recognition in noisy environments, together with an investigation of the effect of different frame lengths on speech recognition performance. The method adopts frame selection using an a posteriori signal-to-noise (SNR) ratio weighted energy distance and increases the length of the selected frames, according to the number of non-selected preceding frames. It assigns a higher frame rate and a normal frame length to a rapidly changing and high SNR region of a speech signal, and a lower frame rate and an increased frame length to a steady or low SNR region. The speech recognition results show that the proposed variable frame rate and length method outperforms fixed frame rate and length analysis, as well as standalone variable frame rate analysis in terms of noise-robustness. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

10. A Joint Approach for Single-Channel Speaker Identification and Speech Separation.

Author: Mowlaee, Pejman, Saeidi, Rahim, Christensen, Mads Græsbøll, Tan, Zheng-Hua, Kinnunen, Tomi, Franti, Pasi, and Jensen, Søren Holdt
Subjects: AUTOMATIC speech recognition, SPEECH processing systems, MATHEMATICAL models, HIDDEN Markov models, SPEECH coding, PARAMETER estimation, SIGNAL-to-noise ratio, ALGORITHMS
Abstract: In this paper, we present a novel system for joint speaker identification and speech separation. For speaker identification a single-channel speaker identification algorithm is proposed which provides an estimate of signal-to-signal ratio (SSR) as a by-product. For speech separation, we propose a sinusoidal model-based algorithm. The speech separation algorithm consists of a double-talk/single-talk detector followed by a minimum mean square error estimator of sinusoidal parameters for finding optimal codevectors from pre-trained speaker codebooks. In evaluating the proposed system, we start from a situation where we have prior information of codebook indices, speaker identities and SSR-level, and then, by relaxing these assumptions one by one, we demonstrate the efficiency of the proposed fully blind system. In contrast to previous studies that mostly focus on automatic speech recognition (ASR) accuracy, here, we report the objective and subjective results as well. The results show that the proposed system performs as well as the best of the state-of-the-art in terms of perceived quality while its performance in terms of speaker identification and automatic speech recognition results are generally lower. It outperforms the state-of-the-art in terms of intelligibility showing that the ASR results are not conclusive. The proposed method achieves on average, 52.3% ASR accuracy, 41.2 points in MUSHRA and 85.9% in speech intelligibility. [ABSTRACT FROM PUBLISHER]
Published: 2012
Full Text: View/download PDF

11. Automatic speech recognition over error-prone wireless networks

Author: Tan, Zheng-Hua, Dalsgaard, Paul, and Lindberg, Børge
Subjects: *AUTOMATIC speech recognition, *COMPUTER input-output equipment, *SPEECH perception, *WIRELESS communications
Abstract: Abstract: The past decade has witnessed a growing interest in deploying automatic speech recognition (ASR) in communication networks. The networks such as wireless networks present a number of challenges due to e.g. bandwidth constraints and transmission errors. The introduction of distributed speech recognition (DSR) largely eliminates the bandwidth limitations and the presence of transmission errors becomes the key robustness issue. This paper reviews the techniques that have been developed for ASR robustness against transmission errors. In the paper, a model of network degradations and robustness techniques is presented. These techniques are classified into three categories: error detection, error recovery and error concealment (EC). A one-frame error detection scheme is described and compared with a frame-pair scheme. As opposed to vector level techniques a technique for error detection and EC at the sub-vector level is presented. A number of error recovery techniques such as forward error correction and interleaving are discussed in addition to a review of both feature-reconstruction and ASR-decoder based EC techniques. To enable the comparison of some of these techniques, evaluation has been conduced on the basis of the same speech database and channel. Special attention is given to the unique characteristics of DSR as compared to streaming audio e.g. voice-over-IP. Additionally, a technique for adapting ASR to the varying quality of networks is presented. The frame-error-rate is here used to adjust the discrimination threshold with the goal of optimising out-of-vocabulary detection. This paper concludes with a discussion of applicability of different techniques based on the channel characteristics and the system requirements. [Copyright &y& Elsevier]
Published: 2005
Full Text: View/download PDF

12. Speech Recognition in Mobile Phones

Author: Varga, Imre, Kiss, Imre, Singh, Sameer, editor, Tan, Zheng-Hua, and Lindberg, Børge
Published: 2008
Full Text: View/download PDF

13. Speech Recognition Over IP Networks

Author: Kim, Hong Kook, Singh, Sameer, editor, Tan, Zheng-Hua, and Lindberg, Børge
Published: 2008
Full Text: View/download PDF

14. Error Concealment

Author: Haeb-Umbach, Reinhold, Ion, Valentin, Singh, Sameer, editor, Tan, Zheng-Hua, and Lindberg, Børge
Published: 2008
Full Text: View/download PDF

15. Fixed-Point Arithmetic

Author: Bocchieri, Enrico, Singh, Sameer, editor, Tan, Zheng-Hua, and Lindberg, Børge
Published: 2008
Full Text: View/download PDF

16. Speech Recognition Over Mobile Networks

Author: Kim, Hong Kook, Rose, Richard C., Singh, Sameer, editor, Tan, Zheng-Hua, and Lindberg, Børge
Published: 2008
Full Text: View/download PDF

17. Speech Coding and Packet Loss Effects on Speech and Speaker Recognition

Author: Besacier, Laurent, Singh, Sameer, editor, Tan, Zheng-Hua, and Lindberg, Børge
Published: 2008
Full Text: View/download PDF

18. Automatic speech recognition on mobile devices and over communication networks

Author: Tan, Zheng-Hua and Lindberg, Børge
Subjects: Communication networks, Automatic speech recognition, Mobile devices
Published: 2008

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

18 results on '"Tan, Zheng-Hua"'

1. Network, Distributed and Embedded Speech Recognition: An Overview

2. On Training Targets and Activation Functions for Deep Representation Learning in Text-Dependent Speaker Verification.

3. rVAD: An unsupervised segment-based robust voice activity detection method.

4. Guided spectrogram filtering for speech dereverberation.

5. A perceptually motivated LP residual estimator in noisy and reverberant environments.

6. Incorporating pass-phrase dependent background models for text-dependent speaker verification.

7. Speech Recognition on Mobile Devices

8. Variable Frame Rate Analysis for Automatic Speech Recognition

9. Joint variable frame rate and length analysis for speech recognition under adverse conditions.

10. A Joint Approach for Single-Channel Speaker Identification and Speech Separation.

11. Automatic speech recognition over error-prone wireless networks

12. Speech Recognition in Mobile Phones

13. Speech Recognition Over IP Networks

14. Error Concealment

15. Fixed-Point Arithmetic

16. Speech Recognition Over Mobile Networks

17. Speech Coding and Packet Loss Effects on Speech and Speaker Recognition

18. Automatic speech recognition on mobile devices and over communication networks

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

18 results on '"Tan, Zheng-Hua"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources