Author: "Audhkhasi, Kartik" / Topic: statistics - machine learning - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Audhkhasi, Kartik"' showing total 5 results

Start Over Author "Audhkhasi, Kartik" Topic statistics - machine learning

5 results on '"Audhkhasi, Kartik"'

1. Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss

Author: Zeineldeen, Mohammad, Audhkhasi, Kartik, Baskar, Murali Karthick, and Ramabhadran, Bhuvana
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Statistics - Machine Learning
Abstract: This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNN-T) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech to train a student model. Soft distillation is another popular KD method that distills the output logits of the teacher model. Due to the nature of RNN-T alignments, applying soft distillation between RNN-T architectures having different posterior distributions is challenging. In addition, bad teachers having high word-error-rate (WER) reduce the efficacy of KD. We investigate how to effectively distill knowledge from variable quality ASR teachers, which has not been studied before to the best of our knowledge. We show that a sequence-level KD, full-sum distillation, outperforms other distillation methods for RNN-T models, especially for bad teachers. We also propose a variant of full-sum distillation that distills the sequence discriminative knowledge of the teacher leading to further improvement in WER. We conduct experiments on public datasets namely SpeechStew and LibriSpeech, and on in-house production data., Comment: Accepted at ICASSP 2023
Published: 2023

2. Building competitive direct acoustics-to-word models for English conversational speech recognition

Author: Audhkhasi, Kartik, Kingsbury, Brian, Ramabhadran, Bhuvana, Saon, George, and Picheny, Michael
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing, Statistics - Machine Learning
Abstract: Direct acoustics-to-word (A2W) models in the end-to-end paradigm have received increasing attention compared to conventional sub-word based automatic speech recognition models using phones, characters, or context-dependent hidden Markov model states. This is because A2W models recognize words from speech without any decoder, pronunciation lexicon, or externally-trained language model, making training and decoding with such models simple. Prior work has shown that A2W models require orders of magnitude more training data in order to perform comparably to conventional models. Our work also showed this accuracy gap when using the English Switchboard-Fisher data set. This paper describes a recipe to train an A2W model that closes this gap and is at-par with state-of-the-art sub-word based models. We achieve a word error rate of 8.8%/13.9% on the Hub5-2000 Switchboard/CallHome test sets without any decoder or language model. We find that model initialization, training data order, and regularization have the most impact on the A2W model performance. Next, we present a joint word-character A2W model that learns to first spell the word and then recognize it. This model provides a rich output to the user instead of simple word hypotheses, making it especially useful in the case of words unseen or rarely-seen during training., Comment: Submitted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018
Published: 2017

3. Direct Acoustics-to-Word Models for English Conversational Speech Recognition

Author: Audhkhasi, Kartik, Ramabhadran, Bhuvana, Saon, George, Picheny, Michael, and Nahamoo, David
Subjects: Computer Science - Computation and Language, Computer Science - Neural and Evolutionary Computing, Statistics - Machine Learning
Abstract: Recent work on end-to-end automatic speech recognition (ASR) has shown that the connectionist temporal classification (CTC) loss can be used to convert acoustics to phone or character sequences. Such systems are used with a dictionary and separately-trained Language Model (LM) to produce word sequences. However, they are not truly end-to-end in the sense of mapping acoustics directly to words without an intermediate phone representation. In this paper, we present the first results employing direct acoustics-to-word CTC models on two well-known public benchmark tasks: Switchboard and CallHome. These models do not require an LM or even a decoder at run-time and hence recognize speech with minimal complexity. However, due to the large number of word output units, CTC word models require orders of magnitude more data to train reliably compared to traditional systems. We present some techniques to mitigate this issue. Our CTC word model achieves a word error rate of 13.0%/18.8% on the Hub5-2000 Switchboard/CallHome test sets without any LM or decoder compared with 9.6%/16.0% for phone-based CTC with a 4-gram LM. We also present rescoring results on CTC word model lattices to quantify the performance benefits of a LM, and contrast the performance of word and phone CTC models., Comment: Submitted to Interspeech-2017
Published: 2017

4. Invariant Representations for Noisy Speech Recognition

Author: Serdyuk, Dmitriy, Audhkhasi, Kartik, Brakel, Philémon, Ramabhadran, Bhuvana, Thomas, Samuel, and Bengio, Yoshua
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Learning, Computer Science - Sound, Statistics - Machine Learning
Abstract: Modern automatic speech recognition (ASR) systems need to be robust under acoustic variability arising from environmental, speaker, channel, and recording conditions. Ensuring such robustness to variability is a challenge in modern day neural network-based ASR systems, especially when all types of variability are not seen during training. We attempt to address this problem by encouraging the neural network acoustic model to learn invariant feature representations. We use ideas from recent research on image generation using Generative Adversarial Networks and domain adaptation ideas extending adversarial gradient-based training. A recent work from Ganin et al. proposes to use adversarial training for image domain adaptation by using an intermediate representation from the main target classification network to deteriorate the domain classifier performance through a separate neural network. Our work focuses on investigating neural architectures which produce representations invariant to noise conditions for ASR. We evaluate the proposed architecture on the Aurora-4 task, a popular benchmark for noise robust ASR. We show that our method generalizes better than the standard multi-condition training especially when only a few noise categories are seen during training., Comment: 5 pages, 1 figure, 1 table, NIPS workshop on end-to-end speech recognition
Published: 2016

5. Generalized Ambiguity Decomposition for Understanding Ensemble Diversity

Author: Audhkhasi, Kartik, Sethy, Abhinav, Ramabhadran, Bhuvana, and Narayanan, Shrikanth S.
Subjects: Statistics - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Learning, I.5
Abstract: Diversity or complementarity of experts in ensemble pattern recognition and information processing systems is widely-observed by researchers to be crucial for achieving performance improvement upon fusion. Understanding this link between ensemble diversity and fusion performance is thus an important research question. However, prior works have theoretically characterized ensemble diversity and have linked it with ensemble performance in very restricted settings. We present a generalized ambiguity decomposition (GAD) theorem as a broad framework for answering these questions. The GAD theorem applies to a generic convex ensemble of experts for any arbitrary twice-differentiable loss function. It shows that the ensemble performance approximately decomposes into a difference of the average expert performance and the diversity of the ensemble. It thus provides a theoretical explanation for the empirically-observed benefit of fusing outputs from diverse classifiers and regressors. It also provides a loss function-dependent, ensemble-dependent, and data-dependent definition of diversity. We present extensions of this decomposition to common regression and classification loss functions, and report a simulation-based analysis of the diversity term and the accuracy of the decomposition. We finally present experiments on standard pattern recognition data sets which indicate the accuracy of the decomposition for real-world classification and regression problems., Comment: 32 pages, 10 figures
Published: 2013

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results on '"Audhkhasi, Kartik"'

1. Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss

2. Building competitive direct acoustics-to-word models for English conversational speech recognition

3. Direct Acoustics-to-Word Models for English Conversational Speech Recognition

4. Invariant Representations for Noisy Speech Recognition

5. Generalized Ambiguity Decomposition for Understanding Ensemble Diversity

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

5 results on '"Audhkhasi, Kartik"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources