Author: "Daniel Povey" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Daniel Povey"' showing total 367 results

Start Over Author "Daniel Povey"

367 results on '"Daniel Povey"'

151. Boosted MMI for model and feature-space discriminative training.

Author: Daniel Povey, Dimitri Kanevsky, Brian Kingsbury, Bhuvana Ramabhadran, George Saon, and Karthik Visweswariah
Published: 2008
Full Text: View/download PDF

152. Quick fmllr for speaker adaptation in speech recognition.

Author: Balakrishnan Varadarajan, Daniel Povey, and Selina M. Chu
Published: 2008
Full Text: View/download PDF

153. Universal background model based speech recognition.

Author: Daniel Povey, Selina M. Chu, and Balakrishnan Varadarajan
Published: 2008
Full Text: View/download PDF

154. Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training.

Author: Daniel Povey and Brian Kingsbury
Published: 2007
Full Text: View/download PDF

155. The Impact of ASR on Speech-to-Speech Translation Performance.

Author: Ruhi Sarikaya, Bowen Zhou, Daniel Povey, Mohamed Afify, and Yuqing Gao
Published: 2007
Full Text: View/download PDF

156. The IBM 2006 Gale Arabic ASR System.

Author: Hagen Soltau, George Saon, Brian Kingsbury, Hong-Kwang Jeff Kuo, Lidia Mangu, Daniel Povey, and Geoffrey Zweig
Published: 2007
Full Text: View/download PDF

157. The IBM Rich Transcription Spring 2006 Speech-to-Text System for Lecture Meetings.

Author: Jing Huang 0019, Martin Westphal, Stanley F. Chen, Olivier Siohan, Daniel Povey, Vit Libal, Alvaro Soneiro, Henrik Schulz 0001, Thomas Ross, and Gerasimos Potamianos
Published: 2006
Full Text: View/download PDF

158. Morpheme-Based Language Modeling for Arabic Lvcsr.

Author: Ghinwa F. Choueiter, Daniel Povey, Stanley F. Chen, and Geoffrey Zweig
Published: 2006
Full Text: View/download PDF

159. Secondary Classification for GMM Based Speaker Recognition.

Author: Jason W. Pelecanos, Daniel Povey, and Ganesh N. Ramaswamy
Published: 2006
Full Text: View/download PDF

160. Automated Quality Monitoring in the Call Center with ASR and Maximum Entropy.

Author: Geoffrey Zweig, Olivier Siohan, George Saon, Bhuvana Ramabhadran, Daniel Povey, Lidia Mangu, and Brian Kingsbury
Published: 2006
Full Text: View/download PDF

161. Improvements to fMPE for discriminative training of features.

Author: Daniel Povey
Published: 2005
Full Text: View/download PDF

162. Discriminatively trained features using fMPE for multi-stream audio-visual speech recognition.

Author: Jing Huang 0019 and Daniel Povey
Published: 2005
Full Text: View/download PDF

163. Anatomy of an extremely fast LVCSR decoder.

Author: George Saon, Daniel Povey, and Geoffrey Zweig
Published: 2005
Full Text: View/download PDF

164. fMPE: Discriminatively Trained Features for Speech Recognition.

Author: Daniel Povey, Brian Kingsbury, Lidia Mangu, George Saon, Hagen Soltau, and Geoffrey Zweig
Published: 2005
Full Text: View/download PDF

165. The IBM 2004 Conversational Telephony System for Rich Transcription.

Author: Hagen Soltau, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Saon, and Geoffrey Zweig
Published: 2005
Full Text: View/download PDF

166. Phone duration modeling for LVCSR.

Author: Daniel Povey
Published: 2004
Full Text: View/download PDF

167. Feature space Gaussianization.

Author: George Saon, Satya Dharanipragada, and Daniel Povey
Published: 2004
Full Text: View/download PDF

168. MMI-MAP and MPE-MAP for acoustic model adaptation.

Author: Daniel Povey, Mark J. F. Gales, Do Yeong Kim, and Philip C. Woodland
Published: 2003
Full Text: View/download PDF

169. Discriminative Training for HMM-Based Offline Handwritten Character Recognition.

Author: Roongroj Nopsuwanchai and Daniel Povey
Published: 2003
Full Text: View/download PDF

170. Discriminative map for acoustic model adaptation.

Author: Daniel Povey, Philip C. Woodland, and Mark J. F. Gales
Published: 2003
Full Text: View/download PDF

171. Porting: SwitchBoard to the VoiceMail task.

Author: Mark J. F. Gales, Yuan Dong, Daniel Povey, and Philip C. Woodland
Published: 2003
Full Text: View/download PDF

172. Minimum Phone Error and I-smoothing for improved discriminative training.

Author: Daniel Povey and Philip C. Woodland
Published: 2002
Full Text: View/download PDF

173. LET-Decoder: A WFST-Based Lazy-Evaluation Token-Group Decoder With Exact Lattice Generation

Author: Mahsa Yarmohammadi, Daniel Povey, Hang Lv, Li Ke, Lei Xie, Yiming Wang, and Sanjeev Khudanpur
Subjects: Computer science, Applied Mathematics, Frame (networking), 020206 networking & telecommunications, 02 engineering and technology, Security token, Token passing, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), Electrical and Electronic Engineering, Lazy evaluation, Hidden Markov model, Algorithm, Word (computer architecture), Decoding methods
Abstract: We propose a novel lazy-evaluation token-group decoding algorithm with on-the-fly composition of weighted finite-state transducers (WFSTs) for large vocabulary continuous speech recognition. In the standard on-the-fly composition decoder, a base WFST and one or more incremental WFSTs are composed during decoding, and then token passing algorithm is employed to generate the lattice on the composed search space, resulting in substantial computation overhead. To improve speed, the proposed algorithm adopts 1) a token-group method, which groups tokens with the same state in the base WFST on each frame and limits the capacity of the group and 2) a lazy-evaluation method, which does not expand a token group and its source token groups until it processes a word label during decoding. Experiments show that the proposed decoder works notably up to 3 times faster than the standard on-the-fly composition decoder.
Published: 2021
Full Text: View/download PDF

174. Translations of the Callhome Egyptian Arabic corpus for conversational speech translation.

Author: Gaurav Kumar, Yuan Cao 0007, Ryan Cotterell, Chris Callison-Burch, Daniel Povey, and Sanjeev Khudanpur
Published: 2014

175. New features in the CU-HTK system for transcription of conversational telephone speech.

Author: Thomas Hain, Philip C. Woodland, Gunnar Evermann, and Daniel Povey
Published: 2001
Full Text: View/download PDF

176. Improved discriminative training techniques for large vocabulary continuous speech recognition.

Author: Daniel Povey and Philip C. Woodland
Published: 2001
Full Text: View/download PDF

177. Krylov Subspace Descent for Deep Learning.

Author: Oriol Vinyals and Daniel Povey
Published: 2012

178. A basis representation of constrained MLLR transforms for robust adaptation.

Author: Daniel Povey and Kaisheng Yao
Published: 2012
Full Text: View/download PDF

179. Minimum Bayes Risk decoding and system combination based on a recursion for edit distance.

Author: Haihua Xu, Daniel Povey, Lidia Mangu, and Jie Zhu 0006
Published: 2011
Full Text: View/download PDF

180. The subspace Gaussian mixture model - A structured model for speech recognition.

Author: Daniel Povey, Lukás Burget, Mohit Agarwal 0005, Pinar Akyazi, Kai Feng, Arnab Ghoshal, Ondrej Glembek, Nagendra K. Goel, Martin Karafiát, Ariya Rastrow, Richard C. Rose, Petr Schwarz, and Samuel Thomas 0001
Published: 2011
Full Text: View/download PDF

181. Frame discrimination training for HMMs for large vocabulary speech recognition.

Author: Daniel Povey and Philip C. Woodland
Published: 1999
Full Text: View/download PDF

182. Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program.

Author: Hagen Soltau, George Saon, Brian Kingsbury, Hong-Kwang Jeff Kuo, Lidia Mangu, Daniel Povey, and Ahmad Emami
Published: 2009
Full Text: View/download PDF

183. Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging.

Author: Daniel Povey, Xiaohui Zhang 0007, and Sanjeev Khudanpur
Published: 2015

184. MUSAN: A Music, Speech, and Noise Corpus.

Author: David Snyder, Guoguo Chen, and Daniel Povey
Published: 2015

185. Advances in speech transcription at IBM under the DARPA EARS program.

Author: Stanley F. Chen, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Saon, Hagen Soltau, and Geoffrey Zweig
Published: 2006
Full Text: View/download PDF

186. Automatic transcription of conversational telephone speech.

Author: Thomas Hain, Philip C. Woodland, Gunnar Evermann, Mark J. F. Gales, Xunying Liu, Gareth L. Moore, Daniel Povey, and Lan Wang
Published: 2005
Full Text: View/download PDF

187. Large scale discriminative training of hidden Markov models for speech recognition.

Author: Philip C. Woodland and Daniel Povey
Published: 2002
Full Text: View/download PDF

188. SPAM and full covariance for speech recognition.

Author: Daniel Povey
Published: 2006
Full Text: View/download PDF

189. Feature and model space speaker adaptation with full covariance Gaussians.

Author: Daniel Povey and George Saon
Published: 2006
Full Text: View/download PDF

190. Automated Quality Monitoring for Call Centers using Speech and NLP Technologies.

Author: Geoffrey Zweig, Olivier Siohan, George Saon, Bhuvana Ramabhadran, Daniel Povey, Lidia Mangu, and Brian Kingsbury
Published: 2006

191. A Parallelizable Lattice Rescoring Strategy with Neural Language Models

Author: Daniel Povey, Sanjeev Khudanpur, and Li Ke
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Signal processing, Computer Science - Computation and Language, Parallelizable manifold, Computer science, High Energy Physics::Lattice, Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing), Lattice expansion, Computer Science - Sound, Audio and Speech Processing (eess.AS), Lattice (order), Path (graph theory), FOS: Electrical engineering, electronic engineering, information engineering, Signal processing algorithms, Language model, Computation and Language (cs.CL), Algorithm, Decoding methods, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This paper proposes a parallel computation strategy and a posterior-based lattice expansion algorithm for efficient lattice rescoring with neural language models (LMs) for automatic speech recognition. First, lattices from first-pass decoding are expanded by the proposed posterior-based lattice expansion algorithm. Second, each expanded lattice is converted into a minimal list of hypotheses that covers every arc. Each hypothesis is constrained to be the best path for at least one arc it includes. For each lattice, the neural LM scores of the minimal list are computed in parallel and are then integrated back to the lattice in the rescoring stage. Experiments on the Switchboard dataset show that the proposed rescoring strategy obtains comparable recognition performance and generates more compact lattices than a competitive baseline method. Furthermore, the parallel rescoring method offers more flexibility by simplifying the integration of PyTorch-trained neural LMs for lattice rescoring with Kaldi., To appear at ICASSP 2021. 5 pages, 1 figure
Published: 2021
Full Text: View/download PDF

192. speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment

Author: Zhiyong Yan, Daniel Povey, Yongqing Wang, Zhiwen Zhang, Qiong Song, Yujun Wang, Huang Yukai, Junbo Zhang, and Ke Li
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, business.industry, Computer science, Speech corpus, Pronunciation, computer.software_genre, ComputingMethodologies_ARTIFICIALINTELLIGENCE, Computer Science - Sound, Native english, Open source, Workflow, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Baseline system, Artificial intelligence, business, Computation and Language (cs.CL), computer, Natural language processing, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This paper introduces a new open-source speech corpus named "speechocean762" designed for pronunciation assessment use, consisting of 5000 English utterances from 250 non-native speakers, where half of the speakers are children. Five experts annotated each of the utterances at sentence-level, word-level and phoneme-level. A baseline system is released in open source to illustrate the phoneme-level pronunciation assessment workflow on this corpus. This corpus is allowed to be used freely for commercial and non-commercial purposes. It is available for free download from OpenSLR, and the corresponding baseline system is published in the Kaldi speech recognition toolkit., Accepted in INTERSPEECH 2021
Published: 2021

193. DOVER-Lap: A Method for Combining Overlap-Aware Diarization Outputs

Author: Andreas Stolcke, Sanjeev Khudanpur, Shinji Watanabe, Leibny Paola Garcia-Perera, Daniel Povey, Desh Raj, and Zili Huang
Subjects: FOS: Computer and information sciences, Beamforming, Sound (cs.SD), Majority rule, Voice activity detection, Computer science, Speech recognition, Region proposal, Approximation algorithm, 020206 networking & telecommunications, 02 engineering and technology, Computer Science - Sound, Speaker diarisation, 030507 speech-language pathology & audiology, 03 medical and health sciences, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, 0202 electrical engineering, electronic engineering, information engineering, 0305 other medical science, Natural language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Several advances have been made recently towards handling overlapping speech for speaker diarization. Since speech and natural language tasks often benefit from ensemble techniques, we propose an algorithm for combining outputs from such diarization systems through majority voting. Our method, DOVER-Lap, is inspired from the recently proposed DOVER algorithm, but is designed to handle overlapping segments in diarization outputs. We also modify the pair-wise incremental label mapping strategy used in DOVER, and propose an approximation algorithm based on weighted k-partite graph matching, which performs this mapping using a global cost tensor. We demonstrate the strength of our method by combining outputs from diverse systems -- clustering-based, region proposal networks, and target-speaker voice activity detection -- on AMI and LibriCSS datasets, where it consistently outperforms the single best system. Additionally, we show that DOVER-Lap can be used for late fusion in multichannel diarization, and compares favorably with early fusion methods like beamforming., Comment: Accepted to IEEE SLT 2021
Published: 2021
Full Text: View/download PDF

194. An Asynchronous WFST-Based Decoder For Automatic Speech Recognition

Author: Hang Lv, Zhehuai Chen, Lei Xie, Daniel Povey, Sanjeev Khudanpur, and Hainan Xu
Subjects: FOS: Computer and information sciences, Signal processing, Sound (cs.SD), Computer science, Speech recognition, Computation, Process (computing), Data_CODINGANDINFORMATIONTHEORY, Computer Science - Sound, Asynchronous communication, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Overhead (computing), Language model, Pruning (decision trees), Decoding methods, Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science::Information Theory
Abstract: We introduce asynchronous dynamic decoder, which adopts an efficient A* algorithm to incorporate big language models in the one-pass decoding for large vocabulary continuous speech recognition. Unlike standard one-pass decoding with on-the-fly composition decoder which might induce a significant computation overhead, the asynchronous dynamic decoder has a novel design where it has two fronts, with one performing "exploration" and the other "backfill". The computation of the two fronts alternates in the decoding process, resulting in more effective pruning than the standard one-pass decoding with an on-the-fly composition decoder. Experiments show that the proposed decoder works notably faster than the standard one-pass decoding with on-the-fly composition decoder, while the acceleration will be more obvious with the increment of data complexity., Comment: 5 pages, 5 figures, icassp
Published: 2021
Full Text: View/download PDF

195. Wake Word Detection with Streaming Transformers

Author: Daniel Povey, Hang Lv, Lei Xie, Sanjeev Khudanpur, and Yiming Wang
Subjects: FOS: Computer and information sciences, Sequence, Sound (cs.SD), Computer Science - Computation and Language, Artificial neural network, Computer science, Computer Science - Sound, Power (physics), Constant false alarm rate, Convolution, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Time complexity, Algorithm, Computation and Language (cs.CL), Word (computer architecture), Electrical Engineering and Systems Science - Audio and Speech Processing, Transformer (machine learning model)
Abstract: Modern wake word detection systems usually rely on neural networks for acoustic modeling. Transformers has recently shown superior performance over LSTM and convolutional networks in various sequence modeling tasks with their better temporal modeling power. However it is not clear whether this advantage still holds for short-range temporal modeling like wake word detection. Besides, the vanilla Transformer is not directly applicable to the task due to its non-streaming nature and the quadratic time and space complexity. In this paper we explore the performance of several variants of chunk-wise streaming Transformers tailored for wake word detection in a recently proposed LF-MMI system, including looking-ahead to the next chunk, gradient stopping, different positional embedding methods and adding same-layer dependency between chunks. Our experiments on the Mobvoi wake word dataset demonstrate that our proposed Transformer model outperforms the baseline convolution network by 25% on average in false rejection rate at the same false alarm rate with a comparable model size, while still maintaining linear complexity w.r.t. the sequence length., Comment: Accepted at IEEE ICASSP 2021. 5 pages, 3 figures
Published: 2021
Full Text: View/download PDF

196. GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

Author: Yujun Wang, Wei Zou, Guoguo Chen, Shuaijiang Zhao, Guan-Bo Wang, Mingjie Jin, Yongqing Wang, Wei-Qiang Zhang, Jiayu Du, Shuzhou Chai, Daniel Povey, Zhiyong Yan, Jan Trmal, Shinji Watanabe, Xuchen Yao, Sanjeev Khudanpur, Zhao You, Dan Su, Junbo Zhang, Chao Weng, and Xiangang Li
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Computer science, Speech recognition, Word error rate, Filter (signal processing), Variety (linguistics), Pipeline (software), Computer Science - Sound, Multi domain, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Segmentation, Transcription (software), Computation and Language (cs.CL), Sentence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 40,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. A new forced alignment and segmentation pipeline is proposed to create sentence segments suitable for speech recognition training, and to filter out segments with low-quality transcription. For system training, GigaSpeech provides five subsets of different sizes, 10h, 250h, 1000h, 2500h, and 10000h. For our 10,000-hour XL training subset, we cap the word error rate at 4% during the filtering/validation stage, and for all our other smaller training subsets, we cap it at 0%. The DEV and TEST evaluation sets, on the other hand, are re-processed by professional human transcribers to ensure high transcription quality. Baseline systems are provided for popular speech recognition toolkits, namely Athena, ESPnet, Kaldi and Pika.
Published: 2021
Full Text: View/download PDF

197. Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems

Author: Daniel Povey, Hervé Bourlard, Banriskhem K. Khonglah, Sibo Tong, Petr Motlicek, and Srikanth Madikeri
Subjects: Lattice (module), Computer science, Speech recognition, Mutual information
Published: 2020
Full Text: View/download PDF

198. An Alternative to MFCCs for ASR

Author: Sanjeev Khudanpur, Daniel Povey, Hynek Hermansky, Hossein Hadian, and Pegah Ghahramani
Subjects: Computer science, Speech recognition
Published: 2020
Full Text: View/download PDF

199. Neural Language Modeling With Implicit Cache Pointers

Author: Daniel Povey, Ke Li, and Sanjeev Khudanpur
Subjects: Recurrent neural network, Dependency (UML), Perplexity, Audio and Speech Processing (eess.AS), Computer science, Pointer (computer programming), Speech recognition, FOS: Electrical engineering, electronic engineering, information engineering, Treebank, Cache, Language model, Layer (object-oriented design), Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: A cache-inspired approach is proposed for neural language models (LMs) to improve long-range dependency and better predict rare words from long contexts. This approach is a simpler alternative to attention-based pointer mechanism that enables neural LMs to reproduce words from recent history. Without using attention and mixture structure, the method only involves appending extra tokens that represent words in history to the output layer of a neural LM and modifying training supervisions accordingly. A memory-augmentation unit is introduced to learn words that are particularly likely to repeat. We experiment with both recurrent neural network- and Transformer-based LMs. Perplexity evaluation on Penn Treebank and WikiText-2 shows the proposed model outperforms both LSTM and LSTM with attention-based pointer mechanism and is more effective on rare words. N-best rescoring experiments on Switchboard indicate that it benefits both very rare and frequent words. However, it is challenging for the proposed model as well as two other models with attention-based pointer mechanism to obtain good overall WER reductions., To appear at Interspeech 2020
Published: 2020

200. Efficient MDI Adaptation for n-gram Language Models

Author: Ashish Arora, Ke Li, Ruizhe Huang, Sanjeev Khudanpur, and Daniel Povey
Subjects: FOS: Computer and information sciences, Vocabulary, Perplexity, Computer Science - Computation and Language, Computational complexity theory, Computer science, Principle of maximum entropy, media_common.quotation_subject, Word error rate, n-gram, Scalability, Language model, Algorithm, Computation and Language (cs.CL), media_common
Abstract: This paper presents an efficient algorithm for n-gram language model adaptation under the minimum discrimination information (MDI) principle, where an out-of-domain language model is adapted to satisfy the constraints of marginal probabilities of the in-domain data. The challenge for MDI language model adaptation is its computational complexity. By taking advantage of the backoff structure of n-gram model and the idea of hierarchical training method, originally proposed for maximum entropy (ME) language models, we show that MDI adaptation can be computed in linear-time complexity to the inputs in each iteration. The complexity remains the same as ME models, although MDI is more general than ME. This makes MDI adaptation practical for large corpus and vocabulary. Experimental results confirm the scalability of our algorithm on very large datasets, while MDI adaptation gets slightly worse perplexity but better word error rate results compared to simple linear interpolation., To appear in INTERSPEECH 2020. Appendix A of this full version will be filled soon
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Database

Publisher

367 results on '"Daniel Povey"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources