251. Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program
- Author
-
Ahmad Emami, Hong-Kwang Kuo, Hagen Soltau, Lidia Mangu, George Saon, Brian Kingsbury, and Daniel Povey
- Subjects
Acoustics and Ultrasonics ,Artificial neural network ,Machine translation ,business.industry ,Computer science ,Speech recognition ,Decision tree ,Word error rate ,computer.software_genre ,Data modeling ,Linguistic Data Consortium ,Unsupervised learning ,Artificial intelligence ,Language model ,Electrical and Electronic Engineering ,business ,computer ,Natural language processing - Abstract
This paper describes the Arabic broadcast transcription system fielded by IBM in the GALE Phase 2.5 machine translation evaluation. Key advances include the use of additional training data from the Linguistic Data Consortium (LDC), use of a very large vocabulary comprising 737 K words and 2.5 M pronunciation variants, automatic vowelization using flat-start training, cross-adaptation between unvowelized and vowelized acoustic models, and rescoring with a neural-network language model. The resulting system achieves word error rates below 10% on Arabic broadcasts. Very large scale experiments with unsupervised training demonstrate that the utility of unsupervised data depends on the amount of supervised data available. While unsupervised training improves system performance when a limited amount (135 h) of supervised data is available, these gains disappear when a greater amount (848 h) of supervised data is used, even with a very large (7069 h) corpus of unsupervised data. We also describe a method for modeling Arabic dialects that avoids the problem of data sparseness entailed by dialect-specific acoustic models via the use of non-phonetic, dialect questions in the decision trees. We show how this method can be used with a statically compiled decoding graph by partitioning the decision trees into a static component and a dynamic component, with the dynamic component being replaced by a mapping that is evaluated at run-time.
- Published
- 2009
- Full Text
- View/download PDF