Start Over

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

Authors :: Liang Lu
Eric Sun
Rui Zhao
Naoyuki Kanda
Xie Chen
Jinyu Li
Zhong Meng
Yifan Gong
Yashesh Gaur
Sarangarajan Parthasarathy
Source :: SLT
Publication Year :: 2020
Abstract: The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models. In this work, we propose an internal LM estimation (ILME) method to facilitate a more effective integration of the external LM with all pre-existing E2E models with no additional model training, including the most popular recurrent neural network transducer (RNN-T) and attention-based encoder-decoder (AED) models. Trained with audio-transcript pairs, an E2E model implicitly learns an internal LM that characterizes the training data in the source domain. With ILME, the internal LM scores of an E2E model are estimated and subtracted from the log-linear interpolation between the scores of the E2E model and the external LM. The internal LM scores are approximated as the output of an E2E model when eliminating its acoustic components. ILME can alleviate the domain mismatch between training and testing, or improve the multi-domain E2E ASR. Experimented with 30K-hour trained RNN-T and AED models, ILME achieves up to 15.5% and 6.8% relative word error rate reductions from Shallow Fusion on out-of-domain LibriSpeech and in-domain Microsoft production test sets, respectively.<br />8 pages, 2 figures, SLT 2021

Subjects :: FOS: Computer and information sciences
Sound (cs.SD)
Computer Science - Machine Learning
Computer science
Speech recognition
Word error rate
02 engineering and technology
Computer Science - Sound
Machine Learning (cs.LG)
Domain (software engineering)
030507 speech-language pathology & audiology
03 medical and health sciences
End-to-end principle
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
0202 electrical engineering, electronic engineering, information engineering
Computer Science - Computation and Language
020206 networking & telecommunications
Task (computing)
Recurrent neural network
Task analysis
Language model
0305 other medical science
Computation and Language (cs.CL)
Interpolation
Electrical Engineering and Systems Science - Audio and Speech Processing

Details

Language :: English
Database :: OpenAIRE
Journal :: SLT
Accession number :: edsair.doi.dedup.....2824fbed9b02fe7bb5e0cf5a2fc1a484

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources