1. CRIM and LIUM approaches for Multi-Genre Broadcast Media Transcription
- Author
-
Yannick Estève, Paul Deléglise, Vishwa Gupta, Sylvain Meignier, Gilles Boulianne, Anthony Rousseau, and meignier, sylvain
- Subjects
Artificial neural network ,Computer science ,business.industry ,Speech recognition ,computer.software_genre ,Speaker diarisation ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,Baseline system ,Trigram ,Language model ,Artificial intelligence ,Transcription (software) ,business ,computer ,Change detection ,Decoding methods ,Natural language processing ,ComputingMilieux_MISCELLANEOUS - Abstract
The Multi-Genre Broadcast Challenge at ASRU 2015 is a controlled evaluation of speech recognition, speaker diarization, and lightly supervised alignment using BBC TV recordings. CRIM and LIUM teams participated in the speech recognition part of the challenge with a joint submission. This paper presents the CRIM and LIUM's contributions. Each team made different choices to develop its ASR system. By the way, it was expected to compare and to evaluate different approaches to diarization and acoustic modeling, and to get complementary ASR systems for effective merging. CRIM's main contributions are the use of a training scenario similar to multi-lingual training to estimate the deep neural net (DNN) acoustic models with most of the data, the use of a pruned trigram model for search, in addition to the use of a genre-dependent quadgram language model for rescoring the lattice from the search. For LIUM, the focus was on fast decoding with high accuracy. The final word error rates (WER) after merging show that it is possible to get reasonable WER with automatically aligned files. The final global WER of 25.1% corresponds to a WER reduction of about 20% absolute in comparison to the ASR baseline system provided by the organizers.
- Published
- 2015