Back to Search
Start Over
Active learning for low-resource speech recognition: Impact of selection size and language modeling data
- Source :
- ICASSP
- Publication Year :
- 2017
- Publisher :
- IEEE, 2017.
-
Abstract
- Active learning aims to reduce the time and cost of developing speech recognition systems by selecting for transcription highly informative subsets from large pools of audio data. Previous evaluations at OpenKWS and IARPA BABEL have investigated data selection for low-resource languages in very constrained scenarios with 2-hour data selections given a 1-hour seed set. We expand on this to investigate what happens with larger selections and fewer constraints on language modeling data. Our results, on four languages from the final BABEL OP3 period, show that active learning is helpful at larger selections with consistent gains up to 14 hours. We also find that the impact of additional language model data is orthogonal to the impact of the active learning selection criteria.
- Subjects :
- Computer science
Low resource
business.industry
Speech recognition
010501 environmental sciences
Machine learning
computer.software_genre
01 natural sciences
Data modeling
030507 speech-language pathology & audiology
03 medical and health sciences
Active learning
Language model
Artificial intelligence
Transcription (software)
0305 other medical science
business
computer
0105 earth and related environmental sciences
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Accession number :
- edsair.doi...........5241d9d4e551cd151b1d589580cd0525