Back to Search
Start Over
Keyword Search Based on Unsupervised Pre-Trained Acoustic Models
- Source :
- International Journal of Asian Language Processing; September 2021, Vol. 31 Issue: 3-4
- Publication Year :
- 2021
-
Abstract
- Speech keyword search (KWS) is the task of automatically detecting the required keywords in continuous speech. Single-keyword detection can be regarded as the task of speech keyword wake-up. For many practical applications of these small vocabulary speech recognition tasks, it is costly and unnecessary to build a full large vocabulary speech recognition system. For tasks related to speech keyword search, insufficiency in data resources remains the main challenge so far. Speech pre-training has become an effective technique, showing its superiority in a variety of tasks. The key idea is to learn effective representations in settings where a large amount of unlabeled data is available to improve the performance while labeled data of downstream tasks are limited. This research focuses on the combination of unsupervised pre-training and keyword search based on the Keyword-Filler model and introduces unsupervised pre-training into speech keyword search. The research selects pre-trained model architecture Wav2vec2.0 including XLSR. The research results show that training with feature extracted by pre-trained model performs better than the baseline. In the case of low-resource condition, the baseline performance drops significantly, while the performance of the pre-trained tuned model does not decrease but even increases slightly in some intervals. It can be seen that the pre-trained model can be tuned to achieve better performance on very little data. This shows the advantage and application value of keyword search based on unsupervised pre-training.
Details
- Language :
- English
- ISSN :
- 27175545 and 2424791X
- Volume :
- 31
- Issue :
- 3-4
- Database :
- Supplemental Index
- Journal :
- International Journal of Asian Language Processing
- Publication Type :
- Periodical
- Accession number :
- ejs60623768
- Full Text :
- https://doi.org/10.1142/S2717554522500059