Back to Search
Start Over
Discriminative Keyword Spotting for limited-data applications
- Source :
- Speech Communication. 99:1-11
- Publication Year :
- 2018
- Publisher :
- Elsevier BV, 2018.
-
Abstract
- Mobile devices are widely used around the world, frequently by people speaking local languages or dialects that are not well documented. For these languages, it might not be beneficial for commercial companies to develop Automatic Speech Recognition (ASR) systems, so users of these languages cannot utilize voice activation features (often using Keyword Spotting, KWS) of their devices. Standard KWS methods aim to statistically model the generation process of the speech signal, requiring hours of recorded and transcribed speech for training, and therefore are not adequate for limited-data scenarios. In this paper we propose a new KWS method, suitable for limited-data scenarios, which can be easily applied by developers. The proposed method uses a new histogram representation for words, obtained with respect to a pre-trained Gaussian Mixture Model (GMM). Sentences are represented by fixed-length global feature vectors, extracted from the response curves obtained by a word classifier. Word and sentence classifiers are trained using a discriminative approach, which is typically robust to training-set size. The dataset for training the GMM is easy to obtain, since no annotation is required. We compared the proposed system to a Hidden Markov Model (HMM) based system, trained using the same low data-resources conditions as ours, and to a state-of-the-art ASR system, trained using either the limited data scenario, or using many hours of recorded speech. In the limited data situation, our system performs better then both benchmarks in all experiments except for clean speech of children (CSLU dataset), where it performs as good as the HMM. Since the ASR benchmark performs poorly without enough training data, we also trained it without limiting the available data. In this case the ASR benchmark performs better when tested on speech of adults (TED-LIUM dataset of TED lectures) for all noise conditions, and our system performs better when tested on speech of children with low to moderate SNR values. The results demonstrate the advantages of the proposed system, and the conditions under which it performs better.
- Subjects :
- 0209 industrial biotechnology
Linguistics and Language
Computer science
Communication
Feature vector
Speech recognition
02 engineering and technology
Mixture model
Language and Linguistics
Computer Science Applications
020901 industrial engineering & automation
Discriminative model
Modeling and Simulation
Histogram
Keyword spotting
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Computer Vision and Pattern Recognition
Hidden Markov model
Classifier (UML)
Software
Sentence
Subjects
Details
- ISSN :
- 01676393
- Volume :
- 99
- Database :
- OpenAIRE
- Journal :
- Speech Communication
- Accession number :
- edsair.doi...........32612050d9d4b81a9c5243d1fc2ba94a
- Full Text :
- https://doi.org/10.1016/j.specom.2018.02.003