8 results on '"Shiv Naga Prasad Vitaladevuni"'
Search Results
2. Low-Bit Quantization and Quantization-Aware Training for Small-Footprint Keyword Spotting
- Author
-
Chris Beauchene, Yuriy Mishchenko, Oleg Rybakov, Shiv Naga Prasad Vitaladevuni, Spyros Matsoukas, Ming Sun, and Yusuf Goren
- Subjects
Artificial neural network ,Computer science ,Low bit ,Quantization (signal processing) ,Small footprint ,020208 electrical & electronic engineering ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Quantization (physics) ,Keyword spotting ,0202 electrical engineering, electronic engineering, information engineering ,Algorithm ,0105 earth and related environmental sciences - Abstract
In this paper, we investigate novel quantization approaches to reduce memory and computational footprint of deep neural network (DNN) based keyword spotters (KWS). We propose a new method for KWS offline and online quantization, which we call dynamic quantization, where we quantize DNN weight matrices column-wise, using each column's exact individual min-max range, and the DNN layers' inputs and outputs are quantized for every input audio frame individually, using the exact min-max range of each input and output vector. We further apply a new quantization-aware training approach that allows us to incorporate quantization errors into KWS model during training. Together, these approaches allow us to significantly improve the performance of KWS in 4-bit and 8-bit quantized precision, achieving the end-to-end accuracy close to that of full precision models while reducing the models' on-device memory footprint by up to 80%.
- Published
- 2019
- Full Text
- View/download PDF
3. Sub-band Convolutional Neural Networks for Small-footprint Spoken Term Classification
- Author
-
Shiv Naga Prasad Vitaladevuni, Yixin Gao, Chao Wang, Ming Sun, and Chieh-Chi Kao
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,Computer science ,Property (programming) ,Computation ,Speech recognition ,Small footprint ,Computer Science::Neural and Evolutionary Computation ,Convolutional neural network ,Computer Science - Sound ,Term (time) ,Computer Science::Sound ,Audio and Speech Processing (eess.AS) ,Kernel (statistics) ,Keyword spotting ,Feature (machine learning) ,FOS: Electrical engineering, electronic engineering, information engineering ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
This paper proposes a Sub-band Convolutional Neural Network for spoken term classification. Convolutional neural networks (CNNs) have proven to be very effective in acoustic applications such as spoken term classification, keyword spotting, speaker identification, acoustic event detection, etc. Unlike applications in computer vision, the spatial invariance property of 2D convolutional kernels does not fit acoustic applications well since the meaning of a specific 2D kernel varies a lot along the feature axis in an input feature map. We propose a sub-band CNN architecture to apply different convolutional kernels on each feature sub-band, which makes the overall computation more efficient. Experimental results show that the computational efficiency brought by sub-band CNN is more beneficial for small-footprint models. Compared to a baseline full band CNN for spoken term classification on a publicly available Speech Commands dataset, the proposed sub-band CNN architecture reduces the computation by 39.7% on commands classification, and 49.3% on digits classification with accuracy maintained., Comment: Accepted by Interspeech 2019
- Published
- 2019
- Full Text
- View/download PDF
4. An Empirical Study of Cross-Lingual Transfer Learning Techniques for Small-Footprint Keyword Spotting
- Author
-
Nikko Strom, Spyros Matsoukas, Ming Sun, Andreas Schwarz, Minhua Wu, and Shiv Naga Prasad Vitaladevuni
- Subjects
Empirical research ,Artificial neural network ,Computer science ,020204 information systems ,Test set ,Speech recognition ,Keyword spotting ,0202 electrical engineering, electronic engineering, information engineering ,Leverage (statistics) ,020201 artificial intelligence & image processing ,02 engineering and technology ,Transfer of learning ,Hidden Markov model - Abstract
This paper presents our work on building a small-footprint keyword spotting system for a resource-limited language, which requires low CPU, memory and latency. Our keyword spotting system consists of deep neural network (DNN) and hidden Markov model (HMM), which is a hybrid DNN-HMM decoder. We investigate different transfer learning techniques to leverage knowledge and data from a resource-abundant source language to improve the keyword DNN training for a target language which has limited in-domain data. The approaches employed in this paper include training a DNN using source language data to initialize the target language DNN training, mixing data from source and target languages together in a multi-task DNN training setup, using logits computed from a DNN trained on the source language data to regularize the keyword DNN training in the target language, as well as combinations of these techniques. Given different amounts of target language training data, our experimental results show that these transfer learning techniques successfully improve keyword spotting performance for the target language, measured by the area under the curve (AUC) of DNN-HMM decoding detection error tradeoff (DET) curves using a large in-house far-field test set.
- Published
- 2017
- Full Text
- View/download PDF
5. Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting
- Author
-
Nikko Strom, Arindam Mandal, Anirudh Raju, Shiv Naga Prasad Vitaladevuni, Geng-Shen Fu, Spyros Matsoukas, Sankaran Panchapagesan, George Tucker, and Ming Sun
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Artificial neural network ,Computer science ,Speech recognition ,education ,Initialization ,Machine Learning (stat.ML) ,Context (language use) ,02 engineering and technology ,Machine Learning (cs.LG) ,Reduction (complexity) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Computer Science - Learning ,Statistics - Machine Learning ,Keyword spotting ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Latency (engineering) ,0305 other medical science ,Hidden Markov model ,Computation and Language (cs.CL) ,Smoothing - Abstract
We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance. Our experimental results show that LSTM models trained using cross-entropy loss or max-pooling loss outperform a cross-entropy loss trained baseline feed-forward Deep Neural Network (DNN). In addition, max-pooling loss trained LSTM with randomly initialized network performs better compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss trained LSTM initialized with a cross-entropy pre-trained network shows the best performance, which yields $67.6\%$ relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure.
- Published
- 2017
6. Multi-Task Learning and Weighted Cross-Entropy for DNN-Based Keyword Spotting
- Author
-
Shiv Naga Prasad Vitaladevuni, Ming Sun, Aparna Khare, Sankaran Panchapagesan, Spyros Matsoukas, Arindam Mandal, and Bjorn Hoffmeister
- Subjects
Cross entropy ,Computer science ,Speech recognition ,Keyword spotting ,0202 electrical engineering, electronic engineering, information engineering ,Multi-task learning ,020206 networking & telecommunications ,020201 artificial intelligence & image processing ,02 engineering and technology - Published
- 2016
- Full Text
- View/download PDF
7. Model Compression Applied to Small-Footprint Keyword Spotting
- Author
-
Ming Sun, Sankaran Panchapagesan, Shiv Naga Prasad Vitaladevuni, George Tucker, Minhua Wu, and Geng-Shen Fu
- Subjects
Model compression ,Computer science ,Keyword spotting ,Small footprint ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,Data mining ,computer.software_genre ,010301 acoustics ,01 natural sciences ,computer - Published
- 2016
- Full Text
- View/download PDF
8. Model Shrinking for Embedded Keyword Spotting
- Author
-
Varun K. Nagaraja, Bjorn Hoffmeister, Shiv Naga Prasad Vitaladevuni, and Ming Sun
- Subjects
Support vector machine ,Computer science ,business.industry ,Keyword spotting ,Pattern recognition ,Feature selection ,Artificial intelligence ,business ,Classifier (UML) - Abstract
In this paper we present two approaches to improve computational efficiency of a keyword spotting system running on a resource constrained device. This embedded keyword spotting system detects a pre-specified keyword in real time at low cost of CPU and memory. Our system is a two stage cascade. The first stage extracts keyword hypotheses from input audio streams. After the first stage is triggered, hand-crafted features are extracted from the keyword hypothesis and fed to a support vector machine (SVM) classifier on the second stage. This paper focuses on improving the computational efficiency of the second stage SVM classifier. More specifically, select a subset of feature dimensions and merge the SVM classifier to a smaller size, while maintaining the keyword spotting performance. Experimental results indicate that we can remove more than 36% of the non-discriminative SVM features, and reduce the number of support vectors by more than 60% without significant performance degradation. This results in more than 15% relative reduction in CPU utilization.
- Published
- 2015
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.