Author: "Shiv Naga Prasad Vitaladevuni" / Publisher: ieee - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Shiv Naga Prasad Vitaladevuni"' showing total 12 results

Start Over Author "Shiv Naga Prasad Vitaladevuni" Publisher ieee

12 results on '"Shiv Naga Prasad Vitaladevuni"'

1. Towards Data-Efficient Modeling for Wake Word Spotting

Author: Shiv Naga Prasad Vitaladevuni, Yuriy Mishchenko, Anish Shah, Yixin Gao, and Spyros Matsoukas
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Machine Learning, Computer science, Speech recognition, 020206 networking & telecommunications, Speech corpus, 02 engineering and technology, Semi-supervised learning, Wake, Spotting, Computer Science - Sound, Machine Learning (cs.LG), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Wake word (WW) spotting is challenging in far-field not only because of the interference in signal transmission but also the complexity in acoustic environment. Traditional WW model training requires large amount of in-domain WW-specific data with substantial human annotations. This prevents the model building in the situation of lacking such data. In this paper we present data-efficient solutions to address the challenges in WW modeling, such as domain-mismatch, noisy conditions, limited annotation, etc. Our proposed system is composed of a multi-condition training pipeline with stratified data augmentation, which improves the model robustness to a variety of predefined acoustic conditions, together with a semi-supervised learning pipeline to extract the WW and adversarial examples from untranscribed speech corpus. Starting from only 10 hours of domain-mismatched WW audio, we are able to enlarge and enrich the training dataset by 20-100 times to capture the complexity in acoustic environments. Our experiments on real user data show that the proposed solutions can achieve comparable performance of a production-grade model by saving 97% of the amount of WW-specific data to collect and 86% of the bandwidth for annotation.
Published: 2020
Full Text: View/download PDF

2. Low-Bit Quantization and Quantization-Aware Training for Small-Footprint Keyword Spotting

Author: Chris Beauchene, Yuriy Mishchenko, Oleg Rybakov, Shiv Naga Prasad Vitaladevuni, Spyros Matsoukas, Ming Sun, and Yusuf Goren
Subjects: Artificial neural network, Computer science, Low bit, Quantization (signal processing), Small footprint, 020208 electrical & electronic engineering, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Quantization (physics), Keyword spotting, 0202 electrical engineering, electronic engineering, information engineering, Algorithm, 0105 earth and related environmental sciences
Abstract: In this paper, we investigate novel quantization approaches to reduce memory and computational footprint of deep neural network (DNN) based keyword spotters (KWS). We propose a new method for KWS offline and online quantization, which we call dynamic quantization, where we quantize DNN weight matrices column-wise, using each column's exact individual min-max range, and the DNN layers' inputs and outputs are quantized for every input audio frame individually, using the exact min-max range of each input and output vector. We further apply a new quantization-aware training approach that allows us to incorporate quantization errors into KWS model during training. Together, these approaches allow us to significantly improve the performance of KWS in 4-bit and 8-bit quantized precision, achieving the end-to-end accuracy close to that of full precision models while reducing the models' on-device memory footprint by up to 80%.
Published: 2019
Full Text: View/download PDF

3. Monophone-Based Background Modeling for Two-Stage On-Device Wake Word Detection

Author: Ming Sun, Bjorn Hoffmeister, Minhua Wu, Sankaran Panchapagesan, Arindam Mandal, Ryan Paul Thomas, Jiacheng Gu, and Shiv Naga Prasad Vitaladevuni
Subjects: Artificial neural network, Computer science, Speech recognition, Feature extraction, 020206 networking & telecommunications, 02 engineering and technology, Rejection rate, Speaker recognition, Constant false alarm rate, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, False alarm, Hidden Markov model, Decoding methods
Abstract: Accurate on-device wake word detection is crucial to products with far-field voice control such as the Amazon Echo. It is quite challenging to build a wake word system with both low False Reject Rate (FRR) and low False Alarm Rate (FAR) in real scenarios where there are various types of background speech, music or noise, especially when computational resources on the device is limited. In this paper, we introduce a two-stage wake word system based on Deep Neural Network (DNN) acoustic modeling, propose a new way to model the non-keyword background events using monophone-based units and present how richer information can be extracted from those monophone units for final wake word detection. Under the new system, we could get around 16% relative reduction in FRR when fixing the false alarm level, and about 37% relative reduction in FAR on the other hand if we maintain the miss rate. For the 2nd stage classifier itself, it is able to reduce the false alarm rate relatively by about 67% on top of 1st stage hypothesis with very few computational resources.
Published: 2018
Full Text: View/download PDF

4. An Empirical Study of Cross-Lingual Transfer Learning Techniques for Small-Footprint Keyword Spotting

Author: Nikko Strom, Spyros Matsoukas, Ming Sun, Andreas Schwarz, Minhua Wu, and Shiv Naga Prasad Vitaladevuni
Subjects: Empirical research, Artificial neural network, Computer science, 020204 information systems, Test set, Speech recognition, Keyword spotting, 0202 electrical engineering, electronic engineering, information engineering, Leverage (statistics), 020201 artificial intelligence & image processing, 02 engineering and technology, Transfer of learning, Hidden Markov model
Abstract: This paper presents our work on building a small-footprint keyword spotting system for a resource-limited language, which requires low CPU, memory and latency. Our keyword spotting system consists of deep neural network (DNN) and hidden Markov model (HMM), which is a hybrid DNN-HMM decoder. We investigate different transfer learning techniques to leverage knowledge and data from a resource-abundant source language to improve the keyword DNN training for a target language which has limited in-domain data. The approaches employed in this paper include training a DNN using source language data to initialize the target language DNN training, mixing data from source and target languages together in a multi-task DNN training setup, using logits computed from a DNN trained on the source language data to regularize the keyword DNN training in the target language, as well as combinations of these techniques. Given different amounts of target language training data, our experimental results show that these transfer learning techniques successfully improve keyword spotting performance for the target language, measured by the area under the curve (AUC) of DNN-HMM decoding detection error tradeoff (DET) curves using a large in-house far-field test set.
Published: 2017
Full Text: View/download PDF

5. Model Shrinking for Embedded Keyword Spotting

Author: Varun K. Nagaraja, Bjorn Hoffmeister, Shiv Naga Prasad Vitaladevuni, and Ming Sun
Subjects: Support vector machine, Computer science, business.industry, Keyword spotting, Pattern recognition, Feature selection, Artificial intelligence, business, Classifier (UML)
Abstract: In this paper we present two approaches to improve computational efficiency of a keyword spotting system running on a resource constrained device. This embedded keyword spotting system detects a pre-specified keyword in real time at low cost of CPU and memory. Our system is a two stage cascade. The first stage extracts keyword hypotheses from input audio streams. After the first stage is triggered, hand-crafted features are extracted from the keyword hypothesis and fed to a support vector machine (SVM) classifier on the second stage. This paper focuses on improving the computational efficiency of the second stage SVM classifier. More specifically, select a subset of feature dimensions and merge the SVM classifier to a smaller size, while maintaining the keyword spotting performance. Experimental results indicate that we can remove more than 36% of the non-discriminative SVM features, and reduce the number of support vectors by more than 60% without significant performance degradation. This results in more than 15% relative reduction in CPU utilization.
Published: 2015
Full Text: View/download PDF

6. Robust EEG emotion classification using segment level decision fusion

Author: Rohit Prasad, Viktor Rozgic, and Shiv Naga Prasad Vitaladevuni
Subjects: medicine.diagnostic_test, business.industry, Computer science, Speech recognition, Emotion classification, Feature vector, Feature extraction, Contrast (statistics), Pattern recognition, Electroencephalography, k-nearest neighbors algorithm, Arousal, ComputingMethodologies_PATTERNRECOGNITION, Binary classification, medicine, Artificial intelligence, business
Abstract: In this paper we address single-trial binary classification of emotion dimensions (arousal, valence, dominance and liking) using electroencephalogram (EEG) signals that represent responses to audio-visual stimuli. We propose an innovative three step solution to this problem: (1) in contrast to the typical feature extraction on the response-level, we represent the EEG signal as a sequence of overlapping segments and extract feature vectors on the segment level; (2) transform segment level features to the response level features using projections based on a novel non-parametric nearest neighbor model; and (3) perform classification on the obtained response-level features. We demonstrate the efficacy of our approach by performing binary classification of emotion dimensions on DEAP (Dataset for Emotion Analysis using electroencephalogram, Physiological and Video Signals) and report state-of-the-art classification accuracies for all emotional dimensions.
Published: 2013
Full Text: View/download PDF

7. Efficient Orthogonal Matching Pursuit using sparse random projections for scene and video classification

Author: Pradeep Natarajan, Shiv Naga Prasad Vitaladevuni, Prem Natarajan, and Rohit Prasad
Subjects: Contextual image classification, Hadamard transform, Robustness (computer science), business.industry, Random projection, Pattern recognition, Artificial intelligence, business, Matching pursuit, Projection (linear algebra), Locality-sensitive hashing, Mathematics, Sparse matrix
Abstract: Sparse projection has been shown to be highly effective in several domains, including image denoising and scene / object classification. However, practical application to large scale problems such as video analysis requires efficient versions of sparse projection algorithms such as Orthogonal Matching Pursuit (OMP). In particular, random projection based locality sensitive hashing (LSH) has been proposed for OMP. In this paper, we propose a novel technique called Comparison Hadamard random projection (CHRP) for further improving the efficiency of LSH within OMP. CHRP combines two techniques:(1) The Fast Johnson-Lindenstrauss Transform (FJLT) which uses a randomized Hadamard transform and sparse projection matrix for LSH, and (2) Achlioptas' random projection that uses only addition and comparison operations. Our approach provides the robustness of FJLT while completely avoiding multiplications. We empirically validate CHRP's efficacy by performing a suite of experiments for image denoising, scene classification, and video categorization. Our experiments indicate that CHRP significantly speeds-up OMP with negligible loss in classification accuracy.
Published: 2011
Full Text: View/download PDF

8. Graph Clustering-Based Ensemble Method for Handwritten Text Line Segmentation

Author: Prem Natarajan, Huaigu Cao, Shiv Naga Prasad Vitaladevuni, Vasant Manohar, and Rohit Prasad
Subjects: Connected component, Computer science, Segmentation-based object categorization, business.industry, Scale-space segmentation, Pattern recognition, Graph theory, Image segmentation, Graph, Handwriting recognition, Graph (abstract data type), Segmentation, Artificial intelligence, business, Cluster analysis, Clustering coefficient
Abstract: Handwritten text line segmentation on real-world data presents significant challenges that cannot be overcome by any single technique. Given the diversity of approaches and the recent advances in ensemble-based combination for pattern recognition problems, it is possible to improve the segmentation performance by combining the outputs from different line finding methods. In this paper, we propose a novel graph clustering-based approach to combine the output of an ensemble of text line segmentation algorithms. A weighted undirected graph is constructed with nodes corresponding to connected components and edge connecting pairs of connected components. Text line segmentation is then posed as the problem of minimum cost partitioning of the nodes in the graph such that each cluster corresponds to a unique line in the document image. Experimental results on a challenging Arabic field dataset using the ensemble method shows a relative gain of 18% in the F1 score over the best individual method within the ensemble.
Published: 2011
Full Text: View/download PDF

9. Contour-based joint clustering of multiple segmentations

Author: Ronen Basri, Shiv Naga Prasad Vitaladevuni, and Daniel Glasner
Subjects: Matching (graph theory), Linear programming, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Score, Boundary (topology), Pattern recognition, Image segmentation, Linear programming relaxation, Computer Science::Computer Vision and Pattern Recognition, Segmentation, Computer vision, Artificial intelligence, Cluster analysis, business, Mathematics
Abstract: We present an unsupervised, shape-based method for joint clustering of multiple image segmentations. Given two or more closely-related images, such as nearby frames in a video sequence or images of the same scene taken under different lighting conditions, our method generates a joint segmentation of the images. We introduce a novel contour-based representation that allows us to cast the shape-based joint clustering problem as a quadratic semi-assignment problem. Our score function is additive. We use complex-valued affinities to assess the quality of matching the edge elements at the exterior bounding contour of clusters, while ignoring the contributions of elements that fall in the interior of the clusters. We further combine this contour-based score with region information and use a linear programming relaxation to solve for the joint clusters. We evaluate our approach on the occlusion boundary data-set of Stein et al.
Published: 2011
Full Text: View/download PDF

10. Co-clustering of image segments using convex optimization applied to EM neuronal reconstruction

Author: Shiv Naga Prasad Vitaladevuni and Ronen Basri
Subjects: Biclustering, Mathematical optimization, Linear programming, Convex optimization, Image segmentation, Quadratic programming, Iterative reconstruction, Function (mathematics), Cluster analysis, Mathematics
Abstract: This paper addresses the problem of jointly clustering two segmentations of closely correlated images. We focus in particular on the application of reconstructing neuronal structures in over-segmented electron microscopy images. We formulate the problem of co-clustering as a quadratic semi-assignment problem and investigate convex relaxations using semidefinite and linear programming. We further introduce a linear programming method with manageable number of constraints and present an approach for learning the cost function. Our method increases computational efficiency by orders of magnitude while maintaining accuracy, automatically finds the optimal number of clusters, and empirically tends to produce binary assignment solutions. We illustrate our approach in simulations and in experiments with real EM data.
Published: 2010
Full Text: View/download PDF

11. Increasing depth resolution of electron microscopy of neural circuits using sparse tomographic reconstruction

Author: Ashok Veeraraghavan, Graham Knott, Shan Xu, Shiv Naga Prasad Vitaladevuni, Richard D. Fetter, Dmitri B. Chklovskii, Alexander Genkin, Harald F. Hess, Marco Cantoni, and Lou Scheffer
Subjects: Tomographic reconstruction, medicine.diagnostic_test, business.industry, Computer science, Resolution (electron density), Computed tomography, Basis function, Iterative reconstruction, Brain tissue, law.invention, Compressed sensing, law, medicine, Computer vision, Tomography, Artificial intelligence, Electron microscope, business, Image resolution, Throughput (business)
Abstract: Future progress in neuroscience hinges on reconstruction of neuronal circuits to the level of individual synapses. Because of the specifics of neuronal architecture, imaging must be done with very high resolution and throughput. While Electron Microscopy (EM) achieves the required resolution in the transverse directions, its depth resolution is a severe limitation. Computed tomography (CT) may be used in conjunction with electron microscopy to improve the depth resolution, but this severely limits the throughput since several tens or hundreds of EM images need to be acquired. Here, we exploit recent advances in signal processing to obtain high depth resolution EM images computationally. First, we show that the brain tissue can be represented as sparse linear combination of local basis functions that are thin membrane-like structures oriented in various directions. We then develop reconstruction techniques inspired by compressive sensing that can reconstruct the brain tissue from very few (typically 5) tomographic views of each section. This enables tracing of neuronal connections across layers and, hence, high throughput reconstruction of neural circuits to the level of individual synapses.
Published: 2010
Full Text: View/download PDF

12. Action recognition using ballistic dynamics

Author: Larry S. Davis, Vili Kellokumpu, and Shiv Naga Prasad Vitaladevuni
Subjects: business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Image segmentation, Task (project management), Image (mathematics), Acceleration, Dynamics (music), Gesture recognition, Feature (machine learning), Action recognition, Computer vision, Artificial intelligence, business
Abstract: We present a Bayesian framework for action recognition through ballistic dynamics. Psycho-kinesiological studies indicate that ballistic movements form the natural units for human movement planning. The framework leads to an efficient and robust algorithm for temporally segmenting videos into atomic movements. Individual movements are annotated with person-centric morphological labels called ballistic verbs. This is tested on a dataset of interactive movements, achieving high recognition rates. The approach is also applied on a gesture recognition task, improving a previously reported recognition rate from 84% to 92%. Consideration of ballistic dynamics enhances the performance of the popular Motion History Image feature. We also illustrate the approachpsilas general utility on real-world videos. Experiments indicate that the method is robust to view, style and appearance variations.
Published: 2008
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

12 results on '"Shiv Naga Prasad Vitaladevuni"'

1. Towards Data-Efficient Modeling for Wake Word Spotting

2. Low-Bit Quantization and Quantization-Aware Training for Small-Footprint Keyword Spotting

3. Monophone-Based Background Modeling for Two-Stage On-Device Wake Word Detection

4. An Empirical Study of Cross-Lingual Transfer Learning Techniques for Small-Footprint Keyword Spotting

5. Model Shrinking for Embedded Keyword Spotting

6. Robust EEG emotion classification using segment level decision fusion

7. Efficient Orthogonal Matching Pursuit using sparse random projections for scene and video classification

8. Graph Clustering-Based Ensemble Method for Handwritten Text Line Segmentation

9. Contour-based joint clustering of multiple segmentations

10. Co-clustering of image segments using convex optimization applied to EM neuronal reconstruction

11. Increasing depth resolution of electron microscopy of neural circuits using sparse tomographic reconstruction

12. Action recognition using ballistic dynamics

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

12 results on '"Shiv Naga Prasad Vitaladevuni"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources