45 results on '"Gaussian mixture models (GMMs)"'
Search Results
2. Unsupervised Machine Learning Methods to Estimate a Health Indicator for Condition Monitoring Using Acoustic and Vibration Signals: A Comparison Based on a Toy Data Set from a Coffee Vending Machine
- Author
-
Tefera, Yonas, Meire, Maarten, Luca, Stijn, Karsmakers, Peter, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Gama, Joao, editor, Pashami, Sepideh, editor, Bifet, Albert, editor, Sayed-Mouchawe, Moamar, editor, Fröning, Holger, editor, Pernkopf, Franz, editor, Schiele, Gregor, editor, and Blott, Michaela, editor
- Published
- 2020
- Full Text
- View/download PDF
3. Distributionally robust chance-constrained optimization with Gaussian mixture ambiguity set.
- Author
-
Kammammettu, Sanjula, Yang, Shu-Bo, and Li, Zukui
- Subjects
- *
AMBIGUITY , *ROBUST optimization , *GAUSSIAN mixture models , *DISTRIBUTION (Probability theory) , *ROBUST programming , *STATISTICS - Abstract
Conventional chance-constrained programming methods suffer from the inexactness of the estimated probability distribution of the underlying uncertainty from data. To this end, a distributionally robust approach to the problem allows for a level of ambiguity considered around a reference distribution. In this work, we propose a novel formulation for the distributionally robust chance-constrained programming problem using an ambiguity set constructed from a variant of optimal transport distance that was developed for Gaussian Mixture Models. We show that for multimodal process uncertainty, our proposed method provides an effective way to incorporate statistical moment information into the ambiguity set construction step, thus leading to improved optimal solutions. We illustrate the performance of our method on a numerical example as well as a chemical process case study. We show that our proposed methodology leverages the multimodal characteristics from the uncertainty data to give superior performance over the traditional Wasserstein distance-based method. • Ambiguity set constructed from optimal transport between Gaussian Mixture Models. • Hedging against the right family of candidate distributions to avoid unnecessary conservatism. • A tractable distributionally robust chance constrained optimization formulation. • Applicability to more generate type of uncertain constraints. • Better objective-constraint satisfaction trade-off performance than classical Wasserstein DRCCP model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Seaweed biomass as a sustainable resource for synthesis of ZnO nanoparticles using Sargassum wightii ethanol extract and their environmental and biomedical applications through Gaussian mixture model.
- Author
-
Bai, Yu, Cao, Yan, Sun, Yiding, Alfaiz, Faiz Abdulaziz, Garalleh, Hakim A.L., El-Shamy, E.F., Almujibah, Hamad, Ali, Elimam, and Assilzadeh, Hamid
- Subjects
- *
GAUSSIAN mixture models , *BACTERIAL cell membranes , *SARGASSUM , *NANOPARTICLES , *ETHANOL , *STREPTOCOCCUS mutans , *ZINC oxide - Abstract
Zinc oxide nanoparticles (ZnO) possess unique features that mak them a common matter among different industries. Nevertheless, traditional models of synthesizing ZnO-NPs are related with health and environmental and risks due to harmful chemicals. The biosynthesis of zinc oxide nanoparticles was achieved using the hot water extract of Sargassum wightii (SW) , which serves as a reducing agent. This extract is mixed with zinc precursors, initiating a bio-reduction process. UV–vis , FTIR , XRD , Raman , DLS , SEM , EDX , TEM imaging, and XPS analysis are used. The novelty of this research lies in utilizing a bio-reduction process involving hot water extract of SW to synthesize zinc oxide nanoparticles, providing a safer and eco-friendly alternative to traditional chemical methods. Here, the zinc oxide nanoparticles produced through the biosynthesis process effectively addressed oral infections (Streptococcus mutans) due to their ability to disrupt the integrity of bacterial cell membranes, interfere with cellular processes, and inhibit the growth and proliferation of bacteria responsible for oral infections. Gaussian Mixture Models (GMMs) uncover intricate patterns within medical data, enabling enhanced diagnostics, treatment personalization, and patient outcomes. This study aims to apply Gaussian Mixture Models (GMMs) to medical data for subpopulation identification and disease subtyping, contributing to personalized treatment strategies and improved patient care. With a dataset comprising 300 samples, the application of GMM showed lower BIC and AIC values (2500, 3200), a high Silhouette Score (0.65 from −1 to 1) reflecting well-defined clusters, Calinski-Harabasz (120) and Davies-Bouldin Indices (0.45). These metrics collectively underscored the model's success in revealing distinct patterns within the data. ZnO -nanocoated aligners were effective against Streptococcus mutans , with the maximum antibacterial effect observed for 2 days and lasting for 7 days. • Enhanced Green Synthesis: Sargassum wightii-Derived ZnO NPs. • Extensive ZnO NPs Characterization: In-depth Analysis of Properties. • Broad-Spectrum Antibacterial: ZnO NPs Active Against Various Strains. • ZnO NPs' Anticancer Efficacy: Demonstrated Action on MCF-7 Cells. • Sustainable and Versatile: Eco-Conscious Synthesis and Varied Uses. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. An Improved Speaker Identification System Using Automatic Split-Merge Incremental Learning (A-SMILE) of Gaussian Mixture Models
- Author
-
Bouziane, Ayoub, Kharroubi, Jamal, Zarghili, Arsalane, Kacprzyk, Janusz, Series editor, Pal, Nikhil R., Advisory editor, Bello Perez, Rafael, Advisory editor, Corchado, Emilio S., Advisory editor, Hagras, Hani, Advisory editor, Kóczy, László T., Advisory editor, Kreinovich, Vladik, Advisory editor, Lin, Chin-Teng, Advisory editor, Lu, Jie, Advisory editor, Melin, Patricia, Advisory editor, Nedjah, Nadia, Advisory editor, Nguyen, Ngoc Thanh, Advisory editor, Wang, Jun, Advisory editor, Silhavy, Radek, editor, Senkerik, Roman, editor, Kominkova Oplatkova, Zuzana, editor, Prokopova, Zdenka, editor, and Silhavy, Petr, editor
- Published
- 2017
- Full Text
- View/download PDF
6. Mongolian Speech Recognition Based on Deep Neural Networks
- Author
-
Zhang, Hui, Bao, Feilong, Gao, Guanglai, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Sun, Maosong, editor, Liu, Zhiyuan, editor, Zhang, Min, editor, and Liu, Yang, editor
- Published
- 2015
- Full Text
- View/download PDF
7. Multiview Active Learning Optimization Based on Genetic Algorithm and Gaussian Mixture Models for Hyperspectral Data.
- Author
-
Jamshidpour, Nasehe, Safari, Abdolreza, and Homayouni, Saeid
- Abstract
In this letter, we proposed a novel optimal view generation framework based on the genetic algorithm (GA) and Gaussian mixture models (GMMs) to improve multiview active learning (MV-AL). AL methods enlarge training data sets, by iteratively selecting the most informative samples, in order to improve the classification performance. By using multiple views to build multiple classifiers, the information content of each unlabeled samples can be more accurately estimated. The MV-AL methods are more inherently suitable for high-dimensional data such as hyperspectral images. This hybrid framework simultaneously constructs the optimal number of diverse and sufficient views. The proposed algorithm has two main steps. In the first step, by applying a cluster distortion function-based GMMs, the actual number of available independent views is determined. In the next step, a hybrid GA approach selects the optimal combination of views using two different criteria. The experiments were conducted on two benchmark hyperspectral data sets, namely, Kennedy Space Center (KSC) and Indian Pines AVIRIS. The results demonstrated an increase in diversity and sufficiency of the views compared to the traditional view generation methods. Furthermore, the performance of MV-AL has also been significantly improved. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
8. Exploration of Properly Combined Audiovisual Representation with the Entropy Measure in Audiovisual Speech Recognition.
- Author
-
Vakhshiteh, Fatemeh and Almasganj, Farshad
- Subjects
- *
SPEECH perception , *AUTOMATIC speech recognition , *GAUSSIAN mixture models , *HIDDEN Markov models , *ENTROPY (Information theory) , *ERROR rates - Abstract
Deep belief networks (DBNs) have shown impressive improvements over the Gaussian mixture models whilst are employed inside the Hidden Markov Model (HMM)-based automatic speech recognition systems. In this study, the benefits of the DBNs to be used in audiovisual speech recognition systems are investigated. First, the DBN-HMMs are explored in speech recognition and lip-reading tasks, separately. Next, the challenge of appropriately integrating the audio and visual information is studied; for this purpose, the application of the fused feature in an audiovisual (AV) DBN-HMM based speech recognition task is studied. With regard to the integration of information, those layers that provide generalities and details with together, so that in overall a completion is made, are selected. A modified technique is proposed based on the entropy of different layers of the used DBNs, to measure the amount of information. The best audio layer representation is found to have the highest entropy, with the highest power of providing information details in the fusion scheme. In contrast, the best visual layer representation is found to have the lowest entropy, which could best provide sufficient generalities. Over the CUAVE database, on English digit recognition task, the conducted experiments show that the AV DBN-HMM, with proposed feature fusion method, can reduce phone error rate by as much as 4% and 1.5%, and word error rate by about 3.49% and 1.89%, over the baseline conventional HMM and audio DBN-HMM, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
9. A Genre-Independent Chord Transcription System from Audio Using GMM-Based HMMs
- Author
-
Wu, Hao, Su, Dan, Wang, Yifang, Wu, Xihong, SAE-China, FISITA, Farag, Aly A., editor, Yang, Jian, editor, and Jiao, Feng, editor
- Published
- 2014
- Full Text
- View/download PDF
10. Introducing Non-linear Analysis into Sustained Speech Characterization to Improve Sleep Apnea Detection
- Author
-
Blanco, Jose Luis, Hernández, Luis A., Fernández, Rubén, Ramos, Daniel, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Travieso-González, Carlos M., editor, and Alonso-Hernández, Jesús B., editor
- Published
- 2011
- Full Text
- View/download PDF
11. Guaranteed Bounds on the Kullback–Leibler Divergence of Univariate Mixtures.
- Author
-
Nielsen, Frank and Sun, Ke
- Subjects
GAUSSIAN mixture models ,SIGNAL processing - Abstract
The Kullback–Leibler (KL) divergence between two mixture models is a fundamental primitive in many signal processing tasks. Since the KL divergence of mixtures does not admit a closed-form formula, it is in practice either estimated using costly Monte-Carlo stochastic integration or approximated. We present a fast and generic method that builds algorithmically closed-form lower and upper bounds on the entropy, the cross-entropy and the KL divergence of univariate mixtures. We illustrate the versatile method by reporting on our experiments for approximating the KL divergence between Gaussian mixture models. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
12. Likelihood-based feature relevance for figure-ground segmentation in images and videos.
- Author
-
Allili, Mohand Saïd and Ziou, Djemel
- Subjects
- *
IMAGE segmentation , *LIKELIHOOD ratio tests , *VIDEO processing , *FEATURE extraction , *DISTRIBUTION (Probability theory) - Abstract
We propose an efficient method for image/video figure-ground segmentation using feature relevance (FR) and active contours. Given a set of positive and negative examples of a specific foreground (an object of interest (OOI) in an image or a tracked objet in a video), we first learn the foreground distribution model and its characteristic features that best discriminate it from its contextual background. For this goal, an objective function based on feature likelihood ratio is proposed for supervised FR computation. FR is then incorporated in foreground segmentation of new images and videos using level sets and energy minimization. We show the effectiveness of our approach on several examples of image/video figure-ground segmentation. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
13. IITKGP-MLILSC speech database for language identification.
- Author
-
Maity, Sudhamay, Kumar Vuppala, Anil, Rao, K. Sreenivasa, and Nandi, Dipanjan
- Abstract
In this paper, we are introducing speech database consists of 27 Indian languages for analyzing language specific information present in speech. In the context of Indian languages, systematic analysis of various speech features and classification models in view of automatic language identification has not performed, because of the lack of proper speech corpus covering majority of the Indian languages. With this motivation, we have initiated the task of developing multilingual speech corpus in Indian languages. In this paper spectral features are explored for investigating the presence of language specific information. Melfrequency cepstral coefficients (MFCCs) and linear predictive cepstral coefficients (LPCCs) are used for representing the spectral information. Gaussian mixture models (GMMs) are developed to capture the language specific information present in spectral features. The performance of language identification system is analyzed in view of speaker dependent and independent cases. The recognition performance is observed to be 96% and 45% respectively, for speaker dependent and independent environments. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
14. Extracting Primary Objects by Video Co-Segmentation.
- Author
-
Lou, Zhongyu and Gevers, Theo
- Abstract
Video object segmentation is a challenging problem. Without human annotation or other prior information, it is hard to select a meaningful primary object from a single video, so extracting the primary object across videos is a more promising approach. However, existing algorithms consider the problem as foreground/background segmentation. Therefore, we propose an algorithm that learns the model of the primary object by representing the frames/videos as a graphical model. The probabilistic graphical model is built across a set of videos based on an object proposal algorithm. Our approach considers appearance , spatial, and temporal consistency of the primary objects. A new dataset is created to evaluate the proposed method and to compare it to the state-of-the-art on video object co-segmentation. The experiments show that our method obtains state-of-the-art results, outperforming other algorithms by 1.5% (pixel accuracy) on the MOViCS dataset and 9.6% (pixel accuracy ) on the new dataset. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
15. Online Discriminative Kernel Density Estimator With Gaussian Kernels.
- Author
-
Kristan, Matej and Leonardis, Ales
- Abstract
We propose a new method for a supervised online estimation of probabilistic discriminative models for classification tasks. The method estimates the class distributions from a stream of data in the form of Gaussian mixture models (GMMs). The reconstructive updates of the distributions are based on the recently proposed online kernel density estimator (oKDE). We maintain the number of components in the model low by compressing the GMMs from time to time. We propose a new cost function that measures loss of interclass discrimination during compression, thus guiding the compression toward simpler models that still retain discriminative properties. The resulting classifier thus independently updates the GMM of each class, but these GMMs interact during their compression through the proposed cost function. We call the proposed method the online discriminative kernel density estimator (odKDE). We compare the odKDE to oKDE, batch state-of-the-art kernel density estimators (KDEs), and batch/incremental support vector machines (SVM) on the publicly available datasets. The odKDE achieves comparable classification performance to that of best batch KDEs and SVM, while allowing online adaptation from large datasets, and produces models of lower complexity than the oKDE. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
16. A Framework for Estimating Driver Decisions Near Intersections.
- Author
-
Gadepally, Vijay, Krishnamurthy, Ashok, and Ozguner, Umit
- Abstract
We present a framework for the estimation of driver behavior at intersections, with applications to autonomous driving and vehicle safety. The framework is based on modeling the driver behavior and vehicle dynamics as a hybrid-state system (HSS), with driver decisions being modeled as a discrete-state system and the vehicle dynamics modeled as a continuous-state system. The proposed estimation method uses observable parameters to track the instantaneous continuous state and estimates the most likely behavior of a driver given these observations. This paper describes a framework that encompasses the hybrid structure of vehicle–driver coupling and uses hidden Markov models (HMMs) to estimate driver behavior from filtered continuous observations. Such a method is suitable for scenarios that involve unknown decisions of other vehicles, such as lane changes or intersection access. Such a framework requires extensive data collection, and the authors describe the procedure used in collecting and analyzing vehicle driving data. For illustration, the proposed hybrid architecture and driver behavior estimation techniques are trained and tested near intersections with exemplary results provided. Comparison is made between the proposed framework, simple classifiers, and naturalistic driver estimation. Obtained results show promise for using the HSS–HMM framework. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
17. Improving Automatic Detection of Obstructive Sleep Apnea Through Nonlinear Analysis of Sustained Speech.
- Author
-
Blanco, José Luis, Hernández, Luis A., Fernández, Rubén, and Ramos, Daniel
- Abstract
We present a novel approach for the detection of severe obstructive sleep apnea (OSA) based on patients’ voices introducing nonlinear measures to describe sustained speech dynamics. Nonlinear features were combined with state-of-the-art speech recognition systems using statistical modeling techniques (Gaussian mixture models, GMMs) over cepstral parameterization (MFCC) for both continuous and sustained speech. Tests were performed on a database including speech records from both severe OSA and control speakers. A 10 % relative reduction in classification error was obtained for sustained speech when combining MFCC-GMM and nonlinear features, and 33 % when fusing nonlinear features with both sustained and continuous MFCC-GMM. Accuracy reached 88.5 % allowing the system to be used in OSA early detection. Tests showed that nonlinear features and MFCCs are lightly correlated on sustained speech, but uncorrelated on continuous speech. Results also suggest the existence of nonlinear effects in OSA patients’ voices, which should be found in continuous speech. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
18. Rotor Bar Fault Monitoring Method Based on Analysis of Air-Gap Torques of Induction Motors.
- Author
-
da Silva, Aderiano M., Povinelli, and Demerdash
- Abstract
A robust method to monitor the operating conditions of induction motors is presented. This method utilizes the data analysis of the air-gap torque profile in conjunction with a Bayesian classifier to determine the operating condition of an induction motor as either healthy or faulty. This method is trained offline with datasets generated either from an induction motor modeled by a time-stepping finite-element (TSFE) method or experimental data. This method can effectively monitor the operating conditions of induction motors that are different in frame/class, ratings, or design from the motor used in the training stage. Such differences can include the level of load torque and operating frequency. This is due to a novel air-gap torque normalization method introduced here, which leads to a motor fault classification process independent of these parameters and with no need for prior information about the motor being monitored. The experimental results given in this paper validate the robustness and efficacy of this method. Additionally, this method relies exclusively on data analysis of motor terminal operating voltages and currents, without relying on complex motor modeling or internal performance parameters not readily available. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
19. Gaussian-Mixture-Model-Based Spatial Neighborhood Relationships for Pixel Labeling Problem.
- Author
-
Nguyen, Thanh Minh and Wu, Q. M. Jonathan
- Subjects
- *
GAUSSIAN processes , *PIXELS , *MARKOV random fields , *PARAMETER estimation , *IMAGE segmentation - Abstract
In this paper, we present a new algorithm for pixel labeling and image segmentation based on the standard Gaussian mixture model (GMM). Unlike the standard GMM where pixels themselves are considered independent of each other and the spatial relationship between neighboring pixels is not taken into account, the proposed method incorporates this spatial relationship into the standard GMM. Moreover, the proposed model requires fewer parameters compared with the models based on Markov random fields. In order to estimate model parameters from observations, instead of utilizing an expectation–maximization algorithm, we employ gradient method to minimize a higher bound on the data negative log-likelihood. The performance of the proposed model is compared with methods based on both standard GMM and Markov random fields, demonstrating the robustness, accuracy, and effectiveness of our method. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
20. Gaussian Mixture Model-based Quantization of Line Spectral Frequencies for Adaptive Multirate Speech Codec.
- Author
-
Tadić, Tihomir and Petrinović, Davor
- Subjects
GAUSSIAN processes ,GEOMETRIC quantization ,MATHEMATICAL models ,COMPUTER storage devices ,CODING theory ,ALGORITHMS - Abstract
In this paper, we investigate the use of a Gaussian Mixture Model (GMM)-based quantizer for quantization of the Line Spectral Frequencies (LSFs) in the Adaptive Multi-Rate (AMR) speech codec. We estimate the parametric GMM model of the probability density function (pdf) for the prediction error (residual) of mean-removed LSF parameters that are used in the AMR codec for speech spectral envelope representation. The studied GMM-based quantizer is based on transform coding using Karhunen-Loève transform (KLT) and transform domain scalar quantizers (SQ) individually designed for each Gaussian mixture. We have investigated the applicability of such a quantization scheme in the existing AMR codec by solely replacing the AMR LSF quantization algorithm segment. The main novelty in this paper lies in applying and adapting the entropy constrained (EC) coding for fixed-rate scalar quantization of transformed residuals thereby allowing for better adaptation to the local statistics of the source. We study and evaluate the compression efficiency, computational complexity and memory requirements of the proposed algorithm. Experimental results show that the GMM-based EC quantizer provides better rate/distortion performance than the quantization schemes used in the referent AMR codec by saving up to 7.32 bits/frame at much lower rate-independent computational complexity and memory requirements. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
21. Automatic Detection of Pathological Voices Using Complexity Measures, Noise Parameters, and Mel-Cepstral Coefficients.
- Author
-
Arias-Londoño, Julián D., Godino-Llorente, Juan I., Sáenz-Lechón, Nicolás, Osma-Ruiz, Víctor, and Castellanos-Domínguez, Germán
- Subjects
- *
DATA mining , *STATISTICAL correlation , *SUPPORT vector machines , *GAUSSIAN processes , *MATHEMATICAL models , *NOISE control , *CEPSTRUM analysis (Mechanics) , *POWER spectra - Abstract
This paper proposes a new approach to improve the amount of information extracted from the speech aiming to increase the accuracy of a system developed for the automatic detection of pathological voices. The paper addresses the discrimination capabilities of 11 features extracted using nonlinear analysis of time series. Two of these features are based on conventional nonlinear statistics (largest Lyapunov exponent and correlation dimension), two are based on recurrence and fractal-scaling analysis, and the remaining are based on different estimations of the entropy. Moreover, this paper uses a strategy based on combining classifiers for fusing the nonlinear analysis with the information provided by classic parameterization approaches found in the literature (noise parameters and mel-frequency cepstral coefficients). The classification was carried out in two steps using, first, a generative and, later, a discriminative approach. Combining both classifiers, the best accuracy obtained is 98.23% ± 0.001. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
22. Voice Conversion Based on Weighted Frequency Warping.
- Author
-
Erro, Daniel, Moreno, Asunción, and Bonafonte, Antonio
- Subjects
HUMAN voice ,VOICE analysis ,TRANSMUTATION (Linguistics) ,SPEECH ,SIGNAL processing - Abstract
Any modification applied to speech signals has an impact on their perceptual quality. In particular, voice conversion to modify a source voice so that it is perceived as a specific target voice involves prosodic and spectral transformations that produce significant quality degradation. Choosing among the current voice conversion methods represents a trade-off between the similarity of the converted voice to the target voice and the quality of the resulting converted speech, both rated by listeners. This paper presents a new voice conversion method termed Weighted Frequency Warping that has a good balance between similarity and quality. This method uses a time-varying piecewise-linear frequency warping function and an energy correction filter, and it combines typical probabilistic techniques and frequency warping transformations. Compared to standard probabilistic systems, Weighted Frequency Warping results in a significant increase in quality scores, whereas the conversion scores remain almost unaltered. This paper carefully discusses the theoretical aspects of the method and the details of its implementation, and the results of an international evaluation of the new system are also included. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
23. On the Use of Anti-Word Models for Audio Music Annotation and Retrieval.
- Author
-
Zhi-Sheng Chen and Jyh-Shing Roger Jang
- Subjects
GAUSSIAN processes ,DATABASES ,SEMANTICS ,SPEECH processing systems ,COMPUTATIONAL linguistics - Abstract
Query-by-semantic-description (QBSD) is a natural way for searching/annotating music in a large database. To improve QBSD, we propose the use of anti-words for each annotation word based on the concept of supervised multiclass labeling (SML). More specifically, words that are highly associated with the opposite semantic meaning of a word constitute its anti-word set. By modeling both a word and its anti-word set, our annotation system can achieve 31.1% of equal mean per-word precision and recall, while the original SML model achieves 27.8%. Moreover, by constructing the models of the anti-word explicitly, the performance is also significantly improved for the retrieval system, especially when the query keyword is the antonym of an existing annotation word. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
24. Bayesian Data Fusion of Multiview Synthetic Aperture Sonar Imagery for Seabed Classification.
- Author
-
Williams, David P.
- Subjects
- *
BAYESIAN analysis , *PROBABILITY theory , *SYNTHETIC apertures , *IMAGING systems , *SONAR , *GAUSSIAN processes , *DISTRIBUTION (Probability theory) , *OCEAN bottom - Abstract
A Bayesian data fusion approach for seabed classification using multiview synthetic aperture sonar (SAS) imagery is proposed. The principled approach exploits all available information and results in probabilistic predictions. Each data point, corresponding to a unique 10 m × 10 m area of seabed, is represented by a vector of wavelet-based features. For each seabed type. the distribution of these features is then modeled by a unique Gaussian mixture model. When multiple views of the same data point (i.e., area of seabed) are available, the views are combined via a joint likelihood calculation. The end result of this Bayesian formulation is the posterior probability that a given data point belongs to each seabed type. It is also shown how these posterior probabilities can be exploited in a form of entropy-based active-learning to determine the most useful additional data to acquire. Experimental results of the proposed multiview classification framework are shown on a large data set of real, multiview SAN imagery spanning more than 2 km2 of seabed. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
25. Gaussian Mixture Kalman Predictive Coding of Line Spectral Frequencies.
- Author
-
Subasingha, Shaminda, Murthi, Manohar N., and Andersen, Søren Vang
- Subjects
STOCHASTIC processes ,GEOMETRIC quantization ,ESTIMATION theory ,STOCHASTIC convergence ,SPEECH perception ,SPEECH - Abstract
Gaussian mixture model (GMM)-based predictive coding of line spectral frequencies (LSFs) has gained wide acceptance. In such coders, each mixture of a GMM can be interpreted as defining a linear predictive transform coder. In this paper, we use Kalman filtering principles to model each of these linear predictive transform coders to present GMM Kalman predictive coding. In particular, we show how suitable modeling of quantization noise leads to an adaptive a posteriori GMM that defines a signal-adaptive predictive coder that provides improved coding of LSFs in comparison with the baseline recursive GMM predictive coder. Moreover, we show how running the GMM Kalman predictive coders to convergence can be used to design a stationary GMM Kaiman predictive coding system which again provides improved coding of LSFs but now with only a modest increase in run-time complexity over the baseline. In packet loss conditions, this stationary GMM Kalman predictive coder provides much better performance than the recursive GMM predictive coder, and in fact has comparable mean performance to a memoryless GMM coder. Finally, we illustrate how one can utilize Kalman filtering principles to design a postfilter which enhances decoded vectors from a recursive GMM predictive coder without any modifications to the encoding process. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
26. Automatic Classification of Bird Species From Their Sounds Using Two-Dimensional Cepstral Coefficients.
- Author
-
Chang-Hsing Lee, Chin-Chuan Han, and Ching-Chien Chuang
- Subjects
BIRDS ,SPECIES ,SOUND recordings ,ALGORITHMS ,SOUNDS ,EXPERIMENTS - Abstract
This paper presents a method for automatic classification of birds into different species based on the audio recordings of their sounds. Each individualsyllable segmented from continuous recordings is regarded as the basic recognition unit. To represent the temporal variations as well as sharp transitions within a syllable, a feature set derived from static and dynamic two-dimensional Mel-frequency cepstral coefficients are calculated for the classification of each syllable. Since a bird might generate several types of sounds with variant characteristics, a number of representative prototype vectors are used to model different syllables of identical bird species. For each bird species, a model selection method is developed to determine the optimal mode between Gaussian mixture models (GMM) and vector quantization (VQ) when the amount of training data is different for each species. In addition, a component number selection algorithm is employed to find the most appropriate number of components of GMM or the cluster number of VQ for each species. The mean vectors of GMM or the cluster centroids of VQ will form the prototype vectors of a certain bird species. In the experiments, the best classification accuracy is 84.06% for the classification of 28 bird species. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
27. Gaussian Mixture Modeling by Exploiting the Mahalanobis Distance.
- Author
-
Ververidis, Dimitrios and Kotropoulos, Constantine
- Subjects
- *
EXPECTATION-maximization algorithms , *RANDOM noise theory , *STATISTICS , *ALGORITHMS , *ALGEBRA - Abstract
In this paper, the expectation-maximization (EM) algorithm for Gaussian mixture modeling is improved via three statistical tests. The first test is a multivariate normality criterion based on the Mahalanobis distance of a sample measurement vector from a certain Gaussian component center. The first test is used in order to derive a decision whether to split a component into another two or not. The second test is a central tendency criterion based on the observation that multivariate kurtosis becomes large if the component to be split is a mixture of two or more underlying Gaussian sources with common centers. If the common center hypothesis is true, the component is split into two new components and their centers are initialized by the center of the (old) component candidate for splitting. Otherwise, the splitting is accomplished by a discriminant derived by the third test. This test is based on marginal cumulative distribution functions. Experimental results are presented against seven other EM variants both on artificially generated data-sets and real ones. The experimental results demonstrate that the proposed EM variant has an increased capability to find the underlying model, while maintaining a low execution time. [ABSTRACT FROM PUBLISHER]
- Published
- 2008
- Full Text
- View/download PDF
28. Object Trajectory-Based Activity Classification and Recognition Using Hidden Markov Models.
- Author
-
Bashir, Faisal I., Khokhar, Ashfaq A., and Schonfeld, Dan
- Subjects
- *
IMAGING systems , *MARKOV processes , *INFORMATION processing , *IMAGE processing , *ALGORITHMS , *OPTICAL detectors - Abstract
Motion trajectories provide rich spatiotemporal information about an object's activity. This paper presents novel classification algorithms for recognizing object activity using object motion trajectory. In the proposed classification system, trajectories are segmented at points of change in curvature, and the subtrajectories are represented by their principal component analysis (PCA) coefficients. We first present a framework to robustly estimate the multivariate probability density function based on PCA coefficients of the subtrajectories using Gaussian mixture models (GMMs). We show that GMM-based modeling alone cannot capture the temporal relations and ordering between underlying entities. To address this issue, we use hidden Markov models (HMMs) with a data-driven design in terms of number of states and topology (e.g., left-right versus ergodic). Experiments using a database of over 5700 complex trajectories (obtained from UCI-KDD data archives and Columbia University Multimedia Group) subdivided into 85 different classes demonstrate the superiority of our proposed 11MM-based scheme using PCA coefficients of subtrajectories in comparison with other techniques in the literature. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
29. Cepstral Vector Normalization Based on Stereo Data for Robust Speech Recognition.
- Author
-
Buera, Luis, Lleida, Eduardo, Miguel, Antonio, Ortega, Alfonso, and Saz, Óscar
- Subjects
SPEECH perception ,MATHEMATICAL transformations ,NOISE ,PHONEME (Linguistics) ,ACOUSTIC models ,SPEECH - Abstract
In this paper, a set of feature vector normalization methods based on the minimum mean square error (MMSE) criterion and stereo data is presented. They include multi-environment model-based linear normalization (MEMLIN), polynomial MEMLIN (P-MEMLIN), multi-environment model-based histogram normalization (MEMHIN), and phoneme-dependent MEMLIN (PD-MEMLIN). Those methods model clean and noisy feature vector spaces using Gaussian mixture models (GMMs). The objective of the methods is to learn a transformation between clean and noisy feature vectors associated with each pair of clean and noisy model Gaussians. The direct approach to learn the transformation is by using stereo data; that is, noisy feature vectors and the corresponding clean feature vectors. In this paper, however, a nonstereo data based training procedure, is presented. The transformations can be modeled just like a bias vector (MEMLIN), or by using a first-order polynomial (P-MEMLIN) or a nonlinear function based on histogram equalization (MEMHIN). Further improvements are obtained by using phoneme-dependent bias vector transformation (PD-MEMLIN). In PD-MEMLIN, the clean and noisy feature vector spaces are split into several phonemes, and each of them is modeled as a GMM. Those methods achieve significant word error rate improvements over others that are based on similar targets. The experimental results using the SpeechDat Car database show an average improvement in word error rate greater than 68% in all cases compared to the baseline when using the original clean acoustic models, and up to 83% when training acoustic models on the new normalized feature space. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
30. Significance of the Modified Group Delay Feature in Speech Recognition.
- Author
-
Hegde, Rajesh M., Murthy, Hema A., and Rao Gadde, Venkata Ramana
- Subjects
SPEECH perception ,FOURIER transforms ,SPECTRUM analysis ,RESONANCE ,SPEECH processing systems - Abstract
Spectral representation of speech is complete when both the Fourier transform magnitude and phase spectra are specified. In conventional speech recognition systems, features are generally derived from the short-time magnitude spectrum. Although the importance of Fourier transform phase in speech perception has been realized, few attempts have been made to extract features from it. This is primarily because the resonances of the speech signal which manifest as transitions in the phase spectrum are completely masked by the wrapping of the phase spectrum. Hence, an alternative to processing the Fourier transform phase, for extracting speech features, is to process the group delay function which can be directly computed from the speech signal. The group delay function has been used in earlier efforts, to extract pitch and formant information from the speech signal. In all these efforts, no attempt was made to extract features from the speech signal and use them for speech recognition applications. This is primarily because the group delay function fails to capture the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects. In this paper, the group delay function is modified to overcome these effects. Cepstral features are extracted from the modified group delay function and are called the modified group delay feature (MODGDF). The MODGDF is used for three speech recognition tasks namely, speaker, language, and continuous-speech recognition. Based on the results of feature and performance evaluation, the significance of the MODGDF as a new feature for speech recognition is discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
31. Support Vector Machines Using GMM Supervectors for Speaker Verification.
- Author
-
Campbell, W. M., Sturim, D. E., and Reynolds, D. A.
- Subjects
GAUSSIAN distribution ,SPEECH ,LECTURERS ,DISTRIBUTION (Probability theory) ,LANGUAGE & languages - Abstract
Gaussian mixture models (GMMs) have proven extremely successful for text-independent speaker recognition. The standard training method for GMM models is to use MAP adaptation of the means of the mixture components based on speech from a target speaker. Recent methods in compensation for speaker and channel variability have proposed the idea of stacking the means of the GMM model to form a GMM mean supervector. We examine the idea of using the GMM supervector in a support vector machine (SVM) classifier. We propose two new SVM kernels based on distance metrics between GMM models. We show that these SVM kernels produce excellent classification accuracy in a NIST speaker recognition evaluation task. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
32. User Authentication via Adapted Statistical Models of Face Images.
- Author
-
Cardinaux, Fabien, Sanderson, Conrad, and Bengio, Samy
- Subjects
- *
FACE perception , *MARKOV processes , *AUTHENTICATION (Law) , *DATABASES , *LIGHTING , *VISUAL perception - Abstract
It has been previously demonstrated that systems based on local features and relatively complex statistical models, namely, one-dimensional (1-D) hidden Markov models (HMMs) and pseudo-two-dimensional (2-D) HMMs, are suitable for face recognition. Recently, a simpler statistical model, namely, the Gaussian mixture model (GMM), was also shown to perform well. In much of the literature devoted to these models, the experiments were performed with controlled images (manual face localization, controlled lighting, background, pose, etc). However, a practical recognition system has to be robust to more challenging conditions. In this article we evaluate, on the relatively difficult BANCA database, the performance, robustness and complexity of GMM and HMM-based approaches, using both manual and automatic face localization. We extend the GMM approach through the use of local features with embedded positional information, increasing performance without sacrificing its low complexity. Furthermore, we show that the traditionally used maximum likelihood (ML) training approach has problems estimating robust model parameters when there is only a few training images available. Considerably more precise models can be obtained through the use of Maximum a posteriori probability (MAP) training. We also show that face recognition techniques which obtain good performance on manually located faces do not necessarily obtain good performance on automatically located faces, indicating that recognition techniques must be designed from the ground up to handle imperfect localization. Finally, we show that while the pseudo-2-D HMM approach has the best overall performance, authentication time on current hardware makes it impractical. The best tradeoff in terms of authentication time, robustness and discrimination performance is achieved by the extended GMM approach. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
33. Multiple Description Coding Based on Gaussian Mixture Models.
- Author
-
Samuelsson, Jonas and Plasberg, Jan H.
- Subjects
DATA compression ,SPECTRAL analysis (Phonetics) ,GEOMETRIC quantization ,MATHEMATICAL optimization ,ALGORITHMS - Abstract
An algorithm for multiple description coding (MDC) based on Gaussian mixture models (GMMs) is presented. Based on the parameters of the GMM, the algorithm combines MDC scalar quantizers, yielding a source-optimized vector MDC system. The performance is evaluated on a speech spectrum source in terms of mean-squared error and log spectral distortion. It is demonstrated experimentally that the proposed system outperforms single description coding and repetition coding over a wide range of channel failure probabilities. The proposed algorithm has a complexity that is linear in rate and dimension while retaining a near optimal vector quantizer point density. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
34. Comparison of approximation methods to Kullback–Leibler divergence between Gaussian mixture models for satellite image retrieval
- Author
-
Shiyong Cui and Foody, Giles
- Subjects
Photogrammetrie und Bildanalyse ,Mathematical optimization ,Kullback–Leibler divergence ,0211 other engineering and technologies ,02 engineering and technology ,Mixture model ,Inequalities in information theory ,Statistics::Computation ,Differential entropy ,Kullback-Leibler Divergence ,0202 electrical engineering, electronic engineering, information engineering ,Earth and Planetary Sciences (miscellaneous) ,020201 artificial intelligence & image processing ,Jensen–Shannon divergence ,Total correlation ,Gaussian Mixture Models (GMMs) ,Electrical and Electronic Engineering ,Image retrieval ,Divergence (statistics) ,Algorithm ,021101 geological & geomatics engineering ,Mathematics - Abstract
As a probabilistic distance between two probability density functions, Kullback–Leibler divergence is widely used in many applications, such as image retrieval and change detection. Unfortunately, for some models, e.g., Gaussian Mixture Models (GMMs), Kullback–Leibler divergence is not analytically tractable. One has to resort to approximation methods. A number of methods have been proposed to address this issue. In this article, we compare seven methods, namely Monte Carlo method, matched bound approximation, product of Gaussians, variational method, unscented transformation, Gaussian approximation and min-Gaussian approximation, for approximating the Kullback–Leibler divergence between two Gaussian mixture models for satellite image retrieval. Two experiments using two public data sets have been performed. The comparison is carried out in terms of retrieval accuracy and computational time.
- Published
- 2016
- Full Text
- View/download PDF
35. Human Action Recognition from Body-Part Directional Velocity using Hidden Markov Models
- Author
-
Anthony Fleury, Sébastien Ambellouis, Sid Ahmed Walid Talha, Centre for Digital Systems (CERI SN), Ecole nationale supérieure Mines-Télécom Lille Douai (IMT Lille Douai), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Institut Mines-Télécom [Paris] (IMT), Université de Lille, Laboratoire Électronique Ondes et Signaux pour les Transports (IFSTTAR/COSYS/LEOST), Institut Français des Sciences et Technologies des Transports, de l'Aménagement et des Réseaux (IFSTTAR)-PRES Université Lille Nord de France, Centre for Digital Systems (CERI SN - IMT Nord Europe), and Ecole nationale supérieure Mines-Télécom Lille Douai (IMT Nord Europe)
- Subjects
Sequence ,Computer science ,business.industry ,Carry (arithmetic) ,RGB-D Sensor ,Hidden Markov models (HMMs) ,Feature extraction ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,Skeleton (category theory) ,Human Action Recognition ,16. Peace & justice ,Mixture model ,0202 electrical engineering, electronic engineering, information engineering ,Action recognition ,020201 artificial intelligence & image processing ,Artificial intelligence ,Hidden Markov model ,business ,Gaussian mixture models (GMMs) ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing - Abstract
International audience; This paper introduces a novel approach for early recognition of human actions using 3D skeleton joints extracted from 3D depth data. We propose a novel, frame-by-frame and real-time descriptor called Body-part Directional Velocity (BDV) calculated by considering the algebraic velocity produced by different body-parts. A real-time Hidden Markov Models algorithm with Gaussian Mixture Models state-output distributions is used to carry out the classification. We show that our method outperforms various state-of-the-art skeleton-based human action recognition approaches on MSRAction3D and Florence3D datasets. We also proved the suitability of our approach for early human action recognition by deducing the decision from a partial analysis of the sequence.
- Published
- 2017
- Full Text
- View/download PDF
36. Graphical Password-Based User Authentication with Free-Form Doodles
- Author
-
Marcos Martinez-Diaz, Javier Galbally, Julian Fierrez, UAM. Departamento de Tecnología Electrónica y de las Comunicaciones, and Análisis y Tratamiento de Voz y Señales Biométricas (ING EPS-002)
- Subjects
Dynamic time warping ,Mobile security ,Computer Networks and Communications ,Computer science ,Speech recognition ,Feature extraction ,0211 other engineering and technologies ,Human Factors and Ergonomics ,02 engineering and technology ,law.invention ,Gesture recognition ,Graphical passwords ,Touchscreen ,Artificial Intelligence ,law ,0202 electrical engineering, electronic engineering, information engineering ,Pattern matching ,Gaussian mixture models (GMMs) ,Selection algorithm ,Password ,021110 strategic, defence & security studies ,Authentication ,Telecomunicaciones ,business.industry ,Pattern recognition ,Mixture model ,Dynamic time warping (DTW) ,Computer Science Applications ,Human-Computer Interaction ,Control and Systems Engineering ,Signal Processing ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. M. Martinez-Diaz, J. Fierrez and J. Galbally, "Graphical Password-Based User Authentication With Free-Form Doodles," in IEEE Transactions on Human-Machine Systems, vol. 46, no. 4, pp. 607-614, Aug. 2016. doi: 10.1109/THMS.2015.2504101, User authentication using simple gestures is now common in portable devices. In this work, authentication with free-form sketches is studied. Verification systems using dynamic time warping and Gaussian mixture models are proposed, based on dynamic signature verification approaches. The most discriminant features are studied using the sequential forward floating selection algorithm. The effects of the time lapse between capture sessions and the impact of the training set size are also studied. Development and validation experiments are performed using the DooDB database, which contains passwords from 100 users captured on a smartphone touchscreen. Equal error rates between 3% and 8% are obtained against random forgeries and between 21% and 22% against skilled forgeries. High variability between capture sessions increases the error rates., This work was supported by projects Contexts (S2009/TIC-1485) from CAM, Bio-Shield (TEC2012-34881) from Spanish MINECO, and BEAT (FP7-SEC-284989) from EU.
- Published
- 2016
37. The ICSI RT-09 Speaker Diarization System
- Author
-
Oriol Vinyals, Gerald Friedland, Adam Janin, Luke Gottlieb, Marijn Huijbregts, Mary Tai Knox, David Imseng, and Xavier Anguera Miro
- Subjects
Voice activity detection ,Acoustics and Ultrasonics ,Computer science ,Speech recognition ,020206 networking & telecommunications ,02 engineering and technology ,BATS (Topic and Speaker Tracking Broadcast Archives) ,Speech processing ,Data modeling ,Speaker diarisation ,machine learning ,0202 electrical engineering, electronic engineering, information engineering ,NIST ,speaker diarization ,020201 artificial intelligence & image processing ,Mel-frequency cepstrum ,Language & Speech Technology ,Electrical and Electronic Engineering ,Transcription (software) ,Gaussian mixture models (GMMs) ,Hidden Markov model - Abstract
The speaker diarization system developed at the International Computer Science Institute (ICSI) has played a prominent role in the speaker diarization community, and many researchers in the rich transcription community have adopted methods and techniques developed for the ICSI speaker diarization engine. Although there have been many related publications over the years, previous articles only presented changes and improvements rather than a description of the full system. Attempting to replicate the ICSI speaker diarization system as a complete entity would require an extensive literature review, and might ultimately fail due to component description version mismatches. This paper therefore presents the first full conceptual description of the ICSI speaker diarization system as presented to the National Institute of Standards Technology Rich Transcription 2009 (NIST RT-09) evaluation, which consists of online and offline subsystems, multi-stream and single-stream implementations, and audio and audio-visual approaches. Some of the components, such as the online system, have not been previously described. The paper also includes all necessary preprocessing steps, such as Wiener filtering, speech activity detection and beamforming.
- Published
- 2012
- Full Text
- View/download PDF
38. Lip Animation Synthesis: a Unified Framework for Speaking and Laughing Virtual Agent
- Author
-
Ding, Yu, Pelachaud, Catherine, Multimédia (MM), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, Département Traitement du Signal et des Images (TSI), Télécom ParisTech-Centre National de la Recherche Scientifique (CNRS), and HAL, TelecomParis
- Subjects
[INFO.INFO-MM] Computer Science [cs]/Multimedia [cs.MM] ,speech ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,lip animation ,hidden Markov models (HMMs) ,[INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM] ,[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] ,GeneralLiterature_MISCELLANEOUS ,interac- tive virtual agent ,speech to animation ,ComputingMethodologies_PATTERNRECOGNITION ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,laughter ,[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC] ,[INFO.INFO-HC] Computer Science [cs]/Human-Computer Interaction [cs.HC] ,Gaussian mixture models (GMMs) ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
International audience; This paper proposes a unified statistical framework to synthesize speaking and laughing lip animations for virtual agents in real time. Our lip animation synthesis model takes as input the decomposition of a spoken text into phonemes as well as their duration. Our model can be used with synthesized speech. First, Gaussian mixture models (GMMs), called lip shape GMMs, are used to model the relationship between phoneme duration and lip shape from human motion capture data; then an interpolation function is learnt from human motion capture data, which is based on hidden Markov models(HMMs), called HMMs interpolation. In the synthesis step, lipshapeGMMs are used to infer a first lip shape stream from the inputs; then this lip shape stream is smoothed by the learnt HMMs interpolation, to obtain the synthesized lip animation. The effectiveness of the proposed framework is confirmed in the objective evaluation.
- Published
- 2015
39. Real Time Context-Independent Phone Recognition Using a Simplified Statistical Training Algorithm
- Author
-
Lachhab, Othman, Di Martino, Joseph, Ibn Elhaj, El Hassan, Hammouch, Ahmed, Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), Ecole Normale Supérieure de l'Enseignement Technique [Rabat] (ENSET), Université Mohammed V, Institut National des Postes et Télécommunications [Rabat] (INPT), and Université Mohammed V de Rabat [Agdal] (UM5)
- Subjects
Continuous Speech Recognition ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,Viterbi ,Continuous Density Hidden Markov Models (CDHMMs) ,Gaussian Mixture Models (GMMs) ,Simplified Statistical Trainning Algorithm ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,Real Time Automatic Speech Recognition (ASR) System - Abstract
International audience; In this paper we present our own real time speaker-independent continuous phone recognition (Spirit) using Context-Independent Continuous Density HMMs (CI-CDHMMs) modeled by Gaussian Mixtures Models (GMMs). All the parameters of our system are estimated directly from data by using an improved Viterbi alignment process instead of the classical Baum-Welch estimation procedure. Generally, in the literature the Viterbi training algorithm is used as a pretreatment to initialize HMMs models that will be most often re-estimated by using complex re-estimation formula. In order to evaluate and compare the performance of our system with other previous works, we use the TIMIT database. The duration test of our recognition system for each sentence is between 2 seconds (for short sentences) to 12 seconds (for long sentences). We get, by combining the 64 possible phones into 39 phonetic classes, a phone recognition correct rate of 71.06% and an accuracy rate of 65.25%. These results compare favorably with previously published works.
- Published
- 2012
40. Analyzing training dependencies and posterior fusion in discriminant classification of apnea patients based on sustained and connected speech
- Author
-
Blanco, José Luis, Fernández Pozo, Rubén, Toledano, Doroteo T., Caminero, F. Javier, López Gonzalo, Eduardo, UAM. Departamento de Tecnología Electrónica y de las Comunicaciones, and Análisis y Tratamiento de Voz y Señales Biométricas (ING EPS-002)
- Subjects
background model (BM) ,Telecomunicaciones ,classifier fusion ,Robótica e Informática Industrial ,gaussian mixture models (GMMs) ,obstructive sleep apnea (OSA) - Abstract
Proceedings of Interspeech 2011, Florence (Italy), We present a novel approach using both sustained vowels and connected speech, to detect obstructive sleep apnea (OSA) cases within a homogeneous group of speakers. The proposed scheme is based on state-of-the-art GMM-based classifiers, and acknowledges specifically the way in which acoustic models are trained on standard databases, as well as the complexity of the resulting models and their adaptation to specific data. Our experimental database contains a suitable number of utterances and sustained speech from healthy (i.e control) and OSA Spanish speakers. Finally, a 25.1% relative reduction in classification error is achieved when fusing continuous and sustained speech classifiers., The activities described in this paper were funded by the Spanish Ministry of Science and Innovation as part of the TEC2009-14719-C02-02 (PriorSpeech) project.
- Published
- 2011
41. Kvantizacija frekvencija spektralnih linija u adaptivnom koderu govornog signala s više brzina prijenosa primjenom modela s Gaussovim mješavinama
- Author
-
Tadić, Tihomir and Petrinović, Davor
- Subjects
Karhunen-Loève transformacija (KLT) ,TECHNICAL SCIENCES. Computing. Data Processing ,Computer science and technology. Computing. Data processing ,speech coding ,adaptivni koder govornog signala s više brzina prijenosa (AMR) ,model s Gaussovim mješavinama ,Karhunen-Loeve transformacija (KLT) ,frekvencije spektralnih linija (LSF) ,kodiranje govornog signala ,transformacijsko kodiranje ,vektorska kvantizacija (VQ) ,skalarna kvantizacija s ograničenom entropijom (ECSQ) ,vector quantization (VQ) ,TEHNIČKE ZNANOSTI. Računarstvo. Obradba informacija ,udc:004(043.2) ,entropy constrained scalar quantizer (ECSQ) ,line spectral frequency (LSF) ,Adaptive Multi-Rate (AMR) ,Gaussian mixture models (GMMs) ,Računalna znanost i tehnologija. Računalstvo. Obrada podataka ,transform coding ,Karhunen-Loève transform (KLT) - Abstract
U ovom radu istražena je mogućnost primjene modela sa Gaussovim mješavinama (engl. Gaussian Mixture Model, GMM) u svrhu kvantizacije frekvencija spektralnih linija (engl. Line Spectral Frequencies, LSFs) u adaptivnom koderu govornog signala s više brzina prijenosa (engl. Adaptive Multi-Rate codec, AMR). Primjenom GMM-a estimirana je parametarska reprezentacija funkcije gustoće vjerojatnosti (engl. probability distribution function, pdf) pogreške predikcije LSF parametara, koji reprezentiraju spektralnu ovojnicu govornog signala. Predložen je kvantizator spektralne ovojnice, koji za svaku komponentu mješavine koristi transformacijski koder temeljen na Karhunen-Loève transformaciji (KLT) i skalarnoj kvantizaciji (engl. Scalar Quantization, SQ) transformiranih reziduala. Istražena je mogućnost korištenja predloženog postupka kvantizacije u postojećem AMR koderu govornog signala isključivo uz izmjenu dijela algoritma koji se odnosi na kvantizaciju LSF vektora. Kao glavni doprinos, u svrhu bolje prilagodbe lokalnoj statistici izvora, ovaj rad razmatra prilagodbu entropijski ograničenog (engl. Entropy Constrained, EC) kodiranja dekoreliranih LSF reziduala za primjenu u AMR koderu fiksne brzine prijenosa. Napravljeno je objektivno vrednovanje i usporedba učinkovitosti sažimanja, računske složenosti i memorijskih zahtjeva polaznog i modificiranog sustava. Rezultati simulacija pokazuju da predloženi kvantizator ostvaruje bolje sažimanje od algoritama korištenih u referentnom AMR koderu, reducirajući prosječnu brzinu prijenosa do 7.33 bita/okviru, pri značajno manjoj računskoj složenosti i memorijskim zahtjevima, koji ne ovise o brzini prijenosa. In this thesis, the use of a Gaussian Mixture Model (GMM) based quantizer for quantization of Line Spectral Frequencies (LSFs) in the Adaptive Multi-Rate (AMR) speech codec is investigated. A parametric GMM model is estimated, modeling the probability density function (pdf) of the prediction error (residual) of mean-removed LSF parameters that are used in the AMR codec for speech spectral envelope representation. The studied GMM vector quantizer is based on transform coding using Karhunen-Loève transform (KLT) and transform domain scalar quantizers (SQ), individually designed for each Gaussian mixture. The applicability of such a quantization scheme in the existing AMR codec has been investigated by solely replacing the AMR LSF quantization algorithm segment. The main novelty in this thesis lies in applying and adapting the entropy constrained (EC) coding for fixed-rate scalar quantization of transformed residuals thereby allowing for better adaptation to the local statistics of the source. The compression efficiency, computational complexity and memory requirements of the proposed algorithm are studied and evaluated. Experimental results show that the GMM-based EC quantizer provides better rate/distortion performance than the quantization schemes used in the referent AMR codec by saving up to 7.33 bits/frame at much lower rate-independent computational complexity and memory requirements.
- Published
- 2010
42. Driver Recognition Using Gaussian Mixture Models and Decision Fusion Techniques
- Author
-
M. Taner Eskil, Kristin S. Benli, Remzi Duzagac, Işık Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Işık University, Faculty of Engineering, Department of Computer Engineering, Benli, Kristin Surpuhi, Düzağaç, Remzi, and Eskil, Mustafa Taner
- Subjects
Biometrics ,Computer science ,Speech recognition ,Posterior probability ,Classifier fusions ,Posterior probabilities ,Vehicle ,Blind signal separation ,Error Rate (ER) ,Gaussian Mixture Model ,Communication channels (information theory) ,Vehicle speeds ,Gaussian mixture models (GMMs) ,Image segmentation ,Classifiers ,Learning systems ,Accelerator pedals ,business.industry ,Cognitive neuroscience of visual object recognition ,Pattern recognition ,Object recognition ,Steering wheel ,Mixture model ,Automobile drivers ,Recognition ,Fusion methods ,ComputingMethodologies_PATTERNRECOGNITION ,Decision Fusion ,Mixtures ,Driving behaviors ,Blind source separation ,Trellis codes ,Artificial intelligence ,business ,Classifier (UML) - Abstract
In this paper we present our research in driver recognition. The goal of this study is to investigate the performance of different classifier fusion techniques in a driver recognition scenario. We are using solely driving behavior signals such as break and accelerator pedal pressure, engine RPM, vehicle speed; steering wheel angle for identifying the driver identities. We modeled each driver using Gaussian Mixture Models, obtained posterior probabilities of identities and combined these scores using different fixed mid trainable (adaptive) fusion methods. We observed error rates is low as 0.35% in recognition of 100 drivers using trainable combiners. We conclude that the fusion of multi-modal classifier results is very successful in biometric recognition of a person in a car setting. Publisher's Version Q4 WOS:000264556900088
- Published
- 2008
- Full Text
- View/download PDF
43. Significance of the modified group delay feature in speech recognition
- Author
-
Venkata Ramana Rao Gadde, Rajesh M. Hegde, and Hema A. Murthy
- Subjects
Speech perception ,Acoustics and Ultrasonics ,Speech recognition ,Spurious signal noise ,Communication channels (information theory) ,Wavelet transforms ,Continuous speech recognition ,Computational grammars ,Cepstrum ,Hidden Markov models ,Electrical and Electronic Engineering ,Gaussian mixture models (GMMs) ,Robustness ,Mathematics ,Group delay and phase delay ,Voice activity detection ,business.industry ,Hidden Markov models (HMMs) ,Pattern recognition ,Object recognition ,Speech processing ,Linear predictive coding ,Speaker recognition ,Phase spectrum ,Fourier transforms ,Formant ,Phase transitions ,Class separability ,Feature selection ,Feature extraction ,Speech analysis ,Trellis codes ,Artificial intelligence ,business ,Group delay function - Abstract
Spectral representation of speech is complete when both the Fourier transform magnitude and phase spectra are specified. In conventional speech recognition systems, features are generally derived from the short-time magnitude spectrum. Although the importance of Fourier transform phase in speech perception has been realized, few attempts have been made to extract features from it. This is primarily because the resonances of the speech signal which manifest as transitions in the phase spectrum are completely masked by the wrapping of the phase spectrum. Hence, an alternative to processing the Fourier transform phase, for extracting speech features, is to process the group delay function which can be directly computed from the speech signal. The group delay function has been used in earlier efforts, to extract pitch and formant information from the speech signal. In all these efforts, no attempt was made to extract features from the speech signal and use them for speech recognition applications. This is primarily because the group delay function fails to capture the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects. In this paper, the group delay function is modified to overcome these effects. Cepstral features are extracted from the modified group delay function and are called the modified group delay feature (MODGDF). The MODGDF is used for three speech recognition tasks namely, speaker, language, and continuous-speech recognition. Based on the results of feature and performance evaluation, the significance of the MODGDF as a new feature for speech recognition is discussed. � 2006 IEEE.
- Published
- 2007
- Full Text
- View/download PDF
44. Towards Reliable Stochastic Data-Driven Models Applied to the Energy Saving in Buildings
- Author
-
Ridi, Antonio, Zarkadis, Nikos, Bovet, Gérôme, Morel, Nicolas, Hennebert, Jean, Scartezzini, Jean-Louis, Institut des Technologies de l'Information et de la Communication (iTIC), Ecole d'ingénieurs et d'architectes Fribourg, Université de Fribourg, Albert-Ludwigs-Universität Freiburg, Ecole Polytechnique Fédérale de Lausanne (EPFL), Laboratoire Traitement et Communication de l'Information (LTCI), and Télécom ParisTech-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Hidden Markov Models (HMMs) ,[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS] ,State-based modeling ,Gaussian Mixture Models (GMMs) ,[INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation - Abstract
We aim at the elaboration of Information Systems able to optimize energy consumption in buildings while preserving human comfort. Our focus is in the use of state-based stochastic modeling applied to temporal signals acquired from heterogeneous sources such as distributed sensors, weather web services, calendar information and user triggered events. Our general scientic objectives are: (1) global instead of local optimization of building automation sub-systems (heating, ventilation, cooling, solar shadings, electric lightings), (2) generalization to unseen building conguration or usage through self-learning data-driven algorithms and (3) inclusion of stochastic state-based modeling to better cope with seasonal and building activity patterns. We leverage on state-based models such as Hidden Markov Models (HMMs) to be able to capture the spatial (states) and temporal (sequence of states) characteristics of the signals. We envision several application layers as per the intrinsic nature of the signals to be modeled. We also envision room-level systems able to leverage on a set of distributed sensors (temperature, presence, electricity consumption, etc.). A typical example of room-level system is to infer room occupancy information or activities done in the rooms as a function of time. Finally, building-level systems can be composed to infer global usage and to propose optimization strategies for the building as a whole. In our approach, each layer may be fed by the output of the previous layers. More specically in this paper, we report on the design, conception and validation of several machine learning applications. We present three different applications of state-based modeling. In the rst case we report on the identication of consumer appliances through an analysis of their electric loads. In the second case we perform the activity recognition task, representing human activities through state-based models. The third case concerns the season prediction using building data, building characteristic parameters and meteorological data.
45. Tuning-Robust Initialization Methods for Speaker Diarization
- Author
-
David Imseng and Gerald Friedland
- Subjects
Acoustics and Ultrasonics ,Computer science ,Speech recognition ,Word error rate ,Initialization ,long-term acoustic features ,01 natural sciences ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Bayesian information criterion ,Electrical and Electronic Engineering ,Gaussian mixture models (GMMs) ,business.industry ,010401 analytical chemistry ,Pattern recognition ,Mixture model ,Speaker recognition ,Speech processing ,0104 chemical sciences ,Speaker diarisation ,machine learning ,NIST ,speaker diarization ,Artificial intelligence ,0305 other medical science ,business - Abstract
This paper investigates a typical speaker diarization system regarding its robustness against initialization parameter variation and presents a method to reduce manual tuning of these values significantly. The behavior of an agglomerative hierarchical clustering system is studied to determine which initialization parameters impact accuracy most. We show that the accuracy of typical systems is indeed very sensitive to the values chosen for the initialization parameters and factors such as the duration of speech in the recording. We then present a solution that reduces the sensitivity of the initialization values and therefore reduces the need for manual tuning significantly while at the same time increasing the accuracy of the system. For short meetings extracted from the previous (2006, 2007, and 2009) National Institute of Standards and Technology (NIST) Rich Transcription (RT) evaluation data, the decrease of the diarization error rate is up to 50% relative. The approach consists of a novel initialization parameter estimation method for speaker diarization that uses agglomerative clustering with Bayesian information criterion (BIC) and Gaussian mixture models (GMMs) of frame-based cepstral features (MFCCs). The estimation method balances the relationship between the optimal value of the seconds of speech data per Gaussian and the duration of the speech data and is combined with a novel nonuniform initialization method. This approach results in a system that performs better than the current ICSI baseline engine on datasets of the NIST RT evaluations of the years 2006, 2007, and 2009.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.