254 results on '"Shigeru Katagiri"'
Search Results
152. Automatic loss smoothness determination for minimum classification error training
- Author
-
Jun'ichi Tokuno, Shigeru Katagiri, Miho Ohsaki, Tsukasa Ohashi, and Hideyuki Watanabe
- Subjects
Smoothness (probability theory) ,Computer science ,business.industry ,Maximum likelihood ,Probability of error ,Pattern recognition (psychology) ,Training (meteorology) ,Pattern recognition ,Artificial intelligence ,Function (mathematics) ,business - Abstract
The loss function smoothness embedded in the Minimum Classification Error (MCE) formalization increases the virtual training samples that lead to the optimal, minimum classification error status over unseen testing samples as well as given training samples. However, no rational method for finding the smoothness that corresponds to the optimal status has been developed yet. To alleviate this problem, we propose in this paper a new MCE training method that incorporates loss smoothness control based on the Parzen estimation of the classification error probability. Experiments clearly demonstrate the high utility of our proposed method.
- Published
- 2011
- Full Text
- View/download PDF
153. Automatic loss smoothness determination for Large Geometric Margin Minimum Classification Error training
- Author
-
Hideyuki Watanabe, Tsukasa Ohashi, Miho Ohsaki, Jun'ichi Tokuno, and Shigeru Katagiri
- Subjects
business.industry ,Robustness (computer science) ,Sample space ,Measurement uncertainty ,Pattern recognition ,Artificial intelligence ,business ,Mathematics - Abstract
A Parzen-estimation-based smoothness determination method for smooth classification error count loss was successfully applied to the early Minimum Classification Error (MCE) training that used a functional-margin-based misclassification measure. In this study, we apply this loss smoothness determination method to the recent MCE framework that uses a geometric-margin-based misclassification measure, and experimentally demonstrate its high utility. Furthermore, we theoretically clarify how the loss smoothness set in the one-dimensional geometric-margin-based misclassification measure space produces virtual samples, which are expected to increase the training robustness to unseen samples, in a sample space that usually has high-dimension.
- Published
- 2011
- Full Text
- View/download PDF
154. Minimum classification error training with automatic setting of loss smoothness
- Author
-
Jun'ichi Tokuno, Shigeru Katagiri, Tsukasa Ohashi, Miho Ohsaki, and Hideyuki Watanabe
- Subjects
Robustness (computer science) ,Estimation theory ,business.industry ,Maximum likelihood ,Probability of error ,Rational method ,Virtual training ,Pattern recognition ,Artificial intelligence ,Maximum likelihood sequence estimation ,business ,Mathematics - Abstract
The loss function smoothness embedded in the Minimum Classification Error formalization increases the number of virtual training samples, enables high robustness to unseen samples, and well approximates the ultimate, minimum classification error probability status. However, a rational method for controlling smoothness has not yet been developed. To alleviate this long-standing problem, we propose a new method that automatically sets the loss function smoothness through Parzen kernel (window) width estimation with a cross-validation maximum likelihood method. Experiments clearly show our proposed method's high utility.
- Published
- 2011
- Full Text
- View/download PDF
155. Re-evaluation of LVQ-HMM hybrid algorithm
- Author
-
Erik McDermott, Hitoshi Iwamida, and Shigeru Katagiri
- Subjects
Learning vector quantization ,Acoustics and Ultrasonics ,Computer science ,business.industry ,Speech recognition ,Pattern recognition ,Artificial intelligence ,Hidden Markov model ,business ,Hybrid algorithm - Published
- 1993
- Full Text
- View/download PDF
156. A minimum-distortion segmentation/LVQ hybrid algorithm for speech recognition
- Author
-
Shigeru Katagiri and Andrew Duchon
- Subjects
Learning vector quantization ,Acoustics and Ultrasonics ,Artificial neural network ,Computer science ,business.industry ,Speech recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Distance classifier ,Pattern recognition ,Viterbi algorithm ,symbols.namesake ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,symbols ,Segmentation ,Artificial intelligence ,Hidden Markov model ,business ,Classifier (UML) - Abstract
The quick and simple training of learning vector quantization (LVQ) can produce classi ficationpower at least as high as that by a powerful, but complex classifier based on artificial neural networks. However, LVQ is a discriminative training algorithm for a distance classifier handling static (fixed-dimensional) patterns. Thus, an innovative process is required to apply this algorithm to dynamic (variable-durational) speech patterns. To meet this requirement, an HHM/LVQ hybrid algorithm was proposed which integrated HMM (Viterbi) segmentation with LVQ classification. However, this algorithm, using all the possible HMM models for segmentation, produces an enormous number of training tokens, making it difficult to apply to large-scale continuous speech recognition tasks. In this light, we present a new minimum-distortion segmentation (MDS)/discriminative classification hybrid algorithm. The MDS algorithm produces one segmentation and this is used in place of the many HMM segmentations. To make a proper comparison between the two methods we used as our discriminative classifier the same LVQ formulation. For clarity, we refer to this proposed algorithm as an MDS/LVQ hybrid algorithm. Results on the E-set task show that MDS/LVQ, with its significantly reduced training, can achieve discriminative power at least as high as HMM/LVQ.
- Published
- 1993
- Full Text
- View/download PDF
157. Minimum Error Classification with geometric margin control
- Author
-
Erik McDermott, Miho Ohsaki, Kouta Yamada, Shigeru Katagiri, Shinji Watanabe, Atsushi Nakamura, and Hideyuki Watanabe
- Subjects
Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminant ,Discriminant function analysis ,Computer science ,Robustness (computer science) ,business.industry ,Pattern recognition ,Artificial intelligence ,Hidden Markov model ,business - Abstract
Minimum Classification Error (MCE) training, which can be used to achieve minimum error classification of various types of patterns, has attracted a great deal of attention. However, to increase classification robustness, a conventional MCE framework has no practical optimization procedures like geometric margin maximization in Support Vector Machine (SVM). To realize high robustness in a wide range of classification tasks, we derive the geometric margin for a general class of discriminant functions and develop a new MCE training method that increases the geometric margin value. We also experimentally demonstrate the effectiveness of our new method using prototype-based classifiers.
- Published
- 2010
- Full Text
- View/download PDF
158. Discriminative training
- Author
-
Biing-Hwang Juang and Shigeru Katagiri
- Subjects
Acoustics and Ultrasonics - Published
- 1992
- Full Text
- View/download PDF
159. Discriminative learning for minimum error classification (pattern recognition)
- Author
-
Shigeru Katagiri and Biing-Hwang Juang
- Subjects
business.industry ,Computer science ,Pattern recognition ,Bayes classifier ,Speaker recognition ,Machine learning ,computer.software_genre ,ComputingMethodologies_PATTERNRECOGNITION ,Signal Processing ,Feature (machine learning) ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Classifier (UML) ,computer ,Discriminative learning - Abstract
A formulation is proposed for minimum-error classification, in which the misclassification probability is to be minimized based on a given set of training samples. A fundamental technique for designing a classifier that approaches the objective of minimum classification error in a more direct manner than traditional methods is given. The method is contrasted with several traditional classifier designs in typical experiments to demonstrate the superiority of the new learning formulation. The method can applied to other classifier structures as well. Experimental results pertaining to a speech recognition task are provided to show the effectiveness of the technique. >
- Published
- 1992
- Full Text
- View/download PDF
160. GPD training of dynamic programming-based speech recognizers
- Author
-
Shigeru Katagiri and Takashi Komori
- Subjects
Learning vector quantization ,Dynamic time warping ,Acoustics and Ultrasonics ,Artificial neural network ,business.industry ,Computer science ,Speech recognition ,Probabilistic logic ,Machine learning ,computer.software_genre ,Dynamic programming ,Discriminative model ,Artificial intelligence ,business ,Focus (optics) ,computer ,Descent (mathematics) - Abstract
Although many pattern classifiers based on artificial neural networks have been vigorous-ly studied, they are still inadequate from a viewpoint of classifyingdynamic (variable-and unspecified-duration) speech patterns. To cope with this problem, the generalized probabilistic descent method (GPD) has recently been proposed. GPD not only allows one to train a discriminative system to classify dynamic patterns, but also possesses a remarkable advantage, namely guaranteeing the learning optimality (in the sense of a probabilistic descent search). A practical implementation of this theory, however, remains to be evaluated. In this light, we particularly focus on evaluating GPD in designing a widely-used speech recognizer based on dynamic time warping distance-measurement. We also show that the design algorithm appraised in this paper can be considered a new version of learning vector quantization, which is incorporated with the dynamic programming. Experimental evaluation results in tasks of classifying syllables and phonemes clearly demonstrate GPD's superiority.
- Published
- 1992
- Full Text
- View/download PDF
161. Analysis of Subsequence Time-Series Clustering Based on Moving Average
- Author
-
Masakazu Nakase, Miho Ohsaki, and Shigeru Katagiri
- Subjects
Longest common subsequence problem ,Similarity (network science) ,business.industry ,Moving average ,Sliding window protocol ,Subsequence ,k-means clustering ,Pattern recognition ,Artificial intelligence ,Longest increasing subsequence ,business ,Cluster analysis ,Mathematics - Abstract
Subsequence time-series clustering (STSC), which consists of subsequence cutout with a sliding window and k-means clustering, had been commonly used in time-series data mining. However, a problem was pointed out that STSC always generates moderate sinusoidal patterns independently of the input. To address this problem, we theoretically explain and empirically confirm the similarity between STSC and moving average. The present analysis is consistent with, and simpler than, one of the most important analyses of STSC. We also question the pattern extraction in the time domain and discuss another solution.
- Published
- 2009
- Full Text
- View/download PDF
162. A unified view for discriminative objective functions based on negative exponential of difference measure between strings
- Author
-
Shigeru Katagiri, Shinji Watanabe, Erik McDermott, and Atsushi Nakamura
- Subjects
Discriminative model ,business.industry ,Joint probability distribution ,Component (UML) ,Pattern recognition (psychology) ,Pattern recognition ,Function (mathematics) ,Artificial intelligence ,Mutual information ,business ,Measure (mathematics) ,Mathematics ,Weighting - Abstract
This paper presents a novel unified view of a wide variety of objective functions suitable for discriminative training applied to sequential pattern recognition problems, such as automatic speech recognition. Focusing on a central component of conventional objective functions, the sum of modified joint probabilities of observations and strings, the analysis generalizes these objective functions by weighting each term in the sum by an important function, the negative exponential of difference measure between strings. The interesting and valuable results of this investigation are highlighted in a comprehensive relationship chart that covers all of the common approaches (Maximum Mutual Information, Minimum Classification Error, Minimum Phone/Word Error), as well as corresponding novel generalizations and modifications of those approaches.
- Published
- 2009
- Full Text
- View/download PDF
163. Re-evaluation of the stereo reproduction method using six speakers for media space development
- Author
-
A. Tsutsui, Shigeru Katagiri, K. Noguchi, M. Miyoshi, A. Kunikami, Miho Ohsaki, and M. Sugawara
- Subjects
Computer science ,Speech recognition ,computer.software_genre ,Media space ,law.invention ,Sound recording and reproduction ,Stereophonic sound ,law ,Precedence effect ,Active listening ,Loudspeaker ,Sound pressure ,Audio signal processing ,computer - Abstract
The stereo sound reproduction method using two speakers, positioned to the right and left, is the current standard of audio systems. A user can experience a virtual sound image from the speakers when he or she is midway between them, thus enjoying a rich acoustic scene. However, this sound image can significantly shift and is often fixed to one of the speaker locations due to several audiological phenomena such as the precedence effect. The conventional stereo approach is thus clearly not adequate for application to a video conference site or modern computer-supported media space where an object such as a speaking person is expected to be reproduced acoustically and visually in the correct direction. To overcome this problem, we elaborate a sound reproduction method using six speakers even for two-channel stereo reproduction. Through systematic experiments, we show that the method has a certain effect of widening the usable listening area but that, on the other hand, it is not sufficient even for the use in the traditional video conference site. Extensive analyses of the relation between the time delay and the listening test results suggest that an effective sound pressure control should be integrated with the time delay control, both in the sound collection stage and in the sound reproduction stage.
- Published
- 2008
- Full Text
- View/download PDF
164. ATR Japanese speech database as a tool of speech recognition and synthesis
- Author
-
Kazuya Takeda, Shigeru Katagiri, Hisao Kuwabara, Kiyohiro Shikano, Yoshinori Sagisaka, and Akira Kurematsu
- Subjects
Audio mining ,Linguistics and Language ,Computer science ,Speech recognition ,Chinese speech synthesis ,Speech synthesis ,computer.software_genre ,Language and Linguistics ,Speech analytics ,Database ,business.industry ,Communication ,Speech technology ,InformationSystems_DATABASEMANAGEMENT ,Acoustic model ,Speech corpus ,VoxForge ,Computer Science Applications ,ComputingMethodologies_PATTERNRECOGNITION ,Modeling and Simulation ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Software ,Natural language processing - Abstract
A large-scale Japanese speech database has been described. The database basically consists of (1) a word speech database, (2) a continuous speech database, (3) a database for a large number of speakers, and (4) a database for speech synthesis. Multiple transcriptions have been made in five different layers from simple phonemic descriptions to fine acoustic-phonetic transcriptions. The database has been used to develop algorithms in speech recognition and synthesis studies and to find acoustic, phonetic and linguistic evidence that will serve as basic data for speech technologies.
- Published
- 1990
- Full Text
- View/download PDF
165. A hybrid speech recognition system using HMMs with an LVQ-trained codebook
- Author
-
Shigeru Katagiri, Hitoshi Iwamida, Erik McDermott, and Yoh'ichi Tohkura
- Subjects
Learning vector quantization ,Acoustics and Ultrasonics ,Computer science ,business.industry ,Speech recognition ,Codebook ,k-means clustering ,Pattern recognition ,k-nearest neighbors algorithm ,Discriminant ,Artificial intelligence ,business ,Hidden Markov model ,Word (computer architecture) ,Sentence - Abstract
A new speech recognition system using the neurally-inspired Learning Vector Quantization (LVQ) to train HMM codebooks is described. Both LVQ and HMMs are stochasticalgorithms holding considerable promise for speech recognition. In particular, LVQ is a vector quantizer with very powerful classification ability. HMMs, on the other hand, have the advantage that phone models can easily be concatenated to produce long utterance models, such as word or sentence models. The new algorithm described here combines the advantages inherent in each of these two algorithms. Instead of using a conventional, K-means generated codebook in the HMMs, the new system uses LVQ to adapt the codebook reference vectors so as to minimize the number of errors these referencevectors make when used for nearest neighbor classification of training vectors. The LVQ codebook can then provide the HMMs with high classification power at the phonemiclevel. As the results of phoneme recognition experiments using a large vocabularydatabase of 5, 240 common Japanese words uttered in isolation by a male speaker, it was confirmed that the high discriminant ability of LVQ could be integrated into an HMM architecture easily extendible to longer utterance models, such as word or sentencemodels.
- Published
- 1990
- Full Text
- View/download PDF
166. Minimum Classification Error for Large Scale Speech Recognition Tasks using Weighted Finite State Transducers
- Author
-
Erik McDermott and Shigeru Katagiri
- Subjects
Discriminative model ,Computer science ,business.industry ,Speech recognition ,String (computer science) ,Decision tree ,Pattern recognition ,Scale (descriptive set theory) ,Artificial intelligence ,business ,Word (computer architecture) ,Task (project management) - Abstract
This article describes recent results obtained for two challenging large-vocabulary speech recognition tasks using the minimum classification error (MCE) approach to discriminative training. Weighted finite state transducers (WFSTs) are used throughout to represent correct and competing string candidates. The primary task examined is a 22 K word, real-world, telephone-based name recognition task. Lattice-derived WFSTs were used successfully to speed up the MCE training procedure. The results of this difficult task follow the classic picture of discriminative training: small acoustic models trained with MCE outperform much larger baseline models trained with maximum likelihood. MCE training substantially improves the performance of the larger models as well. We also present preliminary results on the 30 K word corpus of spontaneous Japanese (CSJ) lecture speech transcription task, with a training set of 190 hours of audio.
- Published
- 2006
- Full Text
- View/download PDF
167. Discriminative Subspace Method for Minimum Error Pattern Recognition
- Author
-
Shigeru Katagiri and H. Watanabe
- Subjects
Discriminative model ,business.industry ,Robustness (computer science) ,Speech recognition ,Vector quantization ,Pattern analysis ,Pattern recognition ,Artificial intelligence ,business ,Subspace topology ,Mathematics - Published
- 2005
- Full Text
- View/download PDF
168. Smoothed language model incorporation for efficient time-synchronous beam search decoding in LVCSR
- Author
-
Erik McDermott, Shigeru Katagiri, and Daniel Willett
- Subjects
Vocabulary ,Computer science ,Speech recognition ,media_common.quotation_subject ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Viterbi algorithm ,Markov model ,symbols.namesake ,Cache language model ,symbols ,Beam search ,Language model ,Hidden Markov model ,Smoothing ,media_common - Abstract
For performing the decoding search in large vocabulary continuous speech recognition (LVCSR) with hidden Markov models (HMM) and statistical language models, the most straightforward and popular approach is the time-synchronous beam search procedure. A drawback of this approach is that the time-asynchrony of the language model weight application during search leads to performance degradations. This is particularly so when performing the search with a tight pruning beam. This study presents a method for smoothing the language model within the recognition network. The optimization goal is the smearing of transition probabilities from HMM state to HMM state in favor of a more time-synchronous language model weight application. In addition, state-based language model look-ahead is proposed and evaluated. Both language model smoothing techniques lead to a remarkable improvement in accuracy-to-run-time ratio, while their combined application yields only limited improvements.
- Published
- 2005
- Full Text
- View/download PDF
169. Model selection for mixture of gaussian based spectral modelling
- Author
-
Yasuhiro Minami, Hiroko Kato, Atsushi Nakamura, Shigeru Katagiri, and Parham Zolfaghari
- Subjects
Kullback–Leibler divergence ,business.industry ,Gaussian ,Model selection ,Short-time Fourier transform ,Pattern recognition ,Mixture model ,Gaussian filter ,symbols.namesake ,symbols ,Artificial intelligence ,business ,Gaussian process ,Algorithm ,Mathematics ,Parametric statistics - Abstract
In this paper, we describe a parametric mixture model for modelling the resonant characteristics of the vocal tract. We propose a mixtures of Gaussians (MoG) spectral modelling scheme which enables model selection with a goal of easing the correspondence between the resonant characteristics of the vocal tract and the parametric Gaussians and representing a spectrum with an appropriate number of parameters. Noting that, a relatively small class of Gaussian densities can approximate a large class of distributions, we systematically reduce the number of Gaussians and re-approximate the densities in the MoG spectral model. The Kullback-Leibler (KL) distance between the densities in the mixture was found to allow optimal ML-MoG solutions to the spectra. A fitness measure based on KL information provides a figure for estimating the model order in representing formant-like features. The mixture model was fitted to a normalised smooth spectrum obtained by filtering the short-time Fourier transform in time and frequency by a pitch adaptive Gaussian filter. This results in the removal of all source information from the spectra. By subjectively evaluating the quality of the analysed and synthesised speech using this parametrisation scheme, we show considerable improvement over ML using this Gaussian reduction scheme specifically when using lower number of Gaussians in the mixture
- Published
- 2005
- Full Text
- View/download PDF
170. A new method of Cepstrum analysis by using comb lifter
- Author
-
Shigeru Katagiri, Ken'iti Kido, and G. Ooyama
- Subjects
Signal processing ,Formant ,Speech recognition ,Cepstrum ,High pitch ,Pitch Frequency ,Pole–zero plot ,Pitch detection algorithm ,Transfer function ,Mathematics - Abstract
This paper describes the use of comb lifter in the Cepstrum analysis which is usefull for extraction of formant frequencies. Good result is obtained with Han lifter in case of low pitch frequency. But, in case of high pitch frequency, the separation of two formants is sometimes impossible. The comb lifter is presented to overcome this difficulty, that is, the length of the lifter is fixed so as to separate the closest formants, and the peaks in the cepstrum due to the periodicity of speech are suppressed by using comb lifter. Successful results have been obtained in the experiments using two types of comb lifter. A new type of adaptive lifter is also presented which does not need the pitch detection and is useful in case of connected speech.
- Published
- 2005
- Full Text
- View/download PDF
171. A NEW KERNEL-BASED FORMALIZATION OF MINIMUM ERROR PATTERN RECOGNITION
- Author
-
Shigeru Katagiri and Erik McDermott
- Subjects
Computer science ,business.industry ,Kernel (statistics) ,Pattern recognition (psychology) ,Pattern recognition ,Artificial intelligence ,business - Published
- 2005
- Full Text
- View/download PDF
172. A theoretical analysis of speech recognition based on feature trajectory models
- Author
-
Erik McDermott, Atsushi Nakamura, Shigeru Katagiri, and Yasuhiro Minami
- Subjects
business.industry ,Computer science ,Speech recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Pattern recognition ,Kalman filter ,Speaker recognition ,Computer Science::Sound ,Dynamics (music) ,Feature (machine learning) ,Trajectory ,Artificial intelligence ,business ,Hidden Markov model - Abstract
In previous work, we proposed a new speech recognition technique that generates a smooth speech trajectory from hidden Markov models (HMMs) by maximizing likelihood subject to the constraints that exist between static and dynamic speech features. This paper presents a theoretical analysis of this method. We show that the approach used to generate the smoothed trajectory is equivalent to a Kalman filter. This result demonstrates that there is a strong relationship between the dynamics of delta features (and delta-delta features) in HMM-based speech recognition and Kalman filter dynamics.
- Published
- 2004
- Full Text
- View/download PDF
173. Bayesian modelling of the speech spectrum using mixture of Gaussians
- Author
-
Shigeru Katagiri, Atsushi Nakamura, Parham Zolfaghari, and Shinji Watanabe
- Subjects
business.industry ,Iterative method ,Gaussian ,Speech coding ,Pattern recognition ,Speech processing ,Mixture model ,symbols.namesake ,Spectral envelope ,Histogram ,symbols ,Artificial intelligence ,business ,Gaussian process ,Mathematics - Abstract
This paper presents a method for modelling the speech spectral envelope using a mixture of Gaussians (MoG). A novel variational Bayesian (VB) framework for Gaussian mixture modelling of a histogram enables the derivation of an objective function that can be used to simultaneously optimise both model parameter distributions and model structure. A histogram representation of the STRAIGHT spectral envelope, which is free of glottal excitation information, is used for parametrisation using this MoG model. This results in a parameterisation scheme that purely models the vocal tract resonant characteristics. Maximum likelihood (ML) and variational Bayesian (VB) solutions of the mixture model on histogram data are found using an iterative algorithm. A comparison between ML-MoG and VB-MoG spectral modelling is carried out using spectral distortion measures and mean opinion scores (MOS). The main advantages of VB-MoG highlighted in this paper include better modelling using fewer Gaussians in the mixture resulting in better correspondence of Gaussians and formant-like peaks, and an objective measure of the number of Gaussians required to best fit the spectral envelope.
- Published
- 2004
- Full Text
- View/download PDF
174. A new formalization of minimum classification error using a Parzen estimate of classification chance
- Author
-
Shigeru Katagiri and Erik McDermott
- Subjects
business.industry ,Estimation theory ,Generalization ,Kernel density estimation ,Pattern recognition ,Machine learning ,computer.software_genre ,Discriminative model ,Kernel (statistics) ,Pattern recognition (psychology) ,Artificial intelligence ,business ,computer ,Random variable ,Smoothing ,Mathematics - Abstract
Ina previous work, we showed that the minimum classification error (MCE) criterion function commonly used for discriminative design of pattern recognition systems is equivalent to a Parzen window based estimate of the theoretical classification risk. In this analysis, each training token is mapped to the center of a Parzen kernel in the domain of a suitably defined random variable; the kernels are then summed and integrated over the domain of incorrect classifications, yielding the risk estimate. Here, we deepen this approach by applying Parzen estimation at an earlier stage of the overall definition of classification risk. Specifically, the new analysis uses all incorrect categories, not just the single best incorrect category, in deriving a "correctness" function that is a simple multiple integral of a Parzen kernel over the region of correct classifications. The width of the Parzen kernel determines how many competing categories to use in optimizing the resulting overall risk estimate. This analysis uses the classic Parzen estimation method to support the notion that using multiple competing categories in discriminative training is a type of smoothing that enhances generalization to unseen data.
- Published
- 2003
- Full Text
- View/download PDF
175. Recognition method with parametric trajectory generated from mixture distribution HMMs
- Author
-
Shigeru Katagiri, Atsushi Nakamura, Yasuhiro Minami, and Erik McDermott
- Subjects
Iterative Viterbi decoding ,Computer science ,business.industry ,Maximum likelihood ,Gaussian ,Speech recognition ,Word error rate ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Pattern recognition ,Viterbi algorithm ,symbols.namesake ,ComputingMethodologies_PATTERNRECOGNITION ,Viterbi decoder ,Computer Science::Sound ,Cepstrum ,symbols ,Mixture distribution ,Artificial intelligence ,Hidden Markov model ,business ,Soft output Viterbi algorithm ,Parametric statistics - Abstract
We have proposed a new speech recognition technique that generates a speech trajectory from HMMs by maximizing the likelihood of the trajectory, while accounting for the relation between the cepstrum and the dynamic cepstrum coefficients. This method has the major advantage that the relation, which is ignored in conventional speech recognition, is directly used in the speech recognition phase. This paper describes an extension of the method for dealing with HMMs whose distributions are mixture Gaussian distributions. The method chooses the sequence of Gaussian distributions by selecting the best Gaussian distribution in the state during Viterbi decoding. Speaker-independent speech recognition experiments were carried out. The proposed method obtained an 18.2% reduction in error rate for the task, proving that the proposed method is effective even for Gaussian mixture HMMs.
- Published
- 2003
- Full Text
- View/download PDF
176. Pervasive unsupervised adaptation for lecture speech transcription
- Author
-
Erik McDermott, Shigeru Katagiri, Yasuhiro Minami, D. Willettt, and Thomas Niesler
- Subjects
Computer science ,business.industry ,Speech recognition ,Word error rate ,Acoustic model ,Speech corpus ,Pronunciation ,computer.software_genre ,Lexicon ,Factored language model ,Unsupervised learning ,Language model ,Artificial intelligence ,business ,Adaptation (computer science) ,computer ,Natural language processing - Abstract
Unsupervised adaptation has evolved as a popular approach for tuning the acoustic models of speaker-independent speech recognition systems to specific speakers, speaker groups or channel conditions while making use of only untranscribed data. This study focuses on procedures for unsupervised adaptation of other probabilistic models that are involved in state-of-the-art speech recognizers and on the joint adaptation of multiple knowledge sources. In particular, we outline and evaluate approaches for adapting both the language model and the pronunciation model (lexicon) without supervision. Initial experiments on off-line lecture speech transcription achieved small but promising word error rate improvements with each approach applied separately. The experimental results on the joint application of acoustic, language and pronunciation model adaptation indicate that the individually achievable performance improvements are additive.
- Published
- 2003
- Full Text
- View/download PDF
177. Speech Pattern Recognition using Neural Networks
- Author
-
Shigeru Katagiri
- Subjects
Artificial neural network ,Time delay neural network ,business.industry ,Computer science ,Deep learning ,Pattern recognition (psychology) ,Acoustic model ,Pattern recognition ,Neocognitron ,Artificial intelligence ,business - Published
- 2003
- Full Text
- View/download PDF
178. Separation of an overlapped signal using speech production models
- Author
-
Shigeru Katagiri, Hideyuki Watanabe, and Satoru Fujita
- Subjects
Speech production ,Signal processing ,Computer science ,Speech recognition ,Vowel ,Information processing ,Construct (python library) ,Loudspeaker ,Speech processing ,Signal - Abstract
We propose a novel approach to separating an overlapped signal using speech production models. The speech production models are used as top-down constraints for solving a one-channel signal separation, which is a one-to-many mapping problem. The goal of our approach is to not only realize high-accuracy separation systems but to also construct a computational model of human auditory mechanisms based on perception-production interaction. The utility of the proposed method is demonstrated in a task of separating a mixture of two-speaker vowel utterances using primitive linear speech production models.
- Published
- 2003
- Full Text
- View/download PDF
179. Shift-invariant, multi-category phoneme recognition using Kohonen's LVQ2
- Author
-
Erik McDermott and Shigeru Katagiri
- Subjects
Self-organizing map ,Neural gas ,Artificial neural network ,Computer science ,business.industry ,Time delay neural network ,Speech recognition ,Neocognitron ,Pattern recognition ,Speaker recognition ,Bayes' theorem ,ComputingMethodologies_PATTERNRECOGNITION ,Recurrent neural network ,Feature (machine learning) ,Artificial intelligence ,business - Abstract
The authors describe a shift-tolerant neural network architecture for phoneme recognition. The system is based on LVQ2, an algorithm which pays close attention to approximating the optimal Bayes decision line in a discrimination task. Recognition performances in the 98-99% correct range were obtained for LVQ2 networks aimed at speaker-dependent recognition of phonemes in small but ambiguous Japanese phonemic classes. A correct recognition rate of 97.7% was achieved by a single, larger LVQ2 network covering all Japanese consonants. These recognition results are at least as high as those obtained in the time delay neural network system and suggest that LVQ2 could be the basis for a successful speech recognition system. >
- Published
- 2003
- Full Text
- View/download PDF
180. A new algorithm for representing acoustic feature dynamics
- Author
-
M. Yokota, Shigeru Katagiri, and Erik McDermott
- Subjects
Linde–Buzo–Gray algorithm ,Learning vector quantization ,Computer science ,business.industry ,Feature vector ,Vector quantization ,Codebook ,Pattern recognition ,Reduction (complexity) ,Feature (machine learning) ,Node (circuits) ,Artificial intelligence ,business ,Algorithm - Abstract
The goal of this algorithm is to reduce learning time in a multireference phoneme-recognition system based on learning vector quantization (LVQ). The algorithm has, within the system, two kinds of vectors: a codebook vector and a reference node vector. An ordering procedure, similar to self-organizing feature maps, is used for the codebook design, and LVQ is used to adapt reference node vectors. The algorithm is divided into four steps: (1) codebook design, (2) mapping from the phoneme vector into the node vector, (3) reduction of the number of reference node vectors, and (4) adaptation of the reference node vectors. In particular, the mapping translates the high-dimensional phoneme vector into a trajectory on a two-dimensional plane; the acoustic feature dynamics can thus be visualized. Geometrical distance on the plane is used in the number reduction and adaptation of the reference node vectors. The learning time is thereby reduced considerably. Experiments using Japanese voiced plosives have shown that the algorithm can considerably speed up learning and still maintain a high, 97% recognition rate. >
- Published
- 2003
- Full Text
- View/download PDF
181. Construction of a large-scale Japanese speech database and its management system
- Author
-
T. Watanabe, Yoshinori Sagisaka, Hisao Kuwabara, Shigeru Katagiri, Kazuya Takeda, and S. Morikawa
- Subjects
SIMPLE (military communications protocol) ,Database ,Computer science ,business.industry ,Speech recognition ,Speech corpus ,Speech synthesis ,computer.software_genre ,ComputingMethodologies_PATTERNRECOGNITION ,Transcription (linguistics) ,Management system ,Information system ,Artificial intelligence ,Scale (map) ,business ,computer ,Natural language processing - Abstract
A large-scale Japanese speech database is described. It consists of (1) an isolated-word speech database and (2) a continuous-speech database. To facilitate usage for speech research, multiple transcriptions have been made in five different layers from a simple phonemic description to fine acoustic-phonetic transcriptions. A database management system has also been developed to establish an effective link between speech data and label data for easy access to both data. >
- Published
- 2003
- Full Text
- View/download PDF
182. A new learning algorithm for minimizing spotting errors
- Author
-
Takashi Komori and Shigeru Katagiri
- Subjects
Scheme (programming language) ,Artificial neural network ,business.industry ,Computer science ,Speech recognition ,Pattern recognition ,Spotting ,Task (project management) ,Algorithm design ,Artificial intelligence ,business ,Design methods ,Hidden Markov model ,Algorithm ,computer ,Word (computer architecture) ,computer.programming_language - Abstract
A new learning algorithm, called minimum spotting error formalization (MSPE), is proposed for designing a high performance word spotting system. An overall spotting system, comprising word models and decision thresholds, primarily needs to be optimized to minimize all spotting errors; the word models and the thresholds should no longer be separately and heuristically designed. MSPE features a rigorous framework for reducing a spotting error objective in a practical, gradient search-based design scheme. Experimental results in a Japanese consonant spotting task clearly demonstrate the usefulness of the proposed method. >
- Published
- 2002
- Full Text
- View/download PDF
183. Discriminative feature extraction for speech recognition
- Author
-
A. Biem, Biing-Hwang Juang, and Shigeru Katagiri
- Subjects
business.industry ,Computer science ,Discriminative feature extraction ,Speech recognition ,Feature extraction ,Probabilistic logic ,Pattern recognition ,k-nearest neighbors algorithm ,Extractor ,ComputingMethodologies_PATTERNRECOGNITION ,Cepstrum ,Systems design ,Artificial intelligence ,business ,Classifier (UML) - Abstract
A novel approach to pattern recognition, called discriminative feature extraction (DFE) is introduced as a way to interactively handle the input data with a given classifier. The entire recognizer, consisting of the feature extractor as well as the classifier, is trained with the minimum classification error generalised probabilistic descent learning algorithm. Both the philosophy and implementation examples of this approach are described. DFE realizes a significant departure from conventional approaches, providing a comprehensive base for the entire system design. By way of example, an automatic scaling process is described, and experimental results for designing a cepstrum representation for vowel recognition are presented. >
- Published
- 2002
- Full Text
- View/download PDF
184. A telephone-based directory assistance system adaptively trained using minimum classification error/generalized probabilistic descent
- Author
-
E.A. Woudenberg, Erik McDermott, and Shigeru Katagiri
- Subjects
Training set ,Artificial neural network ,business.industry ,Computer science ,Speech recognition ,Probabilistic logic ,Artificial intelligence ,business ,Hidden Markov model ,Machine learning ,computer.software_genre ,computer ,Directory assistance - Abstract
The minimum classification error/generalized probabilistic descent (MCE/GPD) framework has been applied to several recognizer frameworks, such as hidden Markov models, prototype based systems, and systems based on artificial neural networks. However, to our knowledge, the MCE/CPD framework has not yet been applied to a working online speech recognition system in a realistic application environment. We describe the application of MCE/GPD to a telephone-based multi-speaker speech recognition system that accepts spoken Japanese names and forwards calls to any of up to 400 staff members. Points of interest include the automatic collection and labeling of new training data and the use of MCE/GPD training to improve recognizer performance.
- Published
- 2002
- Full Text
- View/download PDF
185. Discriminative feature extraction application to filter bank design
- Author
-
A. Biem, Shigeru Katagiri, and Erik McDermott
- Subjects
Artificial neural network ,business.industry ,Computer science ,Speech recognition ,Feature extraction ,Probabilistic logic ,Pattern recognition ,Filter bank ,Backpropagation ,ComputingMethodologies_PATTERNRECOGNITION ,Word recognition ,Artificial intelligence ,business ,Classifier (UML) ,Directory assistance - Abstract
This paper investigates the design of a filter bank model by the discriminative feature extraction method (DFE). A filter bank-based feature extractor is optimized with the classifier's parameters for the minimization of the errors occurring at the back-end classification process. The framework of minimum classification error/generalized probabilistic descent method (MCE/GPD) is used as the basis for optimization. The method is first tested in a vowel recognition task. Analysis of the process shows how DFE extracts those parts of the spectrum that are relevant to discrimination. Then the method is applied to a multi-speaker word recognition system intended to act as telephone directory assistance operator.
- Published
- 2002
- Full Text
- View/download PDF
186. Minimum error training for speech recognition
- Author
-
Shigeru Katagiri and Erik McDermott
- Subjects
Learning vector quantization ,Computer science ,business.industry ,Speech recognition ,Vector quantization ,Word error rate ,Pattern recognition ,Spotting ,Margin classifier ,Feedforward neural network ,Artificial intelligence ,Hidden Markov model ,business ,Classifier (UML) - Abstract
In recent years several research groups have investigated the use of a new framework for minimizing the error rate of a classifier. The key idea is to define a smooth, differentiable loss function that incorporates all adaptable classifier parameters and that approximates the (non-smooth) actual performance error rate. This framework is applicable to a variety of classifier structures, including feedforward neural networks, learning vector quantization classifiers, and hidden Markov models. Here we describe a particular application in which a relatively simple distance-based classifier is trained to minimize errors in speech recognition tasks. The loss function is defined so as to reflect errors at the level of the final, grammar-driven recognition output. We show how the loss function can be made to reflect not just correctness/incorrectness at the string level, but also, for instance, a word spotting loss between the recognized string and the correct string. Thus, minimization of this loss can explicitly optimize the word spotting rate. >
- Published
- 2002
- Full Text
- View/download PDF
187. Filter bank design based on discriminative feature extraction
- Author
-
A. Biem and Shigeru Katagiri
- Subjects
business.industry ,Computer science ,Feature extraction ,Probabilistic logic ,Pattern recognition ,Filter (signal processing) ,Speech processing ,Filter bank ,ComputingMethodologies_PATTERNRECOGNITION ,Band-pass filter ,Computer Science::Sound ,Artificial intelligence ,business ,Classifier (UML) - Abstract
A filter bank model, which achieves minimum error, is investigated in this paper. A bank-of-filter feature extractor module is comprehensively optimized with the classifier's parameters for minimization of the errors occurring at the back-end classifier. The method has been applied to readjusting Mel-scale and Bark-scale based filter banks for the Japanese vowel recognition task, the framework being provided by the minimum classification error (MCE)/generalised probabilistic descent method (GPD). The results show suggestive phenomena underlying the accuracy of the proposed approach. >
- Published
- 2002
- Full Text
- View/download PDF
188. Minimum error classification of keyword-sequences
- Author
-
Shigeru Katagiri and Takashi Komori
- Subjects
Computer science ,business.industry ,Speech recognition ,Keyword spotting ,Process (computing) ,Contrast (statistics) ,Pattern recognition ,Artificial intelligence ,business - Abstract
A novel spotter design method, i.e., minimum error classification of keyword-sequences (MECK), is proposed. In contrast with conventional approaches, the proposed method directly aims at reducing errors of classifying keyword-sequences (strings of prescribed keyword categories) through a mathematically proven, GPD-based optimization process. Experiments in Japanese keyword spotting tasks clearly demonstrate the utility of a MECK-trained, prototype-based spotter. >
- Published
- 2002
- Full Text
- View/download PDF
189. Discriminative multi-layer feed-forward networks
- Author
-
Shigeru Katagiri, Biing-Hwang Juang, and Chin-Hui Lee
- Subjects
business.industry ,Computer science ,Time delay neural network ,Probabilistic logic ,Pattern recognition ,Perceptron ,computer.software_genre ,Probabilistic neural network ,Discriminative model ,Search algorithm ,Multilayer perceptron ,Artificial intelligence ,Data mining ,business ,Classifier (UML) ,computer - Abstract
The authors propose a new family of multi-layer, feed-forward network (FFN) architectures. This framework allows examination of several feed-forward networks, including the well-known multi-layer perceptron (MLP) network, the likelihood network (LNET) and the distance network (DNET), in a unified manner. They then introduce a novel formulation which embeds network parameters into a functional form of the classifier design objective so that the network's parameters can be adjusted by gradient search algorithms, such as the generalized probabilistic descent (GPD) method. They evaluate several discriminative three-layer networks by performing a pattern classification task. They demonstrate that the performance of a network can be significantly improved when discriminative formulations are incorporated into the design of the pattern classification networks. >
- Published
- 2002
- Full Text
- View/download PDF
190. New discriminative training algorithms based on the generalized probabilistic descent method
- Author
-
Biing-Hwang Juang, Shigeru Katagiri, and C.-H. Lee
- Subjects
Normalization (statistics) ,Dynamic time warping ,Artificial neural network ,Computer science ,business.industry ,Probabilistic logic ,Pattern recognition ,Discriminative model ,Computer Science::Sound ,Embedding ,Artificial intelligence ,Hidden Markov model ,business ,Algorithm ,Classifier (UML) - Abstract
The authors developed a generalized probabilistic descent (GPD) method by extending the classical theory on adaptive training by Amari (1967). Their generalization makes it possible to treat dynamic patterns (of a variable duration or dimension) such as speech as well as static patterns (of a fixed duration or dimension), for pattern classification problems. The key ideas of GPD formulations include the embedding of time normalization and the incorporation of smooth classification error functions into the gradient search optimization objectives. As a result, a family of new discriminative training algorithms can be rigorously formulated for various kinds of classifier frameworks, including the popular dynamic time warping (DTW) and hidden Markov model (HMM). Experimental results are also provided to show the superiority of this new family of GPD-based, adaptive training algorithms for speech recognition. >
- Published
- 2002
- Full Text
- View/download PDF
191. A minimum error approach to speech and pattern recognition
- Author
-
Shigeru Katagiri and Biing-Hwang Juang
- Subjects
Error function ,Estimation theory ,business.industry ,Search algorithm ,Computer science ,Classification rule ,Feature (machine learning) ,Bayes error rate ,Word error rate ,Pattern recognition ,Artificial intelligence ,Linear discriminant analysis ,business - Abstract
The authors present a new formulation of the pattern recognition problem, aimed at achieving a minimum error rate classification. The classical discriminant analysis methodology is blended with the classification rule (traditionally expressed in an operational form) in a new functional form and is used as the design objective criterion to be optimized by numerical search algorithms. The new formulation results in a smooth error function which approximates the empirical error rate for the design sample set arbitrarily closely. The authors have applied the minimum error formulation to several recognition tasks and demonstrated the advantages of the proposed method. In a speech recognition experiment involving the English E-set vocabulary, it was demonstrated that the proposed minimum error method achieves the best recognition performance. It is concluded that the proposed learning method and formulation provide a solid analytical ground for the long-standing minimum error classifier design problem. >
- Published
- 2002
- Full Text
- View/download PDF
192. A new HMM/LVQ hybrid algorithm for speech recognition
- Author
-
Chin-Hui Lee and Shigeru Katagiri
- Subjects
Normalization (statistics) ,Vocabulary ,Acoustics and Ultrasonics ,Computer science ,Speech recognition ,media_common.quotation_subject ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Markov process ,symbols.namesake ,Arts and Humanities (miscellaneous) ,Discriminative model ,Hidden Markov model ,media_common ,Learning vector quantization ,Artificial neural network ,business.industry ,Codebook ,Probabilistic logic ,Vector quantization ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Pattern recognition ,ComputingMethodologies_PATTERNRECOGNITION ,Computer Science::Sound ,symbols ,Adaptive learning ,Artificial intelligence ,business ,Classifier (UML) - Abstract
A new HMM/LVQ hybrid algorithm for speech recognition is proposed. The motivations are: (1) to expand the capability of learning vector quantization (LVQ) for handling dynamic speech patterns and (2) to improve the performance of an HMM‐based system. It is shown that, by combining both the discriminative power of LVQ and the capability of modeling temporal variations of speech of an HMM into a hybrid algorithm, the performance of the original HMM‐based speech recognition algorithm is significantly improved. The proposed recognition algorithm uses HMM to segment speech utterances and then adopts a novel classifier in place of the conventional HMM likelihood comparison for recognition. Since the parameters of the classifier can be estimated through adaptive learning rules, the discriminative power of the recognizer is greatly enhanced. Any learnable classifier, such as an artificial neural network (ANN), can be used for the discriminative classifier. By way of example, an LVQ‐based multicategory classifier is used in this study. The LVQ codebook is obtained through a probabilistic descent method using segmented and normalized speech tokens as training samples. The evaluation was conducted using a multispeaker, isolated English E‐set letter database. The average word accuracy for the original HMM‐based system was 61.7%. When the LVQ classifier was incorporated into the hybrid algorithm, the word accuracy increased to 81.3%.
- Published
- 2002
- Full Text
- View/download PDF
193. A hybrid speech recognition system using HMMs with an LVQ-trained codebook
- Author
-
Hitoshi Iwamida, Erik McDermott, and Shigeru Katagiri
- Subjects
Linde–Buzo–Gray algorithm ,Learning vector quantization ,Vocabulary ,business.industry ,Computer science ,Quantization (signal processing) ,media_common.quotation_subject ,Speech recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Codebook ,Vector quantization ,Markov process ,Pattern recognition ,Speaker recognition ,Markov model ,symbols.namesake ,ComputingMethodologies_PATTERNRECOGNITION ,symbols ,Artificial intelligence ,business ,Hidden Markov model ,Signature recognition ,Utterance ,media_common - Abstract
A speech recognition system using the neurally inspired learning vector quantization (LVQ) to train hidden Markov model (HMM) codebooks is described. Both LVQ and HMMs are stochastic algorithms holding considerable promise for speech recognition. In particular, LVQ is a vector quantizer with very powerful classification ability. HMMs, on the other hand, have the advantage that phone models can easily be concatenated to produce long utterance models, such as word or sentence models. The algorithm described combines the advantages inherent in each of these two algorithms. As the result of phoneme recognition experiments using a large vocabulary database of 5240 common Japanese words uttered in isolation by a male speaker, it is confirmed that the high discriminant ability of LVQ could be integrated into an HMM architecture easily extendible to longer utterance models. >
- Published
- 2002
- Full Text
- View/download PDF
194. Sound monitoring based on the generalized probabilistic descent method
- Author
-
S. Tanaka, H. Watanabe, Shigeru Katagiri, and Y. Matsumoto
- Subjects
Sound (medical instrument) ,Discriminative model ,Artificial neural network ,Computer science ,business.industry ,Speech recognition ,Pattern recognition (psychology) ,Probabilistic logic ,Pattern recognition ,Artificial intelligence ,business ,Descent (mathematics) ,Task (project management) - Abstract
We propose a method for sound monitoring, which enables one to selectively detect unexpected irregular sounds and ignore the other sounds, i.e., regular sounds. The proposed method is based on the generalized probabilistic descent (GPD) method, which was originally developed as a general concept for the discriminative design of pattern recognizers, and is referred to as minimum detection error (MDE) training. The formulation and implementation of MDE training are described in detail, and its utility is demonstrated in a task of detecting irregular events; more specifically, sounds due to the mis-operation of a tool in a noisy environment.
- Published
- 2002
- Full Text
- View/download PDF
195. Minimum detection error training for acoustic signal monitoring
- Author
-
Y. Matsumoto, Shigeru Katagiri, and H. Watanabe
- Subjects
Noise ,Artificial neural network ,Discriminative model ,Computer science ,Speech recognition ,Training (meteorology) ,Probabilistic logic ,ComputerSystemsOrganization_SPECIAL-PURPOSEANDAPPLICATION-BASEDSYSTEMS ,Signal monitoring ,Descent (mathematics) - Abstract
In this paper we propose a novel approach to the detection of acoustic irregular signals using minimum detection error (MDE) training. The MDE training is based on the generalized probabilistic descent method, which was originally developed as a general concept for a discriminative pattern recognizer design. We demonstrate its fundamental utility by experiments in which several acoustic events are detected in a noisy environment.
- Published
- 2002
- Full Text
- View/download PDF
196. HMM speech recognizer based on discriminative metric design
- Author
-
Shigeru Katagiri and H. Watanbe
- Subjects
business.industry ,Computer science ,Speech recognition ,Gaussian ,Feature extraction ,Pattern recognition ,Speaker recognition ,Speech processing ,symbols.namesake ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,Robustness (computer science) ,Word recognition ,Feature (machine learning) ,symbols ,Artificial intelligence ,business ,Hidden Markov model ,Gaussian process - Abstract
We apply discriminative metric design (DMD), the general methodology of discriminative class-feature design, to a speech recognizer using a hidden Markov model (HMM) classification. This implementation enables one to represent the salient feature of each acoustic unit that is essential for recognition decision, and accordingly enhances robustness against irrelevant pattern variations. We demonstrate its high utility by experiments of speaker-dependent Japanese word recognition using linear feature extractors and mixture Gaussian HMMs. Furthermore, we summarize several other proposed design methods related to our DMD and show that they are special implementations of the DMD concept.
- Published
- 2002
- Full Text
- View/download PDF
197. Efficient normalization based upon GPD [generalized probabilistic descent]
- Author
-
E.A. Woudenberg, A. Biem, Erik McDermott, and Shigeru Katagiri
- Subjects
Normalization (statistics) ,Artificial neural network ,business.industry ,Computer science ,Feature extraction ,Probabilistic logic ,Pattern recognition ,Machine learning ,computer.software_genre ,Speech processing ,Adaptive filter ,Discriminative model ,Artificial intelligence ,business ,Classifier (UML) ,computer - Abstract
We propose a simple but powerful method for normalizing various sources of mismatch between training and testing conditions in speech recognizers, based on a training methodology called the generalized probabilistic descent method (GPD). In this new framework, a gradient based method is used to adapt the parameters of the feature extraction process in order to minimize the distortion between new speech data and existing classifier models, while most conventional normalization/adaptation methods attempt to adapt classification parameters. The GPD was proposed as a general discriminative training method for pattern recognizers such as neural networks. Up until now this has been used only for classifier design, sometimes in combination with the design of a non adaptive feature extractor. This paper, in contrast, studies the adaptive training benefits of GPD in the framework of normalizing the feature extractor to a new pattern environment. Experiments which use this technique to improve Japanese vowel classification were conducted and demonstrate the ability to reduce error rates by as much as 40%.
- Published
- 2002
- Full Text
- View/download PDF
198. Cepstrum-based filter-bank design using discriminative feature extraction training at various levels
- Author
-
A. Biem and Shigeru Katagiri
- Subjects
business.industry ,Iterative method ,Computer science ,Estimation theory ,Speech recognition ,Bandwidth (signal processing) ,Feature extraction ,Pattern recognition ,Filter bank ,Speech processing ,Band-pass filter ,Computer Science::Sound ,Cepstrum ,Artificial intelligence ,Center frequency ,business - Abstract
This paper investigates the realization of optimal filter bank-based cepstral parameters. The framework is the discriminative feature extraction method (DFE) which iteratively estimates the filter-bank parameters according to the errors that the system makes. Various parameters of the filter-bank, such as center frequency, bandwidth, and gain are optimized using a string-level optimization and a frame-level optimization scheme. Application to vowel and noisy telephone speech recognition tasks shows that the DFE method realizes a more robust classifier by appropriate feature extraction.
- Published
- 2002
- Full Text
- View/download PDF
199. A novel approach to pattern recognition based on discriminative metric design
- Author
-
T. Yamaguchi, H. Watanabe, and Shigeru Katagiri
- Subjects
Learning vector quantization ,business.industry ,Probabilistic logic ,Pattern recognition ,Machine learning ,computer.software_genre ,ComputingMethodologies_PATTERNRECOGNITION ,Quadratic equation ,Discriminant ,Discriminative model ,Artificial intelligence ,Hidden Markov model ,business ,computer ,Subspace topology ,Signature recognition ,Mathematics - Abstract
This paper proposes a novel approach, named discriminative metric design (DMD), to pattern recognition. DMD optimizes the whole metrics of discriminant functions with the minimum classification error/generalized probabilistic descent method (MCE/GPD) such that the intrinsic features of each pattern class can be represented efficiently. The resulting metrics lead accordingly to robust recognizers. DMD is quite general. Several existing methods, such as learning vector quantization, subspace method, discriminative feature extraction, radial-basis function network, and the continuous hidden Markov model, are defined as its special cases. Among the many possibilities, this paper specifically elaborates the DMD formulation for recognizing fixed dimensional patterns using quadratic discriminant functions, and clearly demonstrates its utility in a speaker-independent Japanese vowel recognition task.
- Published
- 2002
- Full Text
- View/download PDF
200. Discriminative metric design for pattern recognition
- Author
-
T. Yamaguchi, Shigeru Katagiri, and H. Watanabe
- Subjects
Learning vector quantization ,business.industry ,Speech recognition ,Feature extraction ,Probabilistic logic ,Vector quantization ,Pattern recognition ,Discriminative model ,Discriminant ,Robustness (computer science) ,Artificial intelligence ,business ,Hidden Markov model ,Mathematics - Abstract
This paper proposes a new approach, named discriminative metric design (DMD), to pattern recognition. DMD optimizes discriminant functions with the minimum classification error/generalized probabilistic descent method (MCE/GPD) such that intrinsic features of each pattern class can be represented efficiently. Resulting metrics accordingly lead to robust recognizers. The DMD is quite general. Several existing methods, such as learning vector quantization and the continuous hidden Markov model, are defined as its special cases. The paper specially elaborates the DMD formulation for the quadratic discriminant function, and clearly demonstrates its utility in a speaker-independent Japanese vowel recognition task.
- Published
- 2002
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.