48 results on '"Tsuneo Nitta"'
Search Results
2. New Grapheme Generation Rules for Two-Stage Modelbased Grapheme-to-Phoneme Conversion
- Author
-
Seng Kheang, Tsuneo Nitta, Kouichi Katsurada, and Yurie Iribe
- Subjects
Consonant ,Information Systems and Management ,General Computer Science ,business.industry ,Computer science ,Speech recognition ,Grapheme ,Speech synthesis ,TK5101-6720 ,Information technology ,Pronunciation ,T58.5-58.64 ,computer.software_genre ,Software ,Vowel ,Telecommunication ,Artificial intelligence ,Electrical and Electronic Engineering ,Document retrieval ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
The precise conversion of arbitrary text into its corresponding phoneme sequence (grapheme-to-phoneme or G2P conversion) is implemented in speech synthesis and recognition, pronunciation learning software, spoken term detection and spoken document retrieval systems. Because the quality of this module plays an important role in the performance of such systems and many problems regarding G2P conversion have been reported, we propose a novel two-stage model-based approach, which is implemented using an existing weighted finite-state transducer-based G2P conversion framework, to improve the performance of the G2P conversion model. The first-stage model is built for automatic conversion of words to phonemes, while the second-stage model utilizes the input graphemes and output phonemes obtained from the first stage to determine the best final output phoneme sequence. Additionally, we designed new grapheme generation rules, which enable extra detail for the vowel and consonant graphemes appearing within a word. When compared with previous approaches, the evaluation results indicate that our approach using rules focusing on the vowel graphemes slightly improved the accuracy of the out-of-vocabulary dataset and consistently increased the accuracy of the in-vocabulary dataset.
- Published
- 2014
3. Task Estimation Using Latent Semantic Analysis of Visual Scenes and Spoken Words
- Author
-
Tsuneo Nitta, Masashi Kimura, Shinta Sawada, Yurie Iribe, and Kouichi Katsurada
- Subjects
Estimation ,Thesaurus (information retrieval) ,Modality (human–computer interaction) ,Computer Networks and Communications ,Computer science ,business.industry ,Latent semantic analysis ,Applied Mathematics ,Speech recognition ,General Physics and Astronomy ,computer.software_genre ,Linear subspace ,Task (project management) ,Image (mathematics) ,Identification (information) ,Signal Processing ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,Natural language processing - Abstract
SUMMARY In this paper, we propose a task estimation method based on multiple subspaces extracted from multimodal information of image objects in visual scenes and spoken words in dialogue appearing in the same task. The multiple subspaces are obtained by using latent semantic analysis (LSA). In the proposed method, a task vector composed of spoken words and the frequencies of image-object appearances are extracted first, and then similarities among the input task vector and reference subspaces of different tasks are compared. Experiments are conducted on the identification of game tasks. The experimental results show that the proposed method with multimodal information outperforms the method in which only the single modality of image or spoken dialogue is applied. The proposed method achieves accurate performance even if less spoken dialogue is applied.
- Published
- 2014
4. Solving the Phoneme Conflict in Grapheme-to-Phoneme Conversion Using a Two-Stage Neural Network-Based Approach
- Author
-
Seng Kheang, Tsuneo Nitta, Kouichi Katsurada, and Yurie Iribe
- Subjects
Artificial neural network ,Computer science ,business.industry ,Speech recognition ,American English ,Phonetic transcription ,Grapheme ,Context (language use) ,Speech synthesis ,Pronunciation ,computer.software_genre ,ComputingMethodologies_ARTIFICIALINTELLIGENCE ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial Intelligence ,Hardware and Architecture ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,Software ,Word (computer architecture) ,Natural language processing - Abstract
SUMMARY To achieve high quality output speech synthesis systems, data-driven grapheme-to-phoneme (G2P) conversion is usually used to generate the phonetic transcription of out-of-vocabulary (OOV) words. To improve the performance of G2P conversion, this paper deals with the problem of conflicting phonemes, where an input grapheme can, in the same context, produce many possible output phonemes at the same time. To this end, we propose a two-stage neural network-based approach that converts the input text to phoneme sequences in the first stage and then predicts each output phoneme in the second stage using the phonemic information obtained. The first-stage neural network is fundamentally implemented as a many-to-many mapping model for automatic conversion of word to phoneme sequences, while the second stage uses a combination of the obtained phoneme sequences to predict the output phoneme corresponding to each input grapheme in a given word. We evaluate the performance of this approach using the American English words-based pronunciation dictionary known as the auto-aligned CMUDict corpus[1]. In terms of phoneme and word accuracy of the OOV words, on comparison with several proposed baseline approaches, the evaluation results show that our proposed approach improves on the previous one-stage neural network-based approach for G2P conversion. The results of comparison with another existing approach indicate that it provides higher phoneme accuracy but lower word accuracy on a general dataset, and slightly higher phoneme and word accuracy on a selection of words consisting of more than one phoneme
- Published
- 2014
5. Six-Layered Model for Multimodal Interaction Systems
- Author
-
Kouichi Katsurada, Kazuyuki Ashimura, Masahiro Araki, and Tsuneo Nitta
- Subjects
Computer architecture ,business.industry ,Computer science ,Container (abstract data type) ,Task control ,Layered model ,Information technology ,Granularity ,Layer (object-oriented design) ,business ,Application layer ,Multimodal interaction - Abstract
We have proposed a six-layered model for multimodal interaction (MMI) systems as an Information Technology Standards Commission of Japan (ITSCJ) standard. It specifies an architecture of an MMI system composed of six layers: application layer, task control layer, a-modal dialogue control, a-modal ⇔ multimodal conversion, modality-dependent layer, and input–output devices. The standard defines the role of each layer in an MMI system, its granularity, and the events transferred between the layers. The EMMA format is employed as the container of the input results. In this chapter, we introduce the outline of the proposed model and show its practical implementation as a Web-based MMI system.
- Published
- 2016
6. Phoneme Recognition Based on AF-HMMs with Optimal Parameter Set
- Author
-
Yurie Iribe, Narpendyah W. Ariwardhani, Tsuneo Nitta, Masashi Kimura, and Kouichi Katsurada
- Subjects
Set (abstract data type) ,business.industry ,Computer science ,Phoneme recognition ,Speech recognition ,Feature (machine learning) ,Word error rate ,Pattern recognition ,Artificial intelligence ,business ,Hidden Markov model - Published
- 2012
7. Generation of CG Animations Based on Articulatory Features for Pronunciation Training
- Author
-
Tsuneo Nitta, Kouichi Katsurada, Takuro Mori, and Yurie Iribe
- Subjects
Computer science ,business.industry ,Speech recognition ,Training (meteorology) ,Artificial intelligence ,Pronunciation ,computer.software_genre ,business ,computer ,Computer animation ,Natural language processing - Published
- 2012
8. Learning Lexicons from Spoken Utterances Based on Statistical Model Selection
- Author
-
Ryo Taguchi, Tsuneo Nitta, Mikio Nakano, Naoto Iwahashi, Takashi Nose, and Kotaro Funakoshi
- Subjects
Computer science ,business.industry ,Model selection ,Speech recognition ,Acoustic model ,Statistical model ,Language acquisition ,Object (computer science) ,Lexicon ,computer.software_genre ,Artificial Intelligence ,Unsupervised learning ,Artificial intelligence ,business ,computer ,Software ,Utterance ,Natural language processing - Abstract
This paper proposes a method for the unsupervised learning of lexicons from pairs of a spoken utterance and an object as its meaning without any a priori linguistic knowledge other than a phoneme acoustic model. In order to obtain a lexicon, a statistical model of the joint probability of a spoken utterance and an object is learned based on the minimum description length principle. This model consists of a list of word phoneme sequences and three statistical models: the phoneme acoustic model, a word-bigram model, and a word meaning model. Experimental results show that the method can acquire acoustically, grammatically and semantically appropriate words with about 85% phoneme accuracy. Index Terms: Lexical learning, language acquisition, model selection.
- Published
- 2010
9. Distinctive Phonetic Feature (DPF) Extraction Based on MLNs and Inhibition/Enhancement Network
- Author
-
Tsuneo Nitta, Mohammad Nurul Huda, and Hiroaki Kawashima
- Subjects
Phrase ,Artificial neural network ,Computer science ,business.industry ,Speech recognition ,Feature extraction ,Phonetics ,Context (language use) ,Pattern recognition ,Speech processing ,Artificial Intelligence ,Hardware and Architecture ,Classifier (linguistics) ,Feature (machine learning) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,Hidden Markov model ,business ,Software ,Sentence - Abstract
This paper describes a distinctive phonetic feature (DPF) extraction method for use in a phoneme recognition system; our method has a low computation cost. This method comprises three stages. The first stage uses two multilayer neural networks (MLNs): MLNLF-DPF, which maps continuous acoustic features, or local features (LFs), onto discrete DPF features, and MLNDyn, which constrains the DPF context at the phoneme boundaries. The second stage incorporates inhibition/enhancement (In/En) functionalities to discriminate whether the DPF dynamic patterns of trajectories are convex or concave, where convex patterns are enhanced and concave patterns are inhibited. The third stage decorrelates the DPF vectors using the Gram-Schmidt orthogonalization procedure before feeding them into a hidden Markov model (HMM)-based classifier. In an experiment on Japanese Newspaper Article Sentences (JNAS) utterances, the proposed feature extractor, which incorporates two MLNs and an In/En network, was found to provide a higher phoneme correct rate with fewer mixture components in the HMMs.
- Published
- 2009
10. Canonicalization of Feature Parameters for Robust Speech Recognition Based on Distinctive Phonetic Feature (DPF) Vectors
- Author
-
Kouichi Katsurada, Tsuneo Nitta, Mohammad Nurul Huda, Muhammad Ghulam, and Takashi Fukuda
- Subjects
business.industry ,Computer science ,Speech recognition ,Gaussian ,Feature extraction ,Wiener filter ,Pattern recognition ,Speech processing ,symbols.namesake ,Artificial Intelligence ,Hardware and Architecture ,Robustness (computer science) ,Canonicalization ,symbols ,Computer Vision and Pattern Recognition ,Mel-frequency cepstrum ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Hidden Markov model ,Coarticulation ,Gaussian process ,Software - Abstract
This paper describes a robust automatic speech recognition (ASR) system with less computation. Acoustic models of a hidden Markov model (HMM)-based classifier include various types of hidden factors such as speaker-specific characteristics, coarticulation, and an acoustic environment, etc. If there exists a canonicalization process that can recover the degraded margin of acoustic likelihoods between correct phonemes and other ones caused by hidden factors, the robustness of ASR systems can be improved. In this paper, we introduce a canonicalization method that is composed of multiple distinctive phonetic feature (DPF) extractors corresponding to each hidden factor canonicalization, and a DPF selector which selects an optimum DPF vector as an input of the HMM-based classifier. The proposed method resolves gender factors and speaker variability, and eliminates noise factors by applying the canonicalzation based on the DPF extractors and two-stage Wiener filtering. In the experiment on AURORA-2J, the proposed method provides higher word accuracy under clean training and significant improvement of word accuracy in low signal-to-noise ratio (SNR) under multi-condition training compared to a standard ASR system with mel frequency ceptral coeffient (MFCC) parameters. Moreover, the proposed method requires a reduced, two-fifth, Gaussian mixture components and less memory to achieve accurate ASR.
- Published
- 2008
11. A Method for Keyword Extraction Using Retrieval Information from Students in Lectures
- Author
-
Kouichi Katsurada, Shuji Shinohara, Hiroaki Kawashima, Yurie Iribe, and Tsuneo Nitta
- Subjects
Information retrieval ,business.industry ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Keyword extraction ,computer.software_genre ,Artificial Intelligence ,Human–computer information retrieval ,ComputingMilieux_COMPUTERSANDEDUCATION ,Artificial intelligence ,business ,computer ,Software ,Natural language processing - Abstract
Recently, e-learning systems for self-learning with various types of retrieval functions have been developed. This paper describes a method for keyword extraction using retrieval information stored from many students through the retrieval functions. Firstly, we show that (1) teachers tend to consider technical terms as important, while students unfamiliar with the technical terms tend to retrieve the terms, therefore (2) there is a clear correlation between keywords extracted by the teachers and the retrieval words by the students. Secondly, we propse a method utilizing retrieval information from the students for keyword extraction, and show that the mehod can achieve quite better performance than a method extracting keywords using only lecture information.
- Published
- 2007
12. A Model of Belief Formation Based on Causality and Application to N-armed Bandit Problem
- Author
-
Tsuneo Nitta, Ryo Taguchi, Shuji Shinohara, and Kouichi Katsurada
- Subjects
Causality (physics) ,Causal induction ,Artificial Intelligence ,Computer science ,business.industry ,Belief formation ,Artificial intelligence ,business ,Software - Published
- 2007
13. Efficient Learning of Word Meanings by Agents Using Biases Observed in Language Development of Children
- Author
-
Masashi Kimura, Kouichi Katsurada, Shuji Shinohara, Satoshi Kodama, Yurie Iribe, Ryo Taguchi, and Tsuneo Nitta
- Subjects
Spoken word ,Distribution (number theory) ,business.industry ,Computer science ,Object (grammar) ,Conditional probability distribution ,computer.software_genre ,Symbol grounding ,Artificial Intelligence ,Feature (machine learning) ,Probability distribution ,Artificial intelligence ,business ,computer ,Software ,Natural language processing ,Word (group theory) - Abstract
Recently, studies on learning of word meanings by agents have begun. In these studies, a human shows objects to an agent and utters words such as ``red'' or ``box''. The agent finds out object's feature represented by each spoken word. In our method, firstly, the agent learns probability distribution p(x) and conditional probability distribution p(x|w), where x is an object feature and w is a word. If a word w does not represent a feature x, p(x) and p(x|w) will be almost same distribution because x is independent of w. This fact enables the agent to use distance between p(x) and p(x|w) when inferring which feature the word represents. Previous works also employ similar stochastic approaches to detect the feature. However, such approaches need a lot of examples to learn correct distributions.
- Published
- 2007
14. On Autonomous Coordination of Learning Biases by an Agent with a Vocabulary Learning Mechanism
- Author
-
Kouichi Katsurada, Tsuneo Nitta, Takashi Hashimoto, Shuji Shinohara, and Ryo Taguchi
- Subjects
Error-driven learning ,Artificial Intelligence ,Human–computer interaction ,Computer science ,business.industry ,Artificial intelligence ,computer.software_genre ,business ,computer ,Vocabulary learning ,Software ,Natural language processing ,Mechanism (sociology) - Published
- 2007
15. Bilinear map of filter-bank outputs for DNN-based speech recognition
- Author
-
Tetsunori Kobayashi, Kenshiro Ueda, Tetsuji Ogawa, Tsuneo Nitta, and Kouichi Katsurada
- Subjects
Computer science ,business.industry ,Speech recognition ,Tensor (intrinsic definition) ,Feature extraction ,Pattern recognition ,Artificial intelligence ,Bilinear map ,business ,Filter bank - Published
- 2015
16. Interaction Builder: A Rapid Prototyping Tool for Developing Web-Based MMI Applications
- Author
-
Hiroaki Adachi, Kunitoshi Sato, Hirobumi Yamada, Tsuneo Nitta, and Kouichi Katsurada
- Subjects
Rapid prototyping ,Computer science ,Interface (Java) ,business.industry ,Speech synthesis ,computer.software_genre ,Mode (computer interface) ,Artificial Intelligence ,Hardware and Architecture ,Human–computer interaction ,Web application ,Web navigation ,Computer Vision and Pattern Recognition ,Electrical and Electronic Engineering ,business ,computer ,Software - Abstract
We have developed Interaction Builder (IB), a rapid prototyping tool for constructing web-based Multi-Modal Interaction (MMI) applications. The goal of IB is making it easy to develop MMI applications with speech recognition, life-like agents, speech synthesis, web browsing, etc. For this purpose, IB supports the following interface and functions: (1) GUI for implementing MMI systems without the details of MMI and MMI description language, (2) functionalities of handling synchronized multimodal inputs/outputs, (3) a test run mode for run-time testing. The results of evaluation tests showed that the application development cycle using IB was significantly shortened in comparison with the time using a text editor both for MMI description language experts and for beginners.
- Published
- 2005
17. Effect of frequency weighting on MLP-based speaker canonicalization
- Author
-
Tetsunori Kobayashi, Motoi Omachi, Tsuneo Nitta, Yuichi Kubota, and Tetsuji Ogawa
- Subjects
Computer science ,business.industry ,Speech recognition ,Multilayer perceptron ,Feature extraction ,Canonicalization ,Pattern recognition ,Artificial intelligence ,business ,Frequency weighting - Published
- 2014
18. Introducing articulatory anchor-point to ann training for corrective learning of pronunciation
- Author
-
Tsuneo Nitta, Ryoko Hayashi, Yurie Iribe, Kouichi Katsurada, Silasak Manosavanh, and Chunyue Zhu
- Subjects
business.industry ,Computer science ,Speech recognition ,Animation ,Pronunciation ,computer.software_genre ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial intelligence ,Articulation (phonetics) ,business ,Articulatory gestures ,computer ,Vocal tract ,Computer animation ,Natural language processing ,Gesture - Abstract
We describe computer-assisted pronunciation training (CAPT) through the visualization of the articulatory gestures from learner's speech in this paper. Typical CAPT systems cannot indicate how the learner can correct his/her articulation. The proposed system enables the learner to study how to correct their pronunciation by comparing the wrongly pronounced gesture with a correctly pronounced gesture. In this system, a multi-layer neural network (MLN) is used to convert the learner's speech into the coordinates for a vocal tract using Magnetic Resonance Imaging data. Then, an animation is generated using the values of the vocal tract coordinates. Moreover, we improved the animations by introducing an anchor-point for a phoneme to MLN training. The new system could even generate accurate CG animations from the English speech by Japanese people in the experiment.
- Published
- 2013
19. Improvement of animated articulatory gesture extracted from speech for pronunciation training
- Author
-
Chunyue Zhu, Silasak Manosavan, Yurie Iribe, Ryoko Hayashi, Kouichi Katsurada, and Tsuneo Nitta
- Subjects
business.industry ,Computer science ,Place of articulation ,Speech recognition ,Animation ,Pronunciation ,computer.software_genre ,Gesture recognition ,Artificial intelligence ,business ,Articulation (phonetics) ,Articulatory gestures ,computer ,Vocal tract ,Natural language processing ,Computer animation - Abstract
Computer-assisted pronunciation training (CAPT) was introduced for language education in recent years. CAPT scores the learner's pronunciation quality and points out wrong phonemes by using speech recognition technology. However, although the learner can thus realize that his/her speech is different from the teacher's, the learner still cannot control the articulation organs to pronounce correctly. The learner cannot understand how to correct the wrong articulatory gestures precisely. We indicate these differences by visualizing a learner's wrong pronunciation movements and the correct pronunciation movements with CG animation. We propose a system for generating animated pronunciation by estimating a learner's pronunciation movements from his/her speech automatically. The proposed system maps speech to coordinate values that are needed to generate the animations by using multilayer perceptron neural networks (MLP). We use MRI data to generate smooth animated pronunciations. Additionally, we verify whether the vocal tract area and articulatory features are suitable as characteristics of pronunciation movement through experimental evaluation.
- Published
- 2012
20. Learning Physically Grounded Lexicons from Spoken Utterances
- Author
-
Mikio Nakano, Ryo Taguchi, Kotaro Funakoshi, Takashi Nose, Naoto Iwahashi, and Tsuneo Nitta
- Subjects
Service (systems architecture) ,Computer science ,Order (business) ,business.industry ,Robot ,Artificial intelligence ,computer.software_genre ,business ,Object (philosophy) ,computer ,Natural language processing ,Utterance ,Word (computer architecture) - Abstract
Service robots must understand correspondence relationships between things in the real world and words in order to communicate with humans. For example, to understand the utterance, "Bring me an apple," the robot requires knowledge about the relationship between the word "apple" and visual features of the apple, such as color and shape. Robots perceive object features with physical sensors. However, developers of service robots cannot describe all knowledge in advance because such robots may be used in situations other than those the developers assumed. In particular, household robots have many opportunities to encounter unknown objects. Therefore, it is preferable that robots automatically learn physically grounded lexicons, which consist of phoneme sequences and meanings of words, through interactions with users.
- Published
- 2012
21. A speaker-independent word recognition based on HMM using orthogonalized phonetic segment codebook
- Author
-
Tsuneo Nitta and Hiroshi Matsuura
- Subjects
Computer science ,business.industry ,Quantization (signal processing) ,Speech recognition ,Codebook ,Vector quantization ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Pattern recognition ,Theoretical Computer Science ,Computational Theory and Mathematics ,Computer Science::Sound ,Hardware and Architecture ,Word recognition ,Artificial intelligence ,Hidden Markov model ,business ,Eigenvalues and eigenvectors ,Smoothing ,Subspace topology ,Information Systems - Abstract
Matrix quantization (MQ) is a method which directly quantizes the spectrum-time pattern. However, it has a problem in that the quantization error is relatively large compared to the vector quantization (VQ), since the dimension is large and the pattern variation is less. From such a viewpoint, this paper introduces the acoustic/phonetic structure called phonetic segment as the unit of MQ. The statistical matrix quantization (SMQ) is applied to the calculation of the error measure, where the subspace method, i.e., a statistical pattern recognition technique, is employed. The purpose of SMQ is to consider effectively the pattern variation by constructing the Orthogonalized phonetic segment codebook based on the eigenvector set representing the pattern variation for each phonetic segment. The training of HMM using the phonetic segment code sequence also is considered. The K-best learning is proposed, where from the first to the K-th phonetic segment sequences are equally handled. Even though the K-best learning is much simpler than VQ, it has equal or better output probability smoothing power, and can suppress the effect of the error in the conversion of the speech to the phonetic segment code sequence. Using the (SMQ/HMM + K-best learning) method, a high speaker-independent word recognition performance of 96.0 percent is obtained for the 100 word data set containing similar word pairs uttered by 10 unknown speakers.
- Published
- 1994
22. Generating animated pronunciation from speech through articulatory feature extraction
- Author
-
Tsuneo Nitta, Silasak Manosavanh, Kouichi Katsurada, Chunyue Zhu, Ryoko Hayashi, and Yurie Iribe
- Subjects
Thesaurus (information retrieval) ,business.industry ,Computer science ,Feature extraction ,Artificial intelligence ,Pronunciation ,business ,computer.software_genre ,computer ,Natural language processing - Published
- 2011
23. Evaluation of fast spoken term detection using a suffix array
- Author
-
Kouichi Katsurada, Tsuneo Nitta, Shigeki Teshima, Yurie Iribe, and Shinta Sawada
- Subjects
law ,business.industry ,Computer science ,Speech recognition ,Suffix array ,Artificial intelligence ,computer.software_genre ,business ,computer ,Natural language processing ,law.invention ,Term (time) - Published
- 2011
24. Studies on the Establishment of Field 'Farming-Type System' in Fertilizing-Seeding and Management of Carrot cultivation
- Author
-
Toshihiko Ibuki, Kazunari Tsuchiya, Hideo Tozawa, Takahiro Adachi, Masaaki Ikeda, and Tsuneo Nitta
- Subjects
Tractor ,Crop ,Engineering ,business.product_category ,Agronomy ,Agriculture ,business.industry ,Yield (wine) ,Operation time ,Seeding ,Fertilizer ,engineering.material ,business - Abstract
We established a new system of fertilizing-seeding and subsequent management of carrot cultivation, which is labor-saving, low cost and adaptable to large scale field farming management in Hokkaido.The results obtained are as follows:1. The system developed can be used for three processes; 1) fertilizing-seeding, 2) weeding, and 3) split dressing and molding on carrot cultivation. With this system, traditional 8 processes (total operation time: 181.1h/ha) was decreased to 3 processes (7.4h/ha). Fertilizing and seeding carrot was done by the newly developed trial machine.2. The trial machine for fertilizing and seeding is adaptable to 4-row cultivation with the row width of 65cm and the tractor which power is 60 PS or more is required for pulling this machine. By using this machine, traditional 4-process management is decreased to one process and the seeding performance is 3-4ha per day. The belt-type device which intrarow spacing is 5cm (2 rows at 10cm in width) was the most adequate as a seed metering device. By using this device with control plate of fertilizing band, fertilizer can be applied zonally. The 95% of fertilizers could be applied to the soil within 20cm in width, 15-20cm in depth. Therefore, remarkable fertilizer saving can be expected, compared to the traditional system.3. The same machines which are commonly used in main crop farm are available to spraying of agricultural chemicals and split dressing and molding.4. The level of growth and yield of carrot in the new system is equal to or more than the level of traditional system.
- Published
- 1993
25. Speaker independent speech recognition based on neural networks of each category with embedded eigenvectors
- Author
-
Tsuneo Nitta, Hiroshi Matsuura, and Yasuyuki Masai
- Subjects
Acoustics and Ultrasonics ,Artificial neural network ,Computer science ,business.industry ,Speech recognition ,Word error rate ,Pattern recognition ,Similarity measure ,Speaker recognition ,Word recognition ,Artificial intelligence ,Projection (set theory) ,business ,Subspace topology ,Word (computer architecture) - Abstract
This paper describes a speaker independent word recognition algorithm that is based on four layer neural networks with embedded eigenvectors. Eigenvectors from the sub-space method (SM) are used as weights for the first hidden layer. A similarity measure given by SM is calculated by cumulative summation of the projection components of aninput pattern onto a set of eigenvectors. In contrast to this, our new method evaluates each projection component to achieve better performance than SM. We propose the subspace training (SST) algorithm with SM and the decision controlled back propaga-tion training (DCBPT) algorithm to improve recognition performance and to reduce training times. Training and recognition experiments were performed using a 26 word vocabulary consisting of train station names. The error rate was 1.3% using SM and was reduced to 0.7% using the combination of neural networks and SM.
- Published
- 1993
26. Facial Expression Mimicking System
- Author
-
Kouichi Katsurada, Tsuneo Nitta, Ryuichi Fukui, and Yurie Iribe
- Subjects
Facial expression ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Facial recognition system ,Active appearance model ,stomatognathic diseases ,ComputingMethodologies_PATTERNRECOGNITION ,ComputerApplications_MISCELLANEOUS ,Face (geometry) ,parasitic diseases ,Computer vision ,Artificial intelligence ,business ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
We propose a facial expression mimicking system that copies the facial expression of one person on the image of another. The system uses the active appearance model (AAM), a commonly used model in the field of facial expression processing. AAM compositionally comprises some parameters representing facial shape, brightness, and illumination environment. Therefore, in addition to the facial expression elements, the model parameters express other elements, such as individuality and direction of the face. In order to extract the facial expression elements from compositional parameters of AAM, we applied principal component analysis (PCA) to the AAM parameter values, collected with each change in facial expression. The obtained facial expression model is applied to the facial expression mimicking system and the experiment shows its effectiveness for mimicking.
- Published
- 2010
27. Fast keyword detection using suffix array
- Author
-
Tsuneo Nitta, Kouichi Katsurada, and Shigeki Teshima
- Subjects
Compressed suffix array ,Computer science ,business.industry ,Generalized suffix tree ,Suffix array ,Pattern recognition ,Data structure ,law.invention ,Set (abstract data type) ,Search engine ,Search algorithm ,law ,Artificial intelligence ,business ,FM-index - Abstract
In this paper, we propose a technique for detecting keywords quickly from a very large speech database without using a large memory space. To accelerate searches and save memory, we used a suffix array as the data structure and applied phoneme-based DP-matching. To avoid an exponential increase in the process time with the length of the keyword, a long keyword is divided into short sub-keywords. Moreover, an iterative lengthening search algorithm is used to rapidly output accurate search results. The experimental results show that it takes less than 100ms to detect the first set of search results from a 10,000-h virtual speech database.
- Published
- 2009
28. Learning Communicative Meanings of Utterances by Robots
- Author
-
Tsuneo Nitta, Ryo Taguchi, and Naoto Iwahashi
- Subjects
Interrogative word ,Communication ,business.industry ,Computer science ,Robot ,Graphical model ,Language acquisition ,business ,Human–robot interaction - Abstract
This paper describes a computational mechanism that enables a robot to return suitable utterances to a human or perform actions by learning the meanings of interrogative words, such as "what" and "which." Previous studies of language acquisition by robots have proposed methods to learn words, such as "box" and "blue," that indicate objects or events in the world. However, the robots could not learn and understand interrogative words by those methods because the words do not directly indicate objects or events. The meanings of those words are grounded in communication and stimulate specific responses by a listener. These are called communicative meanings. Our proposed method learns the relationship between human utterances and robot responses that have communicative meanings on the basis of a graphical model of the human-robot interaction.
- Published
- 2009
29. Distinctive phonetic feature (DPF) based phone segmentation using hybrid neural networks
- Author
-
Junsei Horikawa, Muhammad Ghulam, Mohammad Nurul Huda, and Tsuneo Nitta
- Subjects
Neural gas ,Artificial neural network ,Computer science ,Time delay neural network ,business.industry ,Speech recognition ,Feature extraction ,Speech synthesis ,Pattern recognition ,Speech processing ,computer.software_genre ,Recurrent neural network ,Feature (machine learning) ,Artificial intelligence ,Hidden Markov model ,business ,computer ,Utterance - Abstract
Segmentation of speech into its corresponding phones has become very important issue in many speech processing areas such as speech recognition, speech analysis, speech synthesis, and speech database. In this paper, for accurate segmentation in speech recognition applications, we introduce Distinctive Phonetic Feature (DPF) based feature extraction using a twostage NN (Neural Networks) system consists of a RNN (Recurrent Neural Network) in the first stage and an MLN (Multi-Layer Neural Network) in the second stage. The RNN maps continuous acoustic features, Local Feature (LF), onto discrete DPF patterns, while the MLN constraints DPF context or dynamics in an utterance. The experiments are carried out using JNAS (Japanese Newspaper Article Sentences) continuous utterances that contains vowels and consonants. The proposed DPF based feature extractor provides good segmentation and high recognition rate with a reduced mixture-set of HMMs (Hidden Markov Models) by resolving co-articulation effect.
- Published
- 2007
30. Pitch-Synchronous ZCPA (PS-ZCPA)-Based Feature Extraction with Auditory Masking
- Author
-
Muhammad Ghulam, Takashi Fukuda, Tsuneo Nitta, and Junsei Horikawa
- Subjects
Masking (art) ,Auditory masking ,business.industry ,Computer science ,Histogram ,Speech recognition ,Detector ,Feature extraction ,Pattern recognition ,Artificial intelligence ,business - Abstract
A pitch-synchronous (PS) auditory feature extraction method, based on ZCPA (zero-crossings peak-amplitudes), has been proposed (Ghulam, M. et al., Proc. ICSLP04, 2004) and was shown to be more robust than the conventional ZCPA (Kim, D.S. et al., IEEE Trans. Speech Audio Process., vol.7, no.1, p.55-69, 1999). We examine the effect of auditory masking, both simultaneous and temporal, in the PS-ZCPA method. We also observe the effect of varying the number of histogram bins on the way to find out the optimum parameters of the proposed method. Experimental results demonstrate the improved performance of the PS-ZCPA method achieved by embedding auditory masking into it; for example, with both the masking methods embedded, the performance increases to 73.71% from the 69.92% obtained without masking for PS-ZCPA, while it showed little improvement with an increased number of histogram bins.
- Published
- 2006
31. A Pitch-Synchronous Peak-Amplitude Based Feature Extraction Method for Noise Robust ASR
- Author
-
Tsuneo Nitta, Muhammad Ghulam, and Junsei Horikawa
- Subjects
Auditory masking ,business.industry ,Auditory event ,Computer science ,Noise reduction ,Speech recognition ,Feature extraction ,Wiener filter ,Pattern recognition ,Intelligibility (communication) ,Speech enhancement ,Background noise ,Noise ,symbols.namesake ,symbols ,Artificial intelligence ,Mel-frequency cepstrum ,business - Abstract
In this paper, we propose a novel pitch-synchronous auditory-based feature extraction method for robust automatic speech recognition (ASR). A pitch-synchronous zero-crossing peak-amplitude (PS-ZCPA)-based feature extraction method was proposed previously [1,2], and showed improved performance except while modulation enhancement was integrated together with Wiener filter (WF)-based noise reduction and auditory masking into it [3]. However, since zero-crossing is not an auditory event, we propose a new pitch-synchronous peak-amplitude (PS-PA)-based method to make a feature extractor of ASR more auditory-like. We also examine the effect of WF-based noise reduction, modulation enhancement, and auditory masking into the proposed PS-PA method using Aurora-2J database. The experimental results showed the superiority of the proposed method over the PS-ZCPA method, and eliminated the problem due to the reconstruction of zero-crossings from modulated envelope. The highest relative performance over MFCC was achieved as 67.33% using the PS-PA method together with WF-based noise reduction, modulation enhancement, and auditory masking.
- Published
- 2006
32. Self-learning System Using Lecture Information and Biological Data
- Author
-
Tsuneo Nitta, Kyoichi Matsuura, Shuji Shinohara, Yurie Iribe, and Kouichi Katsurada
- Subjects
Biological data ,Multimedia ,business.industry ,Computer science ,Perspective (graphical) ,Knowledge engineering ,Information system ,The Internet ,Student learning ,business ,computer.software_genre ,computer ,Field (computer science) - Abstract
One of today’s hot topics in the field of education is the learning support system. With the progress of networks and multimedia technologies, various types of web-based training (WBT) systems are being developed for distance- and self-learning. Most of the current learning support systems synchronously reproduce lecture resources such as videos, slides, and digital-ink notes written by the teacher. However, from the perspective of support for student learning, these systems provide only keyword retrieval. This paper describes a more efficient learning support system that we developed by introducing lecture information and student arousal levels extracted from biological data. We also demonstrate the effectiveness of the proposed system through a preliminary experiment.
- Published
- 2006
33. Dialog Strategy Acquisition and Its Evaluation for Efficient Learning of Word Meanings by Agents
- Author
-
Tsuneo Nitta, Kouichi Katsurada, and Ryo Taguchi
- Subjects
Facial expression ,business.industry ,Mechanism (biology) ,computer.software_genre ,ComputingMethodologies_ARTIFICIALINTELLIGENCE ,Comprehension ,Word meaning ,Artificial intelligence ,Dialog box ,Psychology ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
In word meaning acquisition through interactions among humans and agents, the efficiency of the learning depends largely on the dialog strategies the agents have. This paper describes automatic acquisition of dialog strategies through interaction between two agents. In the experiments, two agents infer each other's comprehension level from its facial expressions and utterances to acquire efficient strategies. Q-learning is applied to a strategy acquisition mechanism. Firstly, experiments are carried out through the interaction between a mother agent, who knows all the word meanings, and a child agent with no initial word meaning. The experimental results showed that the mother agent acquires a teaching strategy, while the child agent acquires an asking strategy. Next, the experiments of interaction between a human and an agent are investigated to evaluate the acquired strategies. The results showed the effectiveness of both strategies of teaching and asking.
- Published
- 2006
34. A rapid prototyping tool for constructing web-based MMI applications
- Author
-
Kunitoshi Sato, Hiroaki Adachi, Hirobumi Yamada, Tsuneo Nitta, and Kouichi Katsurada
- Subjects
Rapid prototyping ,business.industry ,Computer science ,Web application ,Software engineering ,business - Published
- 2005
35. Free-field measurements for a loudspeaker system in a normal room--Using digital signal processing techniques
- Author
-
M. Tanaka and Tsuneo Nitta
- Subjects
Physics ,Frequency response ,Anechoic chamber ,Moving average ,business.industry ,Acoustics ,Speech recognition ,Cepstrum ,Loudspeaker ,business ,Free field ,Digital signal processing ,Audio frequency - Abstract
In this paper, free-field measurements (frequency responses and directional characteristics) in a normal room are described. Two methods for reducing reflected waves are investigated. The Comb Lifter method, filtering in cepstrum domain, and the Moving Average method, with variable averaging points, are described. Approximate free-field responses in full audio frequency band are obtained with reduced reflections from the floor, walls and ceiling.
- Published
- 2005
36. Speaker-adaptive connected syllable recognition based on the multiple similarity method
- Author
-
Yoichi Takebayashi, S. Hirai, Tsuneo Nitta, Hiroyuki Tsuboi, and Hiroshi Matsuura
- Subjects
Consonant ,Vocabulary ,business.industry ,Computer science ,Speech recognition ,media_common.quotation_subject ,Frame (networking) ,Pattern recognition ,Speaker recognition ,Similarity (network science) ,Vowel ,Pattern recognition (psychology) ,Artificial intelligence ,Pattern matching ,Syllable ,business ,media_common - Abstract
A new method for accurately recognizing Japanese connected syllables is presented. The method employs both continuous pattern matching and speaker adaptation, based on the Multiple Similarity (MS) method. In the pattern matching, similarity value calculation and acoustic labeling are carried out continuously on a frame by frame basis. They are utilized to achieve a reliable segmentation and recognition for connected syllables. The speaker adaptation depends on the reference pattern modification, whose initial versions are speaker-independent. The vowel and consonant reference patterns are produced through the covariance matrices using K-L expansion. Recognition experiments were carried out for 101 Japanese syllables spoken by ten males at the speed of 3 to 4 syllables per second. The score was 91.4% for 9230 syllables in 4400 Japanese phrases.
- Published
- 2005
37. A noise-robust feature extraction method based on pitch-synchronous ZCPA for ASR
- Author
-
Junsei Horikawa, Tsuneo Nitta, Takashi Fukuda, and Muhammad Ghulam
- Subjects
Noise ,business.industry ,Computer science ,Speech recognition ,Feature extraction ,Subtraction ,Pattern recognition ,Artificial intelligence ,business - Abstract
In this paper, we propose a novel feature extraction method based on an auditory nervous system for robust automatic speech recognition (ASR). In the proposed method, a pitchsynchronous mechanism is embedded in ZCPA (ZeroCrossings Peak-Amplitudes), which has previously been shown to outperform the conventional features in the presence of noise. A noise-robust non-delayed pitch determination algorithm (PDA) is also developed. In the experiment, the proposed pitch-synchronous ZCPA (PS-ZCPA) was proved more robust than the original ZCPA method. Moreover, a simple noise subtraction (NS) method is also integrated in the proposed method and the performance was evaluated using the Aurora-2J database. The experimental results showed the superiority of the proposed PS-ZCPA method with NS over the PS-ZCPA method without NS.
- Published
- 2004
38. Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents
- Author
-
Takao Kobayashi, Akinobu Lee, Shigeo Morishima, Takehito Utsuro, Shigeki Sagayama, Yoichi Yamashita, Takuya Nishimoto, Satoshi Nakamura, Katsunobu Itou, Shin-ichi Kawamoto, Nobuaki Minematsu, Keikichi Hirose, Atsushi Yamada, Tsuneo Nitta, Hiroshi Shimodaira, Keiichi Tokuda, Tatsuo Yotsukura, Yasuharu Den, and Atsuhiko Kai
- Subjects
Unix ,Computer science ,business.industry ,Interface (computing) ,Speech recognition ,Speech synthesis ,computer.software_genre ,Software ,Human–computer interaction ,Virtual machine ,Dialog box ,Dialog system ,business ,computer ,Computer facial animation - Abstract
Galatea is a software toolkit to develop a human-like spoken dialog agent. In order to easily integrate the modules of different characteristics including speech recognizer, speech synthesizer, facial animation synthesizer, and dialog controller, each module is modeled as a virtual machine having a simple common interface and connected to each other through a broker (communication manager). Galatea employs model-based speech and facial animation synthesizers whose model parameters are adapted easily to those for an existing person if his or her training data is given. The software toolkit that runs on both UNIX/Linux and Windows operating systems will be publicly available in the middle of 2003 [7, 6].
- Published
- 2004
39. Distinctive phonetic feature extraction for robust speech recognition
- Author
-
Tsuneo Nitta, Takashi Fukuda, and W. Yamamoto
- Subjects
Artificial neural network ,Computer science ,business.industry ,Speech recognition ,Feature extraction ,Pattern recognition ,White noise ,Noise ,Robustness (computer science) ,Word recognition ,Artificial intelligence ,Mel-frequency cepstrum ,business ,Articulatory gestures - Abstract
The paper describes an attempt to extract distinctive phonetic features (DPFs) that represent articulatory gestures in linguistic theory by using a multilayer neural network (MLN) and to apply the DPFs to noise-robust speech recognition. In the DPF extraction stage, after converting a speech signal to acoustic features composed of local features (LFs), an MLN with 33 output units, corresponding to context-dependent DPFs of 11 DPFs, 11 preceding context DPFs, and 11 following context DPFs, maps the LFs to DPFs. The proposed DPF parameters without MFCC (Mel-frequency cepstral coefficients) were firstly evaluated in comparison with a standard parameter set of MFCC and dynamic features on a word recognition task using clean speech; the result showed the same performance as that of the standard set. Noise robustness of these parameters was then tested with four types of additive noise and the proposed DPF parameters outperformed the standard set except for one additive noise type.
- Published
- 2003
40. A large vocabulary word recognition system based on syllable recognition and nonlinear word matching
- Author
-
Hiroshi Kanazawa, S. Hirai, Hiroshi Matsuura, Yoichi Takebayashi, Tsuneo Nitta, and Hiroyuki Tsuboi
- Subjects
Vocabulary ,Matching (statistics) ,Computer science ,business.industry ,Speech recognition ,media_common.quotation_subject ,computer.software_genre ,Task (project management) ,Nonlinear system ,Pattern recognition (psychology) ,Word recognition ,Artificial intelligence ,Syllable ,business ,computer ,Natural language processing ,Word (computer architecture) ,media_common - Abstract
A practical Japanese large-vocabulary recognition system that is speaker-adaptive has been developed. The system has to notable features. The first is low-cost hardware which can realise the precise syllable recognition and the high speed speaker-adaptation using the Karhunen-Loeve expansion. The second is nonlinear word matching which deals with syllable addition and deletion, and reduces the restriction on acceptable utterances. The system has been applied to data entry systems such as train-station-names, family-names, and given-names input and also to a map data search task in which place names were verified. Recognition experiments have been carried out on a 2000-word vocabulary. The word-recognition rate was 93.7% for 1000 utterances by five male speakers. >
- Published
- 2003
41. Word-spotting based on inter-word and intra-word diphone models
- Author
-
H. Matsu'ura, Tsuneo Nitta, Yasuyuki Masai, and Shinichi Tanaka
- Subjects
Computer science ,business.industry ,Speech recognition ,Logogen model ,Word error rate ,Diphone ,Speaker recognition ,computer.software_genre ,Speech processing ,Word recognition ,Artificial intelligence ,business ,Hidden Markov model ,computer ,Word (computer architecture) ,Natural language processing - Abstract
The authors propose a precise but simple inter-word diphone model (IDM) for word-spotting based on SMQ/HMM. They have applied ordinary diphone models to a speaker-independent, large-vocabulary word recognition unit. However, because users are apt to add words and/or extraneous speech, accuracy degrades due to the mismatch of models at word-boundaries. The IDM represents a transition from the preceding phonemes to a word or from a word to the succeeding phonemes. An experiment showed that the IDMs reduce error rates by about 5% for speech containing unknown words and extraneous speech. The experiment also showed that the proposed method ensured performance good enough for the practical use of a large-vocabulary isolated-word recognition system.
- Published
- 2002
42. A multimodal, keyword-based spoken dialogue system-MultiksDial
- Author
-
Hiroshi Matsuura, Tsuneo Nitta, Hiroyuki Kamio, Shinichi Tanaka, Yasuyuki Masai, and J. Iwasaki
- Subjects
Multimedia ,InformationSystems_INFORMATIONINTERFACESANDPRESENTATION(e.g.,HCI) ,Computer science ,business.industry ,Speech synthesis ,Usability ,Software prototyping ,computer.software_genre ,Multimodal interaction ,Human–computer interaction ,User interface ,Graphics ,business ,computer - Abstract
In this paper, a multimodal, keyword-based, spoken dialog system ("MultiksDial") is described. The system provides multiple input channels of spontaneous speech and designation by touch, as well as multiple output channels of graphics and voice responses by text-to-speech. The system also provides three types of sensors to detect the user's actions and to plan interactive strategies. A word spotter handles real-time word spotting on medium-sized vocabulary words. The multimodal interaction mechanism is evaluated on a directory guidance system. The experimental results showed comparative merits of usability against an ordinary touch-screen system. Multimodal input is especially effective for novice users. Cooperative guidance using sensors, graphics, and speech also helped novice users. A multimodal UI development tool has been developed for rapid prototyping of MultiksDial. >
- Published
- 2002
43. Confidence scoring for accurate HMM-based word recognition by using SM-based monophone score normalization
- Author
-
Tsuneo Nitta, Takaharu Sato, Takashi Fukuda, and Muhammad Ghulam
- Subjects
Normalization (statistics) ,Vocabulary ,Computer science ,business.industry ,Speech recognition ,media_common.quotation_subject ,Feature extraction ,Pattern recognition ,Word recognition ,Artificial intelligence ,Hidden Markov model ,business ,Classifier (UML) ,Utterance ,media_common - Abstract
In this paper, we propose a novel confidence scoring method that is applied to N-best hypotheses output from an HMM-based classifier. In the first pass of the proposed method, the HMM-based classifier with monophone models outputs N-best hypotheses and boundaries of all the monophones in the hypotheses. In the second pass, an SM(sub-space method)-based verifier tests the hypotheses by comparing confidence scores. We discuss how to convert a monophone similarity score of SM into a likelihood score, how to normalize the variations of acoustic quality in an utterance, and how to combine an HMM-based likelihood of word level and an SM-based likelihood of monophone level. In the experiments performed on speaker-independent word recognition, the proposed confidence scoring method significantly improves correct word recognition rate from 95.3% obtained by the standard HMM classifier to 98.0%.
- Published
- 2002
44. Web-based lecture system using slide sharing for classroom questions and answers
- Author
-
Yurie Iribe, Hiroaki Nagaoka, Tsuneo Nitta, and Katsurada Kouichi
- Subjects
Questions and answers ,Text chat ,Ajax ,Multimedia ,business.industry ,Computer science ,media_common.quotation_subject ,computer.software_genre ,Field (computer science) ,Presentation ,ComputingMilieux_COMPUTERSANDEDUCATION ,Web application ,The Internet ,Learning Management ,business ,computer ,media_common ,computer.programming_language - Abstract
One of today's hottest topics in the field of education is the effectiveness of Learning Management Systems have introduced text chat, bulletin boards into the classroom. However, these systems do not provide visual explanation, by drawing diagrams or numbers on the presentation slides. We developed a classroom lecture system that encourages teacher-to-student and student-to-student communication by means of sharing slides drawn notes using a digital pen. The teacher and students can confirm the explanation by sharing these slides. As a result, enhanced understanding is achieved through classroom questions and answers using both text and visual explanation.
- Published
- 2010
45. Representing dynamic features of phonetic segment in an orthogonalized codebook of HMM based speech recognition system
- Author
-
Tsuneo Nitta, Yasuyuki Masai, J. Iwasaki, and Hiroshi Matsuura
- Subjects
Vocabulary ,Fuzzy rule ,Computer science ,business.industry ,Quantization (signal processing) ,Speech recognition ,media_common.quotation_subject ,Speech coding ,Codebook ,Pattern recognition ,Artificial intelligence ,Hidden Markov model ,business ,media_common - Abstract
The authors propose a matrix quantization (MQ) algorithm named statistical MQ (SMQ) which uses an orthogonalized phonetic segment codebook. The SMQ effectively incorporates pattern variations of each phonetic segment into the orthogonalized phonetic segment codebook, and transforms an input speech to a sequence of phonetic symbols which include about 700 types of phonetic segments. The authors also propose a simple SMQ-HMM training algorithm called an equally counted K-based learning in which each phonetic event observed within the best K is equally counted in a model and output probabilities are smoothed without fuzzy rule. The proposed algorithm has been tested on a 546-word vocabulary data set uttered by 10 unknown speakers, using a real time recognition system, and has achieved the high performance of 96.5%. >
- Published
- 1992
46. Pattern recognition system and method using neural network
- Author
-
Tsuneo Nitta
- Subjects
Acoustics and Ultrasonics ,Basis (linear algebra) ,Artificial neural network ,business.industry ,Computation ,Pattern recognition ,Nonlinear system ,Section (category theory) ,Arts and Humanities (miscellaneous) ,Discriminant function analysis ,Product (mathematics) ,Artificial intelligence ,business ,Unit (ring theory) ,Mathematics - Abstract
An inner product computing unit computes inner products of an input pattern whose category is unknown, and orthogonalized dictionary sets of a plurality of reference patterns whose categories are known. A nonlinear converting unit nonlinearly converts the inner products in accordance with a positive-negative symmetrical nonlinear function. A neural network unit or a statistical discriminant function computing unit performs predetermined computations of the nonlinearly converted values on the basis of preset coefficients in units of categories using a neural network or a statistical discriminant function. A determining section compares values calculated in units of categories using the preset coefficients with each other to discriminate a category to which the input pattern belongs.
- Published
- 1994
47. Speech recognition apparatus and method utilizing an orthogonalized dictionary
- Author
-
Tsuneo Nitta
- Subjects
Acoustics and Ultrasonics ,Arts and Humanities (miscellaneous) ,business.industry ,Computer science ,Section (typography) ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Pattern recognition ,Filter (signal processing) ,Artificial intelligence ,Differential (infinitesimal) ,Base (topology) ,business ,Smoothing - Abstract
An orthogonalizing time filter section is arranged in place of a Gram Schmidt orthogonalizing section. The orthogonalizing time filter section is constituted by a plurality of filters for performing smoothing processing and differential processing. The orthogonalizing time filter section obtains an average pattern of acquired learning patterns, and smoothes the average pattern along the time base to obtain a dictionary of a first axis. The section differentiates the average pattern along the time base to obtain a dictionary of a second axis. The above processing is repeated for each category, thus generating an orthogonalized dictionary.
- Published
- 1993
48. Phoneme information extracting apparatus
- Author
-
Hideki Kasuya and Tsuneo Nitta
- Subjects
Acoustics and Ultrasonics ,business.industry ,Maximum correlation ,Value (computer science) ,Spectral density ,Pattern recognition ,Sound power ,Correlation ,Arts and Humanities (miscellaneous) ,Artificial intelligence ,business ,Selection (genetic algorithm) ,Mathematics ,Electronic circuit - Abstract
A phoneme information extracting apparatus includes correlation data generators for successively generating correlation data representing the correlation between the acoustic power spectrum data corresponding to input voice and power spectrum data of various reference phonemes, selection circuits for successively transferring these correlation data when they detect that three or more successive correlation data have values greater than a predetermined value, maximum data hold circuits for holding the maximum correlation data among the correlation data transferred from the respective selection circuits, and a phoneme determination circuit for determining the optimum phoneme by detecting one of the data hold circuits that is holding the maximum correlation data among the correlation data held in the data hold circuits.
- Published
- 1987
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.