Author: "Tsuneo Nitta" / Topic: natural language processing - Searchworks@Jio Institute Digital Library Search Results

1. Using Reversed Sequences and Grapheme Generation Rules to Extend the Feasibility of a Phoneme Transition Network-Based Grapheme-to-Phoneme Conversion

Author: Tsuneo Nitta, Yurie Iribe, Seng Kheang, and Kouichi Katsurada
Subjects: business.industry, Computer science, Speech recognition, Grapheme, 02 engineering and technology, Transition network, computer.software_genre, 01 natural sciences, Artificial Intelligence, Hardware and Architecture, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Electrical and Electronic Engineering, business, 010301 acoustics, computer, Software, Natural language processing
Published: 2016

2. New Grapheme Generation Rules for Two-Stage Modelbased Grapheme-to-Phoneme Conversion

Author: Seng Kheang, Tsuneo Nitta, Kouichi Katsurada, and Yurie Iribe
Subjects: Consonant, Information Systems and Management, General Computer Science, business.industry, Computer science, Speech recognition, Grapheme, Speech synthesis, TK5101-6720, Information technology, Pronunciation, T58.5-58.64, computer.software_genre, Software, Vowel, Telecommunication, Artificial intelligence, Electrical and Electronic Engineering, Document retrieval, business, computer, Word (computer architecture), Natural language processing
Abstract: The precise conversion of arbitrary text into its corresponding phoneme sequence (grapheme-to-phoneme or G2P conversion) is implemented in speech synthesis and recognition, pronunciation learning software, spoken term detection and spoken document retrieval systems. Because the quality of this module plays an important role in the performance of such systems and many problems regarding G2P conversion have been reported, we propose a novel two-stage model-based approach, which is implemented using an existing weighted finite-state transducer-based G2P conversion framework, to improve the performance of the G2P conversion model. The first-stage model is built for automatic conversion of words to phonemes, while the second-stage model utilizes the input graphemes and output phonemes obtained from the first stage to determine the best final output phoneme sequence. Additionally, we designed new grapheme generation rules, which enable extra detail for the vowel and consonant graphemes appearing within a word. When compared with previous approaches, the evaluation results indicate that our approach using rules focusing on the vowel graphemes slightly improved the accuracy of the out-of-vocabulary dataset and consistently increased the accuracy of the in-vocabulary dataset.
Published: 2014

3. Task Estimation Using Latent Semantic Analysis of Visual Scenes and Spoken Words

Author: Tsuneo Nitta, Masashi Kimura, Shinta Sawada, Yurie Iribe, and Kouichi Katsurada
Subjects: Estimation, Thesaurus (information retrieval), Modality (human–computer interaction), Computer Networks and Communications, Computer science, business.industry, Latent semantic analysis, Applied Mathematics, Speech recognition, General Physics and Astronomy, computer.software_genre, Linear subspace, Task (project management), Image (mathematics), Identification (information), Signal Processing, Artificial intelligence, Electrical and Electronic Engineering, business, computer, Natural language processing
Abstract: SUMMARY In this paper, we propose a task estimation method based on multiple subspaces extracted from multimodal information of image objects in visual scenes and spoken words in dialogue appearing in the same task. The multiple subspaces are obtained by using latent semantic analysis (LSA). In the proposed method, a task vector composed of spoken words and the frequencies of image-object appearances are extracted first, and then similarities among the input task vector and reference subspaces of different tasks are compared. Experiments are conducted on the identification of game tasks. The experimental results show that the proposed method with multimodal information outperforms the method in which only the single modality of image or spoken dialogue is applied. The proposed method achieves accurate performance even if less spoken dialogue is applied.
Published: 2014

4. Solving the Phoneme Conflict in Grapheme-to-Phoneme Conversion Using a Two-Stage Neural Network-Based Approach

Author: Seng Kheang, Tsuneo Nitta, Kouichi Katsurada, and Yurie Iribe
Subjects: Artificial neural network, Computer science, business.industry, Speech recognition, American English, Phonetic transcription, Grapheme, Context (language use), Speech synthesis, Pronunciation, computer.software_genre, ComputingMethodologies_ARTIFICIALINTELLIGENCE, ComputingMethodologies_PATTERNRECOGNITION, Artificial Intelligence, Hardware and Architecture, Computer Vision and Pattern Recognition, Artificial intelligence, Electrical and Electronic Engineering, business, computer, Software, Word (computer architecture), Natural language processing
Abstract: SUMMARY To achieve high quality output speech synthesis systems, data-driven grapheme-to-phoneme (G2P) conversion is usually used to generate the phonetic transcription of out-of-vocabulary (OOV) words. To improve the performance of G2P conversion, this paper deals with the problem of conflicting phonemes, where an input grapheme can, in the same context, produce many possible output phonemes at the same time. To this end, we propose a two-stage neural network-based approach that converts the input text to phoneme sequences in the first stage and then predicts each output phoneme in the second stage using the phonemic information obtained. The first-stage neural network is fundamentally implemented as a many-to-many mapping model for automatic conversion of word to phoneme sequences, while the second stage uses a combination of the obtained phoneme sequences to predict the output phoneme corresponding to each input grapheme in a given word. We evaluate the performance of this approach using the American English words-based pronunciation dictionary known as the auto-aligned CMUDict corpus[1]. In terms of phoneme and word accuracy of the OOV words, on comparison with several proposed baseline approaches, the evaluation results show that our proposed approach improves on the previous one-stage neural network-based approach for G2P conversion. The results of comparison with another existing approach indicate that it provides higher phoneme accuracy but lower word accuracy on a general dataset, and slightly higher phoneme and word accuracy on a selection of words consisting of more than one phoneme
Published: 2014

5. Generation of CG Animations Based on Articulatory Features for Pronunciation Training

Author: Tsuneo Nitta, Kouichi Katsurada, Takuro Mori, and Yurie Iribe
Subjects: Computer science, business.industry, Speech recognition, Training (meteorology), Artificial intelligence, Pronunciation, computer.software_genre, business, computer, Computer animation, Natural language processing
Published: 2012

6. Learning Lexicons from Spoken Utterances Based on Statistical Model Selection

Author: Ryo Taguchi, Tsuneo Nitta, Mikio Nakano, Naoto Iwahashi, Takashi Nose, and Kotaro Funakoshi
Subjects: Computer science, business.industry, Model selection, Speech recognition, Acoustic model, Statistical model, Language acquisition, Object (computer science), Lexicon, computer.software_genre, Artificial Intelligence, Unsupervised learning, Artificial intelligence, business, computer, Software, Utterance, Natural language processing
Abstract: This paper proposes a method for the unsupervised learning of lexicons from pairs of a spoken utterance and an object as its meaning without any a priori linguistic knowledge other than a phoneme acoustic model. In order to obtain a lexicon, a statistical model of the joint probability of a spoken utterance and an object is learned based on the minimum description length principle. This model consists of a list of word phoneme sequences and three statistical models: the phoneme acoustic model, a word-bigram model, and a word meaning model. Experimental results show that the method can acquire acoustically, grammatically and semantically appropriate words with about 85% phoneme accuracy. Index Terms: Lexical learning, language acquisition, model selection.
Published: 2010

7. A Method for Keyword Extraction Using Retrieval Information from Students in Lectures

Author: Kouichi Katsurada, Shuji Shinohara, Hiroaki Kawashima, Yurie Iribe, and Tsuneo Nitta
Subjects: Information retrieval, business.industry, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Keyword extraction, computer.software_genre, Artificial Intelligence, Human–computer information retrieval, ComputingMilieux_COMPUTERSANDEDUCATION, Artificial intelligence, business, computer, Software, Natural language processing
Abstract: Recently, e-learning systems for self-learning with various types of retrieval functions have been developed. This paper describes a method for keyword extraction using retrieval information stored from many students through the retrieval functions. Firstly, we show that (1) teachers tend to consider technical terms as important, while students unfamiliar with the technical terms tend to retrieve the terms, therefore (2) there is a clear correlation between keywords extracted by the teachers and the retrieval words by the students. Secondly, we propse a method utilizing retrieval information from the students for keyword extraction, and show that the mehod can achieve quite better performance than a method extracting keywords using only lecture information.
Published: 2007

8. Efficient Learning of Word Meanings by Agents Using Biases Observed in Language Development of Children

Author: Masashi Kimura, Kouichi Katsurada, Shuji Shinohara, Satoshi Kodama, Yurie Iribe, Ryo Taguchi, and Tsuneo Nitta
Subjects: Spoken word, Distribution (number theory), business.industry, Computer science, Object (grammar), Conditional probability distribution, computer.software_genre, Symbol grounding, Artificial Intelligence, Feature (machine learning), Probability distribution, Artificial intelligence, business, computer, Software, Natural language processing, Word (group theory)
Abstract: Recently, studies on learning of word meanings by agents have begun. In these studies, a human shows objects to an agent and utters words such as ``red'' or ``box''. The agent finds out object's feature represented by each spoken word. In our method, firstly, the agent learns probability distribution p(x) and conditional probability distribution p(x|w), where x is an object feature and w is a word. If a word w does not represent a feature x, p(x) and p(x|w) will be almost same distribution because x is independent of w. This fact enables the agent to use distance between p(x) and p(x|w) when inferring which feature the word represents. Previous works also employ similar stochastic approaches to detect the feature. However, such approaches need a lot of examples to learn correct distributions.
Published: 2007

9. On Autonomous Coordination of Learning Biases by an Agent with a Vocabulary Learning Mechanism

Author: Kouichi Katsurada, Tsuneo Nitta, Takashi Hashimoto, Shuji Shinohara, and Ryo Taguchi
Subjects: Error-driven learning, Artificial Intelligence, Human–computer interaction, Computer science, business.industry, Artificial intelligence, computer.software_genre, business, computer, Vocabulary learning, Software, Natural language processing, Mechanism (sociology)
Published: 2007

10. Introducing articulatory anchor-point to ann training for corrective learning of pronunciation

Author: Tsuneo Nitta, Ryoko Hayashi, Yurie Iribe, Kouichi Katsurada, Silasak Manosavanh, and Chunyue Zhu
Subjects: business.industry, Computer science, Speech recognition, Animation, Pronunciation, computer.software_genre, ComputingMethodologies_PATTERNRECOGNITION, Artificial intelligence, Articulation (phonetics), business, Articulatory gestures, computer, Vocal tract, Computer animation, Natural language processing, Gesture
Abstract: We describe computer-assisted pronunciation training (CAPT) through the visualization of the articulatory gestures from learner's speech in this paper. Typical CAPT systems cannot indicate how the learner can correct his/her articulation. The proposed system enables the learner to study how to correct their pronunciation by comparing the wrongly pronounced gesture with a correctly pronounced gesture. In this system, a multi-layer neural network (MLN) is used to convert the learner's speech into the coordinates for a vocal tract using Magnetic Resonance Imaging data. Then, an animation is generated using the values of the vocal tract coordinates. Moreover, we improved the animations by introducing an anchor-point for a phoneme to MLN training. The new system could even generate accurate CG animations from the English speech by Japanese people in the experiment.
Published: 2013

11. Improvement of animated articulatory gesture extracted from speech for pronunciation training

Author: Chunyue Zhu, Silasak Manosavan, Yurie Iribe, Ryoko Hayashi, Kouichi Katsurada, and Tsuneo Nitta
Subjects: business.industry, Computer science, Place of articulation, Speech recognition, Animation, Pronunciation, computer.software_genre, Gesture recognition, Artificial intelligence, business, Articulation (phonetics), Articulatory gestures, computer, Vocal tract, Natural language processing, Computer animation
Abstract: Computer-assisted pronunciation training (CAPT) was introduced for language education in recent years. CAPT scores the learner's pronunciation quality and points out wrong phonemes by using speech recognition technology. However, although the learner can thus realize that his/her speech is different from the teacher's, the learner still cannot control the articulation organs to pronounce correctly. The learner cannot understand how to correct the wrong articulatory gestures precisely. We indicate these differences by visualizing a learner's wrong pronunciation movements and the correct pronunciation movements with CG animation. We propose a system for generating animated pronunciation by estimating a learner's pronunciation movements from his/her speech automatically. The proposed system maps speech to coordinate values that are needed to generate the animations by using multilayer perceptron neural networks (MLP). We use MRI data to generate smooth animated pronunciations. Additionally, we verify whether the vocal tract area and articulatory features are suitable as characteristics of pronunciation movement through experimental evaluation.
Published: 2012

12. Learning Physically Grounded Lexicons from Spoken Utterances

Author: Mikio Nakano, Ryo Taguchi, Kotaro Funakoshi, Takashi Nose, Naoto Iwahashi, and Tsuneo Nitta
Subjects: Service (systems architecture), Computer science, Order (business), business.industry, Robot, Artificial intelligence, computer.software_genre, business, Object (philosophy), computer, Natural language processing, Utterance, Word (computer architecture)
Abstract: Service robots must understand correspondence relationships between things in the real world and words in order to communicate with humans. For example, to understand the utterance, "Bring me an apple," the robot requires knowledge about the relationship between the word "apple" and visual features of the apple, such as color and shape. Robots perceive object features with physical sensors. However, developers of service robots cannot describe all knowledge in advance because such robots may be used in situations other than those the developers assumed. In particular, household robots have many opportunities to encounter unknown objects. Therefore, it is preferable that robots automatically learn physically grounded lexicons, which consist of phoneme sequences and meanings of words, through interactions with users.
Published: 2012

13. Generating animated pronunciation from speech through articulatory feature extraction

Author: Tsuneo Nitta, Silasak Manosavanh, Kouichi Katsurada, Chunyue Zhu, Ryoko Hayashi, and Yurie Iribe
Subjects: Thesaurus (information retrieval), business.industry, Computer science, Feature extraction, Artificial intelligence, Pronunciation, business, computer.software_genre, computer, Natural language processing
Published: 2011

14. Evaluation of fast spoken term detection using a suffix array

Author: Kouichi Katsurada, Tsuneo Nitta, Shigeki Teshima, Yurie Iribe, and Shinta Sawada
Subjects: law, business.industry, Computer science, Speech recognition, Suffix array, Artificial intelligence, computer.software_genre, business, computer, Natural language processing, law.invention, Term (time)
Published: 2011

15. Dialog Strategy Acquisition and Its Evaluation for Efficient Learning of Word Meanings by Agents

Author: Tsuneo Nitta, Kouichi Katsurada, and Ryo Taguchi
Subjects: Facial expression, business.industry, Mechanism (biology), computer.software_genre, ComputingMethodologies_ARTIFICIALINTELLIGENCE, Comprehension, Word meaning, Artificial intelligence, Dialog box, Psychology, business, computer, Word (computer architecture), Natural language processing
Abstract: In word meaning acquisition through interactions among humans and agents, the efficiency of the learning depends largely on the dialog strategies the agents have. This paper describes automatic acquisition of dialog strategies through interaction between two agents. In the experiments, two agents infer each other's comprehension level from its facial expressions and utterances to acquire efficient strategies. Q-learning is applied to a strategy acquisition mechanism. Firstly, experiments are carried out through the interaction between a mother agent, who knows all the word meanings, and a child agent with no initial word meaning. The experimental results showed that the mother agent acquires a teaching strategy, while the child agent acquires an asking strategy. Next, the experiments of interaction between a human and an agent are investigated to evaluate the acquired strategies. The results showed the effectiveness of both strategies of teaching and asking.
Published: 2006

16. A large vocabulary word recognition system based on syllable recognition and nonlinear word matching

Author: Hiroshi Kanazawa, S. Hirai, Hiroshi Matsuura, Yoichi Takebayashi, Tsuneo Nitta, and Hiroyuki Tsuboi
Subjects: Vocabulary, Matching (statistics), Computer science, business.industry, Speech recognition, media_common.quotation_subject, computer.software_genre, Task (project management), Nonlinear system, Pattern recognition (psychology), Word recognition, Artificial intelligence, Syllable, business, computer, Natural language processing, Word (computer architecture), media_common
Abstract: A practical Japanese large-vocabulary recognition system that is speaker-adaptive has been developed. The system has to notable features. The first is low-cost hardware which can realise the precise syllable recognition and the high speed speaker-adaptation using the Karhunen-Loeve expansion. The second is nonlinear word matching which deals with syllable addition and deletion, and reduces the restriction on acceptable utterances. The system has been applied to data entry systems such as train-station-names, family-names, and given-names input and also to a map data search task in which place names were verified. Recognition experiments have been carried out on a 2000-word vocabulary. The word-recognition rate was 93.7% for 1000 utterances by five male speakers. >
Published: 2003

17. Word-spotting based on inter-word and intra-word diphone models

Author: H. Matsu'ura, Tsuneo Nitta, Yasuyuki Masai, and Shinichi Tanaka
Subjects: Computer science, business.industry, Speech recognition, Logogen model, Word error rate, Diphone, Speaker recognition, computer.software_genre, Speech processing, Word recognition, Artificial intelligence, business, Hidden Markov model, computer, Word (computer architecture), Natural language processing
Abstract: The authors propose a precise but simple inter-word diphone model (IDM) for word-spotting based on SMQ/HMM. They have applied ordinary diphone models to a speaker-independent, large-vocabulary word recognition unit. However, because users are apt to add words and/or extraneous speech, accuracy degrades due to the mismatch of models at word-boundaries. The IDM represents a transition from the preceding phonemes to a word or from a word to the succeeding phonemes. An experiment showed that the IDMs reduce error rates by about 5% for speech containing unknown words and extraneous speech. The experiment also showed that the proposed method ensured performance good enough for the practical use of a large-vocabulary isolated-word recognition system.
Published: 2002

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

17 results on '"Tsuneo Nitta"'

1. Using Reversed Sequences and Grapheme Generation Rules to Extend the Feasibility of a Phoneme Transition Network-Based Grapheme-to-Phoneme Conversion

2. New Grapheme Generation Rules for Two-Stage Modelbased Grapheme-to-Phoneme Conversion

3. Task Estimation Using Latent Semantic Analysis of Visual Scenes and Spoken Words

4. Solving the Phoneme Conflict in Grapheme-to-Phoneme Conversion Using a Two-Stage Neural Network-Based Approach

5. Generation of CG Animations Based on Articulatory Features for Pronunciation Training

6. Learning Lexicons from Spoken Utterances Based on Statistical Model Selection

7. A Method for Keyword Extraction Using Retrieval Information from Students in Lectures

8. Efficient Learning of Word Meanings by Agents Using Biases Observed in Language Development of Children

9. On Autonomous Coordination of Learning Biases by an Agent with a Vocabulary Learning Mechanism

10. Introducing articulatory anchor-point to ann training for corrective learning of pronunciation

11. Improvement of animated articulatory gesture extracted from speech for pronunciation training

12. Learning Physically Grounded Lexicons from Spoken Utterances

13. Generating animated pronunciation from speech through articulatory feature extraction

14. Evaluation of fast spoken term detection using a suffix array

15. Dialog Strategy Acquisition and Its Evaluation for Efficient Learning of Word Meanings by Agents

16. A large vocabulary word recognition system based on syllable recognition and nonlinear word matching

17. Word-spotting based on inter-word and intra-word diphone models

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

17 results on '"Tsuneo Nitta"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources