Author: "Tohru Nagano" / Publisher: ieee - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Tohru Nagano"' showing total 4 results

Start Over Author "Tohru Nagano" Publisher ieee

4 results on '"Tohru Nagano"'

1. Data Augmentation Based on Vowel Stretch for Improving Children's Speech Recognition

Author: Gakuto Kurata, Takashi Fukuda, Tohru Nagano, and Masayuki Suzuki
Subjects: Speech disfluency, Computer science, Vowel, Speech recognition, Prolongation, Acoustic model, Perturbation method, Spontaneous speech
Abstract: Prolongation is a speech disfluency that lengthens some portions of speech utterances. It is frequently observed in children's spontaneous speech, while it is rare in read speech. To make acoustic models more robust to children's spontaneous speech, collecting a large amount of children's speech data containing prolongation is usually required, which is very impractical in many cases. To tackle this problem, we propose a novel data augmentation method that virtually generates additional data by simulating prolongation. The method inserts pseudo frames into specific positions of speech utterances to simulate prolongation. The acoustic features of the inserted frames are calculated from the original frames on both sides. This is based on our analysis that many of vowels are actually stretched in children's spontaneous speech. Our proposed procedure can generate partially stretched utterances with low computational costs, unlike a conventional speed or tempo perturbation method that extends and shrinks entire utterances at a uniform rate. The effectiveness of the proposed method were confirmed with the experiments of acoustic model adaptations, in which our proposed method focusing on vowel stretch showed consistent improvement compared with conventional speed and tempo perturbation approach.
Published: 2019

2. Improvements to N-gram Language Model Using Text Generated from Neural Language Model

Author: Tohru Nagano, Nobuyasu Itoh, Masayuki Suzuki, Gakuto Kurata, and Samuel Thomas
Subjects: 030507 speech-language pathology & audiology, 03 medical and health sciences, n-gram, Recurrent neural network, Computer science, Speech recognition, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 02 engineering and technology, Language model, 0305 other medical science, Domain (software engineering)
Abstract: Although neural language models have emerged, n-gram language models are still used for many speech recognition tasks. This paper proposes four methods to improve n-gram language models using text generated from a recurrent neural network language model (RNNLM). First, we use multiple RNNLMs from different domains instead of a single RNNLM. The final n-gram language model is obtained by interpolating generated n-gram models from each domain. Second, we use subwords instead of words for RNNLM to reduce the out-of-vocabulary rate. Third, we generate text templates using an RNNLM for template-based data augmentation for named entities. Fourth, we use both forward RNNLM and backward RNNLM to generate text. We found that these four methods improved performance of speech recognition up to 4% relative in various tasks.
Published: 2019

3. Speech recognition robust against speech overlapping in monaural recordings of telephone conversations

Author: Masayuki Suzuki, Ryuki Tachibana, Gakuto Kurata, and Tohru Nagano
Subjects: 030507 speech-language pathology & audiology, 03 medical and health sciences, Range (music), Computer science, Speech recognition, media_common.quotation_subject, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Conversation, 02 engineering and technology, Monaural, 0305 other medical science, media_common
Abstract: Monaural (single-channel) recording is sometimes used for telephone conversations in call centers. Generally speaking, the accuracy of automatic speech recognition of a monaural recording is worse than that of the multi-channel recording of the same conversation where each speaker's voice is separately recorded. The major reason is that the recognition system fails not only at the overlapping segments where the voices of the multiple speakers overlap, but also at the neighboring segments surrounding the overlapping segments. In this paper, we tackle this problem by using a combination of garbage modeling and noise-robust monaural acoustic modeling. Our proposed method trains the models by making use of multi-channel recordings and transcripts, which are relatively easy to prepare than monaural recordings and transcripts. We present experimental results where the proposed methods reduced the error rates by approximately 3% relative to the baseline methods for both of GMM-HMM and CNN-HMM cases. Because the proposed method is quite simple, the proposed method is easy to deploy to wide range of ASR systems for monaural speech transcription.
Published: 2016

4. Improving phoneme and accent estimation by leveraging a dictionary for a stochastic TTS front-end

Author: Nobuyasu Itoh, Ryuki Tachibana, Masafumi Nishimura, and Tohru Nagano
Subjects: Vocabulary, Computer science, business.industry, Speech recognition, media_common.quotation_subject, Speech synthesis, computer.software_genre, Speech processing, Class (biology), Task (project management), Front and back ends, Stress (linguistics), Artificial intelligence, business, computer, Natural language processing, Word (computer architecture), media_common
Abstract: Determining the correct phonemes and pitch accents is important for creating natural Japanese speech. We implemented a TTS front-end system based on an n-gram model. However, the vocabulary of the word n-gram model is limited to the list of the words found in the training corpus, and collecting a very large training corpus is not an easy task. In this paper, we propose using an additional class n-gram model to incorporate not only the words found in the training corpus, but the words found in the dictionary to further improve the accuracy. In our experiments, our proposed model relatively improves the accuracy for estimating accents by 16.9% and the accuracy for estimating phonemes by 21.6% compared to the word n-gram model.
Published: 2008

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

4 results on '"Tohru Nagano"'

1. Data Augmentation Based on Vowel Stretch for Improving Children's Speech Recognition

2. Improvements to N-gram Language Model Using Text Generated from Neural Language Model

3. Speech recognition robust against speech overlapping in monaural recordings of telephone conversations

4. Improving phoneme and accent estimation by leveraging a dictionary for a stochastic TTS front-end

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

4 results on '"Tohru Nagano"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources