Author: "Kida, Yusuke" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Kida, Yusuke"' showing total 40 results

Start Over Author "Kida, Yusuke"

40 results on '"Kida, Yusuke"'

1. Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition

Author: Zhao, Huaibo, Higuchi, Yosuke, Kida, Yusuke, Ogawa, Tetsuji, and Kobayashi, Tetsunori
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Achieving high accuracy with low latency has always been a challenge in streaming end-to-end automatic speech recognition (ASR) systems. By attending to more future contexts, a streaming ASR model achieves higher accuracy but results in larger latency, which hurts the streaming performance. In the Mask-CTC framework, an encoder network is trained to learn the feature representation that anticipates long-term contexts, which is desirable for streaming ASR. Mask-CTC-based encoder pre-training has been shown beneficial in achieving low latency and high accuracy for triggered attention-based ASR. However, the effectiveness of this method has not been demonstrated for various model architectures, nor has it been verified that the encoder has the expected look-ahead capability to reduce latency. This study, therefore, examines the effectiveness of Mask-CTCbased pre-training for models with different architectures, such as Transformer-Transducer and contextual block streaming ASR. We also discuss the effect of the proposed pre-training method on obtaining accurate output spike timing., Comment: Accepted to EUSIPCO 2023
Published: 2023

2. Neural Diarization with Non-autoregressive Intermediate Attractors

Author: Fujita, Yusuke, Komatsu, Tatsuya, Scheibler, Robin, Kida, Yusuke, and Ogawa, Tetsuji
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Sound
Abstract: End-to-end neural diarization (EEND) with encoder-decoder-based attractors (EDA) is a promising method to handle the whole speaker diarization problem simultaneously with a single neural network. While the EEND model can produce all frame-level speaker labels simultaneously, it disregards output label dependency. In this work, we propose a novel EEND model that introduces the label dependency between frames. The proposed method generates non-autoregressive intermediate attractors to produce speaker labels at the lower layers and conditions the subsequent layers with these labels. While the proposed model works in a non-autoregressive manner, the speaker labels are refined by referring to the whole sequence of intermediate labels. The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance. The proposed method with the deeper network benefits more from the intermediate labels, resulting in better performance and training throughput than EEND-EDA., Comment: ICASSP 2023
Published: 2023

3. Conversation-oriented ASR with multi-look-ahead CBS architecture

Author: Zhao, Huaibo, Fujie, Shinya, Ogawa, Tetsuji, Sakuma, Jin, Kida, Yusuke, and Kobayashi, Tetsunori
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: During conversations, humans are capable of inferring the intention of the speaker at any point of the speech to prepare the following action promptly. Such ability is also the key for conversational systems to achieve rhythmic and natural conversation. To perform this, the automatic speech recognition (ASR) used for transcribing the speech in real-time must achieve high accuracy without delay. In streaming ASR, high accuracy is assured by attending to look-ahead frames, which leads to delay increments. To tackle this trade-off issue, we propose a multiple latency streaming ASR to achieve high accuracy with zero look-ahead. The proposed system contains two encoders that operate in parallel, where a primary encoder generates accurate outputs utilizing look-ahead frames, and the auxiliary encoder recognizes the look-ahead portion of the primary encoder without look-ahead. The proposed system is constructed based on contextual block streaming (CBS) architecture, which leverages block processing and has a high affinity for the multiple latency architecture. Various methods are also studied for architecting the system, including shifting the network to perform as different encoders; as well as generating both encoders' outputs in one encoding pass., Comment: Submitted to ICASSP2023
Published: 2022

4. Tourist Guidance Robot Based on HyperCLOVA

Author: Yamazaki, Takato, Yoshikawa, Katsumasa, Kawamoto, Toshiki, Ohagi, Masaya, Mizumoto, Tomoya, Ichimura, Shuta, Kida, Yusuke, and Sato, Toshinori
Subjects: Computer Science - Computation and Language
Abstract: This paper describes our system submitted to Dialogue Robot Competition 2022. Our proposed system is a combined model of rule-based and generation-based dialog systems. The system utilizes HyperCLOVA, a Japanese foundation model, not only to generate responses but also summarization, search information, etc. We also used our original speech recognition system, which was fine-tuned for this dialog task. As a result, our system ranked second in the preliminary round and moved on to the finals., Comment: This paper is part of the proceedings of the Dialogue Robot Competition 2022
Published: 2022

5. InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

Author: Nakagome, Yu, Komatsu, Tatsuya, Fujita, Yusuke, Ichimura, Shuta, and Kida, Yusuke
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This paper proposes InterAug: a novel training method for CTC-based ASR using augmented intermediate representations for conditioning. The proposed method exploits the conditioning framework of self-conditioned CTC to train robust models by conditioning with "noisy" intermediate predictions. During the training, intermediate predictions are changed to incorrect intermediate predictions, and fed into the next layer for conditioning. The subsequent layers are trained to correct the incorrect intermediate predictions with the intermediate losses. By repeating the augmentation and the correction, iterative refinements, which generally require a special decoder, can be realized only with the audio encoder. To produce noisy intermediate predictions, we also introduce new augmentation: intermediate feature space augmentation and intermediate token space augmentation that are designed to simulate typical errors. The combination of the proposed InterAug framework with new augmentation allows explicit training of the robust audio encoders. In experiments using augmentations simulating deletion, insertion, and substitution error, we confirmed that the trained model acquires robustness to each error, boosting the speech recognition performance of the strong self-conditioned CTC baseline., Comment: This paper was submitted to INTERSPEECH2022
Published: 2022

6. Better Intermediates Improve CTC Inference

Author: Komatsu, Tatsuya, Fujita, Yusuke, Lee, Jaesong, Lee, Lukas, Watanabe, Shinji, and Kida, Yusuke
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This paper proposes a method for improved CTC inference with searched intermediates and multi-pass conditioning. The paper first formulates self-conditioned CTC as a probabilistic model with an intermediate prediction as a latent representation and provides a tractable conditioning framework. We then propose two new conditioning methods based on the new formulation: (1) Searched intermediate conditioning that refines intermediate predictions with beam-search, (2) Multi-pass conditioning that uses predictions of previous inference for conditioning the next inference. These new approaches enable better conditioning than the original self-conditioned CTC during inference and improve the final performance. Experiments with the LibriSpeech dataset show relative 3%/12% performance improvement at the maximum in test clean/other sets compared to the original self-conditioned CTC., Comment: 5 pages, submitted INTERSPEECH2022
Published: 2022

7. Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR

Author: Fujita, Yusuke, Komatsu, Tatsuya, and Kida, Yusuke
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: End-to-end automatic speech recognition directly maps input speech to characters. However, the mapping can be problematic when several different pronunciations should be mapped into one character or when one pronunciation is shared among many different characters. Japanese ASR suffers the most from such many-to-one and one-to-many mapping problems due to Japanese kanji characters. To alleviate the problems, we introduce explicit interaction between characters and syllables using Self-conditioned connectionist temporal classification (CTC), in which the upper layers are ``self-conditioned'' on the intermediate predictions from the lower layers. The proposed method utilizes character-level and syllable-level intermediate predictions as conditioning features to deal with mutual dependency between characters and syllables. Experimental results on Corpus of Spontaneous Japanese show that the proposed method outperformed the conventional multi-task and Self-conditioned CTC methods., Comment: SLT 2022
Published: 2022
Full Text: View/download PDF

8. Flame acceleration and detonation initiation around a T-shaped bifurcation

Author: Honda, Tomoaki, Ogawa, Syotaro, Kida, Yusuke, Kim, Wookyung, Johzaki, Tomoyuki, Yatsufusa, Tomoaki, and Endo, Takuma
Published: 2024
Full Text: View/download PDF

9. Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and Backward Transformers

Author: Kida, Yusuke, Komatsu, Tatsuya, and Togami, Masahito
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Machine Learning, Computer Science - Sound
Abstract: This paper proposes a novel label-synchronous speech-to-text alignment technique for automatic speech recognition (ASR). The speech-to-text alignment is a problem of splitting long audio recordings with un-aligned transcripts into utterance-wise pairs of speech and text. Unlike conventional methods based on frame-synchronous prediction, the proposed method re-defines the speech-to-text alignment as a label-synchronous text mapping problem. This enables an accurate alignment benefiting from the strong inference ability of the state-of-the-art attention-based encoder-decoder models, which cannot be applied to the conventional methods. Two different Transformer models named forward Transformer and backward Transformer are respectively used for estimating an initial and final tokens of a given speech segment based on end-of-sentence prediction with teacher-forcing. Experiments using the corpus of spontaneous Japanese (CSJ) demonstrate that the proposed method provides an accurate utterance-wise alignment, that matches the manually annotated alignment with as few as 0.2% errors. It is also confirmed that a Transformer-based hybrid CTC/Attention ASR model using the aligned speech and text pairs as an additional training data reduces character error rates relatively up to 59.0%, which is significantly better than 39.0% reduction by a conventional alignment method based on connectionist temporal classification model., Comment: Submitted to INTERSPEECH 2021
Published: 2021

10. Voice Activity Detection: Merging Source and Filter-based Information

Author: Drugman, Thomas, Stylianou, Yannis, Kida, Yusuke, and Akamine, Masami
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Voice Activity Detection (VAD) refers to the problem of distinguishing speech segments from background noise. Numerous approaches have been proposed for this purpose. Some are based on features derived from the power spectral density, others exploit the periodicity of the signal. The goal of this paper is to investigate the joint use of source and filter-based features. Interestingly, a mutual information-based assessment shows superior discrimination power for the source-related features, especially the proposed ones. The features are further the input of an artificial neural network-based classifier trained on a multi-condition database. Two strategies are proposed to merge source and filter information: feature and decision fusion. Our experiments indicate an absolute reduction of 3% of the equal error rate when using decision fusion. The final proposed system is compared to four state-of-the-art methods on 150 minutes of data recorded in real environments. Thanks to the robustness of its source-related features, its multi-condition training and its efficient information fusion, the proposed system yields over the best state-of-the-art VAD a substantial increase of accuracy across all conditions (24% absolute on average).
Published: 2019
Full Text: View/download PDF

11. Speaker Selective Beamformer with Keyword Mask Estimation

Author: Kida, Yusuke, Tran, Dung, Omachi, Motoi, Taniguchi, Toru, and Fujita, Yuya
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Machine Learning, Computer Science - Sound
Abstract: This paper addresses the problem of automatic speech recognition (ASR) of a target speaker in background speech. The novelty of our approach is that we focus on a wakeup keyword, which is usually used for activating ASR systems like smart speakers. The proposed method firstly utilizes a DNN-based mask estimator to separate the mixture signal into the keyword signal uttered by the target speaker and the remaining background speech. Then the separated signals are used for calculating a beamforming filter to enhance the subsequent utterances from the target speaker. Experimental evaluations show that the trained DNN-based mask can selectively separate the keyword and background speech from the mixture signal. The effectiveness of the proposed method is also verified with Japanese ASR experiments, and we confirm that the character error rates are significantly improved by the proposed method for both simulated and real recorded test sets., Comment: Accepted by SLT2018
Published: 2018

12. Mask-CTC-Based Encoder Pre-Training for Streaming End-to-End Speech Recognition

Author: Zhao, Huaibo, primary, Higuchi, Yosuke, additional, Kida, Yusuke, additional, Ogawa, Tetsuji, additional, and Kobayashi, Tetsunori, additional
Published: 2023
Full Text: View/download PDF

13. Target Vocabulary Recognition Based on Multi-Task Learning with Decomposed Teacher Sequences

Author: Ito, Aoi, primary, Komatsu, Tatsuya, additional, Fujita, Yusuke, additional, and Kida, Yusuke, additional
Published: 2023
Full Text: View/download PDF

14. Neural Diarization with Non-Autoregressive Intermediate Attractors

Author: Fujita, Yusuke, primary, Komatsu, Tatsuya, additional, Scheibler, Robin, additional, Kida, Yusuke, additional, and Ogawa, Tetsuji, additional
Published: 2023
Full Text: View/download PDF

15. Conversation-Oriented ASR with Multi-Look-Ahead CBS Architecture

Author: Zhao, Huaibo, primary, Fujie, Shinya, additional, Ogawa, Tetsuji, additional, Sakuma, Jin, additional, Kida, Yusuke, additional, and Kobayashi, Tetsunori, additional
Published: 2023
Full Text: View/download PDF

16. Alternate Intermediate Conditioning with Syllable-Level and Character-Level Targets for Japanese ASR

Author: Fujita, Yusuke, primary, Komatsu, Tatsuya, additional, and Kida, Yusuke, additional
Published: 2023
Full Text: View/download PDF

17. Perception of the neighborhood environment and self rated health : A multilevel analysis of the Nagoya neighborhood environment and health study

Author: Kida, Yusuke and Sung, Woncheol
Published: 2021

18. InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

Author: Nakagome, Yu, primary, Komatsu, Tatsuya, additional, Fujita, Yusuke, additional, Ichimura, Shuta, additional, and Kida, Yusuke, additional
Published: 2022
Full Text: View/download PDF

19. Better Intermediates Improve CTC Inference

Author: Komatsu, Tatsuya, primary, Fujita, Yusuke, additional, Lee, Jaesong, additional, Lee, Lukas, additional, Watanabe, Shinji, additional, and Kida, Yusuke, additional
Published: 2022
Full Text: View/download PDF

20. “Neighborhood Effects” and Cities in Japan

Author: KAWANO, Eiji, primary, KIDA, Yusuke, additional, and HARADA, Ken, additional
Published: 2022
Full Text: View/download PDF

21. リニア開発主義の構造と主体Ⅱ

Author: KIDA, Yusuke, primary
Published: 2022
Full Text: View/download PDF

22. The determinants of the disparity of social network in the city: A community-level study in Nagoya

Author: Kida, Yusuke, Sung, Woncheol, and Kawamura, Noriyuki
Published: 2019

23. Simultaneous Detection and Localization of a Wake-Up Word Using Multi-Task Learning of the Duration and Endpoint

Author: Maekaku, Takashi, primary, Kida, Yusuke, additional, and Sugiyama, Akihiko, additional
Published: 2019
Full Text: View/download PDF

24. Speaker Selective Beamformer with Keyword Mask Estimation

Author: Kida, Yusuke, primary, Tran, Dung, additional, Omachi, Motoi, additional, Taniguchi, Toru, additional, and Fujita, Yuya, additional
Published: 2018
Full Text: View/download PDF

25. Small angle neutron scattering studies on structural inhomogeneities in polymer gels: irradiation cross-linked gels vs chemically cross-linked gels

Author: Norisuye, Tomohisa, Masui, Naoki, Kida, Yusuke, Ikuta, Daigo, Kokufuta, Etsuo, Ito, Shoji, Panyukov, Sergei, and Shibayama, Mitsuhiro
Published: 2002
Full Text: View/download PDF

26. The location of Mn (MnO: 2.0 wt%) in fluorapatite from Lavra da Golconda, near Governador Valadares, Minas Gerais, Brazil

Author: ARIMA, Hiroshi, primary, KIDA, Yusuke, additional, MIKOUCHI, Takashi, additional, and SUGIYAMA, Kazumasa, additional
Published: 2018
Full Text: View/download PDF

27. Fluidization of the Regime in Contemporary Urban Politics: A Political Sociological Study of Nagoya City

Author: KIDA, Yusuke
Published: 2014

28. Voice Activity Detection: Merging Source and Filter-based Information

Author: Drugman, Thomas, primary, Stylianou, Yannis, additional, Kida, Yusuke, additional, and Akamine, Masami, additional
Published: 2016
Full Text: View/download PDF

29. Rearranging the Urban Regime

Author: KIDA, Yusuke, primary
Published: 2016
Full Text: View/download PDF

30. Electoral Base of Reformist Leader in Modern Japanese City

Author: KIDA, Yusuke, primary
Published: 2013
Full Text: View/download PDF

31. APPARATUS, METHOD AND COMPUTER PROGRAM PRODUCT FOR FEATURE EXTRACTION

Author: Kida, Yusuke, primary and Masuko, Takashi, additional
Published: 2012
Full Text: View/download PDF

32. Using duration and pitch for mandarin digit string recognition

Author: Zhao, Rui, primary, Kida, Yusuke, additional, Yan, Xiang, additional, Ding, Pei, additional, and He, Lei, additional
Published: 2010
Full Text: View/download PDF

33. Robust F0 estimation based on log-time scale autocorrelation and its application to Mandarin tone recognition

Author: Kida, Yusuke, primary, Sakai, Masaru, additional, Masuko, Takashi, additional, and Kawamura, Akinori, additional
Published: 2009
Full Text: View/download PDF

34. Evaluation of voice activity detection by combining multiple features with weight adaptation

Author: Kida, Yusuke, primary and Kawahara, Tatsuya, additional
Published: 2006
Full Text: View/download PDF

35. 2A1-C14 Accuracy Evaluation of Visual Odometry Method from stereo Image Sequence

Author: Ozawa, Risa, primary, Kida, Yusuke, additional, Kagami, Satoshi, additional, and Mizoguchi, Hiroshi, additional
Published: 2006
Full Text: View/download PDF

36. Voice activity detection based on optimally weighted combination of multiple features

Author: Kida, Yusuke, primary and Kawahara, Tatsuya, additional
Published: 2005
Full Text: View/download PDF

37. 2A1-N-097 High-Speed 3D Environment Reconstruction using Visual Odometry from Sequential Depth Maps(Robot Vision 1,Mega-Integration in Robotics and Mechatronics to Assist Our Daily Lives)

Author: KIDA, Yusuke, primary, TAKAOKA, Yutaka, additional, KAGAMI, Satoshi, additional, MIZOGUCHI, Hiroshi, additional, and KANADE, Takeo, additional
Published: 2005
Full Text: View/download PDF

38. 2P1-S-049 Footstep Planning using 3D Map Reconstructed by Visual Odometry(Humanoid 4,Mega-Integration in Robotics and Mechatronics to Assist Our Daily Lives)

Author: Ozawa, Risa, primary, Takaoka, Yutaka, additional, Kida, Yusuke, additional, Chestnutt, Joel, additional, Kuffner, James, additional, Nishiwaki, Koichi, additional, Kagami, Ssatoshi, additional, Mizoguchi, Hiroshi, additional, and Inoue, Hirochika, additional
Published: 2005
Full Text: View/download PDF

39. Studies on Two Types of Built-in Inhomogeneities for Polymer Gels: Frozen Segmental Concentration Fluctuations and Spatial Distribution of Cross-Links

Author: Norisuye, Tomohisa, primary, Kida, Yusuke, additional, Masui, Naoki, additional, Tran-Cong-Miyata, Qui, additional, Maekawa, Yasunari, additional, Yoshida, Masaru, additional, and Shibayama, Mitsuhiro, additional
Published: 2003
Full Text: View/download PDF

40. Proton NMR study of the lowest-hydrogen-content molybdenum bronzeH0.26MoO3

Author: Kunitomo, Masakazu, primary, Kida, Yusuke, additional, Etoh, Rina, additional, Kohmoto, Toshiro, additional, Fukuda, Yukio, additional, Eda, Kazuo, additional, and Sotani, Noriyuki, additional
Published: 2001
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

40 results on '"Kida, Yusuke"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources