Author: "Jaejin Cho" / Publisher: ieee - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Jaejin Cho"' showing total 2 results

Start Over Author "Jaejin Cho" Publisher ieee

2 results on '"Jaejin Cho"'

1. Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios

Author: Jesús Villalba, Piotr Zelasko, Jaejin Cho, and Najim Dehak
Subjects: Correlation, Sequence, Training set, Computer science, Speech recognition, Sampling (statistics), Embedding, Spectrogram, Decoding methods, Identity (music)
Abstract: Text-to-speech (TTS) models trained to minimize the spectrogram reconstruction loss can learn speaker embeddings without explicit speaker identity supervision, unlike x-vector speaker identification (SID) systems. Leveraging this way of speaker embedding learning can be useful in unsupervised or semi-supervised scenarios where non, or only some, of the training data have speaker labels. Thus, in this paper, we evaluate speaker embeddings learned by training the spectrogram prediction network under unsupervised and semi-supervised scenarios. We experimented with different data sampling strategies. The best one was sampling two different segments from the same utterance, namely A and B, where the spectrogram of B is predicted given the B phone sequence and the speaker embedding extracted from A. This method improved by 3.4% relative in EER, compared to using the same utterance for both A and B without segmenting. In the unsupervised scenario, the best speaker embedding outperformed i-vectors, the state-of-the-art unsupervised speaker embedding, in speaker verification by 12.9% relative in EER. We observed high correlation between reconstruction loss and speaker embedding quality. In the semi-supervised scenario, having more unlabeled data in training led to a better performance in speaker verification. Adding 5314 unlabeled speakers to 800 labeled speakers improved EER by 10.8 % relative.
Published: 2021

2. Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition

Author: Najim Dehak, Hirofumi Inaguma, Takaaki Hori, Murali Karthick Baskar, Shinji Watanabe, Jesús Villalba, and Jaejin Cho
Subjects: FOS: Computer and information sciences, Scheme (programming language), Sound (cs.SD), Computer science, Speech recognition, Inference, Computer Science - Sound, Set (abstract data type), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Language model, State (computer science), Transfer of learning, computer, Decoding methods, Electrical Engineering and Systems Science - Audio and Speech Processing, computer.programming_language
Abstract: In this paper, we explore several new schemes to train a seq2seq model to integrate a pre-trained LM. Our proposed fusion methods focus on the memory cell state and the hidden state in the seq2seq decoder long short-term memory (LSTM), and the memory cell state is updated by the LM unlike the prior studies. This means the memory retained by the main seq2seq would be adjusted by the external LM. These fusion methods have several variants depending on the architecture of this memory cell update and the use of memory cell and hidden states which directly affects the final label inference. We performed the experiments to show the effectiveness of the proposed methods in a mono-lingual ASR setup on the Librispeech corpus and in a transfer learning setup from a multilingual ASR (MLASR) base model to a low-resourced language. In Librispeech, our best model improved WER by 3.7%, 2.4% for test clean, test other relatively to the shallow fusion baseline, with multi-level decoding. In transfer learning from an MLASR base model to the IARPA Babel Swahili model, the best scheme improved the transferred model on eval set by 9.9%, 9.8% in CER, WER relatively to the 2-stage transfer baseline., 4 pages, 1 figure, 5 tables, submitted to ICASSP 2019
Published: 2019

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

2 results on '"Jaejin Cho"'

1. Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios

2. Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

2 results on '"Jaejin Cho"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources