Descriptor: "self-supervised speech representation" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"self-supervised speech representation"' showing total 3 results

Start Over Descriptor "self-supervised speech representation"

3 results on '"self-supervised speech representation"'

1. End-to-End Image-to-Speech Generation for Untranscribed Unknown Languages

Author: Johanes Effendi, Sakriani Sakti, and Satoshi Nakamura
Subjects: Image-to-speech, image captioning, self-supervised speech representation, vector-quantized variational autoencoder, untranscribed unknown language, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Describing orally what we are seeing is a simple task we do in our daily life. However, in the natural language processing field, this simple task needs to be bridged by a textual modality that helps the system to generalize various objects in the image and various pronunciations in speech utterances. In this study, we propose an end-to-end Image2Speech system that does not need any textual information in its training. We use a vector-quantized variational autoencoder (VQ-VAE) model to learn the discrete representation of a speech caption in an unsupervised manner, where discrete labels are used by an image-captioning model. This self-supervised speech representation enables the Image2Speech model to be trained with the minimum amount of paired image-speech data while still maintaining the quality of the speech caption. Our experimental results with a multi-speaker natural speech dataset demonstrate our proposed text-free Image2Speech system’s performance close to the one with textual information. Furthermore, our approach also successfully outperforms the most recent existing frameworks with phoneme-based and grounding-based Image2Speech systems.
Published: 2021
Full Text: View/download PDF

2. A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS

Author: Wang, Siyang, Henter, Gustav Eje, Gustafsson, Joakim, Székely, Éva, Wang, Siyang, Henter, Gustav Eje, Gustafsson, Joakim, and Székely, Éva
Abstract: Recent work has explored using self-supervised learning (SSL) speech representations such as wav2vec2.0 as the representation medium in standard two-stage TTS, in place of conventionally used mel-spectrograms. It is however unclear which speech SSL is the better fit for TTS, and whether or not the performance differs between read and spontaneous TTS, the later of which is arguably more challenging. This study aims at addressing these questions by testing several speech SSLs, including different layers of the same SSL, in two-stage TTS on both read and spontaneous corpora, while maintaining constant TTS model architecture and training settings. Results from listening tests show that the 9th layer of 12-layer wav2vec2.0 (ASR finetuned) outperforms other tested SSLs and mel-spectrogram, in both read and spontaneous TTS. Our work sheds light on both how speech SSL can readily improve current TTS systems, and how SSLs compare in the challenging generative task of TTS. Audio examples can be found at https://www.speech.kth.se/tts-demos/ssr_tts, Part of ISBN 9798350302615QC 20230831
Published: 2023
Full Text: View/download PDF

3. A comparative study of self-supervised speech representationsin read and spontaneous TTS

Author: Wang, Siyang, Henter, Gustav Eje, Gustafsson, Joakim, Székely, Éva, Wang, Siyang, Henter, Gustav Eje, Gustafsson, Joakim, and Székely, Éva
Abstract: Recent work has explored using self-supervised learning(SSL) speech representations such as wav2vec2.0 as the rep-resentation medium in standard two-stage TTS, in place ofconventionally used mel-spectrograms. It is however unclearwhich speech SSL is the better fit for TTS, and whether ornot the performance differs between read and spontaneousTTS, the later of which is arguably more challenging. Thisstudy aims at addressing these questions by testing severalspeech SSLs, including different layers of the same SSL, intwo-stage TTS on both read and spontaneous corpora, whilemaintaining constant TTS model architecture and trainingsettings. Results from listening tests show that the 9th layerof 12-layer wav2vec2.0 (ASR finetuned) outperforms othertested SSLs and mel-spectrogram, in both read and sponta-neous TTS. Our work sheds light on both how speech SSL canreadily improve current TTS systems, and how SSLs comparein the challenging generative task of TTS. Audio examplescan be found at https://www.speech.kth.se/tts-demos/ssr tts, Accepted by the 2023 IEEE International Conference on Acoustics, Speech,and Signal Processing Workshops, 4-10 Jun 2023, Rhodes Island, GreeceQC 20230620, Digital Futures project Advanced Adaptive Intelligent Systems (AAIS), Swedish Research Council project Connected (VR-2019-05003), Swedish Research Council project Perception of speaker stance (VR-2020- 02396), Riksbankens Jubileumsfond project CAPTivating (P20-0298), Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation
Published: 2023

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

3 results on '"self-supervised speech representation"'

1. End-to-End Image-to-Speech Generation for Untranscribed Unknown Languages

2. A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS

3. A comparative study of self-supervised speech representationsin read and spontaneous TTS

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

3 results on '"self-supervised speech representation"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources