Back to Search Start Over

Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control

Authors :
Watanabe, Aya
Takamichi, Shinnosuke
Saito, Yuki
Nakata, Wataru
Xin, Detai
Saruwatari, Hiroshi
Publication Year :
2023

Abstract

In text-to-speech, controlling voice characteristics is important in achieving various-purpose speech synthesis. Considering the success of text-conditioned generation, such as text-to-image, free-form text instruction should be useful for intuitive and complicated control of voice characteristics. A sufficiently large corpus of high-quality and diverse voice samples with corresponding free-form descriptions can advance such control research. However, neither an open corpus nor a scalable method is currently available. To this end, we develop Coco-Nut, a new corpus including diverse Japanese utterances, along with text transcriptions and free-form voice characteristics descriptions. Our methodology to construct this corpus consists of 1) automatic collection of voice-related audio data from the Internet, 2) quality assurance, and 3) manual annotation using crowdsourcing. Additionally, we benchmark our corpus on the prompt embedding model trained by contrastive speech-text learning.<br />Comment: Submitted to ASRU2023

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2309.13509
Document Type :
Working Paper