1. Phoneme Recognition in Korean Singing Voices Using Self-Supervised English Speech Representations.
- Author
-
Wu, Wenqin and Lee, Joonwhoan
- Subjects
HUMAN voice ,SPEECH ,KOREAN language ,ERROR rates ,ENGLISH language ,SPEECH perception ,AUTOMATIC speech recognition - Abstract
In general, it is difficult to obtain a huge, labeled dataset for deep learning-based phoneme recognition in singing voices. Studying singing voices also offers inherent challenges, compared to speech, because of the distinct variations in pitch, duration, and intensity. This paper proposes a detouring method to overcome this insufficient dataset, and applies it to the recognition of Korean phonemes in singing voices. The method started with pre-training the HuBERT, a self-supervised speech representation model, on a large-scale English corpus. The model was then adapted to the Korean speech domain with a relatively small-scale Korean corpus, in which the Korean phonemes were interpreted as similar English ones. Finally, the speech-adapted model was again trained with a tiny-scale Korean singing voice corpus for speech–singing adaptation. In the final adaptation, melodic supervision was chosen, which utilizes pitch information to improve the performance. For evaluation, the performance on multi-level error rates based on Word Error Rate (WER) was taken. Using the HuBERT-based transfer learning for adaptation improved the phoneme-level error rate of Korean speech by as much as 31.19%. Again, on singing voices by melodic supervision, it improved the rate by 0.55%. The significant improvement in speech recognition underscores the considerable potential of a model equipped with general human voice representations captured from the English corpus that can improve phoneme recognition on less target speech data. Moreover, the musical variation in singing voices is beneficial for phoneme recognition in singing voices. The proposed method could be applied to the phoneme recognition of other languages that have less speech and singing voice corpora. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF