Back to Search Start Over

Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training.

Authors :
Ahmad, Hawraz A.
Rashid, Tarik A.
Source :
Algorithms. Jul2024, Vol. 17 Issue 7, p292. 19p.
Publication Year :
2024

Abstract

Recent advancements in text-to-speech (TTS) models have aimed to streamline the two-stage process into a single-stage training approach. However, many single-stage models still lag behind in audio quality, particularly when handling Kurdish text and speech. There is a critical need to enhance text-to-speech conversion for the Kurdish language, particularly for the Sorani dialect, which has been relatively neglected and is underrepresented in recent text-to-speech advancements. This study introduces an end-to-end TTS model for efficiently generating high-quality Kurdish audio. The proposed method leverages a variational autoencoder (VAE) that is pre-trained for audio waveform reconstruction and is augmented by adversarial training. This involves aligning the prior distribution established by the pre-trained encoder with the posterior distribution of the text encoder within latent variables. Additionally, a stochastic duration predictor is incorporated to imbue synthesized Kurdish speech with diverse rhythms. By aligning latent distributions and integrating the stochastic duration predictor, the proposed method facilitates the real-time generation of natural Kurdish speech audio, offering flexibility in pitches and rhythms. Empirical evaluation via the mean opinion score (MOS) on a custom dataset confirms the superior performance of our approach (MOS of 3.94) compared with that of a one-stage system and other two-staged systems as assessed through a subjective human evaluation. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
19994893
Volume :
17
Issue :
7
Database :
Academic Search Index
Journal :
Algorithms
Publication Type :
Academic Journal
Accession number :
178696588
Full Text :
https://doi.org/10.3390/a17070292