Start Over

Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training.

Authors :: Ahmad, Hawraz A.
Rashid, Tarik A.
Source :: Algorithms. Jul2024, Vol. 17 Issue 7, p292. 19p.
Publication Year :: 2024
Abstract: Recent advancements in text-to-speech (TTS) models have aimed to streamline the two-stage process into a single-stage training approach. However, many single-stage models still lag behind in audio quality, particularly when handling Kurdish text and speech. There is a critical need to enhance text-to-speech conversion for the Kurdish language, particularly for the Sorani dialect, which has been relatively neglected and is underrepresented in recent text-to-speech advancements. This study introduces an end-to-end TTS model for efficiently generating high-quality Kurdish audio. The proposed method leverages a variational autoencoder (VAE) that is pre-trained for audio waveform reconstruction and is augmented by adversarial training. This involves aligning the prior distribution established by the pre-trained encoder with the posterior distribution of the text encoder within latent variables. Additionally, a stochastic duration predictor is incorporated to imbue synthesized Kurdish speech with diverse rhythms. By aligning latent distributions and integrating the stochastic duration predictor, the proposed method facilitates the real-time generation of natural Kurdish speech audio, offering flexibility in pitches and rhythms. Empirical evaluation via the mean opinion score (MOS) on a custom dataset confirms the superior performance of our approach (MOS of 3.94) compared with that of a one-stage system and other two-staged systems as assessed through a subjective human evaluation. [ABSTRACT FROM AUTHOR]

Subjects :: *DEEP learning
*TRANSFORMER models
*LATENT variables
*BASIC needs
*RHYTHM
*SPEECH synthesis

Details

Language :: English
ISSN :: 19994893
Volume :: 17
Issue :: 7
Database :: Academic Search Index
Journal :: Algorithms
Publication Type :: Academic Journal
Accession number :: 178696588
Full Text :: https://doi.org/10.3390/a17070292

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources