Back to Search Start Over

Current advances and algorithmic solutions in speech generation.

Authors :
Oralbekova, Dina
Mamyrbayev, Orken
Kassymova, Dinara
Othman, Mohamed
Source :
Vibroengineering Procedia. Apr2024, Vol. 54, p160-166. 7p.
Publication Year :
2024

Abstract

Currently, Text-to-Speech (TTS) technology, aimed at reproducing a natural human voice from text, is gaining increasing demand in natural language processing. Key criteria for evaluating the quality of synthesized sound include its clarity and naturalness, which largely depend on the accurate modeling of intonations using the acoustic model in the speech generation system. This paper presents fundamental methods such as concatenative and parametric speech synthesis, speech synthesis based on hidden Markov models, and deep learning approaches like end-to-end models for building the acoustic model. The article discusses metrics for evaluating the quality of synthesized voice. Brief overviews of modern text-to-speech architectures, such as WaveNet, Tacotron, and Deep Voice, applying deep learning and demonstrating quality ratings close to professionally recorded speech, are also provided. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
23450533
Volume :
54
Database :
Academic Search Index
Journal :
Vibroengineering Procedia
Publication Type :
Academic Journal
Accession number :
176731893
Full Text :
https://doi.org/10.21595/vp.2024.23940