Back to Search Start Over

Automatic pause marking for speech synthesis

Authors :
S. R. M. Prasanna
Nagaraj Adiga
Bidisha Sharma
Sanasam Ranbir Singh
Loitongbam Gyanendro Singh
Source :
TENCON 2017 - 2017 IEEE Region 10 Conference.
Publication Year :
2017
Publisher :
IEEE, 2017.

Abstract

Accurate detection of pause boundary plays a major role in the segmentation of the speech corpus and improving the quality of speech synthesis. For pause modelling, we need to have pause tags in the training sentences. Manual tagging of pause is accurate but have the possibilities of missing out due to human error, and it is time-consuming. In this work, an automatic approach for marking the pause in the training corpus is proposed. During the training phase, after every word explicit pause (PAU) tags are added to represent a pause. Then, models for all phones including PAU are trained and re-alignment is performed. During re-alignment, each PAU boundary is corrected using three speech specific features namely, modulation spectrum energy, spectral peaks energy, and strength of excitation. The proposed approach gives a better result as compared to manual pause marking with less time complexity. It also improves the overall segmentation accuracy. The tagged label files are used for developing text-to-speech synthesis system using Hidden Markov Model based speech synthesis framework. Subjective evaluation is performed for various approaches used in tagging the pause. The experimental evaluation shows that accurate pause marking plays an important factor for improving the quality of synthesized speech in terms of naturalness and intelligibility.

Details

Database :
OpenAIRE
Journal :
TENCON 2017 - 2017 IEEE Region 10 Conference
Accession number :
edsair.doi...........26252eff14f99e472450f43dfc5780ec
Full Text :
https://doi.org/10.1109/tencon.2017.8228148