Back to Search Start Over

Statistical model training technique based on speaker clustering approach for HMM-based speech synthesis.

Authors :
Ijima, Yusuke
Miyazaki, Noboru
Mizuno, Hideyuki
Sakauchi, Sumitaka
Source :
Speech Communication. Jul2015, Vol. 71, p50-61. 12p.
Publication Year :
2015

Abstract

This paper proposes an average voice model training technique based on a speaker clustering approach to generate synthetic speech with enhanced similarity to the target speakers’ speech. A novel point of the proposed technique is the use of the speaker characteristics (called “speaker class”), which are obtained from unsupervised clustering, as the additional contextual factor for the average voice based speech synthesis. In the model training process, first, speaker clustering is performed for all speakers used for model training to obtain the speaker class for each speaker. The average voice model with multiple speaker characteristics is trained by using the obtained speaker class. For the speaker adaptation and speech parameter generation, the speaker class of the target speaker is estimated on the basis of the Euclidean distance between the centroids of each cluster and the target speaker’s feature. The use of the estimated speaker class makes it possible to utilize the model parameters that have speaker characteristics similar to those of the target speaker for speaker adaptation and speech parameter generation. The results of objective and subjective experiments indicated the proposed technique can synthesize speech with improved similarity and naturalness. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01676393
Volume :
71
Database :
Academic Search Index
Journal :
Speech Communication
Publication Type :
Academic Journal
Accession number :
102696142
Full Text :
https://doi.org/10.1016/j.specom.2015.04.003