Back to Search Start Over

A real-time French text-to-speech system generating high-quality synthetic speech

Authors :
Christel Sorin
F. Marty
L. Le Faucheur
D. Larreur
F. Emerard
Eric Moulines
J.L. Le Saint Milon
Francis Charpentier
Source :
ICASSP
Publication Year :
2002
Publisher :
IEEE, 2002.

Abstract

The main features of the CNET diphone-based text-to-speech system for French language are described. The linguistic analysis works in three steps. First, a morphosyntactic analysis module assigns a grammatical value to each word in the text and transcribes it phonetically. A second module parses the text into hierarchical syntactico-prosodic groups. Finally, prosodic patterns are automatically assigned to each word by queries to a database of prosodic events. The phonetic and prosodic information serves as commands to the synthesis component. The synthesis component is based on diphone concatenation. A time-domain formulation of the pitch-synchronous overlap-add scheme (TD-PSOLA) is used to modify the speech prosody and to concatenate diphone waveforms. It is combined with a low bit-rate speech decoder to reduce the memory requirement for storing the diphone inventory. The system runs in real time on a PC equipped with a TMS320C25 DSP board and provides notably improved sound quality and naturalness in comparison to commercially available systems. >

Details

Database :
OpenAIRE
Journal :
International Conference on Acoustics, Speech, and Signal Processing
Accession number :
edsair.doi...........8abdfb63df675013846ce34d8f039b62
Full Text :
https://doi.org/10.1109/icassp.1990.115650