A real-time French text-to-speech system generating high-quality synthetic speech

Authors :: Christel Sorin
F. Marty
L. Le Faucheur
D. Larreur
F. Emerard
Eric Moulines
J.L. Le Saint Milon
Francis Charpentier
Source :: ICASSP
Publication Year :: 2002
Publisher :: IEEE, 2002.
Abstract: The main features of the CNET diphone-based text-to-speech system for French language are described. The linguistic analysis works in three steps. First, a morphosyntactic analysis module assigns a grammatical value to each word in the text and transcribes it phonetically. A second module parses the text into hierarchical syntactico-prosodic groups. Finally, prosodic patterns are automatically assigned to each word by queries to a database of prosodic events. The phonetic and prosodic information serves as commands to the synthesis component. The synthesis component is based on diphone concatenation. A time-domain formulation of the pitch-synchronous overlap-add scheme (TD-PSOLA) is used to modify the speech prosody and to concatenate diphone waveforms. It is combined with a low bit-rate speech decoder to reduce the memory requirement for storing the diphone inventory. The system runs in real time on a PC equipped with a TMS320C25 DSP board and provides notably improved sound quality and naturalness in comparison to commercially available systems. >

Subjects :: business.industry
Computer science
Speech recognition
Concatenation
Speech corpus
Speech synthesis
Diphone
computer.software_genre
Speech processing
ComputingMethodologies_PATTERNRECOGNITION
MBROLA
Artificial intelligence
Sound quality
business
Prosody
computer
Natural language processing
Word (computer architecture)

Database :: OpenAIRE
Journal :: International Conference on Acoustics, Speech, and Signal Processing
Accession number :: edsair.doi...........8abdfb63df675013846ce34d8f039b62
Full Text :: https://doi.org/10.1109/icassp.1990.115650

Full Text Access

Tools