Footprint reduction of Concatenative Text-To-Speech synthesizers using polynomial temporal decomposition

Authors :: David Malah
Tamar Shoham
Slava Shechtman
Source :: 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP).
Publication Year :: 2010
Publisher :: IEEE, 2010.
Abstract: High quality low footprint Concatenative Text-To-Speech (CTTS) synthesizers provide a persistent challenge in the field of speech processing. The spectral parameters representing the short speech segments used in the concatenation process constitute a large portion of the required memory. In this paper we propose to use a vectorial form of Polynomial Temporal Decomposition combined with jointly optimal segmentation and polynomial order selection in order to reduce the storage required for the spectral amplitude parameters by 50%, while preserving the perceptual quality of the obtained synthesized speech.

Subjects :: Polynomial
Computer science
business.industry
Speech recognition
Concatenation
Speech synthesis
Pattern recognition
Speech processing
computer.software_genre
Wavelet packet decomposition
Reduction (complexity)
Mel-frequency cepstrum
Artificial intelligence
Hidden Markov model
business
computer

Database :: OpenAIRE
Journal :: 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP)
Accession number :: edsair.doi...........58f244f63884239acfee907b2e4141f9
Full Text :: https://doi.org/10.1109/isccsp.2010.5463316

Full Text Access

Tools