Back to Search
Start Over
Corpus design for expressive speech: impact of the utterance length
- Publication Year :
- 2020
- Publisher :
- HAL CCSD, 2020.
-
Abstract
- International audience; Voice corpus plays a crucial role in the quality of the synthetic speech generation, specially under a length constraint. Creating a new voice is costly and the recording script selection for an expressive TTS task is generally considered as an optimization problem in order to achieve a rich and parsimonious corpus. In order to vocalize a given book using a TTS system, we investigate four script selection approaches. Based on preliminary observations, we simply propose to select shortest utterances of the book and compare the achievements of this method with state of the art ones for two books, with different utterance lengths and styles, using two kinds of concatenation based TTS systems. The study of the TTS costs indicates that selecting the shortest utterances could result in better synthetic quality, which is confirmed by a perceptual test. By investigating usual criteria for corpus design in literature like unit coverage or distribution similarity of units, it turns out that they are not pertinent metrics in the framework of this study.
- Subjects :
- Optimization problem
Computer science
media_common.quotation_subject
Concatenation
Speech synthesis
02 engineering and technology
computer.software_genre
Task (project management)
0202 electrical engineering, electronic engineering, information engineering
Selection (linguistics)
Quality (business)
[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]
media_common
business.industry
020206 networking & telecommunications
voice corpus design
utterance length
Constraint (information theory)
ComputingMethodologies_PATTERNRECOGNITION
020201 artificial intelligence & image processing
Artificial intelligence
business
Text to speech
computer
Utterance
Natural language processing
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....a75ec0ee9ea87787f9846292c990ff44