Back to Search
Start Over
An investigation into the adaptability of a diffusion-based TTS model
- Publication Year :
- 2023
-
Abstract
- Given the recent success of diffusion in producing natural-sounding synthetic speech, we investigate how diffusion can be used in speaker adaptive TTS. Taking cues from more traditional adaptation approaches, we show that adaptation can be included in a diffusion pipeline using conditional layer normalization with a step embedding. However, we show experimentally that, whilst the approach has merit, such adaptation alone cannot approach the performance of Transformer-based techniques. In a second experiment, we show that diffusion can be optimally combined with Transformer, with the latter taking the bulk of the adaptation load and the former contributing to improved naturalness.
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2303.01849
- Document Type :
- Working Paper