Back to Search Start Over

Speaker-adaptive-trainable Boltzmann machine and its application to non-parallel voice conversion

Authors :
Toru Nakashika
Yasuhiro Minami
Source :
EURASIP Journal on Audio, Speech, and Music Processing, Vol 2017, Iss 1, Pp 1-10 (2017)
Publication Year :
2017
Publisher :
SpringerOpen, 2017.

Abstract

Abstract In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. Voice conversion is a technique where only speaker-specific information in the source speech is converted while keeping the phonological information unchanged. Most of the existing VC methods rely on parallel data—pairs of speech data from the source and target speakers uttering the same sentences. However, the use of parallel data in training causes several problems: (1) the data used for the training is limited to the pre-defined sentences, (2) the trained model is only applied to the speaker pair used in the training, and (3) a mismatch in alignment may occur. Although it is generally preferable in VC to not use parallel data, a non-parallel approach is considered difficult to learn. In our approach, we realize the non-parallel training based on speaker-adaptive training (SAT). Speech signals are represented using a probabilistic model based on the Boltzmann machine that defines phonological information and speaker-related information explicitly. Speaker-independent (SI) and speaker-dependent (SD) parameters are simultaneously trained using SAT. In the conversion stage, a given speech signal is decomposed into phonological and speaker-related information, the speaker-related information is replaced with that of the desired speaker, and then voice-converted speech is obtained by combining the two. Our experimental results showed that our approach outperformed the conventional non-parallel approach regarding objective and subjective criteria.

Details

Language :
English
ISSN :
16874722
Volume :
2017
Issue :
1
Database :
Directory of Open Access Journals
Journal :
EURASIP Journal on Audio, Speech, and Music Processing
Publication Type :
Academic Journal
Accession number :
edsdoj.0d5fd99fdb6c473faa1308fe95508165
Document Type :
article
Full Text :
https://doi.org/10.1186/s13636-017-0112-6