Back to Search Start Over

Sequence-to-Sequence Emotional Voice Conversion With Strength Control

Authors :
Heejin Choi
Minsoo Hahn
Source :
IEEE Access, Vol 9, Pp 42674-42687 (2021)
Publication Year :
2021
Publisher :
IEEE, 2021.

Abstract

This paper proposes an improved emotional voice conversion (EVC) method with emotional strength and duration controllability. EVC methods without duration mapping generate emotional speech with identical duration to that of the neutral input speech. In reality, even the same sentences would have different speeds and rhythms depending on the emotions. To solve this, the proposed method adopts a sequence-to-sequence network with an attention module that enables the network to learn attention in the neutral input sequence should be focused on which part of the emotional output sequence. Besides, to capture the multi-attribute aspects of emotional variations, an emotion encoder is designed for transforming acoustic features into emotion embedding vectors. By aggregating the emotion embedding vectors for each emotion, a representative vector for the target emotion is obtained and weighted to reflect emotion strength. By introducing a speaker encoder, the proposed method can preserve speaker identity even after the emotion conversion. Objective and subjective evaluation results confirm that the proposed method is superior to other previous works. Especially, in emotion strength control, we achieve in getting successful results.

Details

Language :
English
ISSN :
21693536
Volume :
9
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.1eb9aa8ea3ee4c1ca29f2f2597ef2517
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2021.3065460