Affect-insensitive speaker recognition systems via emotional speech clustering using prosodic features.

Authors :: Li, Dongdong
Yuan, Yubo
Wu, Zhaohui
Yang, Yingchun
Source :: Neural Computing & Applications; Feb2015, Vol. 26 Issue 2, p473-484, 12p
Publication Year :: 2015
Abstract: Voice-based biometric security systems involving only neutral speech have achieved promising performance. However, the speakers are very likely to fail the recognition when the test data exhibit multiple emotions. This paper aimed to address the mismatch of the emotional states between training and testing speech. We discuss different modeling strategies that incorporate the emotions (affects) of speakers into the training stage of a Mandarin-based speaker recognition system and propose an alternative approach, which could optimize the utilization of the limited affective speech. The training speeches are partitioned and clustered by the trends of the prosodic variations. Multiple models are built based on the clustered speech for a given speaker. The prosodic differences are characterized by a combination of features that describe the changes of the fundamental frequencies and energy contours. The experiments were carried out based on the Mandarin Affective Speech Corpus. The result shows 73.37 % improvement in recognition rate over that of the traditional speaker verification tasks relatively and also achieves 63.53 % higher in performance over the structural training-based systems relatively. [ABSTRACT FROM AUTHOR]

Subjects :: VOICEPRINTS
AUTOMATIC speech recognition
FEATURE extraction
SYSTEMS theory
FUZZY clustering technique
BIOMETRIC identification

Full Text Access

Tools