Back to Search Start Over

Multi-resolution time frequency feature and complementary combination for short utterance speaker recognition.

Authors :
Li, Zhi-Yi
Zhang, Wei-Qiang
Liu, Jia
Source :
Multimedia Tools & Applications; Feb2015, Vol. 74 Issue 3, p937-953, 17p
Publication Year :
2015

Abstract

A human speaker recognition expert often observes the speech spectrogram in multiple different scales for speaker recognition, especially under the short utterance condition. Inspired by this action, this paper proposes a novel multi-resolution time frequency feature (MRTF) extraction method, which is obtained by performing a 2-Dimensional discrete cosine transform (DCT) in multi-scale on the time frequency spectrogram matrix and then selecting and combining to the final multi-scaled transformed elements. Compared to the traditional Mel-Frequency Cepstral Coefficient (MFCC) feature extraction, the proposed method can make better use of multi-resolution temporal-frequency information. Beyond this, we also proposed three complementary combination strategies of MFCC and MRTF: in feature level, in i-vector level and in score level. Comparing their performance. We found the best results are obtained by combination in i-vector level. In the three NIST 2008 Speaker Recognition Evaluation datasets, the proposed method is the most effective for improving the performance under short utterance than under long utterance. And after the combination, we can achieve an EER of 11.32 % and MinDCF of 0.054 in the 10sec-10sec trials on the male dataset, which is an absolute 3 % improvement of EER than the best reported result in this field. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13807501
Volume :
74
Issue :
3
Database :
Complementary Index
Journal :
Multimedia Tools & Applications
Publication Type :
Academic Journal
Accession number :
100953258
Full Text :
https://doi.org/10.1007/s11042-013-1705-4