Back to Search
Start Over
Multi-resolution time frequency feature and complementary combination for short utterance speaker recognition.
- Source :
- Multimedia Tools & Applications; Feb2015, Vol. 74 Issue 3, p937-953, 17p
- Publication Year :
- 2015
-
Abstract
- A human speaker recognition expert often observes the speech spectrogram in multiple different scales for speaker recognition, especially under the short utterance condition. Inspired by this action, this paper proposes a novel multi-resolution time frequency feature (MRTF) extraction method, which is obtained by performing a 2-Dimensional discrete cosine transform (DCT) in multi-scale on the time frequency spectrogram matrix and then selecting and combining to the final multi-scaled transformed elements. Compared to the traditional Mel-Frequency Cepstral Coefficient (MFCC) feature extraction, the proposed method can make better use of multi-resolution temporal-frequency information. Beyond this, we also proposed three complementary combination strategies of MFCC and MRTF: in feature level, in i-vector level and in score level. Comparing their performance. We found the best results are obtained by combination in i-vector level. In the three NIST 2008 Speaker Recognition Evaluation datasets, the proposed method is the most effective for improving the performance under short utterance than under long utterance. And after the combination, we can achieve an EER of 11.32 % and MinDCF of 0.054 in the 10sec-10sec trials on the male dataset, which is an absolute 3 % improvement of EER than the best reported result in this field. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 13807501
- Volume :
- 74
- Issue :
- 3
- Database :
- Complementary Index
- Journal :
- Multimedia Tools & Applications
- Publication Type :
- Academic Journal
- Accession number :
- 100953258
- Full Text :
- https://doi.org/10.1007/s11042-013-1705-4