Spectro-temporal modulation energy based mask for robust speaker identification.

Authors :: Chi, Tai-Shih
Lin, Ting-Han
Hsu, Chung-Chien
Source :: Journal of the Acoustical Society of America; May2012, Vol. 131 Issue 5, pEL368-EL374, 7p
Publication Year :: 2012
Abstract: Spectro-temporal modulations of speech encode speech structures and speaker characteristics. An algorithm which distinguishes speech from non-speech based on spectro-temporal modulation energies is proposed and evaluated in robust text-independent closed-set speaker identification simulations using the TIMIT and GRID corpora. Simulation results show the proposed method produces much higher speaker identification rates in all signal-to-noise ratio (SNR) conditions than the baseline system using mel-frequency cepstral coefficients. In addition, the proposed method also outperforms the system, which uses auditory-based nonnegative tensor cepstral coefficients [Q. Wu and L. Zhang, 'Auditory sparse representation for robust speaker recognition based on tensor structure,' EURASIP J. Audio, Speech, Music Process. 2008, 578612 (2008)], in low SNR (≤ 10 dB) conditions. [ABSTRACT FROM AUTHOR]

Subjects :: LECTURERS
SOUND waves
ACOUSTIC dispersion
ACOUSTICAL engineering

Full Text Access

Tools