Back to Search Start Over

Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006

Authors :
Frantisek Grezl
Martin Karafiat
Jan Cernocky
Ondrej Glembek
Petr Schwarz
Pavel Matejka
Niko Brümmer
D.A. van Leeuwen
Albert Strasheim
Lukas Burget
TNO Defensie en Veiligheid
Source :
IEEE Transactions on Audio, Speech, and Language Processing. 15:2072-2084
Publication Year :
2007
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2007.

Abstract

This paper describes and discusses the "STBU" speaker recognition system, which performed well in the NIST Speaker Recognition Evaluation 2006 (SRE). STBU is a consortium of four partners: Spescom DataVoice (Stellenbosch, South Africa), TNO (Soesterberg, The Netherlands), BUT (Brno, Czech Republic), and the University of Stellenbosch (Stellenbosch, South Africa). The STBU system was a combination of three main kinds of subsystems: 1) GMM, with short-time Mel frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) features, 2) Gaussian mixture model-support vector machine (GMM-SVM), using GMM mean supervectors as input to an SVM, and 3) maximum-likelihood linear regression-support vector machine (MLLR-SVM), using MLLR speaker adaptation coefficients derived from an English large vocabulary continuous speech recognition (LVCSR) system. All subsystems made use of supervector subspace channel compensation methods-either eigenchannel adaptation or nuisance attribute projection. We document the design and performance of all subsystems, as well as their fusion and calibration via logistic regression. Finally, we also present a cross-site fusion that was done with several additional systems from other NIST SRE-2006 participants. © 2006 IEEE.

Details

ISSN :
15587924 and 15587916
Volume :
15
Database :
OpenAIRE
Journal :
IEEE Transactions on Audio, Speech, and Language Processing
Accession number :
edsair.doi.dedup.....d19a32d9e3719b0aefdb6c67c0d0c219