301. Discriminative subspace modeling of SNR and duration variabilities for robust speaker verification
- Author
-
Man-Wai Mak, Weiwei Lin, Jen-Tzung Chien, and Na Li
- Subjects
Propagation of uncertainty ,business.industry ,Computer science ,Speech recognition ,Probabilistic logic ,Pattern recognition ,02 engineering and technology ,Variational Bayesian methods ,Theoretical Computer Science ,Human-Computer Interaction ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Bayes' theorem ,Discriminative model ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,0305 other medical science ,business ,Latent variable model ,Software ,Subspace topology - Abstract
Model SNR and duration variability of i-vectors in discriminative subspaces.Use variational Bayesian methods to infer the latent variable model that defines the SNR and duration subspaces.Perform better than PLDA, SNR-invariant PLDA and PLDA with uncertainty propagation on long test utterances. Although i-vectors together with probabilistic LDA (PLDA) have achieved a great success in speaker verification, how to suppress the undesirable effects caused by the variability in utterance length and background noise level is still a challenge. This paper aims to improve the robustness of i-vector based speaker verification systems by compensating for the utterance-length variability and noise-level variability. Inspired by the recent findings that noise-level variability can be modeled by a signal-to-noise ratio (SNR) subspace and that duration variability can be modeled as additive noise in the i-vector space, we propose to add an SNR factor and a duration factor to the PLDA model. In this framework, we assume that i-vectors derived from utterances with comparable durations share similar duration-specific information and that i-vectors extracted from utterances within a narrow SNR range have similar SNR-specific information. Based on these assumptions, an i-vector can be represented as a linear combination of four components: speaker, SNR, duration, and channel. A variational Bayes algorithm is developed to infer this latent variable model via a discriminative subspace training procedure. In the testing stage, different variabilities are compensated for when computing the likelihood ratio. Experiments on Common Conditions 1 and 4 in NIST 2012 SRE show that the proposed model outperforms the conventional PLDA and SNR-invariant PLDA. Results also show that the proposed model performs better than the uncertainty-propagation PLDA (UP-PLDA) for long test utterances.
- Published
- 2017
- Full Text
- View/download PDF