Back to Search Start Over

PLDA inspired Siamese networks for speaker verification.

Authors :
Ramoji, Shreyas
Krishnan, Prashant
Ganapathy, Sriram
Source :
Computer Speech & Language. Nov2022, Vol. 76, pN.PAG-N.PAG. 1p.
Publication Year :
2022

Abstract

The deep learning methodologies in state-of-the-art speaker recognition systems are predominantly limited to the extraction of recording level embeddings. This is usually followed by generative modeling of the embeddings to output the verification score. In this paper, we explore a fully neural approach where the neural model outputs the verification score directly, given the acoustic feature inputs. This model, termed as Siamese neural network (SiamNN), combines the embedding extraction and back-end modeling into a single processing pipeline. The back-end modeling is achieved using a neural approach to PLDA modeling, called neural probabilistic linear discriminant analysis (NPLDA). In the NPLDA model, the verification score is computed as a discriminative similarity function. The development of the single neural SiamNN model allows the joint optimization of all the modules using a verification cost. Several speaker recognition experiments are performed using SITW, VOiCES, and NIST SRE datasets where the proposed SiamNN model is shown to significantly improve over the state-of-art x-vector PLDA baseline system (relative improvements of up to 35% in the primary cost metric). We also provide a detailed analysis of the influence of hyper-parameters, choice of loss functions, and data sampling strategies for training the model. In particular, we highlight that the proposed soft detection cost function based optimization improves over other loss functions considered. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
08852308
Volume :
76
Database :
Academic Search Index
Journal :
Computer Speech & Language
Publication Type :
Academic Journal
Accession number :
157301342
Full Text :
https://doi.org/10.1016/j.csl.2022.101383