End-to-End Residual CNN with L-GM Loss Speaker Verification System

Authors :: Mengyao Zhu
Xuan Shi
Xingjian Du
Source :: DSL
Publication Year :: 2018
Publisher :: arXiv, 2018.
Abstract: We propose an end-to-end speaker verification system based on the neural network and trained by a loss function with less computational complexity. The end-to-end speaker verification system in this paper consists of a ResNet architecture to extract features from utterance, then produces utterance-level speaker embeddings, and train using the large-margin Gaussian Mixture loss function. Influenced by the large-margin and likelihood regularization, large-margin Gaussian Mixture loss function benefits the speaker verification performance. Experimental results demonstrate that the Residual CNN with large-margin Gaussian Mixture loss outperforms DNN-based i-vector baseline by more than 10% improvement in accuracy rate.<br />Comment: 5 pages. arXiv admin note: text overlap with arXiv:1803.02988, arXiv:1705.02304, arXiv:1706.08612 by other authors