1. Neural i-vectors
- Author
-
Kong Aik Lee, Ville Vestman, and Tomi Kinnunen
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,0303 health sciences ,business.industry ,Computer science ,Pattern recognition ,Mixture model ,Speaker recognition ,01 natural sciences ,Machine Learning (cs.LG) ,Extractor ,03 medical and health sciences ,Audio and Speech Processing (eess.AS) ,Bundle ,0103 physical sciences ,FOS: Electrical engineering, electronic engineering, information engineering ,Embedding ,Artificial intelligence ,Layer (object-oriented design) ,010306 general physics ,business ,Sufficient statistic ,Generative grammar ,Electrical Engineering and Systems Science - Audio and Speech Processing ,030304 developmental biology - Abstract
Deep speaker embeddings have been demonstrated to outperform their generative counterparts, i-vectors, in recent speaker verification evaluations. To combine the benefits of high performance and generative interpretation, we investigate the use of deep embedding extractor and i-vector extractor in succession. To bundle the deep embedding extractor with an i-vector extractor, we adopt aggregation layers inspired by the Gaussian mixture model (GMM) to the embedding extractor networks. The inclusion of GMM-like layer allows the discriminatively trained network to be used as a provider of sufficient statistics for the i-vector extractor to extract what we call neural i-vectors. We compare the deep embeddings to the proposed neural i-vectors on the Speakers in the Wild (SITW) and the Speaker Recognition Evaluation (SRE) 2018 and 2019 datasets. On the core-core condition of SITW, our deep embeddings obtain performance comparative to the state-of-the-art. The neural i-vectors obtain about 50% worse performance than the deep embeddings, but on the other hand outperform the previous i-vector approaches reported in the literature by a clear margin., Comment: Accepted to Odyssey 2020: The Speaker and Language Recognition Workshop. Version 2 (bugfix)
- Published
- 2020
- Full Text
- View/download PDF