Start Over

Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition

Authors :: Raphaël Duroselle
Denis Jouvet
Irina Illina
Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH)
Inria Nancy - Grand Est
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr). This work has been partly funded bythe French Direction Générale de l'Armement.
Grid'5000
Source :: INTERSPEECH 2020, INTERSPEECH 2020, Oct 2020, Shangaï / Virtual, China, INTERSPEECH
Publication Year :: 2020
Publisher :: HAL CCSD, 2020.
Abstract: International audience; State-of-the-art language recognition systems are based on dis-criminative embeddings called x-vectors. Channel and gender distortions produce mismatch in such x-vector space where em-beddings corresponding to the same language are not grouped in an unique cluster. To control this mismatch, we propose to train the x-vector DNN with metric learning objective functions. Combining a classification loss with the metric learning n-pair loss allows to improve the language recognition performance. Such a system achieves a robustness comparable to a system trained with a domain adaptation loss function but without using the domain information. We also analyze the mismatch due to channel and gender, in comparison to language proximity, in the x-vector space. This is achieved using the Maximum Mean Discrepancy divergence measure between groups of x-vectors. Our analysis shows that using the metric learning loss function reduces gender and channel mismatch in the x-vector space, even for languages only observed on one channel in the train set.

Details

Language :: English
Database :: OpenAIRE
Journal :: INTERSPEECH 2020, INTERSPEECH 2020, Oct 2020, Shangaï / Virtual, China, INTERSPEECH
Accession number :: edsair.doi.dedup.....581f2567de731f191d1da86f2240fb33

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources