Back to Search Start Over

Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition

Authors :
Raphaël Duroselle
Denis Jouvet
Irina Illina
Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH)
Inria Nancy - Grand Est
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr). This work has been partly funded bythe French Direction Générale de l'Armement.
Grid'5000
Source :
INTERSPEECH 2020, INTERSPEECH 2020, Oct 2020, Shangaï / Virtual, China, INTERSPEECH
Publication Year :
2020
Publisher :
HAL CCSD, 2020.

Abstract

International audience; State-of-the-art language recognition systems are based on dis-criminative embeddings called x-vectors. Channel and gender distortions produce mismatch in such x-vector space where em-beddings corresponding to the same language are not grouped in an unique cluster. To control this mismatch, we propose to train the x-vector DNN with metric learning objective functions. Combining a classification loss with the metric learning n-pair loss allows to improve the language recognition performance. Such a system achieves a robustness comparable to a system trained with a domain adaptation loss function but without using the domain information. We also analyze the mismatch due to channel and gender, in comparison to language proximity, in the x-vector space. This is achieved using the Maximum Mean Discrepancy divergence measure between groups of x-vectors. Our analysis shows that using the metric learning loss function reduces gender and channel mismatch in the x-vector space, even for languages only observed on one channel in the train set.

Details

Language :
English
Database :
OpenAIRE
Journal :
INTERSPEECH 2020, INTERSPEECH 2020, Oct 2020, Shangaï / Virtual, China, INTERSPEECH
Accession number :
edsair.doi.dedup.....581f2567de731f191d1da86f2240fb33