1. Compensate multiple distortions for speaker recognition systems
- Author
-
Mohammadamini, Mohammad, Matrouf, Driss, Bonastre, Jean-Francois, Serizel, Romain, Dowerah, Sandipana, Jouvet, Denis, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
x-vector ,full reverberation ,denoising autoencoder ,0103 physical sciences ,early reverberation ,0202 electrical engineering, electronic engineering, information engineering ,020206 networking & telecommunications ,02 engineering and technology ,010301 acoustics ,01 natural sciences ,additive noise ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] - Abstract
International audience; The performance of speaker recognition systems reduces dramatically in severe conditions in the presence of additive noise and/or reverberation. In some cases, there is only one kind of domain mismatch like additive noise or reverberation, but in many cases, there are more than one distortion. Finding a solution for domain adaptation in the presence of different distortions is a challenge. In this paper we investigate the situation in which there is none, one or more of the following distortions: early reverberation, full reverberation, additive noise. We propose two configurations to compensate for these distortions. In the first one a specific denoising autoencoder is used for each distortion. In the second configuration, a denoising autoencoder is used to compensate for all of these distortions simultaneously. Our experiments show that, in the coexistence of noise and reverberation, the second configuration gives better results. For example, with the second configuration we obtained 76.6% relative improvement of EER for utterances longer than 12 seconds. For other situations in the presence of only one distortion, the second configuration gives almost the same results achieved by using a specific model for each distortion.
- Published
- 2021
- Full Text
- View/download PDF