1. MULTICHANNEL SPEECH ENHANCEMENT FOR SPEAKER VERIFICATION IN NOISY AND REVERBERANT ENVIRONMENTS
- Author
-
Dowerah, Sandipana, Serizel, Romain, Jouvet, Denis, Mohammadamini, Mohammad, Matrouf, Driss, Dowerah, Sandipana, APPEL À PROJETS GÉNÉRIQUE 2018 - ROBOVOX - Identification vocale robuste pour les robots de sécurité mobiles - - ROBOVOX2018 - ANR-18-CE33-0014 - AAPG2018 - VALID, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, GRID5000, and ANR-18-CE33-0014,ROBOVOX,ROBOVOX - Identification vocale robuste pour les robots de sécurité mobiles(2018)
- Subjects
[INFO]Computer Science [cs] ,[INFO] Computer Science [cs] - Abstract
International audience; Speech signals can be corrupted by environmental noise as well as room reverberation which severely affects the speaker verification performance. In this paper, we propose to combine a multichannel pre-processing pipeline including filter-and-sum network (FaSnet), Rank-1 multichannel Wiener filter, and weighted prediction error as a front-end to speaker verification. Experimental evaluation shows that the pre-processing can improve the speaker verification performance as long as the enrollment files are processed similarly to the test data and that test and enrollment occur within similar SNR ranges. Our proposed pipeline is trained on synthetic data but generalizes to unseen, real recorded clips included in the VOiCES eval dataset and improves the speaker verification performance on all the noise conditions.
- Published
- 2021