Author: "Dowerah, Sandipana" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Dowerah, Sandipana"' showing total 5 results

Start Over Author "Dowerah, Sandipana" Database OpenAIRE

5 results on '"Dowerah, Sandipana"'

1. Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions

Author: Dowerah, Sandipana, Kulkarni, Ajinkya, Serizel, Romain, Jouvet, Denis, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: FOS: Computer and information sciences, multichannel speech enhancement, Sound (cs.SD), diffusion probabilistic models, Audio and Speech Processing (eess.AS), self-supervised learning, FOS: Electrical engineering, electronic engineering, information engineering, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], multichannel speech enhancement diffusion probabilistic models speaker verification self-supervised learning, speaker verification, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: International audience; The paper introduces Diff-Filter, a multichannel speech enhancement approach based on the diffusion probabilistic model, for improving speaker verification performance under noisy and reverberant conditions. It also presents a new two-step training procedure that takes the benefit of self-supervised learning. In the first stage, the Diff-Filter is trained by conducting timedomain speech filtering using a scoring-based diffusion model. In the second stage, the Diff-Filter is jointly optimized with a pre-trained ECAPA-TDNN speaker verification model under a self-supervised learning framework. We present a novel loss based on equal error rate. This loss is used to conduct selfsupervised learning on a dataset that is not labelled in terms of speakers. The proposed approach is evaluated on MultiSV, a multichannel speaker verification dataset, and shows significant improvements in performance under noisy multichannel conditions.
Published: 2023

2. Joint Optimization of Diffusion Probabilistic-Based Multichannel Speech Enhancement with Far-Field Speaker Verification

Author: Dowerah, Sandipana, Serizel, Romain, Jouvet, Denis, Mohammadamini, M, Matrouf, Driss, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, and ANR-18-CE33-0014,ROBOVOX,ROBOVOX - Identification vocale robuste pour les robots de sécurité mobiles(2018)
Subjects: multichannel speech enhancement, far-field speaker verification, diffusion model, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]
Abstract: International audience; Today's smart devices using speaker verification are getting equipped with multiple microphones resulting in improving spatial ambiguity and directivity. However, unlike any other speech-based applications, the performance of speaker verification degrades in far-field scenarios due to the adverse effects of a noisy environment and room reverberation. This paper presents a novel multichannel speech enhancement module based on the diffusion probabilistic model. It is used as the front-end of the ECAPA-TDNN speaker verification system in far-field scenarios under a noisy-reverberant environment. The proposed system incorporates a two-stage training approach. In the first stage, both speech enhancement and speaker verification modules are trained individually. In the second stage, both the modules are combined to jointly trained them. We use similaritypreserving knowledge distillation loss that guides the network to produce similar activation for enhanced signals to that of clean speech signals. Using joint optimization with knowledge distillation loss achieved the best performance on both the evaluation composed of synthetic clips similar to those used at training and on unseen recorded clips from the VOiCES dataset.
Published: 2023
Full Text: View/download PDF

3. How to Leverage DNN-based speech enhancement for multi-channel speaker verification?

Author: Dowerah, Sandipana, Serizel, Romain, Jouvet, Denis, Mohammadamini, Mohammad, Matrouf, Driss, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, and ANR-18-CE33-0014,ROBOVOX,ROBOVOX - Identification vocale robuste pour les robots de sécurité mobiles(2018)
Subjects: FOS: Computer and information sciences, multichannel speech enhancement, Sound (cs.SD), far-field speaker verification, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, deep neural network, Computer Science - Human-Computer Interaction, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], Computer Science - Sound, Human-Computer Interaction (cs.HC), Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: International audience; Speaker verification (SV) suffers from unsatisfactory performance in far-field scenarios due to environmental noise andthe adverse impact of room reverberation. This work presents a benchmark of multichannel speech enhancement for far-fieldspeaker verification. One approach is a deep neural network-based, and the other is a combination of deep neural network andsignal processing. We integrated a DNN architecture with signal processing techniques to carry out various experiments. Ourapproach is compared to the existing state-of-the-art approaches. We examine the importance of enrollment in pre-processing,which has been largely overlooked in previous studies. Experimental evaluation shows that pre-processing can improve the SVperformance as long as the enrollment files are processed similarly to the test data and that test and enrollment occur within similarSNR ranges. Considerable improvement is obtained on the generated and all the noise conditions of the VOiCES dataset.
Published: 2022

4. MULTICHANNEL SPEECH ENHANCEMENT FOR SPEAKER VERIFICATION IN NOISY AND REVERBERANT ENVIRONMENTS

Author: Dowerah, Sandipana, Serizel, Romain, Jouvet, Denis, Mohammadamini, Mohammad, Matrouf, Driss, Dowerah, Sandipana, APPEL À PROJETS GÉNÉRIQUE 2018 - ROBOVOX - Identification vocale robuste pour les robots de sécurité mobiles - - ROBOVOX2018 - ANR-18-CE33-0014 - AAPG2018 - VALID, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, GRID5000, and ANR-18-CE33-0014,ROBOVOX,ROBOVOX - Identification vocale robuste pour les robots de sécurité mobiles(2018)
Subjects: [INFO]Computer Science [cs], [INFO] Computer Science [cs]
Abstract: International audience; Speech signals can be corrupted by environmental noise as well as room reverberation which severely affects the speaker verification performance. In this paper, we propose to combine a multichannel pre-processing pipeline including filter-and-sum network (FaSnet), Rank-1 multichannel Wiener filter, and weighted prediction error as a front-end to speaker verification. Experimental evaluation shows that the pre-processing can improve the speaker verification performance as long as the enrollment files are processed similarly to the test data and that test and enrollment occur within similar SNR ranges. Our proposed pipeline is trained on synthetic data but generalizes to unseen, real recorded clips included in the VOiCES eval dataset and improves the speaker verification performance on all the noise conditions.
Published: 2021

5. Compensate multiple distortions for speaker recognition systems

Author: Mohammadamini, Mohammad, Matrouf, Driss, Bonastre, Jean-Francois, Serizel, Romain, Dowerah, Sandipana, Jouvet, Denis, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: x-vector, full reverberation, denoising autoencoder, 0103 physical sciences, early reverberation, 0202 electrical engineering, electronic engineering, information engineering, 020206 networking & telecommunications, 02 engineering and technology, 010301 acoustics, 01 natural sciences, additive noise, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: International audience; The performance of speaker recognition systems reduces dramatically in severe conditions in the presence of additive noise and/or reverberation. In some cases, there is only one kind of domain mismatch like additive noise or reverberation, but in many cases, there are more than one distortion. Finding a solution for domain adaptation in the presence of different distortions is a challenge. In this paper we investigate the situation in which there is none, one or more of the following distortions: early reverberation, full reverberation, additive noise. We propose two configurations to compensate for these distortions. In the first one a specific denoising autoencoder is used for each distortion. In the second configuration, a denoising autoencoder is used to compensate for all of these distortions simultaneously. Our experiments show that, in the coexistence of noise and reverberation, the second configuration gives better results. For example, with the second configuration we obtained 76.6% relative improvement of EER for utterances longer than 12 seconds. For other situations in the presence of only one distortion, the second configuration gives almost the same results achieved by using a specific model for each distortion.
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results on '"Dowerah, Sandipana"'

1. Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions

2. Joint Optimization of Diffusion Probabilistic-Based Multichannel Speech Enhancement with Far-Field Speaker Verification

3. How to Leverage DNN-based speech enhancement for multi-channel speaker verification?

4. MULTICHANNEL SPEECH ENHANCEMENT FOR SPEAKER VERIFICATION IN NOISY AND REVERBERANT ENVIRONMENTS

5. Compensate multiple distortions for speaker recognition systems

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

Publisher

5 results on '"Dowerah, Sandipana"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources