1. DNN-based mask estimation for distributed speech enhancement in spatially unconstrained microphone arrays
- Author
-
Romain Serizel, Irina Illina, Nicolas Furnon, Slim Essid, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, This work was made with the support of the French National Research Agency, in the framework of the project DiSCogs (ANR-17-CE23-0026-01). Experiments presented in this paper were partially out using the Grid5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000)., Grid'5000, ANR-17-CE23-0026,DiSCogs,Antennes acoustiques hétérogènes et non contraintes pour la communication parlée(2017), Institut Polytechnique de Paris (IP Paris), Département Images, Données, Signal (IDS), Télécom ParisTech, Signal, Statistique et Apprentissage (S2A), and Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris
- Subjects
Signal Processing (eess.SP) ,Microphone array ,Acoustics and Ultrasonics ,Noise measurement ,Artificial neural network ,Computer science ,Microphone ,Noise reduction ,Speech recognition ,Context (language use) ,Speech processing ,Speech enhancement ,Computational Mathematics ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,Computer Science::Sound ,FOS: Electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Electrical and Electronic Engineering ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Deep neural network (DNN)-based speech enhancement algorithms in microphone arrays have now proven to be efficient solutions to speech understanding and speech recognition in noisy environments. However, in the context of ad-hoc microphone arrays, many challenges remain and raise the need for distributed processing. In this paper, we propose to extend a previously introduced distributed DNN-based time-frequency mask estimation scheme that can efficiently use spatial information in form of so-called compressed signals which are pre-filtered target estimations. We study the performance of this algorithm under realistic acoustic conditions and investigate practical aspects of its optimal application. We show that the nodes in the microphone array cooperate by taking profit of their spatial coverage in the room. We also propose to use the compressed signals not only to convey the target estimation but also the noise estimation in order to exploit the acoustic diversity recorded throughout the microphone array., Submitted to TASLP
- Published
- 2020