Topic: speech processing - Searchworks@Jio Institute Digital Library Search Results

Showing total 3 results

Start Over Topic speech processing

3 results

1. DNN-based mask estimation for distributed speech enhancement in spatially unconstrained microphone arrays

Author: Romain Serizel, Irina Illina, Nicolas Furnon, Slim Essid, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, This work was made with the support of the French National Research Agency, in the framework of the project DiSCogs (ANR-17-CE23-0026-01). Experiments presented in this paper were partially out using the Grid5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000)., Grid'5000, ANR-17-CE23-0026,DiSCogs,Antennes acoustiques hétérogènes et non contraintes pour la communication parlée(2017), Institut Polytechnique de Paris (IP Paris), Département Images, Données, Signal (IDS), Télécom ParisTech, Signal, Statistique et Apprentissage (S2A), and Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris
Subjects: Signal Processing (eess.SP), Microphone array, Acoustics and Ultrasonics, Noise measurement, Artificial neural network, Computer science, Microphone, Noise reduction, Speech recognition, Context (language use), Speech processing, Speech enhancement, Computational Mathematics, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, Computer Science::Sound, FOS: Electrical engineering, electronic engineering, information engineering, Computer Science (miscellaneous), Electrical and Electronic Engineering, Electrical Engineering and Systems Science - Signal Processing
Abstract: Deep neural network (DNN)-based speech enhancement algorithms in microphone arrays have now proven to be efficient solutions to speech understanding and speech recognition in noisy environments. However, in the context of ad-hoc microphone arrays, many challenges remain and raise the need for distributed processing. In this paper, we propose to extend a previously introduced distributed DNN-based time-frequency mask estimation scheme that can efficiently use spatial information in form of so-called compressed signals which are pre-filtered target estimations. We study the performance of this algorithm under realistic acoustic conditions and investigate practical aspects of its optimal application. We show that the nodes in the microphone array cooperate by taking profit of their spatial coverage in the room. We also propose to use the compressed signals not only to convey the target estimation but also the noise estimation in order to exploit the acoustic diversity recorded throughout the microphone array., Submitted to TASLP
Published: 2020

2. Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition

Author: Emmanuel Vincent, Sunit Sivasankaran, Dominique Fohr, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Grid'5000, ANR-16-CE33-0006,VOCADOM,Commande vocale robuste adaptée à la personne et au contexte pour l'autonomie à domicile(2016), This work was made with the support of the French National Research Agency, in the framework of the project VOCADOM 'Robust voice commandadapted to the user and to the context for AAL' (ANR-16-CE33-0006). Experiments presented in this paper were carried out using the Grid’5000testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several universities as well as other organizations (see https://www.grid5000.fr) and using the EXPLOR centre, hosted by the University of Lorraine., Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Subjects: Multichannel speech separation, WSJ0-2mix reverberated, Signal processing, Noise measurement, Artificial neural network, Computer science, Speech recognition, Word error rate, 020206 networking & telecommunications, 02 engineering and technology, Speech processing, Signal-to-noise ratio, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Audio and Speech Processing (eess.AS), [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD], 0202 electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Adaptive beamformer, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: We investigate the effect of speaker localization on the performance of speech recognition systems in a multispeaker, multichannel environment. Given the speaker location information, speech separation is performed in three stages. In the first stage, a simple delay-and-sum (DS) beamformer is used to enhance the signal impinging from the speaker location which is then used to estimate a time-frequency mask corresponding to the localized speaker using a neural network. This mask is used to compute the second order statistics and to derive an adaptive beamformer in the third stage. We generated a multichannel, multispeaker, reverberated, noisy dataset inspired from the well studied WSJ0-2mix and study the performance of the proposed pipeline in terms of the word error rate (WER). An average WER of $29.4$% was achieved using the ground truth localization information and $42.4$% using the localization information estimated via GCC-PHAT. The signal-to-interference ratio (SIR) between the speakers has a higher impact on the ASR performance, to the extent of reducing the WER by $59$% relative for a SIR increase of $15$ dB. By contrast, increasing the spatial distance to $50^\circ$ or more improves the WER by $23$% relative only, Comment: Submitted to ICASSP 2020
Published: 2019
Full Text: View/download PDF

3. Assessment of Severe Apnoea through Voice Analysis, Automatic Speech, and Speaker Recognition Techniques

Author: Luis A. Hernández Gómez, José Luis Blanco Murillo, José Alcázar Ramírez, Eduardo López Gonzalo, Rubén Fernández Pozo, Doroteo Torre Toledano, [Fernández Pozo,R, Blanco Murillo,JL, Hernández Gómez,L, López Gonzalo,E] Signal, Systems and Radiocommunications Departament, Universidad Politécnica de Madrid, Madrid, Spain. [Alcázar Ramírez,J] Respiratory Departament, Hospital Torrecárdenas, Almería, Spain. [Toledano,DT] ATVS Biometric Recognition group, Universidad Autónoma de Madrid, Madrid, Spain., The activities described in this paper were funded by the Spanish Ministry of Science and Technology as part of the TEC2006-13170-C02-02 Project., UAM. Departamento de Ingeniería Informática, and Análisis y Tratamiento de Voz y Señales Biométricas (ING EPS-002)
Subjects: Sustained speech, Artificial intelligence, Information Science::Information Science::Computing Methodologies::Software::Speech Recognition Software [Medical Subject Headings], Computer science, Speech recognition, 0206 medical engineering, lcsh:TK7800-8360, 02 engineering and technology, computer.software_genre, Voice analysis, Nasalization, lcsh:Telecommunication, Vowel, lcsh:TK5101-6720, 0202 electrical engineering, electronic engineering, information engineering, Distribución normal, Diseases::Respiratory Tract Diseases::Respiration Disorders::Apnea::Sleep Apnea Syndromes::Sleep Apnea, Obstructive [Medical Subject Headings], Audio signal processing, Telecomunicaciones, Phenomena and Processes::Mathematical Concepts::Statistical Distributions::Normal Distribution [Medical Subject Headings], lcsh:Electronics, 020206 networking & telecommunications, Phonetics, Speaker recognition, Speech processing, Diseases::Otorhinolaryngologic Diseases::Laryngeal Diseases::Voice Disorders [Medical Subject Headings], 020601 biomedical engineering, Continuous speech, Obstructive sleep apnea, respiratory tract diseases, Apnea del sueño obstructiva, Pattern recognition (psychology), Speech dynamics, Gaussian mixture models, computer, Programa informático para el reconocimiento del lenguaje hablado, Trastornos de la voz, Classification and regression tree (CART)
Abstract: The electronic version of this article is the complete one and can be found online at: http://asp.eurasipjournals.com/content/2009/1/982531, This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry., The activities described in this paper were funded by the Spanish Ministry of Science and Technology as part of the TEC2006-13170-C02-02 Project.
Published: 2009

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

3 results

1. DNN-based mask estimation for distributed speech enhancement in spatially unconstrained microphone arrays

2. Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition

3. Assessment of Severe Apnoea through Voice Analysis, Automatic Speech, and Speaker Recognition Techniques

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

Publisher

3 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources