Author: "Laurent Girin" / Publisher: ieee - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Laurent Girin"' showing total 15 results

Start Over Author "Laurent Girin" Publisher ieee

15 results on '"Laurent Girin"'

1. High-Resolution Speaker Counting in Reverberant Rooms Using CRNN with Ambisonics Features

Author: Pierre-Amaury Grumiaux, Laurent Girin, Srdan Kitic, and Alexandre Guerin
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Reverberation, Microphone, Computer science, Ambisonics, Speech recognition, 020206 networking & telecommunications, 02 engineering and technology, computer.software_genre, Computer Science - Sound, Speaker diarisation, Sound recording and reproduction, Noise, Recurrent neural network, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Audio signal processing, computer, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the number of speakers at each timestep is a prerequisite, or at least it can be a strong advantage, in addition to enabling a low latency processing. For that purpose, we address the speaker counting problem with a multichannel convolutional recurrent neural network which produces an estimation at a short-term frame resolution. We trained the network to predict up to 5 concurrent speakers in a multichannel mixture, with simulated data including many different conditions in terms of source and microphone positions, reverberation, and noise. The network can predict the number of speakers with good accuracy at frame resolution., 5 pages, 1 figure
Published: 2021
Full Text: View/download PDF

2. Speech Enhancement with Variational Autoencoders and Alpha-stable Distributions

Author: Laurent Girin, Radu Horaud, Antoine Liutkus, Umut Simsekli, Simon Leglaive, Interpretation and Modelling of Images and Videos (PERCEPTION ), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Signal, Statistique et Apprentissage (S2A), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, Département Images, Données, Signal (IDS), Télécom ParisTech, Scientific Data Management (ZENITH), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing (GIPSA-CRISSP), Département Parole et Cognition (GIPSA-DPC), Grenoble Images Parole Signal Automatique (GIPSA-lab ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Grenoble Images Parole Signal Automatique (GIPSA-lab ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Chaire DSAIDISThis work is supported by the ERC Advanced Grant VHIA #34, ANR-15-CE38-0003,KAMoulox,Démixage en ligne de larges archives sonores(2015), ANR-16-CE23-0014,FBIMATRIX,Méthodes distribuées et parallèles de Monte-Carlo par chaînes de Markov pour l'Inférence Bayésienne de modèles à factorisation de tenseurs(2016), European Project: 340113,EC:FP7:ERC,ERC-2013-ADG,VHIA(2014), and Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Inria Sophia Antipolis - Méditerranée (CRISAM)
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer science, Gaussian, Speech recognition, Speech enhancement, Monte Carlo method, Machine Learning (stat.ML), 02 engineering and technology, [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE], Intelligibility (communication), Computer Science - Sound, Matrix decomposition, 030507 speech-language pathology & audiology, 03 medical and health sciences, symbols.namesake, Audio and Speech Processing (eess.AS), Statistics - Machine Learning, FOS: Electrical engineering, electronic engineering, information engineering, 0202 electrical engineering, electronic engineering, information engineering, Context model, Noise measurement, business.industry, Deep learning, 020206 networking & telecommunications, Computer Science::Sound, symbols, Monte Carlo expectation-maximization, Artificial intelligence, 0305 other medical science, business, Variational autoencoders, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, Alpha-stable distribution, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This paper focuses on single-channel semi-supervised speech enhancement. We learn a speaker-independent deep generative speech model using the framework of variational autoencoders. The noise model remains unsupervised because we do not assume prior knowledge of the noisy recording environment. In this context, our contribution is to propose a noise model based on alpha-stable distributions, instead of the more conventional Gaussian non-negative matrix factorization approach found in previous studies. We develop a Monte Carlo expectation-maximization algorithm for estimating the model parameters at test time. Experimental results show the superiority of the proposed approach both in terms of perceptual quality and intelligibility of the enhanced speech signal., 5 pages, 3 figures, audio examples and code available online : https://team.inria.fr/perception/research/icassp2019-asvae/. arXiv admin note: text overlap with arXiv:1811.06713
Published: 2019
Full Text: View/download PDF

3. Online Localization of Multiple Moving Speakers in Reverberant Environments

Author: Bastien Mourgue, Laurent Girin, Sharon Gannot, Xiaofei Li, Radu Horaud, Interpretation and Modelling of Images and Videos (PERCEPTION ), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing (GIPSA-CRISSP), Département Parole et Cognition (GIPSA-DPC), Grenoble Images Parole Signal Automatique (GIPSA-lab ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Grenoble Images Parole Signal Automatique (GIPSA-lab ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Bar-Ilan University [Israël], and European Project: 340113,EC:FP7:ERC,ERC-2013-ADG,VHIA(2014)
Subjects: Reverberation, Computer science, Speech recognition, 020206 networking & telecommunications, 02 engineering and technology, Acoustic source localization, Mixture model, Speech processing, Motion capture, Complex normal distribution, 030507 speech-language pathology & audiology, 03 medical and health sciences, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Rate of convergence, Computer Science::Sound, Feature (computer vision), reverberant environments, [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD], 0202 electrical engineering, electronic engineering, information engineering, multiple moving speakers, sound-source localization, 0305 other medical science
Abstract: International audience; This paper addresses the problem of online multiple moving speakers localization in reverberant environments. The direct-path relative transfer function (DP-RTF), as defined by the ratio between the first taps of the convolutive transfer function (CTF) of two microphones, encodes the inter-channel direct-path information and is thus used as a localization feature being robust against reverberation. The CTF estimation is based on the cross-relation method. In this work, the recursive least-square method is proposed to solve the cross-relation problem, due to its relatively low computational cost and its good convergence rate. The DP-RTF feature estimated at each time-frequency bin is assumed to correspond to a single speaker. A complex Gaussian mixture model is used to assign each observed feature to one among several speakers. The recursive expectation-maximization algorithm is adopted to update online the model parameters. The method is evaluated with a new dataset containing multiple moving speakers, where the ground-truth speaker trajectories are recorded with a motion capture system.
Published: 2018
Full Text: View/download PDF

4. Explaining the parameterized wiener filter with alpha-stable processes

Author: Laurent Girin, Antoine Liutkus, Roland Badeau, Mathieu Fontaine, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing (GIPSA-CRISSP), Département Parole et Cognition (GIPSA-DPC), Grenoble Images Parole Signal Automatique (GIPSA-lab ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Grenoble Images Parole Signal Automatique (GIPSA-lab ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Télécom ParisTech, Projet ANR KAMoulox, ANR-15-CE38-0003,KAMoulox,Démixage en ligne de larges archives sonores(2015), Badeau, Roland, Démixage en ligne de larges archives sonores - - KAMoulox2015 - ANR-15-CE38-0003 - AAPG2015 - VALID, Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Subjects: Noise (signal processing), Noise reduction, Speech recognition, Wiener filter, Spectral density, Wiener deconvolution, 020206 networking & telecommunications, 02 engineering and technology, Weighting, Speech enhancement, 030507 speech-language pathology & audiology, 03 medical and health sciences, symbols.namesake, Fourier transform, probability theory, denoising, 0202 electrical engineering, electronic engineering, information engineering, symbols, Wiener filtering, 0305 other medical science, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, Algorithm, [SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing, alpha-stable processes, Mathematics
Abstract: International audience; This paper introduces a new method for single-channel denoising that sheds new light on classical early developments on this topic that occurred in the 70’s and 80’s with Wiener filtering and spectral subtraction. Operating both in the short-time Fourier transform domain, these methods consist in estimating the power spectral density (PSD) of the noise without speech. Then, the clean speech signal is obtained by manipulating the corrupted time-frequency bins thanks to these noise PSD estimates. Theoretically grounded when using power spectra, these methods were subsequently generalized to magnitude spectra, or shown to yield better performance by weighting the PSDs in the so-called parameterized Wiener filter. Both these strategies were long considered ad-hoc. To the best of our knowledge, while we recently proposed an interpretation of magnitude processing, there is still no theoretical result that would justify the better performance of parameterized Wiener filters. Here, we show how the α-stable probabilistic model for waveforms naturally leads to these weighted filters and we provide a grounded and fast algorithm to enhance corrupted audio that compares favorably with classical denoising methods.
Published: 2017
Full Text: View/download PDF

5. Deep neural networks for automatic detection of screams and shouted speech in subway trains

Author: Pierre Laffitte, David Sodoyer, Charles Tatkeu, Laurent Girin, Laboratoire Électronique Ondes et Signaux pour les Transports (IFSTTAR/COSYS/LEOST), Institut Français des Sciences et Technologies des Transports, de l'Aménagement et des Réseaux (IFSTTAR)-PRES Université Lille Nord de France, GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing (GIPSA-CRISSP), Département Parole et Cognition (GIPSA-DPC), Grenoble Images Parole Signal Automatique (GIPSA-lab ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Grenoble Images Parole Signal Automatique (GIPSA-lab ), and Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])
Subjects: RECONNAISSANCE DE SON, Computer science, Speech recognition, 02 engineering and technology, Task (project management), 030507 speech-language pathology & audiology, 03 medical and health sciences, Deep belief network, DETECTION DE CRIS, 0202 electrical engineering, electronic engineering, information engineering, DETECTION D'INCIDENT, METRO, ENVIRONNEMENT TRANSPORT, Voice activity detection, 020206 networking & telecommunications, RESEAU DE NEURONES, DETECTION D'EVENEMENTS SONORES AUDIO, ENVIRONNEMENT SONORE, TRANSPORT FERROVIAIRE, BRUIT, Deep neural networks, DETECTION D'EVENEMENT SONORE, Train, Noise (video), 0305 other medical science, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, DEEP BELIEF NETWORKS
Abstract: IEEE ICASSP 2016 - International Conference on Acoustics, Speech and Signal Processing, Shanghai, Chine, 20-/03/2016 - 25/03/2016; International audience; Deep Neural Networks (DNNs) have recently become a popular technique for regression and classification problems. Their capacity to learn high-order correlations between input and output data proves to be very powerful for automatic speech recognition. In this paper we investigate the use of DNNs for automatic scream and shouted speech detection, within the framework of surveillance systems in public transportation. We recorded a database of sounds occurring in subway trains in real conditions of exploitation and used DNNs to classify the sounds into screams, shouts and other categories. We report encouraging results, given the difficulty of the task, especially when a high level of surrounding noise is present.; Les réseaux de neurones profonds sont devenues récemment une technique populaire pour les problèmes de régression et de classification. Leur capacité d'apprendre des corrélations d'ordre éleÎ entre des entrées et des données de sortie s'aÏre être très un puissant outil pour reconnaissance automatique de la parole. Dans cet article, nous étudions l'utilisation des réseaux de neurones profonds pour la détection automatique de cris et de parole criée dans le cadre de systèmes de surveillance dans les transports publics. Pour cela, une base de données sonores a été enregistrée dans une rame de métro en condition réelle d'exploitation. Dans ce contexte, la détection de cri est réalisée via un classement de divers types de production de la parole dont des cris. Nous obtenons des résultats encourageants étant donné la difficulté de la tâche, en particulier vis-à-vis du haut niveau de bruit sonore environnant.
Published: 2016
Full Text: View/download PDF

6. A variational EM algorithm for the separation of moving sound sources

Author: Laurent Girin, Dionyssos Kounades-Bastian, Sharon Gannot, Xavier Alameda-Pineda, Radu Horaud, Interpretation and Modelling of Images and Videos (PERCEPTION), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing (GIPSA-CRISSP), Département Parole et Cognition (GIPSA-DPC), Grenoble Images Parole Signal Automatique (GIPSA-lab), Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Grenoble Images Parole Signal Automatique (GIPSA-lab), Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), University of Trento [Trento], Faculty of Engineering [Israel], Bar-Ilan University [Israël], IEEE Signal Processing Society, European Project: 340113,EC:FP7:ERC,ERC-2013-ADG,VHIA(2014), European Project: 609465,EC:FP7:ICT,FP7-ICT-2013-10,EARS(2014), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Grenoble Images Parole Signal Automatique (GIPSA-lab), and Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)
Subjects: Mathematical optimization, Computer science, 02 engineering and technology, variational EM, Matrix decomposition, 030507 speech-language pathology & audiology, 03 medical and health sciences, [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], moving sources, Expectation–maximization algorithm, 0202 electrical engineering, electronic engineering, information engineering, Source separation, [SPI.ACOU]Engineering Sciences [physics]/Acoustics [physics.class-ph], Probabilistic logic, Estimator, 020206 networking & telecommunications, Kalman filter, Audio-source separation, Complex normal distribution, Kalman smoother, [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD], Algorithm design, 0305 other medical science, time-varying mixing filters, Algorithm, [MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA]
Abstract: International audience; This paper addresses the problem of separation of moving sound sources. We propose a probabilistic framework based on the complex Gaussian model combined with non-negative matrix factorization. The properties associated with moving sources are modeled using time-varying mixing filters described by a stochastic temporal process. We present a variational expectation-maximization (VEM) algorithm that employs a Kalman smoother to estimate the mixing filters. The sound sources are separated by means of Wiener filters, built from the estimators provided by the proposed VEM algorithm. Preliminary experiments with simulated data show that, while for static sources we obtain results comparable with the base-line method of Ozerov et al., in the case of moving source our method outperforms a piece-wise version of the baseline method.
Published: 2015
Full Text: View/download PDF

7. Long-term flexible 2D cepstral modeling of speech spectral amplitudes

Author: Mohammad Firouzmand, Laurent Girin, Girin, Laurent, GIPSA - Machines Parlantes, Agents Communicants & Interaction Face-à-face (GIPSA-MPACIF), Département Parole et Cognition (GIPSA-DPC), Grenoble Images Parole Signal Automatique (GIPSA-lab), Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Grenoble Images Parole Signal Automatique (GIPSA-lab), and Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)
Subjects: Masking (art), [INFO.INFO-TS] Computer Science [cs]/Signal and Image Processing, Computer science, speech coding, Speech recognition, Speech coding, Speech synthesis, 02 engineering and technology, computer.software_genre, 030507 speech-language pathology & audiology, 03 medical and health sciences, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, speech synthesis, Cepstrum, 0202 electrical engineering, electronic engineering, information engineering, Discrete cosine transform, Envelope (mathematics), speech processing, [SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing, speech analysis, 020206 networking & telecommunications, Speech processing, speech modeling, Amplitude, Spectral envelope, 0305 other medical science, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, computer, Algorithm
Abstract: International audience; This paper presents a method for modeling the envelope of spectral amplitude parameters of speech signals in "two dimensions" (2D). It consists of two cascaded modelings: the first one along the frequency axis is the usual cepstrum technique, which consists of modeling the log-scaled spectral envelope with a Discrete Cosine Model (DCM). The second one, along the time axis, consists of modeling the trajectory of the envelope DCM coefficients by another similar DCM model. An iterative algorithm is proposed to optimally fit this 2D-model to the data according to a perceptual criterion based on frequency masking. This approach is shown to provide an efficient and flexible representation of spectral amplitude parameters in terms of coefficient rates, while providing good signal quality, opening new perspectives in very-low bit-rate sinusoidal speech coding.
Published: 2008
Full Text: View/download PDF

8. Perceptually Weighted Long Term Modeling of Sinusoidal Speech Amplitude Trajectories

Author: Mohammad Firouzmand and Laurent Girin
Subjects: Noise, Masking threshold, Computer Science::Sound, Distortion, Speech recognition, Speech coding, Context (language use), Speech synthesis, Sinusoidal model, computer.software_genre, computer, Mathematics, Interpolation
Abstract: In this paper, the problem of modeling the trajectory of the amplitudes of speech signals is addressed within the context of the sinusoidal model of speech. A long-term model of the trajectory of the amplitude of the partials is proposed for each entire voiced section of speech, contrary to standard models, which are defined on a frame-by-frame basis. The complete analysis-modeling-synthesis process is presented. We compare a DCT-based long-term model with classical (frame-by-frame) interpolation schemes, given that the analysis process is identical in both cases. Perceptual constraints are taken into account since the distortion criterion in this approach is the level of modeling noise above the masking threshold. Promising results are given and the interest of the presented models for speech coding and watermarking applications is discussed.
Published: 2006
Full Text: View/download PDF

9. Solving The Inderterminations Of Blind Source Separation Of Convolutive Speech Mixtures

Author: Christian Jutten, Laurent Girin, and Bertrand Rivet
Subjects: Signal processing, Computer science, Estimation theory, Speech recognition, computer.software_genre, Speech processing, Blind signal separation, Speech enhancement, Computer Science::Sound, Frequency separation, Frequency domain, Source separation, Audio signal processing, computer
Abstract: Looking at the speaker's face seems useful for hearing a speech signal better and extracting it from competing sources before identification. We present a novel algorithm plugging the audiovisual coherence of speech signals, estimated by statistical tools, on audio blind source separation (BSS) algorithms in the difficult case of convolutive mixtures. The algorithm mainly works in the frequency (transform) domain, where the convolutive mixture becomes an additive mixture for each frequency channel. Frequency by frequency separation is made by an audio BSS algorithm, and the audiovisual information is used to solve the standard source permutation and scale factor problems at the output of the separation stage, for each frequency. The proposed method is shown to be efficient in the case of 2/spl times/2 convolutive mixtures.
Published: 2006
Full Text: View/download PDF

10. Using audiovisual speech processing to improve the robustness of the separation of convolutive speech mixtures

Author: Laurent Girin, Jean-Luc Schwartz, Christian Jutten, and Bertrand Rivet
Subjects: Voice activity detection, Computer Science::Sound, Frequency separation, Computer science, Robustness (computer science), Speech recognition, Speech coding, Audiovisual speech, Audio signal processing, computer.software_genre, Speech processing, computer, Blind signal separation
Abstract: Looking at the speaker's face seems useful in hearing better a speech signal and extract it from the competing sources before identification. In this paper, we present a novel algorithm plugging audiovisual coherence of speech signals, estimated by statistical tools, on audio blind source separation (BSS) algorithms in the difficult case of convolutive mixtures. The algorithm mainly works in the frequency (transform) domain, where the convolutive mixture becomes an additive mixture for each frequency channel. Frequency by frequency separation is made by an audio BSS algorithm, and the audiovisual information is used to solve the standard source permutation problem at the output of the separation stage, for each frequency. The proposed method is shown to be efficient in the case of 2 /spl times/ 2 convolutive mixtures.
Published: 2005
Full Text: View/download PDF

11. Speech extraction based on ICA and audio-visual coherence

Author: C. Jutten, Jean-Luc Schwartz, Laurent Girin, and David Sodoyer
Subjects: Voice activity detection, Computer Science::Sound, Computer science, Speech recognition, Speech coding, Acoustic model, Coherence (statistics), Linear predictive coding, Audio signal processing, computer.software_genre, Speech processing, computer, Blind signal separation
Abstract: We present a new approach to the source separation problem for multiple speech signals. Using the extra visual information of the speaker's face, the method aims to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker's lip movements. We define a statistical model of the joint probability of visual and spectral audio input for quantifying the audio-visual coherence. Then, separation can be achieved by maximising this joint probability. Experiments on additive mixtures of 2, 3 and 5 sources show that the algorithm performs well, and systematically better than the classical BSS algorithm JADE.
Published: 2003
Full Text: View/download PDF

12. Audiovisual speech enhancement: new advances using multi-layer perceptrons

Author: Laurent Girin, Jean-Luc Schwartz, G. Feng, and Laurent Varin
Subjects: Computer science, business.industry, Speech recognition, Pattern recognition, Linear prediction, White noise, Intelligibility (communication), Perceptron, computer.software_genre, Sensor fusion, Distance measures, Speech enhancement, Artificial intelligence, Audio signal processing, business, computer
Abstract: This paper deals with the improvement of a noisy speech enhancement system based on the fusion of auditory and visual information. The system was presented in previous papers and implemented with a simple stimuli corrupted with white noise. Its principle consists of an analysis-enhancement-synthesis process based on a linear prediction (LP) model of the signal: the LP filter is enhanced thanks to associative tools that estimate the LP cleaned parameters from both noisy audio and lip shape information. The structure of the system is reviewed and we focus on the improvement that concerns the associators: multi-layers perceptrons are used instead of linear regression. It is shown that in the context of VCV transitions corrupted with white noise, the performances of the system are improved in terms of the intelligibility gain, distance measures and classification tests.
Published: 2002
Full Text: View/download PDF

13. Fusion of auditory and visual information for noisy speech enhancement: a preliminary study of vowel transitions

Author: Laurent Girin, G. Feng, and Jean-Luc Schwartz
Subjects: Speech enhancement, Estimation theory, Computer science, business.industry, Vowel, Speech recognition, Noise reduction, Context (language use), Pattern recognition, Artificial intelligence, White noise, Speech processing, business
Abstract: This paper deals with a noisy speech enhancement technique based on the fusion of auditory and visual information. We first present the global structure of the system, and then we focus on the tool we used to melt both sources of information. The whole noise reduction system is implemented in the context of vowel transitions corrupted with white noise. A complete evaluation of the system in this context is presented, including distance measures, Gaussian classification scores, and a perceptive test. The results are very promising.
Published: 2002
Full Text: View/download PDF

14. An audio-visual distance for audio-visual speech vector quantization

Author: Laurent Girin, Elodie Foucher, and G. Feng
Subjects: Voice activity detection, Exploit, business.industry, Computer science, Existential quantification, Speech recognition, Speech coding, Vector quantization, Speech processing, Computer Science::Sound, Computer Science::Multimedia, Computer vision, Artificial intelligence, business, Decoding methods, Coding (social sciences)
Abstract: Speech is both an acoustic and a visual signal, and there exists some complementarity and redundancy between the two modalities. In the speech coding domain, it is of great interest to use this redundancy to improve speech coder performance. In this paper, we consider some audio and video joint coding process based on an audio-visual vector quantization. The method is shown to exploit quite well the audio-visual redundancy as it can reduce the bit rate while decreasing the quantization error. A notion of audio-visual distance has to be introduced and adapted to the different nature of the data. It is defined from an existing audio distance and a new visual distance, which is particularly focussed.
Published: 2002
Full Text: View/download PDF

15. Speech signals separation: a new approach exploiting the coherence of audio and visual speech

Author: Laurent Girin, Jean-Luc Schwartz, and A. Allard
Subjects: Voice activity detection, Computer science, Speech recognition, Speech coding, Source separation, Acoustic model, Coherence (statistics), Speaker recognition, Speech processing, Independent component analysis
Abstract: We present a new approach to the source separation problem in the case of multiple speech signals. The method is based on the use of automatic lip reading: the objective is to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker's lip movements. For this aim, a statistical model is used to quantify this coherence. The results, while very preliminary, are encouraging. They show that this method can achieve a good separation of a speech source in the case of simple 2/spl times/2 additive mixtures. Moreover, it presents some interesting complementarity with traditional pure audio techniques.
Published: 2002
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

15 results on '"Laurent Girin"'

1. High-Resolution Speaker Counting in Reverberant Rooms Using CRNN with Ambisonics Features

2. Speech Enhancement with Variational Autoencoders and Alpha-stable Distributions

3. Online Localization of Multiple Moving Speakers in Reverberant Environments

4. Explaining the parameterized wiener filter with alpha-stable processes

5. Deep neural networks for automatic detection of screams and shouted speech in subway trains

6. A variational EM algorithm for the separation of moving sound sources

7. Long-term flexible 2D cepstral modeling of speech spectral amplitudes

8. Perceptually Weighted Long Term Modeling of Sinusoidal Speech Amplitude Trajectories

9. Solving The Inderterminations Of Blind Source Separation Of Convolutive Speech Mixtures

10. Using audiovisual speech processing to improve the robustness of the separation of convolutive speech mixtures

11. Speech extraction based on ICA and audio-visual coherence

12. Audiovisual speech enhancement: new advances using multi-layer perceptrons

13. Fusion of auditory and visual information for noisy speech enhancement: a preliminary study of vowel transitions

14. An audio-visual distance for audio-visual speech vector quantization

15. Speech signals separation: a new approach exploiting the coherence of audio and visual speech

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

15 results on '"Laurent Girin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources