Back to Search Start Over

Informed source separation of linear instantaneous under-determined audio mixtures by source index embedding

Authors :
Laurent Girin
Mathieu Parvaix
GIPSA - Machines parlantes, Gestes oro-faciaux, Interaction Face-à-face, Communication augmentée (GIPSA-MAGIC)
Département Parole et Cognition (GIPSA-DPC)
Grenoble Images Parole Signal Automatique (GIPSA-lab)
Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Grenoble Images Parole Signal Automatique (GIPSA-lab)
Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)
ANR
ANR DReaM
ANR-09-CORD-0006,DReaM,Le Disque Repensé pour l'Écoute Active de la Musique(2009)
Source :
IEEE Transactions on Audio, Speech and Language Processing, IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2011, 19 (6), pp.1721-1733. ⟨10.1109/TASL.2010.2097250⟩
Publication Year :
2011
Publisher :
HAL CCSD, 2011.

Abstract

International audience; In this paper, we address the issue of underdeter- mined source separation of I non-stationary audio sources from a J-channel linear instantaneous mixture (J < I). This problem is addressed with a specific coder-decoder configuration. At the coder, source signals are assumed to be available before the mixing is processed. A time-frequency (TF) joint analysis of each source signal and mixture signal enables to select the subset of sources (among I) leading to the best separation results in each TF region. A corresponding source(s) index code is imperceptibly embedded into the mix signal using a watermarking technique. At the decoder, where the original source signals are unknown, the extraction of the watermark enables to invert the mixture in each TF region to recover the source signals. With such informed approach, it is shown that 5 instruments and singing voice signals can be efficiently separated from 2-channel stereo mixtures, with a quality that significantly overcomes the quality obtained by a semi-blind reference method and enables separate manipulation of the source signals during stereo music restitution (i.e. remixing).

Details

Language :
English
ISSN :
15587916
Database :
OpenAIRE
Journal :
IEEE Transactions on Audio, Speech and Language Processing, IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2011, 19 (6), pp.1721-1733. ⟨10.1109/TASL.2010.2097250⟩
Accession number :
edsair.doi.dedup.....893207cabcf381ee5ebd4129b7f0785b
Full Text :
https://doi.org/10.1109/TASL.2010.2097250⟩