Back to Search Start Over

Active Speaker Detection and Localization in Videos Using Low-Rank and Kernelized Sparsity

Authors :
Pu, J. Panagakis, Y. Pantic, M.
Publication Year :
2020

Abstract

A novel method for active speaker detection and localization in audio-visual recordings is proposed. The method relies on a specifically tailored matrix decomposition that exploits the intrinsic low-dimensional structure of audio-visual data, namely, the low-rank of the background visual/audio information and the sparsity of the correlated foreground components. Concretely, the data matrix of each modality is modeled as a superposition of two terms: 1) a low-rank matrix capturing the background information and 2) a kernelized sparse matrix capturing the non-linear correlated components among the audio and visual modalities and, hence, revealing the active speaker. To this end, we formulate an appropriate optimization problem that involves the minimization of nuclear- and matrix ell-1-norms, and develop an efficient solver. Experimental results on active speaker detection and localization demonstrate the superior performance of the proposed method over other state-of-the-art approaches. © 1994-2012 IEEE.

Subjects

Subjects :
Computer Science::Sound

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.od......2127..f53ea717a321a6653cdca3760a8c23e1