Back to Search
Start Over
Active Speaker Detection and Localization in Videos Using Low-Rank and Kernelized Sparsity
- Publication Year :
- 2020
-
Abstract
- A novel method for active speaker detection and localization in audio-visual recordings is proposed. The method relies on a specifically tailored matrix decomposition that exploits the intrinsic low-dimensional structure of audio-visual data, namely, the low-rank of the background visual/audio information and the sparsity of the correlated foreground components. Concretely, the data matrix of each modality is modeled as a superposition of two terms: 1) a low-rank matrix capturing the background information and 2) a kernelized sparse matrix capturing the non-linear correlated components among the audio and visual modalities and, hence, revealing the active speaker. To this end, we formulate an appropriate optimization problem that involves the minimization of nuclear- and matrix ell-1-norms, and develop an efficient solver. Experimental results on active speaker detection and localization demonstrate the superior performance of the proposed method over other state-of-the-art approaches. © 1994-2012 IEEE.
- Subjects :
- Computer Science::Sound
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Accession number :
- edsair.od......2127..f53ea717a321a6653cdca3760a8c23e1