1. Audio-Visual Tracking of Concurrent Speakers
- Author
-
Xinyuan Qian, Alessio Brutti, Andrea Cavallaro, Oswald Lanz, and Maurizio Omologo
- Subjects
Computer science ,BitTorrent tracker ,business.industry ,Computation ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image plane ,Tracking (particle physics) ,Computer Science Applications ,Image (mathematics) ,Task (computing) ,Discriminative model ,Signal Processing ,Media Technology ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Particle filter - Abstract
Audio-visual tracking of an unknown number of concurrent speakers in 3D is a challenging task, especially when sound and video are collected with a compact sensing platform. In this paper, we propose a tracker that builds on generative and discriminative audio-visual likelihood models formulated in a particle filtering framework. We localize multiple concurrent speakers with a de-emphasized acoustic map assisted by the image detection-derived 3D video observations. The 3D multimodal observations are either assigned to existing tracks for discriminative likelihood computation or used to initialize new tracks. The generative likelihoods rely on color distribution of the target and the de-emphasized acoustic map value. Experiments on AV16.3 and CAV3D datasets show that the proposed tracker outperforms the uni-modal trackers and the state-of-the-art approaches both in 3D and on the image plane.
- Published
- 2022
- Full Text
- View/download PDF