Back to Search
Start Over
Audio-Visual Segmentation
- Publication Year :
- 2022
- Publisher :
- arXiv, 2022.
-
Abstract
- We propose to explore a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we construct the first audio-visual segmentation benchmark (AVSBench), providing pixel-wise annotations for the sounding objects in audible videos. Two settings are studied with this benchmark: 1) semi-supervised audio-visual segmentation with a single sound source and 2) fully-supervised audio-visual segmentation with multiple sound sources. To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process. We also design a regularization loss to encourage the audio-visual mapping during training. Quantitative and qualitative experiments on the AVSBench compare our approach to several existing methods from related tasks, demonstrating that the proposed method is promising for building a bridge between the audio and pixel-wise visual semantics. Code is available at https://github.com/OpenNLPLab/AVSBench.<br />Comment: ECCV 2022; Code is available at https://github.com/OpenNLPLab/AVSBench
- Subjects :
- FOS: Computer and information sciences
Sound (cs.SD)
Audio and Speech Processing (eess.AS)
Computer Vision and Pattern Recognition (cs.CV)
Image and Video Processing (eess.IV)
Computer Science - Computer Vision and Pattern Recognition
ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
FOS: Electrical engineering, electronic engineering, information engineering
Electrical Engineering and Systems Science - Image and Video Processing
Computer Science - Multimedia
Computer Science - Sound
Electrical Engineering and Systems Science - Audio and Speech Processing
Multimedia (cs.MM)
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....22c4f1730299dfd7d96cc98e931c918d
- Full Text :
- https://doi.org/10.48550/arxiv.2207.05042