Back to Search
Start Over
UniMod1K: Towards a More Universal Large-Scale Dataset and Benchmark for Multi-modal Learning.
- Source :
- International Journal of Computer Vision; Aug2024, Vol. 132 Issue 8, p2845-2860, 16p
- Publication Year :
- 2024
-
Abstract
- The emergence of large-scale high-quality datasets has stimulated the rapid development of deep learning in recent years. However, most computer vision tasks focus on the visual modality only, resulting in a huge imbalance in the number of annotated data for other modalities. While several multi-modal datasets have been made available, the majority of them are confined to only two modalities, serving a single specific computer vision task. To redress the data deficiency for multi-modal learning and applications, a new dataset named UniMod1K is presented in this work. UniMod1K involves three data modalities: vision, depth, and language. For the vision and depth modalities, the UniMod1K dataset contains 1050 RGB-D sequences, comprising a total of some 2.5 million frames. Regarding the language modality, the proposed dataset includes 1050 sentences describing the target object in each video. To demonstrate the advantages of training on a larger multi-modal dataset, such as UniMod1K, and to stimulate research enabled by the dataset, we address several multi-modal tasks, namely multi-modal object tracking and monocular depth estimation. To establish a performance baseline, we propose novel baseline methods for RGB-D object tracking, vision-language tracking and vision-depth-language tracking. Additionally, we conduct comprehensive experiments for each of these tasks. The results highlight the potential of the UniMod1K dataset to improve the performance of multi-modal approaches. The dataset and codes can be accessed at https://github.com/xuefeng-zhu5/UniMod1K. [ABSTRACT FROM AUTHOR]
- Subjects :
- COMPUTER vision
OBJECT tracking (Computer vision)
DEEP learning
MONOCULARS
Subjects
Details
- Language :
- English
- ISSN :
- 09205691
- Volume :
- 132
- Issue :
- 8
- Database :
- Complementary Index
- Journal :
- International Journal of Computer Vision
- Publication Type :
- Academic Journal
- Accession number :
- 178402107
- Full Text :
- https://doi.org/10.1007/s11263-024-01999-8