1. Fine-Grained Categorization From RGB-D Images
- Author
-
Yanfu Yan, Mohammad Muntasir Rahman, Ling Shao, Yanhao Tan, Jian Xue, and Ke Lu
- Subjects
business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Context (language use) ,Object (computer science) ,Machine learning ,computer.software_genre ,Convolutional neural network ,Field (computer science) ,Computer Science Applications ,Categorization ,Signal Processing ,Media Technology ,RGB color model ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,Scope (computer science) - Abstract
In the field of computer vision, fine-grained visual categorization has attracted a lot of attention and made great progress due to convolutional neural networks and a large number of publicly available datasets. With next-generation sensing technology, RGB-D cameras can provide high-quality synchronized RGB and depth images for solving many computer vision problems. Although RGB-D cameras have been used in the context of multi-view object category detection and scene understanding, they have not been widely used in fine-grained classification. In this paper, we introduce a multiview RGB-D dataset RGBD-FG for fine-grained categorization. Currently, the dataset contains 93,051 RGB-D images covering 19 super-categories and 50 sub-categories of common vegetables and fruit, and is organized in a hierarchical manner. We provide extensive experimental results to establish state-of-the-art benchmarks for our dataset, illustrating its diversity and scope for improvement through future work. We also propose a novel modality-specific multimodal network called FS-Multimodal network, which can solve two limitations of multimodal networks trained based on fine-tuning techniques: over-fitting and lack of effective depth-specific features. We hope that our study lays the foundations for fine-grained categorization of RGB-D data.
- Published
- 2022
- Full Text
- View/download PDF