238 results on '"Wei, Xiu-Shen"'
Search Results
2. Equiangular Basis Vectors: A Novel Paradigm for Classification Tasks
- Author
-
Shen, Yang, Sun, Xuhao, Wei, Xiu-Shen, Xu, Anqi, and Gao, Lingyan
- Published
- 2024
- Full Text
- View/download PDF
3. Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale Fine-Grained Image Retrieval
- Author
-
Wei, Xiu-Shen, Shen, Yang, Sun, Xuhao, Wang, Peng, and Peng, Yuxin
- Subjects
Computer Science - Information Retrieval ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Multimedia - Abstract
Our work focuses on tackling large-scale fine-grained image retrieval as ranking the images depicting the concept of interests (i.e., the same sub-category labels) highest based on the fine-grained details in the query. It is desirable to alleviate the challenges of both fine-grained nature of small inter-class variations with large intra-class variations and explosive growth of fine-grained data for such a practical task. In this paper, we propose attribute-aware hashing networks with self-consistency for generating attribute-aware hash codes to not only make the retrieval process efficient, but also establish explicit correspondences between hash codes and visual attributes. Specifically, based on the captured visual representations by attention, we develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors from the appearance-specific visual representations without attribute annotations. Our models are also equipped with a feature decorrelation constraint upon these attribute vectors to strengthen their representative abilities. Then, driven by preserving original entities' similarity, the required hash codes can be generated from these attribute-specific vectors and thus become attribute-aware. Furthermore, to combat simplicity bias in deep hashing, we consider the model design from the perspective of the self-consistency principle and propose to further enhance models' self-consistency by equipping an additional image reconstruction path. Comprehensive quantitative experiments under diverse empirical settings on six fine-grained retrieval datasets and two generic retrieval datasets show the superiority of our models over competing methods., Comment: Accepted by IEEE TPAMI
- Published
- 2023
4. Hyperbolic Space with Hierarchical Margin Boosts Fine-Grained Learning from Coarse Labels
- Author
-
Xu, Shu-Lin, Sun, Yifan, Zhang, Faen, Xu, Anqi, Wei, Xiu-Shen, and Yang, Yi
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Multimedia - Abstract
Learning fine-grained embeddings from coarse labels is a challenging task due to limited label granularity supervision, i.e., lacking the detailed distinctions required for fine-grained tasks. The task becomes even more demanding when attempting few-shot fine-grained recognition, which holds practical significance in various applications. To address these challenges, we propose a novel method that embeds visual embeddings into a hyperbolic space and enhances their discriminative ability with a hierarchical cosine margins manner. Specifically, the hyperbolic space offers distinct advantages, including the ability to capture hierarchical relationships and increased expressive power, which favors modeling fine-grained objects. Based on the hyperbolic space, we further enforce relatively large/small similarity margins between coarse/fine classes, respectively, yielding the so-called hierarchical cosine margins manner. While enforcing similarity margins in the regular Euclidean space has become popular for deep embedding learning, applying it to the hyperbolic space is non-trivial and validating the benefit for coarse-to-fine generalization is valuable. Extensive experiments conducted on five benchmark datasets showcase the effectiveness of our proposed method, yielding state-of-the-art results surpassing competing methods., Comment: Accepted by NeurIPS 2023
- Published
- 2023
5. Hawkeye: A PyTorch-based Library for Fine-Grained Image Recognition with Deep Learning
- Author
-
He, Jiabei, Shen, Yang, Wei, Xiu-Shen, and Wu, Ye
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Fine-Grained Image Recognition (FGIR) is a fundamental and challenging task in computer vision and multimedia that plays a crucial role in Intellectual Economy and Industrial Internet applications. However, the absence of a unified open-source software library covering various paradigms in FGIR poses a significant challenge for researchers and practitioners in the field. To address this gap, we present Hawkeye, a PyTorch-based library for FGIR with deep learning. Hawkeye is designed with a modular architecture, emphasizing high-quality code and human-readable configuration, providing a comprehensive solution for FGIR tasks. In Hawkeye, we have implemented 16 state-of-the-art fine-grained methods, covering 6 different paradigms, enabling users to explore various approaches for FGIR. To the best of our knowledge, Hawkeye represents the first open-source PyTorch-based library dedicated to FGIR. It is publicly available at https://github.com/Hawkeye-FineGrained/Hawkeye/, providing researchers and practitioners with a powerful tool to advance their research and development in the field of FGIR., Comment: ACM Multimedia 2023 Open Source Software Competition Winner Entry. X.-S. Wei is the corresponding author
- Published
- 2023
6. Watch out Venomous Snake Species: A Solution to SnakeCLEF2023
- Author
-
Hu, Feiran, Wang, Peng, Li, Yangyang, Duan, Chenlong, Zhu, Zijian, Wang, Fei, Zhang, Faen, Li, Yong, and Wei, Xiu-Shen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The SnakeCLEF2023 competition aims to the development of advanced algorithms for snake species identification through the analysis of images and accompanying metadata. This paper presents a method leveraging utilization of both images and metadata. Modern CNN models and strong data augmentation are utilized to learn better representation of images. To relieve the challenge of long-tailed distribution, seesaw loss is utilized in our method. We also design a light model to calculate prior probabilities using metadata features extracted from CLIP in post processing stage. Besides, we attach more importance to venomous species by assigning venomous species labels to some examples that model is uncertain about. Our method achieves 91.31% score of the final metric combined of F1 and other metrics on private leaderboard, which is the 1st place among the participators. The code is available at https://github.com/xiaoxsparraw/CLEF2023., Comment: This work was the winner solution of the SnakeCLEF2023 challenge
- Published
- 2023
7. Equiangular Basis Vectors
- Author
-
Shen, Yang, Sun, Xuhao, and Wei, Xiu-Shen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We propose Equiangular Basis Vectors (EBVs) for classification tasks. In deep neural networks, models usually end with a k-way fully connected layer with softmax to handle different classification tasks. The learning objective of these methods can be summarized as mapping the learned feature representations to the samples' label space. While in metric learning approaches, the main objective is to learn a transformation function that maps training data points from the original space to a new space where similar points are closer while dissimilar points become farther apart. Different from previous methods, our EBVs generate normalized vector embeddings as "predefined classifiers" which are required to not only be with the equal status between each other, but also be as orthogonal as possible. By minimizing the spherical distance of the embedding of an input between its categorical EBV in training, the predictions can be obtained by identifying the categorical EBV with the smallest distance during inference. Various experiments on the ImageNet-1K dataset and other downstream tasks demonstrate that our method outperforms the general fully connected classifier while it does not introduce huge additional computation compared with classical metric learning methods. Our EBVs won the first place in the 2022 DIGIX Global AI Challenge, and our code is open-source and available at https://github.com/NJUST-VIPGroup/Equiangular-Basis-Vectors., Comment: CVPR 2023
- Published
- 2023
8. Delving Deep into Simplicity Bias for Long-Tailed Image Recognition
- Author
-
Wei, Xiu-Shen, Sun, Xuhao, Shen, Yang, Xu, Anqi, Wang, Peng, and Zhang, Faen
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Simplicity Bias (SB) is a phenomenon that deep neural networks tend to rely favorably on simpler predictive patterns but ignore some complex features when applied to supervised discriminative tasks. In this work, we investigate SB in long-tailed image recognition and find the tail classes suffer more severely from SB, which harms the generalization performance of such underrepresented classes. We empirically report that self-supervised learning (SSL) can mitigate SB and perform in complementary to the supervised counterpart by enriching the features extracted from tail samples and consequently taking better advantage of such rare samples. However, standard SSL methods are designed without explicitly considering the inherent data distribution in terms of classes and may not be optimal for long-tailed distributed data. To address this limitation, we propose a novel SSL method tailored to imbalanced data. It leverages SSL by triple diverse levels, i.e., holistic-, partial-, and augmented-level, to enhance the learning of predictive complex patterns, which provides the potential to overcome the severe SB on tail data. Both quantitative and qualitative experimental results on five long-tailed benchmark datasets show our method can effectively mitigate SB and significantly outperform the competing state-of-the-arts.
- Published
- 2023
9. SEMICON: A Learning-to-hash Solution for Large-scale Fine-grained Image Retrieval
- Author
-
Shen, Yang, Sun, Xuhao, Wei, Xiu-Shen, Jiang, Qing-Yuan, and Yang, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper, we propose Suppression-Enhancing Mask based attention and Interactive Channel transformatiON (SEMICON) to learn binary hash codes for dealing with large-scale fine-grained image retrieval tasks. In SEMICON, we first develop a suppression-enhancing mask (SEM) based attention to dynamically localize discriminative image regions. More importantly, different from existing attention mechanism simply erasing previous discriminative regions, our SEM is developed to restrain such regions and then discover other complementary regions by considering the relation between activated regions in a stage-by-stage fashion. In each stage, the interactive channel transformation (ICON) module is afterwards designed to exploit correlations across channels of attended activation tensors. Since channels could generally correspond to the parts of fine-grained objects, the part correlation can be also modeled accordingly, which further improves fine-grained retrieval accuracy. Moreover, to be computational economy, ICON is realized by an efficient two-step process. Finally, the hash learning of our SEMICON consists of both global- and local-level branches for better representing fine-grained objects and then generating binary hash codes explicitly corresponding to multiple levels. Experiments on five benchmark fine-grained datasets show our superiority over competing methods., Comment: ECCV 2022
- Published
- 2022
10. An Embarrassingly Simple Approach to Semi-Supervised Few-Shot Learning
- Author
-
Wei, Xiu-Shen, Xu, He-Yang, Zhang, Faen, Peng, Yuxin, and Zhou, Wei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Semi-supervised few-shot learning consists in training a classifier to adapt to new tasks with limited labeled data and a fixed quantity of unlabeled data. Many sophisticated methods have been developed to address the challenges this problem comprises. In this paper, we propose a simple but quite effective approach to predict accurate negative pseudo-labels of unlabeled data from an indirect learning perspective, and then augment the extremely label-constrained support set in few-shot classification tasks. Our approach can be implemented in just few lines of code by only using off-the-shelf operations, yet it is able to outperform state-of-the-art methods on four benchmark datasets., Comment: Accepted by NeurIPS 2022
- Published
- 2022
11. Bridge the Gap between Supervised and Unsupervised Learning for Fine-Grained Classification
- Author
-
Wang, Jiabao, Li, Yang, Wei, Xiu-Shen, Li, Hang, Miao, Zhuang, and Zhang, Rui
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Unsupervised learning technology has caught up with or even surpassed supervised learning technology in general object classification (GOC) and person re-identification (re-ID). However, it is found that the unsupervised learning of fine-grained visual classification (FGVC) is more challenging than GOC and person re-ID. In order to bridge the gap between unsupervised and supervised learning for FGVC, we investigate the essential factors (including feature extraction, clustering, and contrastive learning) for the performance gap between supervised and unsupervised FGVC. Furthermore, we propose a simple, effective, and practical method, termed as UFCL, to alleviate the gap. Three key issues are concerned and improved: First, we introduce a robust and powerful backbone, ResNet50-IBN, which has an ability of domain adaptation when we transfer ImageNet pre-trained models to FGVC tasks. Next, we propose to introduce HDBSCAN instead of DBSCAN to do clustering, which can generate better clusters for adjacent categories with fewer hyper-parameters. Finally, we propose a weighted feature agent and its updating mechanism to do contrastive learning by using the pseudo labels with inevitable noise, which can improve the optimization process of learning the parameters of the network. The effectiveness of our UFCL is verified on CUB-200-2011, Oxford-Flowers, Oxford-Pets, Stanford-Dogs, Stanford-Cars and FGVC-Aircraft datasets. Under the unsupervised FGVC setting, we achieve state-of-the-art results, and analyze the key factors and the important parameters to provide a practical guidance., Comment: 12 pages, 7 figures
- Published
- 2022
12. Relieving Long-tailed Instance Segmentation via Pairwise Class Balance
- Author
-
He, Yin-Yin, Zhang, Peizhen, Wei, Xiu-Shen, Zhang, Xiangyu, and Sun, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes. It causes severe biases of the head classes (with majority samples) against the tailed ones. This renders "how to appropriately define and alleviate the bias" one of the most important issues. Prior works mainly use label distribution or mean score information to indicate a coarse-grained bias. In this paper, we explore to excavate the confusion matrix, which carries the fine-grained misclassification details, to relieve the pairwise biases, generalizing the coarse one. To this end, we propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences. PCB generates fightback soft labels for regularization during training. Besides, an iterative learning paradigm is developed to support a progressive and smooth regularization in such debiasing. PCB can be plugged and played to any existing method as a complement. Experimental results on LVIS demonstrate that our method achieves state-of-the-art performance without bells and whistles. Superior results across various architectures show the generalization ability. The code and trained models are available at https://github.com/megvii-research/PCB., Comment: Accepted to CVPR 2022
- Published
- 2022
13. Fine-Grained Image Recognition
- Author
-
Wei, Xiu-Shen, Medioni, Gerard, Series Editor, Dickinson, Sven, Series Editor, and Wei, Xiu-Shen
- Published
- 2023
- Full Text
- View/download PDF
14. Benchmark Datasets
- Author
-
Wei, Xiu-Shen, Medioni, Gerard, Series Editor, Dickinson, Sven, Series Editor, and Wei, Xiu-Shen
- Published
- 2023
- Full Text
- View/download PDF
15. Resouces and Future Work
- Author
-
Wei, Xiu-Shen, Medioni, Gerard, Series Editor, Dickinson, Sven, Series Editor, and Wei, Xiu-Shen
- Published
- 2023
- Full Text
- View/download PDF
16. Background
- Author
-
Wei, Xiu-Shen, Medioni, Gerard, Series Editor, Dickinson, Sven, Series Editor, and Wei, Xiu-Shen
- Published
- 2023
- Full Text
- View/download PDF
17. Fine-Grained Image Analysis with Deep Learning: A Survey
- Author
-
Wei, Xiu-Shen, Song, Yi-Zhe, Mac Aodha, Oisin, Wu, Jianxin, Peng, Yuxin, Tang, Jinhui, Yang, Jian, and Belongie, Serge
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, e.g., species of birds or models of cars. The small inter-class and large intra-class variation inherent to fine-grained image analysis makes it a challenging problem. Capitalizing on advances in deep learning, in recent years we have witnessed remarkable progress in deep learning powered FGIA. In this paper we present a systematic survey of these advances, where we attempt to re-define and broaden the field of FGIA by consolidating two fundamental fine-grained research areas -- fine-grained image recognition and fine-grained image retrieval. In addition, we also review other key issues of FGIA, such as publicly available benchmark datasets and related domain-specific applications. We conclude by highlighting several research directions and open problems which need further exploration from the community., Comment: Accepted by IEEE TPAMI
- Published
- 2021
18. CAT: a coarse-to-fine attention tree for semantic change detection
- Author
-
Wei, Xiu-Shen, Xu, Yu-Yan, Zhang, Chen-Lin, Xia, Gui-Song, and Peng, Yu-Xin
- Published
- 2023
- Full Text
- View/download PDF
19. Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach
- Author
-
Sun, Zeren, Yao, Yazhou, Wei, Xiu-Shen, Zhang, Yongshun, Shen, Fumin, Wu, Jianxin, Zhang, Jian, and Shen, Heng-Tao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Learning from the web can ease the extreme dependence of deep learning on large-scale manually labeled datasets. Especially for fine-grained recognition, which targets at distinguishing subordinate categories, it will significantly reduce the labeling costs by leveraging free web data. Despite its significant practical and research value, the webly supervised fine-grained recognition problem is not extensively studied in the computer vision community, largely due to the lack of high-quality datasets. To fill this gap, in this paper we construct two new benchmark webly supervised fine-grained datasets, termed WebFG-496 and WebiNat-5089, respectively. In concretely, WebFG-496 consists of three sub-datasets containing a total of 53,339 web training images with 200 species of birds (Web-bird), 100 types of aircrafts (Web-aircraft), and 196 models of cars (Web-car). For WebiNat-5089, it contains 5089 sub-categories and more than 1.1 million web training images, which is the largest webly supervised fine-grained dataset ever. As a minor contribution, we also propose a novel webly supervised method (termed "{Peer-learning}") for benchmarking these datasets.~Comprehensive experimental results and analyses on two new benchmark datasets demonstrate that the proposed method achieves superior performance over the competing baseline models and states-of-the-art. Our benchmark datasets and the source codes of Peer-learning have been made available at {\url{https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset}}., Comment: accepted by ICCV 2021
- Published
- 2021
20. Contextualizing Meta-Learning via Learning to Decompose
- Author
-
Ye, Han-Jia, Zhou, Da-Wei, Hong, Lanqing, Li, Zhenguo, Wei, Xiu-Shen, and Zhan, De-Chuan
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Meta-learning has emerged as an efficient approach for constructing target models based on support sets. For example, the meta-learned embeddings enable the construction of target nearest-neighbor classifiers for specific tasks by pulling instances closer to their same-class neighbors. However, a single instance can be annotated from various latent attributes, making visually similar instances inside or across support sets have different labels and diverse relationships with others. Consequently, a uniform meta-learned strategy for inferring the target model from the support set fails to capture the instance-wise ambiguous similarity. To this end, we propose Learning to Decompose Network (LeadNet) to contextualize the meta-learned ``support-to-target'' strategy, leveraging the context of instances with one or mixed latent attributes in a support set. In particular, the comparison relationship between instances is decomposed w.r.t. multiple embedding spaces. LeadNet learns to automatically select the strategy associated with the right attribute via incorporating the change of comparison across contexts} with polysemous embeddings. We demonstrate the superiority of LeadNet in various applications, including exploring multiple views of confusing data, out-of-distribution recognition, and few-shot image classification., Comment: Accepted to TPAMI. Code is available at: https://github.com/zhoudw-zdw/TPAMI-LeadNet
- Published
- 2021
21. Distilling Virtual Examples for Long-tailed Recognition
- Author
-
He, Yin-Yin, Wu, Jianxin, and Wei, Xiu-Shen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We tackle the long-tailed visual recognition problem from the knowledge distillation perspective by proposing a Distill the Virtual Examples (DiVE) method. Specifically, by treating the predictions of a teacher model as virtual examples, we prove that distilling from these virtual examples is equivalent to label distribution learning under certain constraints. We show that when the virtual example distribution becomes flatter than the original input distribution, the under-represented tail classes will receive significant improvements, which is crucial in long-tailed recognition. The proposed DiVE method can explicitly tune the virtual example distribution to become flat. Extensive experiments on three benchmark datasets, including the large-scale iNaturalist ones, justify that the proposed DiVE method can significantly outperform state-of-the-art methods. Furthermore, additional analyses and experiments verify the virtual example interpretation, and demonstrate the effectiveness of tailored designs in DiVE for long-tailed problems., Comment: Accepted to ICCV 2021
- Published
- 2021
22. Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification
- Author
-
Wang, Peng, Han, Kai, Wei, Xiu-Shen, Zhang, Lei, and Wang, Lei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Learning discriminative image representations plays a vital role in long-tailed image classification because it can ease the classifier learning in imbalanced cases. Given the promising performance contrastive learning has shown recently in representation learning, in this work, we explore effective supervised contrastive learning strategies and tailor them to learn better image representations from imbalanced data in order to boost the classification accuracy thereon. Specifically, we propose a novel hybrid network structure being composed of a supervised contrastive loss to learn image representations and a cross-entropy loss to learn classifiers, where the learning is progressively transited from feature learning to the classifier learning to embody the idea that better features make better classifiers. We explore two variants of contrastive loss for feature learning, which vary in the forms but share a common idea of pulling the samples from the same class together in the normalized embedding space and pushing the samples from different classes apart. One of them is the recently proposed supervised contrastive (SC) loss, which is designed on top of the state-of-the-art unsupervised contrastive loss by incorporating positive samples from the same class. The other is a prototypical supervised contrastive (PSC) learning strategy which addresses the intensive memory consumption in standard SC loss and thus shows more promise under limited memory budget. Extensive experiments on three long-tailed classification datasets demonstrate the advantage of the proposed contrastive learning based hybrid networks in long-tailed classification., Comment: CVPR 2021
- Published
- 2021
23. Introduction
- Author
-
Wei, Xiu-Shen, Medioni, Gerard, Series Editor, Dickinson, Sven, Series Editor, and Wei, Xiu-Shen
- Published
- 2023
- Full Text
- View/download PDF
24. Tips and Tricks for Webly-Supervised Fine-Grained Recognition: Learning from the WebFG 2020 Challenge
- Author
-
Wei, Xiu-Shen, Xu, Yu-Yan, Yao, Yazhou, Wei, Jia, Xi, Si, Xu, Wenyuan, Zhang, Weidong, Lv, Xiaoxin, Fu, Dengpan, Li, Qing, Chen, Baoying, Guo, Haojie, Xue, Taolue, Jing, Haipeng, Wang, Zhiheng, Zhang, Tianming, and Zhang, Mingwen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
WebFG 2020 is an international challenge hosted by Nanjing University of Science and Technology, University of Edinburgh, Nanjing University, The University of Adelaide, Waseda University, etc. This challenge mainly pays attention to the webly-supervised fine-grained recognition problem. In the literature, existing deep learning methods highly rely on large-scale and high-quality labeled training data, which poses a limitation to their practicability and scalability in real world applications. In particular, for fine-grained recognition, a visual task that requires professional knowledge for labeling, the cost of acquiring labeled training data is quite high. It causes extreme difficulties to obtain a large amount of high-quality training data. Therefore, utilizing free web data to train fine-grained recognition models has attracted increasing attentions from researchers in the fine-grained community. This challenge expects participants to develop webly-supervised fine-grained recognition methods, which leverages web images in training fine-grained recognition models to ease the extreme dependence of deep learning methods on large-scale manually labeled datasets and to enhance their practicability and scalability. In this technical report, we have pulled together the top WebFG 2020 solutions of total 54 competing teams, and discuss what methods worked best across the set of winning teams, and what surprisingly did not help., Comment: This is a technical report of the WebFG 2020 challenge (https://sites.google.com/view/webfg2020) associated with ACCV 2020
- Published
- 2020
25. Salvage Reusable Samples from Noisy Data for Robust Learning
- Author
-
Sun, Zeren, Hua, Xian-Sheng, Yao, Yazhou, Wei, Xiu-Shen, Hu, Guosheng, and Zhang, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Multimedia - Abstract
Due to the existence of label noise in web images and the high memorization capacity of deep neural networks, training deep fine-grained (FG) models directly through web images tends to have an inferior recognition ability. In the literature, to alleviate this issue, loss correction methods try to estimate the noise transition matrix, but the inevitable false correction would cause severe accumulated errors. Sample selection methods identify clean ("easy") samples based on the fact that small losses can alleviate the accumulated errors. However, "hard" and mislabeled examples that can both boost the robustness of FG models are also dropped. To this end, we propose a certainty-based reusable sample selection and correction approach, termed as CRSSC, for coping with label noise in training deep FG models with web images. Our key idea is to additionally identify and correct reusable samples, and then leverage them together with clean examples to update the networks. We demonstrate the superiority of the proposed approach from both theoretical and experimental perspectives., Comment: accepted by ACM MM 2020
- Published
- 2020
26. ExchNet: A Unified Hashing Network for Large-Scale Fine-Grained Image Retrieval
- Author
-
Cui, Quan, Jiang, Qing-Yuan, Wei, Xiu-Shen, Li, Wu-Jun, and Yoshie, Osamu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Retrieving content relevant images from a large-scale fine-grained dataset could suffer from intolerably slow query speed and highly redundant storage cost, due to high-dimensional real-valued embeddings which aim to distinguish subtle visual differences of fine-grained objects. In this paper, we study the novel fine-grained hashing topic to generate compact binary codes for fine-grained images, leveraging the search and storage efficiency of hash learning to alleviate the aforementioned problems. Specifically, we propose a unified end-to-end trainable network, termed as ExchNet. Based on attention mechanisms and proposed attention constraints, it can firstly obtain both local and global features to represent object parts and whole fine-grained objects, respectively. Furthermore, to ensure the discriminative ability and semantic meaning's consistency of these part-level features across images, we design a local feature alignment approach by performing a feature exchanging operation. Later, an alternative learning algorithm is employed to optimize the whole ExchNet and then generate the final binary hash codes. Validated by extensive experiments, our proposal consistently outperforms state-of-the-art generic hashing methods on five fine-grained datasets, which shows our effectiveness. Moreover, compared with other approximate nearest neighbor methods, ExchNet achieves the best speed-up and storage reduction, revealing its efficiency and practicality., Comment: Accepted by ECCV2020
- Published
- 2020
27. Hierarchical Context Embedding for Region-based Object Detection
- Author
-
Chen, Zhao-Min, Jin, Xin, Zhao, Borui, Wei, Xiu-Shen, and Guo, Yanwen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
State-of-the-art two-stage object detectors apply a classifier to a sparse set of object proposals, relying on region-wise features extracted by RoIPool or RoIAlign as inputs. The region-wise features, in spite of aligning well with the proposal locations, may still lack the crucial context information which is necessary for filtering out noisy background detections, as well as recognizing objects possessing no distinctive appearances. To address this issue, we present a simple but effective Hierarchical Context Embedding (HCE) framework, which can be applied as a plug-and-play component, to facilitate the classification ability of a series of region-based detectors by mining contextual cues. Specifically, to advance the recognition of context-dependent object categories, we propose an image-level categorical embedding module which leverages the holistic image-level context to learn object-level concepts. Then, novel RoI features are generated by exploiting hierarchically embedded context information beneath both whole images and interested regions, which are also complementary to conventional RoI features. Moreover, to make full use of our hierarchical contextual RoI features, we propose the early-and-late fusion strategies (i.e., feature fusion and confidence fusion), which can be combined to boost the classification accuracy of region-based detectors. Comprehensive experiments demonstrate that our HCE framework is flexible and generalizable, leading to significant and consistent improvements upon various region-based detectors, including FPN, Cascade R-CNN and Mask R-CNN., Comment: Accepted by ECCV 2020
- Published
- 2020
- Full Text
- View/download PDF
28. Learning Semantically Enhanced Feature for Fine-Grained Image Classification
- Author
-
Luo, Wei, Zhang, Hengmin, Li, Jun, and Wei, Xiu-Shen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We aim to provide a computationally cheap yet effective approach for fine-grained image classification (FGIC) in this letter. Unlike previous methods that rely on complex part localization modules, our approach learns fine-grained features by enhancing the semantics of sub-features of a global feature. Specifically, we first achieve the sub-feature semantic by arranging feature channels of a CNN into different groups through channel permutation. Meanwhile, to enhance the discriminability of sub-features, the groups are guided to be activated on object parts with strong discriminability by a weighted combination regularization. Our approach is parameter parsimonious and can be easily integrated into the backbone model as a plug-and-play module for end-to-end training with only image-level supervision. Experiments verified the effectiveness of our approach and validated its comparable performance to the state-of-the-art methods. Code is available at https://github.com/cswluo/SEF, Comment: Accepted by IEEE Signal Processing Letters. 5 pages, 4 figures, 4 tables
- Published
- 2020
- Full Text
- View/download PDF
29. PyRetri: A PyTorch-based Library for Unsupervised Image Retrieval by Deep Convolutional Neural Networks
- Author
-
Hu, Benyi, Song, Ren-Jie, Wei, Xiu-Shen, Yao, Yazhou, Hua, Xian-Sheng, and Liu, Yuehu
- Subjects
Computer Science - Information Retrieval ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Multimedia - Abstract
Despite significant progress of applying deep learning methods to the field of content-based image retrieval, there has not been a software library that covers these methods in a unified manner. In order to fill this gap, we introduce PyRetri, an open source library for deep learning based unsupervised image retrieval. The library encapsulates the retrieval process in several stages and provides functionality that covers various prominent methods for each stage. The idea underlying its design is to provide a unified platform for deep learning based image retrieval research, with high usability and extensibility. To the best of our knowledge, this is the first open-source library for unsupervised image retrieval by deep learning., Comment: Accepted by ACM Multimedia Conference 2020. PyRetri is open-source and available at https://github.com/PyRetri/PyRetri
- Published
- 2020
30. Exploring Categorical Regularization for Domain Adaptive Object Detection
- Author
-
Xu, Chang-Dong, Zhao, Xing-Ran, Jin, Xin, and Wei, Xiu-Shen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper, we tackle the domain adaptive object detection problem, where the main challenge lies in significant domain gaps between source and target domains. Previous work seeks to plainly align image-level and instance-level shifts to eventually minimize the domain discrepancy. However, they still overlook to match crucial image regions and important instances across domains, which will strongly affect domain shift mitigation. In this work, we propose a simple but effective categorical regularization framework for alleviating this issue. It can be applied as a plug-and-play component on a series of Domain Adaptive Faster R-CNN methods which are prominent for dealing with domain adaptive detection. Specifically, by integrating an image-level multi-label classifier upon the detection backbone, we can obtain the sparse but crucial image regions corresponding to categorical information, thanks to the weakly localization ability of the classification manner. Meanwhile, at the instance level, we leverage the categorical consistency between image-level predictions (by the classifier) and instance-level predictions (by the detection head) as a regularization factor to automatically hunt for the hard aligned instances of target domains. Extensive experiments of various domain shift scenarios show that our method obtains a significant performance gain over original Domain Adaptive Faster R-CNN detectors. Furthermore, qualitative visualization and analyses can demonstrate the ability of our method for attending on the key regions/instances targeting on domain adaptation. Our code is open-source and available at \url{https://github.com/Megvii-Nanjing/CR-DA-DET}., Comment: To appear in CVPR 2020. X.-S. Wei is the corresponding author
- Published
- 2020
31. BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition
- Author
-
Zhou, Boyan, Cui, Quan, Wei, Xiu-Shen, and Chen, Zhao-Min
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Our work focuses on tackling the challenging but natural visual recognition task of long-tailed data distribution (i.e., a few classes occupy most of the data, while most classes have rarely few samples). In the literature, class re-balancing strategies (e.g., re-weighting and re-sampling) are the prominent and effective methods proposed to alleviate the extreme imbalance for dealing with long-tailed problems. In this paper, we firstly discover that these re-balancing methods achieving satisfactory recognition accuracy owe to that they could significantly promote the classifier learning of deep networks. However, at the same time, they will unexpectedly damage the representative ability of the learned deep features to some extent. Therefore, we propose a unified Bilateral-Branch Network (BBN) to take care of both representation learning and classifier learning simultaneously, where each branch does perform its own duty separately. In particular, our BBN model is further equipped with a novel cumulative learning strategy, which is designed to first learn the universal patterns and then pay attention to the tail data gradually. Extensive experiments on four benchmark datasets, including the large-scale iNaturalist ones, justify that the proposed BBN can significantly outperform state-of-the-art methods. Furthermore, validation experiments can demonstrate both our preliminary discovery and effectiveness of tailored designs in BBN for long-tailed problems. Our method won the first place in the iNaturalist 2019 large scale species classification competition, and our code is open-source and available at https://github.com/Megvii-Nanjing/BBN., Comment: Accepted by CVPR 2020; Our work won the first place in the iNaturalist 2019 large scale species classification competition, and our code is open-source and available at https://github.com/Megvii-Nanjing/BBN
- Published
- 2019
32. Bridge the gap between supervised and unsupervised learning for fine-grained classification
- Author
-
Wang, Jiabao, Li, Yang, Wei, Xiu-Shen, Li, Hang, Miao, Zhuang, and Zhang, Rui
- Published
- 2023
- Full Text
- View/download PDF
33. Deep Learning for Fine-Grained Image Analysis: A Survey
- Author
-
Wei, Xiu-Shen, Wu, Jianxin, and Cui, Quan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Computer vision (CV) is the process of using machines to understand and analyze imagery, which is an integral branch of artificial intelligence. Among various research areas of CV, fine-grained image analysis (FGIA) is a longstanding and fundamental problem, and has become ubiquitous in diverse real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, \eg, species of birds or models of cars. The small inter-class variations and the large intra-class variations caused by the fine-grained nature makes it a challenging problem. During the booming of deep learning, recent years have witnessed remarkable progress of FGIA using deep learning techniques. In this paper, we aim to give a survey on recent advances of deep learning based FGIA techniques in a systematic way. Specifically, we organize the existing studies of FGIA techniques into three major categories: fine-grained image recognition, fine-grained image retrieval and fine-grained image generation. In addition, we also cover some other important issues of FGIA, such as publicly available benchmark datasets and its related domain specific applications. Finally, we conclude this survey by highlighting several directions and open problems which need be further explored by the community in the future., Comment: Project page: http://www.weixiushen.com/project/Awesome_FGIA/Awesome_FGIA.html. arXiv admin note: text overlap with arXiv:1902.06068 by other authors
- Published
- 2019
34. Multi-Label Image Recognition with Graph Convolutional Networks
- Author
-
Chen, Zhao-Min, Wei, Xiu-Shen, Wang, Peng, and Guo, Yanwen
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
The task of multi-label image recognition is to predict a set of object labels that present in an image. As objects normally co-occur in an image, it is desirable to model the label dependencies to improve the recognition performance. To capture and explore such important dependencies, we propose a multi-label classification model based on Graph Convolutional Network (GCN). The model builds a directed graph over the object labels, where each node (label) is represented by word embeddings of a label, and GCN is learned to map this label graph into a set of inter-dependent object classifiers. These classifiers are applied to the image descriptors extracted by another sub-net, enabling the whole network to be end-to-end trainable. Furthermore, we propose a novel re-weighted scheme to create an effective label correlation matrix to guide information propagation among the nodes in GCN. Experiments on two multi-label image recognition datasets show that our approach obviously outperforms other existing state-of-the-art methods. In addition, visualization analyses reveal that the classifiers learned by our model maintain meaningful semantic topology., Comment: To appear at CVPR 2019 (Source codes have been released on https://github.com/chenzhaomin123/ML_GCN)
- Published
- 2019
35. RPC: A Large-Scale Retail Product Checkout Dataset
- Author
-
Wei, Xiu-Shen, Cui, Quan, Yang, Lei, Wang, Peng, and Liu, Lingqiao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Over recent years, emerging interest has occurred in integrating computer vision technology into the retail industry. Automatic checkout (ACO) is one of the critical problems in this area which aims to automatically generate the shopping list from the images of the products to purchase. The main challenge of this problem comes from the large scale and the fine-grained nature of the product categories as well as the difficulty for collecting training images that reflect the realistic checkout scenarios due to continuous update of the products. Despite its significant practical and research value, this problem is not extensively studied in the computer vision community, largely due to the lack of a high-quality dataset. To fill this gap, in this work we propose a new dataset to facilitate relevant research. Our dataset enjoys the following characteristics: (1) It is by far the largest dataset in terms of both product image quantity and product categories. (2) It includes single-product images taken in a controlled environment and multi-product images taken by the checkout system. (3) It provides different levels of annotations for the check-out images. Comparing with the existing datasets, ours is closer to the realistic setting and can derive a variety of research problems. Besides the dataset, we also benchmark the performance on this dataset with various approaches. The dataset and related resources can be found at \url{https://rpc-dataset.github.io/}., Comment: Project page: https://rpc-dataset.github.io/
- Published
- 2019
36. Coarse-to-fine: A RNN-based hierarchical attention model for vehicle re-identification
- Author
-
Wei, Xiu-Shen, Zhang, Chen-Lin, Liu, Lingqiao, Shen, Chunhua, and Wu, Jianxin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Vehicle re-identification is an important problem and becomes desirable with the rapid expansion of applications in video surveillance and intelligent transportation. By recalling the identification process of human vision, we are aware that there exists a native hierarchical dependency when humans identify different vehicles. Specifically, humans always firstly determine one vehicle's coarse-grained category, i.e., the car model/type. Then, under the branch of the predicted car model/type, they are going to identify specific vehicles by relying on subtle visual cues, e.g., customized paintings and windshield stickers, at the fine-grained level. Inspired by the coarse-to-fine hierarchical process, we propose an end-to-end RNN-based Hierarchical Attention (RNN-HA) classification model for vehicle re-identification. RNN-HA consists of three mutually coupled modules: the first module generates image representations for vehicle images, the second hierarchical module models the aforementioned hierarchical dependent relationship, and the last attention module focuses on capturing the subtle visual information distinguishing specific vehicles from each other. By conducting comprehensive experiments on two vehicle re-identification benchmark datasets VeRi and VehicleID, we demonstrate that the proposed model achieves superior performance over state-of-the-art methods., Comment: ACCV 2018
- Published
- 2018
37. Automatic Check-Out via Prototype-Based Classifier Learning from Single-Product Exemplars
- Author
-
Chen, Hao, Wei, Xiu-Shen, Zhang, Faen, Shen, Yang, Xu, Hui, Xiao, Liang, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Avidan, Shai, editor, Brostow, Gabriel, editor, Cissé, Moustapha, editor, Farinella, Giovanni Maria, editor, and Hassner, Tal, editor
- Published
- 2022
- Full Text
- View/download PDF
38. Piecewise classifier mappings: Learning fine-grained learners for novel categories with few examples
- Author
-
Wei, Xiu-Shen, Wang, Peng, Liu, Lingqiao, Shen, Chunhua, and Wu, Jianxin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Humans are capable of learning a new fine-grained concept with very little supervision, \emph{e.g.}, few exemplary images for a species of bird, yet our best deep learning systems need hundreds or thousands of labeled examples. In this paper, we try to reduce this gap by studying the fine-grained image recognition problem in a challenging few-shot learning setting, termed few-shot fine-grained recognition (FSFG). The task of FSFG requires the learning systems to build classifiers for novel fine-grained categories from few examples (only one or less than five). To solve this problem, we propose an end-to-end trainable deep network which is inspired by the state-of-the-art fine-grained recognition model and is tailored for the FSFG task. Specifically, our network consists of a bilinear feature learning module and a classifier mapping module: while the former encodes the discriminative information of an exemplar image into a feature vector, the latter maps the intermediate feature into the decision boundary of the novel category. The key novelty of our model is a "piecewise mappings" function in the classifier mapping module, which generates the decision boundary via learning a set of more attainable sub-classifiers in a more parameter-economic way. We learn the exemplar-to-classifier mapping based on an auxiliary dataset in a meta-learning fashion, which is expected to be able to generalize to novel categories. By conducting comprehensive experiments on three fine-grained datasets, we demonstrate that the proposed method achieves superior performance over the competing baselines., Comment: Accepted by IEEE TIP
- Published
- 2018
- Full Text
- View/download PDF
39. Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization
- Author
-
Chen, Yu, Shen, Chunhua, Chen, Hao, Wei, Xiu-Shen, Liu, Lingqiao, and Yang, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Landmark/pose estimation in single monocular images have received much effort in computer vision due to its important applications. It remains a challenging task when input images severe occlusions caused by, e.g., adverse camera views. Under such circumstances, biologically implausible pose predictions may be produced. In contrast, human vision is able to predict poses by exploiting geometric constraints of landmark point inter-connectivity. To address the problem, by incorporating priors about the structure of pose components, we propose a novel structure-aware fully convolutional network to implicitly take such priors into account during training of the deep network. Explicit learning of such constraints is typically challenging. Instead, inspired by how human identifies implausible poses, we design discriminators to distinguish the real poses from the fake ones (such as biologically implausible ones). If the pose generator G generates results that the discriminator fails to distinguish from real ones, the network successfully learns the priors. Training of the network follows the strategy of conditional Generative Adversarial Networks (GANs). The effectiveness of the proposed network is evaluated on three pose-related tasks: 2D single human pose estimation, 2D facial landmark estimation and 3D single human pose estimation. The proposed approach significantly outperforms the state-of-the-art methods and almost always generates plausible pose predictions, demonstrating the usefulness of implicit learning of structures using GANs., Comment: 18 pages. Extended version of arXiv:1705.00389. Accepted to IEEE Trans. Pattern Analysis and Machine Intelligence
- Published
- 2017
40. Unsupervised Object Discovery and Co-Localization by Deep Descriptor Transforming
- Author
-
Wei, Xiu-Shen, Zhang, Chen-Lin, Wu, Jianxin, Shen, Chunhua, and Zhou, Zhi-Hua
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Reusable model design becomes desirable with the rapid expansion of computer vision and machine learning applications. In this paper, we focus on the reusability of pre-trained deep convolutional models. Specifically, different from treating pre-trained models as feature extractors, we reveal more treasures beneath convolutional layers, i.e., the convolutional activations could act as a detector for the common object in the image co-localization problem. We propose a simple yet effective method, termed Deep Descriptor Transforming (DDT), for evaluating the correlations of descriptors and then obtaining the category-consistent regions, which can accurately locate the common object in a set of unlabeled images, i.e., unsupervised object discovery. Empirical studies validate the effectiveness of the proposed DDT method. On benchmark image co-localization datasets, DDT consistently outperforms existing state-of-the-art methods by a large margin. Moreover, DDT also demonstrates good generalization ability for unseen categories and robustness for dealing with noisy data. Beyond those, DDT can be also employed for harvesting web images into valid external data sources for improving performance of both image recognition and object detection., Comment: This paper is extended based on our preliminary work published in IJCAI 2017 [arXiv:1705.02758]
- Published
- 2017
41. Deep Descriptor Transforming for Image Co-Localization
- Author
-
Wei, Xiu-Shen, Zhang, Chen-Lin, Li, Yao, Xie, Chen-Wei, Wu, Jianxin, Shen, Chunhua, and Zhou, Zhi-Hua
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Learning - Abstract
Reusable model design becomes desirable with the rapid expansion of machine learning applications. In this paper, we focus on the reusability of pre-trained deep convolutional models. Specifically, different from treating pre-trained models as feature extractors, we reveal more treasures beneath convolutional layers, i.e., the convolutional activations could act as a detector for the common object in the image co-localization problem. We propose a simple but effective method, named Deep Descriptor Transforming (DDT), for evaluating the correlations of descriptors and then obtaining the category-consistent regions, which can accurately locate the common object in a set of images. Empirical studies validate the effectiveness of the proposed DDT method. On benchmark image co-localization datasets, DDT consistently outperforms existing state-of-the-art methods by a large margin. Moreover, DDT also demonstrates good generalization ability for unseen categories and robustness for dealing with noisy data., Comment: Accepted by IJCAI 2017
- Published
- 2017
42. Adversarial PoseNet: A Structure-aware Convolutional Network for Human Pose Estimation
- Author
-
Chen, Yu, Shen, Chunhua, Wei, Xiu-Shen, Liu, Lingqiao, and Yang, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
For human pose estimation in monocular images, joint occlusions and overlapping upon human bodies often result in deviated pose predictions. Under these circumstances, biologically implausible pose predictions may be produced. In contrast, human vision is able to predict poses by exploiting geometric constraints of joint inter-connectivity. To address the problem by incorporating priors about the structure of human bodies, we propose a novel structure-aware convolutional network to implicitly take such priors into account during training of the deep network. Explicit learning of such constraints is typically challenging. Instead, we design discriminators to distinguish the real poses from the fake ones (such as biologically implausible ones). If the pose generator (G) generates results that the discriminator fails to distinguish from real ones, the network successfully learns the priors., Comment: Fixed typos. 14 pages. Demonstration videos are http://v.qq.com/x/page/c039862eira.html, http://v.qq.com/x/page/f0398zcvkl5.html, http://v.qq.com/x/page/w0398ei9m1r.html
- Published
- 2017
43. Delving deep into spatial pooling for squeeze-and-excitation networks
- Author
-
Jin, Xin, Xie, Yanping, Wei, Xiu-Shen, Zhao, Bo-Rui, Chen, Zhao-Min, and Tan, Xiaoyang
- Published
- 2022
- Full Text
- View/download PDF
44. RPC: a large-scale and fine-grained retail product checkout dataset
- Author
-
Wei, Xiu-Shen, Cui, Quan, Yang, Lei, Wang, Peng, Liu, Lingqiao, and Yang, Jian
- Published
- 2022
- Full Text
- View/download PDF
45. Negatives Make a Positive: An Embarrassingly Simple Approach to Semi-Supervised Few-Shot Learning
- Author
-
Wei, Xiu-Shen, primary, Xu, He-Yang, additional, Yang, Zhiwen, additional, Duan, Chen-Long, additional, and Peng, Yuxin, additional
- Published
- 2024
- Full Text
- View/download PDF
46. Piecewise Hashing: A Deep Hashing Method for Large-Scale Fine-Grained Search
- Author
-
Wang, Yimu, Wei, Xiu-Shen, Xue, Bo, Zhang, Lijun, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Peng, Yuxin, editor, Liu, Qingshan, editor, Lu, Huchuan, editor, Sun, Zhenan, editor, Liu, Chenglin, editor, Chen, Xilin, editor, Zha, Hongbin, editor, and Yang, Jian, editor
- Published
- 2020
- Full Text
- View/download PDF
47. Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition
- Author
-
Wei, Xiu-Shen, Xie, Chen-Wei, and Wu, Jianxin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Fine-grained image recognition is a challenging computer vision problem, due to the small inter-class variations caused by highly similar subordinate categories, and the large intra-class variations in poses, scales and rotations. In this paper, we propose a novel end-to-end Mask-CNN model without the fully connected layers for fine-grained recognition. Based on the part annotations of fine-grained images, the proposed model consists of a fully convolutional network to both locate the discriminative parts (e.g., head and torso), and more importantly generate object/part masks for selecting useful and meaningful convolutional descriptors. After that, a four-stream Mask-CNN model is built for aggregating the selected object- and part-level descriptors simultaneously. The proposed Mask-CNN model has the smallest number of parameters, lowest feature dimensionality and highest recognition accuracy when compared with state-of-the-arts fine-grained approaches., Comment: Submitted to NIPS 2016
- Published
- 2016
48. Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval
- Author
-
Wei, Xiu-Shen, Luo, Jian-Hao, Wu, Jianxin, and Zhou, Zhi-Hua
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep convolutional neural network models pre-trained for the ImageNet classification task have been successfully adopted to tasks in other domains, such as texture description and object proposal generation, but these tasks require annotations for images in the new domain. In this paper, we focus on a novel and challenging task in the pure unsupervised setting: fine-grained image retrieval. Even with image labels, fine-grained images are difficult to classify, let alone the unsupervised retrieval task. We propose the Selective Convolutional Descriptor Aggregation (SCDA) method. SCDA firstly localizes the main object in fine-grained images, a step that discards the noisy background and keeps useful deep descriptors. The selected descriptors are then aggregated and dimensionality reduced into a short feature vector using the best practices we found. SCDA is unsupervised, using no image label or bounding box annotation. Experiments on six fine-grained datasets confirm the effectiveness of SCDA for fine-grained image retrieval. Besides, visualization of the SCDA features shows that they correspond to visual attributes (even subtle ones), which might explain SCDA's high mean average precision in fine-grained retrieval. Moreover, on general image retrieval datasets, SCDA achieves comparable retrieval results with state-of-the-art general image retrieval approaches., Comment: IEEE Transactions on Image Processing (TIP), 2017, 26(6): 2868-2881
- Published
- 2016
49. SEMICON: A Learning-to-Hash Solution for Large-Scale Fine-Grained Image Retrieval
- Author
-
Shen, Yang, primary, Sun, Xuhao, additional, Wei, Xiu-Shen, additional, Jiang, Qing-Yuan, additional, and Yang, Jian, additional
- Published
- 2022
- Full Text
- View/download PDF
50. Prototype-based classifier learning for long-tailed visual recognition
- Author
-
Wei, Xiu-Shen, Xu, Shu-Lin, Chen, Hao, Xiao, Liang, and Peng, Yuxin
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.