13 results on '"Xilin Chen"'
Search Results
2. What is a Tabby? Interpretable Model Decisions by Learning Attribute-Based Classification Criteria
- Author
-
Ruiping Wang, Xilin Chen, Haomiao Liu, and Shiguang Shan
- Subjects
Contextual image classification ,Artificial neural network ,Hierarchy (mathematics) ,business.industry ,Computer science ,Applied Mathematics ,02 engineering and technology ,Machine learning ,computer.software_genre ,Convolutional neural network ,Visualization ,Set (abstract data type) ,Computational Theory and Mathematics ,Discriminative model ,Artificial Intelligence ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Software - Abstract
State-of-the-art classification models are usually considered as black boxes since their decision processes are implicit to humans. On the contrary, human experts classify objects according to a set of explicit hierarchical criteria. For example, “ tabby is a domestic cat with stripes, dots, or lines”, where tabby is defined by combining its superordinate category ( domestic cat ) and some certain attributes (e.g., has stripes). Inspired by this mechanism, we propose an interpretable Hierarchical Criteria Network (HCN) by additionally learning such criteria. To achieve this goal, images and semantic entities (e.g., taxonomies and attributes) are embedded into a common space, where each category can be represented by the linear combination of its superordinate category and a set of learned discriminative attributes. Specifically, a two-stream convolutional neural network (CNN) is elaborately devised, which embeds images and taxonomies with the two streams respectively. The model is trained by minimizing the prediction error of hierarchy labels on both streams. Extensive experiments on two widely studied datasets (CIFAR-100 and ILSVRC) demonstrate that HCN can learn meaningful attributes as well as reasonable and interpretable classification criteria. Therefore, the proposed method enables further human feedback for model correction as an additional benefit.
- Published
- 2021
- Full Text
- View/download PDF
3. Tattoo Image Search at Scale: Joint Detection and Compact Representation Learning
- Author
-
Anil K. Jain, Xilin Chen, Hu Han, Jie Li, and Shiguang Shan
- Subjects
FOS: Computer and information sciences ,Biometrics ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Applied Mathematics ,Feature extraction ,Computer Science - Computer Vision and Pattern Recognition ,Convolutional neural network ,Facial recognition system ,GeneralLiterature_MISCELLANEOUS ,Identification (information) ,Computational Theory and Mathematics ,Artificial Intelligence ,Feature (computer vision) ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Image retrieval ,Feature learning ,Software - Abstract
The explosive growth of digital images in video surveillance and social media has led to the significant need for efficient search of persons of interest in law enforcement and forensic applications. Despite tremendous progress in primary biometric traits (e.g., face and fingerprint) based person identification, a single biometric trait alone cannot meet the desired recognition accuracy in forensic scenarios. Tattoos, as one of the important soft biometric traits, have been found to be valuable for assisting in person identification. However, tattoo search in a large collection of unconstrained images remains a difficult problem, and existing tattoo search methods mainly focus on matching cropped tattoos, which is different from real application scenarios. To close the gap, we propose an efficient tattoo search approach that is able to learn tattoo detection and compact representation jointly in a single convolutional neural network (CNN) via multi-task learning. While the features in the backbone network are shared by both tattoo detection and compact representation learning, individual latent layers of each sub-network optimize the shared features toward the detection and feature learning tasks, respectively. We resolve the small batch size issue inside the joint tattoo detection and compact representation learning network via random image stitch and preceding feature buffering. We evaluate the proposed tattoo search system using multiple public-domain tattoo benchmarks, and a gallery set with about 300K distracter tattoo images compiled from these datasets and images from the Internet. In addition, we also introduce a tattoo sketch dataset containing 300 tattoos for sketch-based tattoo search. Experimental results show that the proposed approach has superior performance in tattoo detection and tattoo search at scale compared to several state-of-the-art tattoo retrieval algorithms., Comment: Technical Report (15 pages, 14 figures)
- Published
- 2019
- Full Text
- View/download PDF
4. Unifying Visual Attribute Learning with Object Recognition in a Multiplicative Framework
- Author
-
Xilin Chen, Kongming Liang, Hong Chang, Shiguang Shan, and Bingpeng Ma
- Subjects
Computer science ,business.industry ,Applied Mathematics ,Deep learning ,Feature vector ,Cognitive neuroscience of visual object recognition ,02 engineering and technology ,Semantic property ,Machine learning ,computer.software_genre ,Visualization ,Text mining ,Computational Theory and Mathematics ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,Leverage (statistics) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Software - Abstract
Attributes are mid-level semantic properties of objects. Recent research has shown that visual attributes can benefit many typical learning problems in computer vision community. However, attribute learning is still a challenging problem as the attributes may not always be predictable directly from input images and the variation of visual attributes is sometimes large across categories. In this paper, we propose a unified multiplicative framework for attribute learning, which tackles the key problems. Specifically, images and category information are jointly projected into a shared feature space, where the latent factors are disentangled and multiplied to fulfil attribute prediction. The resulting attribute classifier is category-specific instead of being shared by all categories. Moreover, our model can leverage auxiliary data to enhance the predictive ability of attribute classifiers, which can reduce the effort of instance-level attribute annotation to some extent. By integrated into an existing deep learning framework, our model can both accurately predict attributes and learn efficient image representations. Experimental results show that our method achieves superior performance on both instance-level and category-level attribute prediction. For zero-shot learning based on visual attributes and human-object interaction recognition, our method can improve the state-of-the-art performance on several widely used datasets.
- Published
- 2019
- Full Text
- View/download PDF
5. Feature Completion for Occluded Person Re-Identification
- Author
-
Xinqian Gu, Xilin Chen, Bingpeng Ma, Hong Chang, Ruibing Hou, and Shiguang Shan
- Subjects
FOS: Computer and information sciences ,Spatial correlation ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Applied Mathematics ,Feature vector ,Computer Science - Computer Vision and Pattern Recognition ,Pattern recognition ,Semantics ,Re identification ,Task (project management) ,Computational Theory and Mathematics ,Artificial Intelligence ,Feature (computer vision) ,Humans ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Encoder ,Algorithms ,Software ,Block (data storage) - Abstract
Person re-identification (reID) plays an important role in computer vision. However, existing methods suffer from performance degradation in occluded scenes. In this work, we propose an occlusion-robust block, Region Feature Completion (RFC), for occluded reID. Different from most previous works that discard the occluded regions, RFC block can recover the semantics of occluded regions in feature space. Firstly, a Spatial RFC (SRFC) module is developed. SRFC exploits the long-range spatial contexts from non-occluded regions to predict the features of occluded regions. The unit-wise prediction task leads to an encoder/decoder architecture, where the region-encoder models the correlation between non-occluded and occluded region, and the region-decoder utilizes the spatial correlation to recover occluded region features. Secondly, we introduce Temporal RFC (TRFC) module which captures the long-term temporal contexts to refine the prediction of SRFC. RFC block is lightweight, end-to-end trainable and can be easily plugged into existing CNNs to form RFCnet. Extensive experiments are conducted on occluded and commonly holistic reID benchmarks. Our method significantly outperforms existing methods on the occlusion datasets, while remains top even superior performance on holistic datasets. The source code is available at https://github.com/blue-blue272/OccludedReID-RFCnet., 18 pages, 17 figures. The paper is accepted by TPAMI, and the code is available at https://github.com/blue-blue272/OccludedReID-RFCnet
- Published
- 2021
6. WLD: a robust local image descriptor
- Author
-
Jie Chen, Shiguang Shan, Chu He, Guoying Zhao, Pietikainen, M., Xilin Chen, and Wen Gao
- Subjects
Object recognition (Computers) -- Analysis ,Pattern recognition -- Analysis ,Pixels -- Analysis - Published
- 2010
7. A compositional and dynamic model for face aging
- Author
-
Jinli Suo, Song-Chun Zhu, Shiguang Shan, and Xilin Chen
- Subjects
Graph theory -- Usage ,Markov processes -- Usage - Published
- 2010
8. Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach
- Author
-
Xilin Chen, Hu Han, Shiguang Shan, Anil K. Jain, and Fang Wang
- Subjects
Adult ,Male ,FOS: Computer and information sciences ,Adolescent ,Databases, Factual ,Generalization ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,0211 other engineering and technologies ,Multi-task learning ,02 engineering and technology ,Crowdsourcing ,Machine learning ,computer.software_genre ,Convolutional neural network ,Young Adult ,Deep Learning ,Text mining ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,Humans ,021110 strategic, defence & security studies ,business.industry ,Applied Mathematics ,Facial Expression ,Support vector machine ,Computational Theory and Mathematics ,Biometric Identification ,Face ,Face (geometry) ,Female ,020201 artificial intelligence & image processing ,Neural Networks, Computer ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Feature learning ,computer ,Software - Abstract
Face attribute estimation has many potential applications in video surveillance, face retrieval, and social media. While a number of methods have been proposed for face attribute estimation, most of them did not explicitly consider the attribute correlation and heterogeneity (e.g., ordinal vs. nominal and holistic vs. local) during feature representation learning. In this paper, we present a Deep Multi-Task Learning (DMTL) approach to jointly estimate multiple heterogeneous attributes from a single face image. In DMTL, we tackle attribute correlation and heterogeneity with convolutional neural networks (CNNs) consisting of shared feature learning for all the attributes, and category-specific feature learning for heterogeneous attributes. We also introduce an unconstrained face database (LFW+), an extension of public-domain LFW, with heterogeneous demographic attributes (age, gender, and race) obtained via crowdsourcing. Experimental results on benchmarks with multiple face attributes (MORPH II, LFW+, CelebA, LFWA, and FotW) show that the proposed approach has superior performance compared to state of the art. Finally, evaluations on a public-domain face database (LAP) with a single attribute show that the proposed approach has excellent generalization ability., Comment: To appear in the IEEE Trans. Pattern Analysis and Machine Intelligence (final)
- Published
- 2018
- Full Text
- View/download PDF
9. Hyperspectral Light Field Stereo Matching
- Author
-
Xilin Chen, Qiang Fu, Kang Zhu, Jingyi Yu, Yujia Xue, and Sing Bing Kang
- Subjects
FOS: Computer and information sciences ,Matching (statistics) ,Markov random field ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Applied Mathematics ,Computer Science - Computer Vision and Pattern Recognition ,Hyperspectral imaging ,02 engineering and technology ,Iterative reconstruction ,Computational Theory and Mathematics ,Artificial Intelligence ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Cube ,business ,Software ,Light field - Abstract
In this paper, we describe how scene depth can be extracted using a hyperspectral light field capture (H-LF) system. Our H-LF system consists of a $5 \times 6$ 5 × 6 array of cameras, with each camera sampling a different narrow band in the visible spectrum. There are two parts to extracting scene depth. The first part is our novel cross-spectral pairwise matching technique, which involves a new spectral-invariant feature descriptor and its companion matching metric we call bidirectional weighted normalized cross correlation (BWNCC). The second part, namely, H-LF stereo matching, uses a combination of spectral-dependent correspondence and defocus cues. These two new cost terms are integrated into a Markov Random Field (MRF) for disparity estimation. Experiments on synthetic and real H-LF data show that our approach can produce high-quality disparity maps. We also show that these results can be used to produce the complete plenoptic cube in addition to synthesizing all-focus and defocused color images under different sensor spectral responses.
- Published
- 2018
10. A Concatenational Graph Evolution Aging Model
- Author
-
Jinli Suo, Qionghai Dai, Shiguang Shan, Xilin Chen, and Wen Gao
- Subjects
Models, Anatomic ,Aging ,Computer science ,Markov process ,Machine learning ,computer.software_genre ,Models, Biological ,Facial recognition system ,Pattern Recognition, Automated ,Data modeling ,symbols.namesake ,Artificial Intelligence ,Humans ,Computer Simulation ,Models, Statistical ,business.industry ,Applied Mathematics ,Probabilistic logic ,Graph theory ,Active appearance model ,Computational Theory and Mathematics ,Face ,symbols ,Graph (abstract data type) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Analysis of variance ,business ,computer ,Software - Abstract
Modeling the long-term face aging process is of great importance for face recognition and animation, but there is a lack of sufficient long-term face aging sequences for model learning. To address this problem, we propose a CONcatenational GRaph Evolution (CONGRE) aging model, which adopts decomposition strategy in both spatial and temporal aspects to learn long-term aging patterns from partially dense aging databases. In spatial aspect, we build a graphical face representation, in which a human face is decomposed into mutually interrelated subregions under anatomical guidance. In temporal aspect, the long-term evolution of the above graphical representation is then modeled by connecting sequential short-term patterns following the Markov property of aging process under smoothness constraints between neighboring short-term patterns and consistency constraints among subregions. The proposed model also considers the diversity of face aging by proposing probabilistic concatenation strategy between short-term patterns and applying scholastic sampling in aging prediction. In experiments, the aging prediction results generated by the learned aging models are evaluated both subjectively and objectively to validate the proposed model.
- Published
- 2012
- Full Text
- View/download PDF
11. Maximal Linear Embedding for Dimensionality Reduction
- Author
-
Wen Gao, Xilin Chen, Jie Chen, Ruiping Wang, and Shiguang Shan
- Subjects
business.industry ,Iterative method ,Applied Mathematics ,Dimensionality reduction ,Nonlinear dimensionality reduction ,Pattern recognition ,Local optimum ,Computational Theory and Mathematics ,Artificial Intelligence ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Coordinate space ,business ,Isomap ,Software ,Eigenvalues and eigenvectors ,Mathematics ,Parametric statistics - Abstract
Over the past few decades, dimensionality reduction has been widely exploited in computer vision and pattern analysis. This paper proposes a simple but effective nonlinear dimensionality reduction algorithm, named Maximal Linear Embedding (MLE). MLE learns a parametric mapping to recover a single global low-dimensional coordinate space and yields an isometric embedding for the manifold. Inspired by geometric intuition, we introduce a reasonable definition of locally linear patch, Maximal Linear Patch (MLP), which seeks to maximize the local neighborhood in which linearity holds. The input data are first decomposed into a collection of local linear models, each depicting an MLP. These local models are then aligned into a global coordinate space, which is achieved by applying MDS to some randomly selected landmarks. The proposed alignment method, called Landmarks-based Global Alignment (LGA), can efficiently produce a closed-form solution with no risk of local optima. It just involves some small-scale eigenvalue problems, while most previous aligning techniques employ time-consuming iterative optimization. Compared with traditional methods such as ISOMAP and LLE, our MLE yields an explicit modeling of the intrinsic variation modes of the observation data. Extensive experiments on both synthetic and real data indicate the effectivity and efficiency of the proposed algorithm.
- Published
- 2011
- Full Text
- View/download PDF
12. WLD: A Robust Local Image Descriptor
- Author
-
Shiguang Shan, Wen Gao, Matti Pietikäinen, Guoying Zhao, Chu He, Jie Chen, and Xilin Chen
- Subjects
Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Scale-invariant feature transform ,Facial recognition system ,Pattern Recognition, Automated ,Imaging, Three-Dimensional ,Gabor filter ,Image texture ,Artificial Intelligence ,Histogram ,Image Interpretation, Computer-Assisted ,Computer vision ,Face detection ,Image resolution ,Pixel ,business.industry ,Applied Mathematics ,Image Enhancement ,Object detection ,Computational Theory and Mathematics ,Subtraction Technique ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Algorithms ,Software - Abstract
Inspired by Weber's Law, this paper proposes a simple, yet very powerful and robust local descriptor, called the Weber Local Descriptor (WLD). It is based on the fact that human perception of a pattern depends not only on the change of a stimulus (such as sound, lighting) but also on the original intensity of the stimulus. Specifically, WLD consists of two components: differential excitation and orientation. The differential excitation component is a function of the ratio between two terms: One is the relative intensity differences of a current pixel against its neighbors, the other is the intensity of the current pixel. The orientation component is the gradient orientation of the current pixel. For a given image, we use the two components to construct a concatenated WLD histogram. Experimental results on the Brodatz and KTH-TIPS2-a texture databases show that WLD impressively outperforms the other widely used descriptors (e.g., Gabor and SIFT). In addition, experimental results on human face detection also show a promising performance comparable to the best known results on the MIT+CMU frontal face test set, the AR face data set, and the CMU profile test set.
- Published
- 2010
- Full Text
- View/download PDF
13. Multi-View Discriminant Analysis
- Author
-
Xilin Chen, Shihong Lao, Haihong Zhang, Meina Kan, and Shiguang Shan
- Subjects
Computer science ,business.industry ,Applied Mathematics ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Linear discriminant analysis ,Facial recognition system ,Image (mathematics) ,Constraint (information theory) ,Computational Theory and Mathematics ,Discriminant ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Rayleigh quotient ,Software ,TRACE (psycholinguistics) - Abstract
In many computer vision systems, the same object can be observed at varying viewpoints or even by different sensors, which brings in the challenging demand for recognizing objects from distinct even heterogeneous views. In this work we propose a Multi-view Discriminant Analysis (MvDA) approach, which seeks for a single discriminant common space for multiple views in a non-pairwise manner by jointly learning multiple view-specific linear transforms. Specifically, our MvDA is formulated to jointly solve the multiple linear transforms by optimizing a generalized Rayleigh quotient, i.e., maximizing the between-class variations and minimizing the within-class variations from both intra-view and inter-view in the common space. By reformulating this problem as a ratio trace problem, the multiple linear transforms are achieved analytically and simultaneously through generalized eigenvalue decomposition. Furthermore, inspired by the observation that different views share similar data structures, a constraint is introduced to enforce the view-consistency of the multiple linear transforms. The proposed method is evaluated on three tasks: face recognition across pose, photo versus. sketch face recognition, and visual light image versus near infrared image face recognition on Multi-PIE, CUFSF and HFB databases respectively. Extensive experiments show that our MvDA achieves significant improvements compared with the best known results.
- Published
- 2015
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.