38 results on '"Xilin Chen"'
Search Results
2. CMOS-GAN: Semi-Supervised Generative Adversarial Model for Cross-Modality Face Image Synthesis
- Author
-
Shikang Yu, Hu Han, Shiguang Shan, and Xilin Chen
- Subjects
Computer Graphics and Computer-Aided Design ,Software - Published
- 2023
- Full Text
- View/download PDF
3. Dual Compensation Residual Networks for Class Imbalanced Learning
- Author
-
Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen
- Subjects
Computational Theory and Mathematics ,Artificial Intelligence ,Applied Mathematics ,Computer Vision and Pattern Recognition ,Software - Published
- 2023
- Full Text
- View/download PDF
4. Person Search by a Bi-Directional Task-Consistent Learning Model
- Author
-
Cheng Wang, Bingpeng Ma, Hong Chang, Shiguang Shan, and Xilin Chen
- Subjects
Signal Processing ,Media Technology ,Electrical and Electronic Engineering ,Computer Science Applications - Published
- 2023
- Full Text
- View/download PDF
5. SANet: Statistic Attention Network for Video-Based Person Re-Identification
- Author
-
Shutao Bai, Shiguang Shan, Hong Chang, Bingpeng Ma, Xilin Chen, and Rui Huang
- Subjects
Source code ,Contextual image classification ,business.industry ,Computer science ,media_common.quotation_subject ,Feature extraction ,Pattern recognition ,Feature (computer vision) ,Media Technology ,Code (cryptography) ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Time complexity ,Statistic ,Block (data storage) ,media_common - Abstract
Capturing long-range dependencies during feature extraction is crucial for video-based person re-identification (re-id) since it would help to tackle many challenging problems such as occlusion and dramatic pose variation. Moreover, capturing subtle differences, such as bags and glasses, is indispensable to distinguish similar pedestrians. In this paper, we propose a novel and efficacious Statistic Attention (SA) block which can capture both the long-range dependencies and subtle differences. SA block leverages high-order statistics of feature maps, which contain both long-range and high-order information. By modeling relations with these statistics, SA block can explicitly capture long-range dependencies with less time complexity. In addition, high-order statistics usually concentrate on details of feature maps and can perceive the subtle differences between pedestrians. In this way, SA block is capable of discriminating pedestrians with subtle differences. Furthermore, this lightweight block can be conveniently inserted into existing deep neural networks at any depth to form Statistic Attention Network (SANet). To evaluate its performance, we conduct extensive experiments on two challenging video re-id datasets, showing that our SANet outperforms the state-of-the-art methods. Furthermore, to show the generalizability of SANet, we evaluate it on three image re-id datasets and two more general image classification datasets, including ImageNet. The source code is available at http://vipl.ict.ac.cn/resources/codes/code/SANet code.zip.
- Published
- 2022
- Full Text
- View/download PDF
6. IAUnet: Global Context-Aware Feature Learning for Person Reidentification
- Author
-
Xilin Chen, Ruibing Hou, Hong Chang, Xinqian Gu, Shiguang Shan, and Bingpeng Ma
- Subjects
Context model ,Source code ,Computer Networks and Communications ,business.industry ,Computer science ,media_common.quotation_subject ,Feature extraction ,02 engineering and technology ,Machine learning ,computer.software_genre ,Convolutional neural network ,Computer Science Applications ,Visualization ,Text mining ,Categorization ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Feature learning ,computer ,Software ,media_common - Abstract
Person reidentification (reID) by convolutional neural network (CNN)-based networks has achieved favorable performance in recent years. However, most of existing CNN-based methods do not take full advantage of spatial–temporal context modeling. In fact, the global spatial–temporal context can greatly clarify local distractions to enhance the target feature representation. To comprehensively leverage the spatial–temporal context information, in this work, we present a novel block, interaction–aggregation-update (IAU), for high-performance person reID. First, the spatial–temporal IAU (STIAU) module is introduced. STIAU jointly incorporates two types of contextual interactions into a CNN framework for target feature learning. Here, the spatial interactions learn to compute the contextual dependencies between different body parts of a single frame, while the temporal interactions are used to capture the contextual dependencies between the same body parts across all frames. Furthermore, a channel IAU (CIAU) module is designed to model the semantic contextual interactions between channel features to enhance the feature representation, especially for small-scale visual cues and body parts. Therefore, the IAU block enables the feature to incorporate the globally spatial, temporal, and channel context. It is lightweight, end-to-end trainable, and can be easily plugged into existing CNNs to form IAUnet. The experiments show that IAUnet performs favorably against state of the art on both image and video reID tasks and achieves compelling results on a general object categorization task. The source code is available at https://github.com/blue-blue272/ImgReID-IAnet .
- Published
- 2021
- Full Text
- View/download PDF
7. What is a Tabby? Interpretable Model Decisions by Learning Attribute-Based Classification Criteria
- Author
-
Ruiping Wang, Xilin Chen, Haomiao Liu, and Shiguang Shan
- Subjects
Contextual image classification ,Artificial neural network ,Hierarchy (mathematics) ,business.industry ,Computer science ,Applied Mathematics ,02 engineering and technology ,Machine learning ,computer.software_genre ,Convolutional neural network ,Visualization ,Set (abstract data type) ,Computational Theory and Mathematics ,Discriminative model ,Artificial Intelligence ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Software - Abstract
State-of-the-art classification models are usually considered as black boxes since their decision processes are implicit to humans. On the contrary, human experts classify objects according to a set of explicit hierarchical criteria. For example, “ tabby is a domestic cat with stripes, dots, or lines”, where tabby is defined by combining its superordinate category ( domestic cat ) and some certain attributes (e.g., has stripes). Inspired by this mechanism, we propose an interpretable Hierarchical Criteria Network (HCN) by additionally learning such criteria. To achieve this goal, images and semantic entities (e.g., taxonomies and attributes) are embedded into a common space, where each category can be represented by the linear combination of its superordinate category and a set of learned discriminative attributes. Specifically, a two-stream convolutional neural network (CNN) is elaborately devised, which embeds images and taxonomies with the two streams respectively. The model is trained by minimizing the prediction error of hierarchy labels on both streams. Extensive experiments on two widely studied datasets (CIFAR-100 and ILSVRC) demonstrate that HCN can learn meaningful attributes as well as reasonable and interpretable classification criteria. Therefore, the proposed method enables further human feedback for model correction as an additional benefit.
- Published
- 2021
- Full Text
- View/download PDF
8. Unsupervised Adversarial Domain Adaptation for Cross-Domain Face Presentation Attack Detection
- Author
-
Guoqing Wang, Xilin Chen, Hu Han, and Shiguang Shan
- Subjects
021110 strategic, defence & security studies ,Training set ,Computer Networks and Communications ,Computer science ,business.industry ,Feature vector ,Deep learning ,Feature extraction ,0211 other engineering and technologies ,Pattern recognition ,02 engineering and technology ,Facial recognition system ,Domain (software engineering) ,Discriminative model ,Face (geometry) ,Feature (machine learning) ,Artificial intelligence ,Safety, Risk, Reliability and Quality ,business - Abstract
Face presentation attack detection (PAD) is essential for securing the widely used face recognition systems. Most of the existing PAD methods do not generalize well to unseen scenarios because labeled training data of the new domain is usually not available. In light of this, we propose an unsupervised domain adaptation with disentangled representation (DR-UDA) approach to improve the generalization capability of PAD into new scenarios. DR-UDA consists of three modules, i.e., ML-Net, UDA-Net and DR-Net. ML-Net aims to learn a discriminative feature representation using the labeled source domain face images via metric learning. UDA-Net performs unsupervised adversarial domain adaptation in order to optimize the source domain and target domain encoders jointly, and obtain a common feature space shared by both domains. As a result, the source domain PAD model can be effectively transferred to the unlabeled target domain for PAD. DR-Net further disentangles the features irrelevant to specific domains by reconstructing the source and target domain face images from the common feature space. Therefore, DR-UDA can learn a disentangled representation space which is generative for face images in both domains and discriminative for live vs. spoof classification. The proposed approach shows promising generalization capability in several public-domain face PAD databases.
- Published
- 2021
- Full Text
- View/download PDF
9. Location Sensitive Network for Human Instance Segmentation
- Author
-
Xiangzhou Zhang, Bingpeng Ma, Hong Chang, Xilin Chen, and Shiguang Shan
- Subjects
Pixel ,Computer science ,business.industry ,Feature extraction ,Sampling (statistics) ,Pattern recognition ,Image segmentation ,Semantics ,Computer Graphics and Computer-Aided Design ,Encoding (memory) ,Image Processing, Computer-Assisted ,Humans ,Attention ,Segmentation ,Artificial intelligence ,Representation (mathematics) ,business ,Algorithms ,Software - Abstract
Location is an important distinguishing information for instance segmentation. In this paper, we propose a novel model, called Location Sensitive Network (LSNet), for human instance segmentation. LSNet integrates instance-specific location information into one-stage segmentation framework. Specifically, in the segmentation branch, Pose Attention Module (PAM) encodes the location information into the attention regions through coordinates encoding. Based on the location information provided by PAM, the segmentation branch is able to effectively distinguish instances in feature-level. Moreover, we propose a combination operation named Keypoints Sensitive Combination (KSCom) to utilize the location information from multiple sampling points. These sampling points construct the points representation for instances via human keypoints and random points. Human keypoints provide the spatial locations and semantic information of the instances, and random points expand the receptive fields. Based on the points representation for each instance, KSCom effectively reduces the mis-classified pixels. Our method is validated by the experiments on public datasets. LSNet-5 achieves 56.2 mAP at 18.5 FPS on COCOPersons. Besides, the proposed method is significantly superior to its peers in the case of severe occlusion.
- Published
- 2021
- Full Text
- View/download PDF
10. Learning to Recognize Visual Concepts for Visual Question Answering With Structural Label Space
- Author
-
Xilin Chen, Shiguang Shan, Difei Gao, and Ruiping Wang
- Subjects
Computer science ,business.industry ,020206 networking & telecommunications ,02 engineering and technology ,Space (commercial competition) ,computer.software_genre ,Semantics ,Visualization ,Knowledge extraction ,Signal Processing ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Question answering ,Feature (machine learning) ,Task analysis ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,Natural language processing - Abstract
Solving visual question answering (VQA) task requires recognizing many diverse visual concepts as the answer. These visual concepts contain rich structural semantic meanings, e.g., some concepts in VQA are highly related (e.g., red & blue), some of them are less relevant (e.g., red & standing). It is very natural for humans to efficiently learn concepts by utilizing their semantic meanings to concentrate on distinguishing relevant concepts and eliminate the disturbance of irrelevant concepts. However, previous works usually use a simple MLP to output visual concept as the answer in a flat label space that treats all labels equally, causing limitations in representing and using the semantic meanings of labels. To address this issue, we propose a novel visual recognition module named Dynamic Concept Recognizer (DCR), which is easy to be plugged in an attention-based VQA model, to utilize the semantics of the labels in answer prediction. Concretely, we introduce two key features in DCR: 1) a novel structural label space to depict the difference of semantics between concepts, where the labels in new label space are assigned to different groups according to their meanings. This type of semantic information helps decompose the visual recognizer in VQA into multiple specialized sub-recognizers to improve the capacity and efficiency of the recognizer. 2) A feature attention mechanism to capture the similarity between relevant groups of concepts, e.g., human-related group “chef, waiter” is more related to “swimming, running, etc.” than scene related group “sunny, rainy, etc.”. This type of semantic information helps sub-recognizers for relevant groups to adaptively share part of modules and to share the knowledge between relevant sub-recognizers to facilitate the learning procedure. Extensive experiments on several datasets have shown that the proposed structural label space and DCR module can efficiently learn the visual concept recognition and benefit the performance of the VQA model.
- Published
- 2020
- Full Text
- View/download PDF
11. AttGAN: Facial Attribute Editing by Only Changing What You Want
- Author
-
Wangmeng Zuo, Meina Kan, Shiguang Shan, Xilin Chen, and Zhenliang He
- Subjects
FOS: Computer and information sciences ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,media_common.quotation_subject ,Computer Science - Computer Vision and Pattern Recognition ,Representation (systemics) ,Machine Learning (stat.ML) ,Pattern recognition ,02 engineering and technology ,Computer Graphics and Computer-Aided Design ,Image (mathematics) ,Task (project management) ,Constraint (information theory) ,Text mining ,Statistics - Machine Learning ,Face (geometry) ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,020201 artificial intelligence & image processing ,Quality (business) ,Artificial intelligence ,business ,Software ,media_common - Abstract
Facial attribute editing aims to manipulate single or multiple attributes of a face image, i.e., to generate a new face with desired attributes while preserving other details. Recently, generative adversarial net (GAN) and encoder-decoder architecture are usually incorporated to handle this task with promising results. Based on the encoder-decoder architecture, facial attribute editing is achieved by decoding the latent representation of the given face conditioned on the desired attributes. Some existing methods attempt to establish an attribute-independent latent representation for further attribute editing. However, such attribute-independent constraint on the latent representation is excessive because it restricts the capacity of the latent representation and may result in information loss, leading to over-smooth and distorted generation. Instead of imposing constraints on the latent representation, in this work we apply an attribute classification constraint to the generated image to just guarantee the correct change of desired attributes, i.e., to "change what you want". Meanwhile, the reconstruction learning is introduced to preserve attribute-excluding details, in other words, to "only change what you want". Besides, the adversarial learning is employed for visually realistic editing. These three components cooperate with each other forming an effective framework for high quality facial attribute editing, referred as AttGAN. Furthermore, our method is also directly applicable for attribute intensity control and can be naturally extended for attribute style manipulation. Experiments on CelebA dataset show that our method outperforms the state-of-the-arts on realistic attribute editing with facial details well preserved., Submitted to IEEE Transactions on Image Processing, Code: https://github.com/LynnHo/AttGAN-Tensorflow
- Published
- 2019
- Full Text
- View/download PDF
12. A Novel Sign Language Recognition Framework Using Hierarchical Grassmann Covariance Matrix
- Author
-
Xiujuan Chai, Hanjie Wang, and Xilin Chen
- Subjects
Covariance matrix ,business.industry ,Computer science ,Feature extraction ,Pattern recognition ,02 engineering and technology ,Sign language ,Covariance ,Belief propagation ,Manifold ,Computer Science Applications ,Discriminative model ,Gesture recognition ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,020201 artificial intelligence & image processing ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Hidden Markov model ,Sign (mathematics) - Abstract
Visual sign language recognition is an interesting and challenging problem. To create a discriminative representation, a hierarchical Grassmann covariance matrix (HGCM) model is proposed for sign description. Furthermore, a multi-temporal belief propagation (MTBP) based segmentation approach is presented for continuous sequence spotting. Concretely speaking, a sign is represented by multiple covariance matrices, followed by evaluating and selecting their most significant singular vectors. These covariance matrices are transformed into a more compact and discriminative HGCM, which is formulated on the Grassmann manifold. Continuous sign sequences can be recognized frame by frame using the HGCM model, before being optimized by MTBP, which is a carefully designed graphic model. The proposed method is thoroughly evaluated on isolated and synthetic and real continuous sign datasets as well as on HDM05. Extensive experimental results convincingly show the effectiveness of our proposed framework.
- Published
- 2019
- Full Text
- View/download PDF
13. Tattoo Image Search at Scale: Joint Detection and Compact Representation Learning
- Author
-
Anil K. Jain, Xilin Chen, Hu Han, Jie Li, and Shiguang Shan
- Subjects
FOS: Computer and information sciences ,Biometrics ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Applied Mathematics ,Feature extraction ,Computer Science - Computer Vision and Pattern Recognition ,Convolutional neural network ,Facial recognition system ,GeneralLiterature_MISCELLANEOUS ,Identification (information) ,Computational Theory and Mathematics ,Artificial Intelligence ,Feature (computer vision) ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Image retrieval ,Feature learning ,Software - Abstract
The explosive growth of digital images in video surveillance and social media has led to the significant need for efficient search of persons of interest in law enforcement and forensic applications. Despite tremendous progress in primary biometric traits (e.g., face and fingerprint) based person identification, a single biometric trait alone cannot meet the desired recognition accuracy in forensic scenarios. Tattoos, as one of the important soft biometric traits, have been found to be valuable for assisting in person identification. However, tattoo search in a large collection of unconstrained images remains a difficult problem, and existing tattoo search methods mainly focus on matching cropped tattoos, which is different from real application scenarios. To close the gap, we propose an efficient tattoo search approach that is able to learn tattoo detection and compact representation jointly in a single convolutional neural network (CNN) via multi-task learning. While the features in the backbone network are shared by both tattoo detection and compact representation learning, individual latent layers of each sub-network optimize the shared features toward the detection and feature learning tasks, respectively. We resolve the small batch size issue inside the joint tattoo detection and compact representation learning network via random image stitch and preceding feature buffering. We evaluate the proposed tattoo search system using multiple public-domain tattoo benchmarks, and a gallery set with about 300K distracter tattoo images compiled from these datasets and images from the Internet. In addition, we also introduce a tattoo sketch dataset containing 300 tattoos for sketch-based tattoo search. Experimental results show that the proposed approach has superior performance in tattoo detection and tattoo search at scale compared to several state-of-the-art tattoo retrieval algorithms., Comment: Technical Report (15 pages, 14 figures)
- Published
- 2019
- Full Text
- View/download PDF
14. Adaptive Metric Learning For Zero-Shot Recognition
- Author
-
Shiguang Shan, Ruiping Wang, Xilin Chen, and Huajie Jiang
- Subjects
Computer science ,business.industry ,Applied Mathematics ,020206 networking & telecommunications ,02 engineering and technology ,Machine learning ,computer.software_genre ,Zero shot learning ,Popularity ,Visualization ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,Embedding ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer - Abstract
Zero-shot learning (ZSL) has enjoyed great popularity in recent years due to its ability to recognize novel objects, where semantic information is exploited to build up relations among different categories. Traditional ZSL approaches usually focus on learning more robust visual-semantic embeddings among seen classes and directly apply them to the unseen classes without considering whether they are suitable. It is well known that domain gap exists between seen and unseen classes. In order to tackle such problem, we propose a novel adaptive metric learning approach to measure the compatibility between visual samples and class semantics, where class similarities are utilized to adapt the visual-semantic embedding to the unseen classes. Extensive experiments on four benchmark ZSL datasets show the effectiveness of the proposed approach.
- Published
- 2019
- Full Text
- View/download PDF
15. Unifying Visual Attribute Learning with Object Recognition in a Multiplicative Framework
- Author
-
Xilin Chen, Kongming Liang, Hong Chang, Shiguang Shan, and Bingpeng Ma
- Subjects
Computer science ,business.industry ,Applied Mathematics ,Deep learning ,Feature vector ,Cognitive neuroscience of visual object recognition ,02 engineering and technology ,Semantic property ,Machine learning ,computer.software_genre ,Visualization ,Text mining ,Computational Theory and Mathematics ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,Leverage (statistics) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Software - Abstract
Attributes are mid-level semantic properties of objects. Recent research has shown that visual attributes can benefit many typical learning problems in computer vision community. However, attribute learning is still a challenging problem as the attributes may not always be predictable directly from input images and the variation of visual attributes is sometimes large across categories. In this paper, we propose a unified multiplicative framework for attribute learning, which tackles the key problems. Specifically, images and category information are jointly projected into a shared feature space, where the latent factors are disentangled and multiplied to fulfil attribute prediction. The resulting attribute classifier is category-specific instead of being shared by all categories. Moreover, our model can leverage auxiliary data to enhance the predictive ability of attribute classifiers, which can reduce the effort of instance-level attribute annotation to some extent. By integrated into an existing deep learning framework, our model can both accurately predict attributes and learn efficient image representations. Experimental results show that our method achieves superior performance on both instance-level and category-level attribute prediction. For zero-shot learning based on visual attributes and human-object interaction recognition, our method can improve the state-of-the-art performance on several widely used datasets.
- Published
- 2019
- Full Text
- View/download PDF
16. Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach
- Author
-
Xilin Chen, Hu Han, Shiguang Shan, Anil K. Jain, and Fang Wang
- Subjects
Adult ,Male ,FOS: Computer and information sciences ,Adolescent ,Databases, Factual ,Generalization ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,0211 other engineering and technologies ,Multi-task learning ,02 engineering and technology ,Crowdsourcing ,Machine learning ,computer.software_genre ,Convolutional neural network ,Young Adult ,Deep Learning ,Text mining ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,Humans ,021110 strategic, defence & security studies ,business.industry ,Applied Mathematics ,Facial Expression ,Support vector machine ,Computational Theory and Mathematics ,Biometric Identification ,Face ,Face (geometry) ,Female ,020201 artificial intelligence & image processing ,Neural Networks, Computer ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Feature learning ,computer ,Software - Abstract
Face attribute estimation has many potential applications in video surveillance, face retrieval, and social media. While a number of methods have been proposed for face attribute estimation, most of them did not explicitly consider the attribute correlation and heterogeneity (e.g., ordinal vs. nominal and holistic vs. local) during feature representation learning. In this paper, we present a Deep Multi-Task Learning (DMTL) approach to jointly estimate multiple heterogeneous attributes from a single face image. In DMTL, we tackle attribute correlation and heterogeneity with convolutional neural networks (CNNs) consisting of shared feature learning for all the attributes, and category-specific feature learning for heterogeneous attributes. We also introduce an unconstrained face database (LFW+), an extension of public-domain LFW, with heterogeneous demographic attributes (age, gender, and race) obtained via crowdsourcing. Experimental results on benchmarks with multiple face attributes (MORPH II, LFW+, CelebA, LFWA, and FotW) show that the proposed approach has superior performance compared to state of the art. Finally, evaluations on a public-domain face database (LAP) with a single attribute show that the proposed approach has excellent generalization ability., Comment: To appear in the IEEE Trans. Pattern Analysis and Machine Intelligence (final)
- Published
- 2018
- Full Text
- View/download PDF
17. Prototype Discriminative Learning for Image Set Classification
- Author
-
Xilin Chen, Shiguang Shan, Wen Wang, and Ruiping Wang
- Subjects
0209 industrial biotechnology ,business.industry ,Euclidean space ,Applied Mathematics ,Pattern recognition ,02 engineering and technology ,k-nearest neighbors algorithm ,Stiefel manifold ,ComputingMethodologies_PATTERNRECOGNITION ,020901 industrial engineering & automation ,Discriminative model ,Robustness (computer science) ,Affine hull ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Electrical and Electronic Engineering ,Gradient descent ,business ,Subspace topology ,Mathematics - Abstract
This letter presents a prototype discriminative learning (PDL) method for image set classification. We aim to simultaneously learn prototypes and a linear discriminative projection to drive that in the target subspace each image set can be discriminated with its nearest neighbor prototype. To reveal the unseen appearance variations implicitly in an image set, the prototypes are actually “virtual,” which do not certainly appear in the set but are searched in the corresponding affine hull. Moreover, to enhance the stability and robustness of the learned target subspace, an orthogonality constraint is imposed on the projection. Thus, to optimize the prototypes and the projection jointly, we design a specific gradient descent mechanism by updating the projection on Stiefel manifold and the prototypes in Euclidean space in an alternative optimization manner. Experimental results on four challenging databases demonstrate the superiority of the proposed PDL method.
- Published
- 2017
- Full Text
- View/download PDF
18. Learning Expressionlets via Universal Manifold Model for Dynamic Facial Expression Recognition
- Author
-
Xilin Chen, Shiguang Shan, Mengyi Liu, and Ruiping Wang
- Subjects
FOS: Computer and information sciences ,Facial expression ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Computer Graphics and Computer-Aided Design ,Manifold ,Expression (mathematics) ,law.invention ,Set (abstract data type) ,Discriminative model ,Margin (machine learning) ,law ,0202 electrical engineering, electronic engineering, information engineering ,Embedding ,020201 artificial intelligence & image processing ,Artificial intelligence ,Representation (mathematics) ,business ,Manifold (fluid mechanics) ,Software - Abstract
Facial expression is temporally dynamic event which can be decomposed into a set of muscle motions occurring in different facial regions over various time intervals. For dynamic expression recognition, two key issues, temporal alignment and semantics-aware dynamic representation, must be taken into account. In this paper, we attempt to solve both problems via manifold modeling of videos based on a novel mid-level representation, i.e. \textbf{expressionlet}. Specifically, our method contains three key stages: 1) each expression video clip is characterized as a spatial-temporal manifold (STM) formed by dense low-level features; 2) a Universal Manifold Model (UMM) is learned over all low-level features and represented as a set of local modes to statistically unify all the STMs. 3) the local modes on each STM can be instantiated by fitting to UMM, and the corresponding expressionlet is constructed by modeling the variations in each local mode. With above strategy, expression videos are naturally aligned both spatially and temporally. To enhance the discriminative power, the expressionlet-based STM representation is further processed with discriminant embedding. Our method is evaluated on four public expression databases, CK+, MMI, Oulu-CASIA, and FERA. In all cases, our method outperforms the known state-of-the-art by a large margin., 12 pages
- Published
- 2016
- Full Text
- View/download PDF
19. A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database
- Author
-
Ruiping Wang, Haihong Zhang, Shihong Lao, Alifu Kuerban, Xilin Chen, Zhiwu Huang, and Shiguang Shan
- Subjects
Male ,Models, Statistical ,Databases, Factual ,Biometrics ,Database ,Computer science ,Video Recording ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image processing ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Facial recognition system ,Benchmarking ,Object-class detection ,Biometric Identification ,Face ,Face (geometry) ,Video tracking ,Image Processing, Computer-Assisted ,Benchmark (computing) ,Humans ,Female ,computer ,Software - Abstract
Face recognition with still face images has been widely studied, while the research on video-based face recognition is inadequate relatively, especially in terms of benchmark datasets and comparisons. Real-world video-based face recognition applications require techniques for three distinct scenarios: 1) Video-to-Still (V2S); 2) Still-to-Video (S2V); and 3) Video-to-Video (V2V), respectively, taking video or still image as query or target. To the best of our knowledge, few datasets and evaluation protocols have benchmarked for all the three scenarios. In order to facilitate the study of this specific topic, this paper contributes a benchmarking and comparative study based on a newly collected still/video face database, named COX 1 Face DB. Specifically, we make three contributions. First, we collect and release a large-scale still/video face database to simulate video surveillance with three different video-based face recognition scenarios (i.e., V2S, S2V, and V2V). Second, for benchmarking the three scenarios designed on our database, we review and experimentally compare a number of existing set-based methods. Third, we further propose a novel Point-to-Set Correlation Learning (PSCL) method, and experimentally show that it can be used as a promising baseline method for V2S/S2V face recognition on COX Face DB. Extensive experimental results clearly demonstrate that video-based face recognition needs more efforts, and our COX Face DB is a good benchmark database for evaluation. 1 COX Face DB was constructed by Institute of Computing Technology, Chinese Academy of Sciences ( C AS) under the sponsor of OMRON Social Solutions Co. Ltd. ( O SS), and the support of X injiang University.
- Published
- 2015
- Full Text
- View/download PDF
20. Flowing on Riemannian Manifold: Domain Adaptation by Shifting Covariance
- Author
-
Shiguang Shan, Dong Xu, Xilin Chen, Wen Li, Xuelong Li, and Zhen Cui
- Subjects
Geodesic ,Covariance matrix ,business.industry ,Cognitive neuroscience of visual object recognition ,Pattern recognition ,Covariance ,Riemannian manifold ,Computer Science Applications ,Human-Computer Interaction ,Support vector machine ,Control and Systems Engineering ,Principal component analysis ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Classifier (UML) ,Software ,Information Systems ,Mathematics - Abstract
Domain adaptation has shown promising results in computer vision applications. In this paper, we propose a new unsupervised domain adaptation method called domain adaptation by shifting covariance (DASC) for object recognition without requiring any labeled samples from the target domain. By characterizing samples from each domain as one covariance matrix, the source and target domain are represented into two distinct points residing on a Riemannian manifold. Along the geodesic constructed from the two points, we then interpolate some intermediate points (i.e., covariance matrices), which are used to bridge the two domains. By utilizing the principal components of each covariance matrix, samples from each domain are further projected into intermediate feature spaces, which finally leads to domain-invariant features after the concatenation of these features from intermediate points. In the multiple source domain adaptation task, we also need to effectively integrate different types of features between each pair of source and target domains. We additionally propose an SVM based method to simultaneously learn the optimal target classifier as well as the optimal weights for different source domains. Extensive experiments demonstrate the effectiveness of our method for both single source and multiple source domain adaptation tasks.
- Published
- 2014
- Full Text
- View/download PDF
21. Semisupervised Hashing via Kernel Hyperplane Learning for Scalable Image Search
- Author
-
Dong Xu, Xilin Chen, Meina Kan, and Shiguang Shan
- Subjects
Computer Science::Machine Learning ,Multiple kernel learning ,business.industry ,Hash function ,Pattern recognition ,Machine learning ,computer.software_genre ,Statistics::Machine Learning ,ComputingMethodologies_PATTERNRECOGNITION ,Kernel (image processing) ,Polynomial kernel ,Kernel embedding of distributions ,String kernel ,Radial basis function kernel ,Media Technology ,Artificial intelligence ,Electrical and Electronic Engineering ,Tree kernel ,business ,computer ,Mathematics - Abstract
Hashing methods that aim to seek a compact binary code for each image are demonstrated to be efficient for scalable content-based image retrieval. In this paper, we propose a new hashing method called semisupervised kernel hyperplane learning (SKHL) for semantic image retrieval by modeling each hashing function as a nonlinear kernel hyperplane constructed from an unlabeled dataset. Moreover, a Fisher-like criterion is proposed to learn the optimal kernel hyperplanes and hashing functions, using only weakly labeled training samples with side information. To further integrate different types of features, we also incorporate multiple kernel learning (MKL) into the proposed SKHL (called SKHL-MKL), leading to better hashing functions. Comprehensive experiments on CIFAR-100 and NUS-WIDE datasets demonstrate the effectiveness of our SKHL and SKHL-MKL.
- Published
- 2014
- Full Text
- View/download PDF
22. Learning Prototype Hyperplanes for Face Verification in the Wild
- Author
-
Dong Xu, Xilin Chen, Shiguang Shan, Meina Kan, and Wen Li
- Subjects
Biometry ,Feature extraction ,Sensitivity and Specificity ,Pattern Recognition, Automated ,Set (abstract data type) ,Imaging, Three-Dimensional ,Artificial Intelligence ,Image Interpretation, Computer-Assisted ,Feature (machine learning) ,Humans ,Mathematics ,business.industry ,Dimensionality reduction ,Cosine similarity ,Reproducibility of Results ,Pattern recognition ,Image Enhancement ,Linear discriminant analysis ,Computer Graphics and Computer-Aided Design ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,Hyperplane ,Face ,Subtraction Technique ,Artificial intelligence ,business ,Algorithms ,Software - Abstract
In this paper, we propose a new scheme called Prototype Hyperplane Learning (PHL) for face verification in the wild using only weakly labeled training samples (i.e., we only know whether each pair of samples are from the same class or different classes without knowing the class label of each sample) by leveraging a large number of unlabeled samples in a generic data set. Our scheme represents each sample in the weakly labeled data set as a mid-level feature with each entry as the corresponding decision value from the classification hyperplane (referred to as the prototype hyperplane) of one Support Vector Machine (SVM) model, in which a sparse set of support vectors is selected from the unlabeled generic data set based on the learnt combination coefficients. To learn the optimal prototype hyperplanes for the extraction of mid-level features, we propose a Fisher’s Linear Discriminant-like (FLD-like) objective function by maximizing the discriminability on the weakly labeled data set with a constraint enforcing sparsity on the combination coefficients of each SVM model, which is solved by using an alternating optimization method. Then, we use the recent work called Side-Information based Linear Discriminant (SILD) analysis for dimensionality reduction and a cosine similarity measure for final face verification. Comprehensive experiments on two data sets, Labeled Faces in the Wild (LFW) and YouTube Faces, demonstrate the effectiveness of our scheme.
- Published
- 2013
- Full Text
- View/download PDF
23. A Concatenational Graph Evolution Aging Model
- Author
-
Jinli Suo, Qionghai Dai, Shiguang Shan, Xilin Chen, and Wen Gao
- Subjects
Models, Anatomic ,Aging ,Computer science ,Markov process ,Machine learning ,computer.software_genre ,Models, Biological ,Facial recognition system ,Pattern Recognition, Automated ,Data modeling ,symbols.namesake ,Artificial Intelligence ,Humans ,Computer Simulation ,Models, Statistical ,business.industry ,Applied Mathematics ,Probabilistic logic ,Graph theory ,Active appearance model ,Computational Theory and Mathematics ,Face ,symbols ,Graph (abstract data type) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Analysis of variance ,business ,computer ,Software - Abstract
Modeling the long-term face aging process is of great importance for face recognition and animation, but there is a lack of sufficient long-term face aging sequences for model learning. To address this problem, we propose a CONcatenational GRaph Evolution (CONGRE) aging model, which adopts decomposition strategy in both spatial and temporal aspects to learn long-term aging patterns from partially dense aging databases. In spatial aspect, we build a graphical face representation, in which a human face is decomposed into mutually interrelated subregions under anatomical guidance. In temporal aspect, the long-term evolution of the above graphical representation is then modeled by connecting sequential short-term patterns following the Markov property of aging process under smoothness constraints between neighboring short-term patterns and consistency constraints among subregions. The proposed model also considers the diversity of face aging by proposing probabilistic concatenation strategy between short-term patterns and applying scholastic sampling in aging prediction. In experiments, the aging prediction results generated by the learned aging models are evaluated both subjectively and objectively to validate the proposed model.
- Published
- 2012
- Full Text
- View/download PDF
24. Maximal Linear Embedding for Dimensionality Reduction
- Author
-
Wen Gao, Xilin Chen, Jie Chen, Ruiping Wang, and Shiguang Shan
- Subjects
business.industry ,Iterative method ,Applied Mathematics ,Dimensionality reduction ,Nonlinear dimensionality reduction ,Pattern recognition ,Local optimum ,Computational Theory and Mathematics ,Artificial Intelligence ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Coordinate space ,business ,Isomap ,Software ,Eigenvalues and eigenvectors ,Mathematics ,Parametric statistics - Abstract
Over the past few decades, dimensionality reduction has been widely exploited in computer vision and pattern analysis. This paper proposes a simple but effective nonlinear dimensionality reduction algorithm, named Maximal Linear Embedding (MLE). MLE learns a parametric mapping to recover a single global low-dimensional coordinate space and yields an isometric embedding for the manifold. Inspired by geometric intuition, we introduce a reasonable definition of locally linear patch, Maximal Linear Patch (MLP), which seeks to maximize the local neighborhood in which linearity holds. The input data are first decomposed into a collection of local linear models, each depicting an MLP. These local models are then aligned into a global coordinate space, which is achieved by applying MDS to some randomly selected landmarks. The proposed alignment method, called Landmarks-based Global Alignment (LGA), can efficiently produce a closed-form solution with no risk of local optima. It just involves some small-scale eigenvalue problems, while most previous aligning techniques employ time-consuming iterative optimization. Compared with traditional methods such as ISOMAP and LLE, our MLE yields an explicit modeling of the intrinsic variation modes of the observation data. Extensive experiments on both synthetic and real data indicate the effectivity and efficiency of the proposed algorithm.
- Published
- 2011
- Full Text
- View/download PDF
25. High-Resolution Face Fusion for Gender Conversion
- Author
-
Wen Gao, Liang Lin, Shiguang Shan, Xilin Chen, and Jinli Suo
- Subjects
Image fusion ,Face hallucination ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Facial recognition system ,Computer Science Applications ,Active appearance model ,Human-Computer Interaction ,Transformation (function) ,Image texture ,Control and Systems Engineering ,Face (geometry) ,Computer vision ,Graphical model ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Image resolution ,Software - Abstract
This paper presents an integrated face image fusion framework, which combines a hierarchical compositional paradigm with seamless image-editing techniques, for gender conversion. In our framework a high-resolution face is represented by a probabilistic graphical model that decomposes a human face into several parts (facial components) constrained by explicit spatial configurations (relationships). Benefiting from this representation, the proposed fusion strategy is able to largely preserve the face identity of each facial component while applying gender transformation. Given a face image, the basic idea is to select reference facial components from the opposite-gender group as templates and transform the appearance of the given image toward the selected facial components. Our fusion approach decomposes a face image into two parts-sketchable and nonsketchable ones. For the sketchable regions (e.g., the contours of facial components and wrinkle lines, etc.), we use a graph-matching algorithm to find the best templates and transform the structure (shape), while for the nonsketchable regions (e.g., the texture area of facial components, skin, etc.), we learn active appearance models and transform the texture attributes in the corresponding principal component analysis space. Both objective and subjective quantitative evaluation results on 200 Asian frontal-face images selected from the public Lotus Hill Image database show that the proposed approach is able to give plausible gender conversion results.
- Published
- 2011
- Full Text
- View/download PDF
26. IEEE Standards for Advanced Audio and Video Coding in Emerging Applications
- Author
-
Cliff Reader, Wen Gao, Xilin Chen, Tiejun Huang, and Weibei Dou
- Subjects
General Computer Science ,Multimedia ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,The Internet ,Dynamic range compression ,business ,computer.software_genre ,computer ,Computer network ,Coding (social sciences) ,Data compression - Abstract
The IEEE audio- and video-coding standards family includes updated tools that can be configured to serve new applications, such as surveillance, Internet, and intelligent systems video.
- Published
- 2014
- Full Text
- View/download PDF
27. WLD: A Robust Local Image Descriptor
- Author
-
Shiguang Shan, Wen Gao, Matti Pietikäinen, Guoying Zhao, Chu He, Jie Chen, and Xilin Chen
- Subjects
Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Scale-invariant feature transform ,Facial recognition system ,Pattern Recognition, Automated ,Imaging, Three-Dimensional ,Gabor filter ,Image texture ,Artificial Intelligence ,Histogram ,Image Interpretation, Computer-Assisted ,Computer vision ,Face detection ,Image resolution ,Pixel ,business.industry ,Applied Mathematics ,Image Enhancement ,Object detection ,Computational Theory and Mathematics ,Subtraction Technique ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Algorithms ,Software - Abstract
Inspired by Weber's Law, this paper proposes a simple, yet very powerful and robust local descriptor, called the Weber Local Descriptor (WLD). It is based on the fact that human perception of a pattern depends not only on the change of a stimulus (such as sound, lighting) but also on the original intensity of the stimulus. Specifically, WLD consists of two components: differential excitation and orientation. The differential excitation component is a function of the ratio between two terms: One is the relative intensity differences of a current pixel against its neighbors, the other is the intensity of the current pixel. The orientation component is the gradient orientation of the current pixel. For a given image, we use the two components to construct a concatenated WLD histogram. Experimental results on the Brodatz and KTH-TIPS2-a texture databases show that WLD impressively outperforms the other widely used descriptors (e.g., Gabor and SIFT). In addition, experimental results on human face detection also show a promising performance comparable to the best known results on the MIT+CMU frontal face test set, the AR face data set, and the CMU profile test set.
- Published
- 2010
- Full Text
- View/download PDF
28. Sigma Set Based Implicit Online Learning for Object Tracking
- Author
-
Shiguang Shan, Xilin Chen, Xiaopeng Hong, Bineng Zhong, Hong Chang, and Wen Gao
- Subjects
Computer science ,business.industry ,Covariance matrix ,Applied Mathematics ,Bayesian probability ,Inference ,Pattern recognition ,Bayesian inference ,Machine learning ,computer.software_genre ,Object detection ,Support vector machine ,Robustness (computer science) ,Video tracking ,Signal Processing ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Particle filter ,computer - Abstract
This letter presents a novel object tracking approach within the Bayesian inference framework through implicit online learning. In our approach, the target is represented by multiple patches, each of which is encoded by a powerful and efficient region descriptor called Sigma set. To model each target patch, we propose to utilize the online one-class support vector machine algorithm, named Implicit online Learning with Kernels Model (ILKM). ILKM is simple, efficient, and capable of learning a robust online target predictor in the presence of appearance changes. Responses of ILKMs related to multiple target patches are fused by an arbitrator with an inference of possible partial occlusions, to make the decision and trigger the model update. Experimental results demonstrate that the proposed tracking approach is effective and efficient in ever-changing and cluttered scenes.
- Published
- 2010
- Full Text
- View/download PDF
29. Adaptive Sign Language Recognition With Exemplar Extraction and MAP/IVFS
- Author
-
Wen Gao, Debin Zhao, Yu Zhou, Xilin Chen, and Hongxun Yao
- Subjects
Vocabulary ,Computer science ,business.industry ,Applied Mathematics ,Speech recognition ,media_common.quotation_subject ,Feature extraction ,Pattern recognition ,Sign language ,Gesture recognition ,Signal Processing ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Hidden Markov model ,Sign (mathematics) ,media_common ,Gesture - Abstract
Sign language recognition systems suffer from the problem of signer dependence. In this letter, we propose a novel method that adapts the original model set to a specific signer with his/her small amount of training data. First, affinity propagation is used to extract the exemplars of signer independent hidden Markov models; then the adaptive training vocabulary can be automatically formed. Based on the collected sign gestures of the new vocabulary, the combination of maximum a posteriori and iterative vector field smoothing is utilized to generate signer-adapted models. Experimental results on six signers demonstrate that the proposed method can reduce the amount of the adaptation data and still can achieve high recognition performance.
- Published
- 2010
- Full Text
- View/download PDF
30. Low-Resolution Face Recognition via Coupled Locality Preserving Mappings
- Author
-
Xilin Chen, Hong Chang, Shiguang Shan, and Bo Li
- Subjects
Contextual image classification ,business.industry ,Applied Mathematics ,Dimensionality reduction ,Feature vector ,Locality ,Pattern recognition ,Iterative reconstruction ,Facial recognition system ,Weighting ,Signal Processing ,Preprocessor ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Mathematics - Abstract
Practical face recognition systems are sometimes confronted with low-resolution face images. Traditional two-step methods solve this problem through employing super-resolution (SR). However, these methods usually have limited performance because the target of SR is not absolutely consistent with that of face recognition. Moreover, time-consuming sophisticated SR algorithms are not suitable for real-time applications. To avoid these limitations, we propose a novel approach for LR face recognition without any SR preprocessing. Our method based on coupled mappings (CMs), projects the face images with different resolutions into a unified feature space which favors the task of classification. These CMs are learned through optimizing the objective function to minimize the difference between the correspondences (i.e., low-resolution image and its high-resolution counterpart). Inspired by locality preserving methods for dimensionality reduction, we introduce a penalty weighting matrix into our objective function. Our method significantly improves the recognition performance. Finally, we conduct experiments on publicly available databases to verify the efficacy of our algorithm.
- Published
- 2010
- Full Text
- View/download PDF
31. Aligning Coupled Manifolds for Face Hallucination
- Author
-
Xilin Chen, Shiguang Shan, Hong Chang, and Bo Li
- Subjects
Manifold alignment ,Face hallucination ,business.industry ,Applied Mathematics ,Pattern recognition ,Context (language use) ,Computational geometry ,Topology ,Manifold ,Face (geometry) ,Signal Processing ,Embedding ,Mathematics::Differential Geometry ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Representation (mathematics) ,Mathematics::Symplectic Geometry ,Mathematics - Abstract
Many learning-based super-resolution methods are based on the manifold assumption, which claims that point-pairs from the low-resolution representation manifold (LRM) and the corresponding high-resolution representation manifold (HRM) possess similar local geometry. However, the manifold assumption does not hold well on the original coupled manifolds (i.e., LRM and HRM) due to the nonisometric one-to-multiple mappings from low-resolution (LR) image patches to high-resolution (HR) ones. To overcome this limitation, we propose a solution from the perspective of manifold alignment. In this context, we perform alignment by learning two explicit mappings which project the point-pairs from the original coupled manifolds into the embeddings of the common manifold (CM). For the task of SR reconstruction, we treat HRM as target manifold and employ the manifold regularization to guarantee that the local geometry of CM is more consistent with that of HRM than LRM is. After alignment, we carry out the SR reconstruction based on neighbor embedding between the new couple of the CM and the target HRM. Besides, we extend our method by aligning the multiple coupled subsets instead of the whole coupled manifolds to address the issue of the global nonlinearity. Experimental results on face image super-resolution verify the effectiveness of our method.
- Published
- 2009
- Full Text
- View/download PDF
32. Facial Shape Localization Using Probability Gradient Hints
- Author
-
Shiguang Shan, Zhiheng Niu, and Xilin Chen
- Subjects
business.industry ,Applied Mathematics ,Posterior probability ,Pattern recognition ,Bayesian inference ,Facial recognition system ,Active appearance model ,Computer Science::Graphics ,Computer Science::Computer Vision and Pattern Recognition ,Active shape model ,Signal Processing ,Maximum a posteriori estimation ,Probability distribution ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Likelihood function ,Mathematics - Abstract
This letter proposes a novel method to localize facial shape represented by a series of facial landmarks. In our method, the problem of facial shape localization is formulated with a Bayesian inference. Specifically, given a face image, the posterior probability of the facial shape is naturally decomposed into two parts: the likelihood function of local textures and the prior constraints of global shape. The former is provided by the landmark detectors, while the latter is evaluated based on the global shape statistics. The global shape is iteratively estimated in the Maximum A Posteriori (MAP) procedure which is derived in a Lucas-Kanade manner over the probability distribution. Intuitively, in each step, the landmarks are driven by the probability gradient and converge towards the positions which maximize the posterior probability. Experiments on two public databases (XM2VTS and BioID) show the effectiveness of the proposed method.
- Published
- 2009
- Full Text
- View/download PDF
33. Head Yaw Estimation From Asymmetry of Facial Appearance
- Author
-
Wen Gao, Xilin Chen, Bingpeng Ma, and Shiguang Shan
- Subjects
Biometry ,media_common.quotation_subject ,Posture ,Feature extraction ,Sensitivity and Specificity ,Asymmetry ,Facial recognition system ,Pattern Recognition, Automated ,Imaging, Three-Dimensional ,Artificial Intelligence ,Image Interpretation, Computer-Assisted ,Humans ,Computer vision ,Electrical and Electronic Engineering ,Pose ,media_common ,Mathematics ,business.industry ,Nearest centroid classifier ,Yaw ,Reproducibility of Results ,Pattern recognition ,General Medicine ,Image Enhancement ,Linear discriminant analysis ,Computer Science Applications ,Human-Computer Interaction ,ComputingMethodologies_PATTERNRECOGNITION ,Control and Systems Engineering ,Face ,Face (geometry) ,Artificial intelligence ,business ,Head ,Algorithms ,Software ,Information Systems - Abstract
This paper proposes a novel method to estimate the head yaw rotations based on the asymmetry of 2-D facial appearance. In traditional appearance-based pose estimation methods, features are typically extracted holistically by subspace analysis such as principal component analysis, linear discriminant analysis (LDA), etc., which are not designed to directly model the pose variations. In this paper, we argue and reveal that the asymmetry in the intensities of each row of the face image is closely relevant to the yaw rotation of the head and, at the same time, evidently insensitive to the identity of the input face. Specifically, to extract the asymmetry information, 1-D Gabor filters and Fourier transform are exploited. LDA is further applied to the asymmetry features to enhance the discrimination ability. By using the simple nearest centroid classifier, experimental results on two multipose databases show that the proposed features outperform other features. In particular, the generalization of the proposed asymmetry features is verified by the impressive performance when the training and the testing data sets are heterogeneous.
- Published
- 2008
- Full Text
- View/download PDF
34. The CAS-PEAL Large-Scale Chinese Face Database and Baseline Evaluations
- Author
-
Xiaohua Zhang, Xilin Chen, Wen Gao, Debin Zhao, Bo Cao, Shiguang Shan, and Delong Zhou
- Subjects
Facial expression ,Database ,Computer science ,computer.software_genre ,Facial recognition system ,Expression (mathematics) ,Computer Science Applications ,Human-Computer Interaction ,Control and Systems Engineering ,Gesture recognition ,Face (geometry) ,Electrical and Electronic Engineering ,Pose ,computer ,Protocol (object-oriented programming) ,Software - Abstract
In this paper, we describe the acquisition and contents of a large-scale Chinese face database: the CAS-PEAL face database. The goals of creating the CAS-PEAL face database include the following: 1) providing the worldwide researchers of face recognition with different sources of variations, particularly pose, expression, accessories, and lighting (PEAL), and exhaustive ground-truth information in one uniform database; 2) advancing the state-of-the-art face recognition technologies aiming at practical applications by using off-the-shelf imaging equipment and by designing normal face variations in the database; and 3) providing a large-scale face database of Mongolian. Currently, the CAS-PEAL face database contains 99 594 images of 1040 individuals (595 males and 445 females). A total of nine cameras are mounted horizontally on an arc arm to simultaneously capture images across different poses. Each subject is asked to look straight ahead, up, and down to obtain 27 images in three shots. Five facial expressions, six accessories, and 15 lighting changes are also included in the database. A selected subset of the database (CAS-PEAL-R1, containing 30 863 images of the 1040 subjects) is available to other researchers now. We discuss the evaluation protocol based on the CAS-PEAL-R1 database and present the performance of four algorithms as a baseline to do the following: 1) elementarily assess the difficulty of the database for face recognition algorithms; 2) preference evaluation results for researchers using the database; and 3) identify the strengths and weaknesses of the commonly used algorithms.
- Published
- 2008
- Full Text
- View/download PDF
35. Local Gabor Binary Patterns Based on Kullback–Leibler Divergence for Partially Occluded Face Recognition
- Author
-
Wen Gao, Shiguang Shan, Wenchao Zhang, and Xilin Chen
- Subjects
Kullback–Leibler divergence ,business.industry ,Local binary patterns ,Applied Mathematics ,Gabor wavelet ,Feature extraction ,Pattern recognition ,Facial recognition system ,Face (geometry) ,Signal Processing ,Feature (machine learning) ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Divergence (statistics) ,Mathematics - Abstract
The partial occlusion is one of the key issues in the face recognition community. To resolve the problem of partial occlusion, based on our previous work of local Gabor binary patterns (LGBP) for face recognition, we further propose Kullback-Leibler divergence (KLD)-based LGBP for partial occluded face recognition. The local property of LGBP face recognition is thoroughly used in the method, by introducing KLD between the LGBP feature of the local region and that of the non-occluded local region to estimate the probability of occlusion. The probability is used as the weight of the local region for the final feature matching. The experimental results on the AR face database demonstrate the effectiveness of the KLD-based LGBP face recognition method for partially occluded face images.
- Published
- 2007
- Full Text
- View/download PDF
36. Enhancing Human Face Detection by Resampling Examples Through Manifolds
- Author
-
Xilin Chen, Shiguang Shan, Jie Chen, Ruiping Wang, Shengye Yan, and Wen Gao
- Subjects
business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Facial recognition system ,Object detection ,Computer Science Applications ,Human-Computer Interaction ,Support vector machine ,Control and Systems Engineering ,Face (geometry) ,Test set ,Computer vision ,AdaBoost ,Artificial intelligence ,Electrical and Electronic Engineering ,Isomap ,business ,Face detection ,Software ,Mathematics - Abstract
As a large-scale database of hundreds of thousands of face images collected from the Internet and digital cameras becomes available, how to utilize it to train a well-performed face detector is a quite challenging problem. In this paper, we propose a method to resample a representative training set from a collected large-scale database to train a robust human face detector. First, in a high-dimensional space, we estimate geodesic distances between pairs of face samples/examples inside the collected face set by isometric feature mapping (Isomap) and then subsample the face set. After that, we embed the face set to a low-dimensional manifold space and obtain the low-dimensional embedding. Subsequently, in the embedding, we interweave the face set based on the weights computed by locally linear embedding (LLE). Furthermore, we resample nonfaces by Isomap and LLE likewise. Using the resulting face and nonface samples, we train an AdaBoost-based face detector and run it on a large database to collect false alarms. We then use the false detections to train a one-class support vector machine (SVM). Combining the AdaBoost and one-class SVM-based face detector, we obtain a stronger detector. The experimental results on the MIT + CMU frontal face test set demonstrated that the proposed method significantly outperforms the other state-of-the-art methods.
- Published
- 2007
- Full Text
- View/download PDF
37. Detection of Text on Road Signs From Video
- Author
-
Xilin Chen, Jie Yang, and Wen Wu
- Subjects
Color image ,Computer science ,business.industry ,Mechanical Engineering ,Feature extraction ,Frame (networking) ,Object detection ,Computer Science Applications ,User assistance ,Feature (computer vision) ,Video tracking ,Automotive Engineering ,Hit rate ,Computer vision ,Artificial intelligence ,business - Abstract
A fast and robust framework for incrementally detecting text on road signs from video is presented in this paper. This new framework makes two main contributions. 1) The framework applies a divide-and-conquer strategy to decompose the original task into two subtasks, that is, the localization of road signs and the detection of text on the signs. The algorithms for the two subtasks are naturally incorporated into a unified framework through a feature-based tracking algorithm. 2) The framework provides a novel way to detect text from video by integrating two-dimensional (2-D) image features in each video frame (e.g., color, edges, texture) with the three-dimensional (3-D) geometric structure information of objects extracted from video sequence (such as the vertical plane property of road signs). The feasibility of the proposed framework has been evaluated using 22 video sequences captured from a moving vehicle. This new framework gives an overall text detection rate of 88.9% and a false hit rate of 9.2%. It can easily be applied to other tasks of text detection from video and potentially be embedded in a driver assistance system.
- Published
- 2005
- Full Text
- View/download PDF
38. Discriminant Analysis on Riemannian Manifold of Gaussian Distributions for Face Recognition with Image Sets
- Author
-
Xilin Chen, Ruiping Wang, Zhiwu Huang, Shiguang Shan, and Wen Wang
- Subjects
Gaussian ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Riemannian geometry ,symbols.namesake ,Gaussian function ,0202 electrical engineering, electronic engineering, information engineering ,Information geometry ,Mathematics ,Manifold alignment ,business.industry ,020206 networking & telecommunications ,Pattern recognition ,Riemannian manifold ,Mixture model ,Linear discriminant analysis ,Computer Graphics and Computer-Aided Design ,Statistical manifold ,ComputingMethodologies_PATTERNRECOGNITION ,symbols ,020201 artificial intelligence & image processing ,Artificial intelligence ,Kernel Fisher discriminant analysis ,business ,Software ,Distribution (differential geometry) - Abstract
To address the problem of face recognition with image sets, we aim to capture the underlying data distribution in each set and thus facilitate more robust classification. To this end, we represent image set as the Gaussian mixture model (GMM) comprising a number of Gaussian components with prior probabilities and seek to discriminate Gaussian components from different classes. Since in the light of information geometry, the Gaussians lie on a specific Riemannian manifold, this paper presents a method named discriminant analysis on Riemannian manifold of Gaussian distributions (DARG). We investigate several distance metrics between Gaussians and accordingly two discriminative learning frameworks are presented to meet the geometric and statistical characteristics of the specific manifold. The first framework derives a series of provably positive definite probabilistic kernels to embed the manifold to a high-dimensional Hilbert space, where conventional discriminant analysis methods developed in Euclidean space can be applied, and a weighted Kernel discriminant analysis is devised which learns discriminative representation of the Gaussian components in GMMs with their prior probabilities as sample weights. Alternatively, the other framework extends the classical graph embedding method to the manifold by utilizing the distance metrics between Gaussians to construct the adjacency graph, and hence the original manifold is embedded to a lower-dimensional and discriminative target manifold with the geometric structure preserved and the interclass separability maximized. The proposed method is evaluated by face identification and verification tasks on four most challenging and largest databases, YouTube Celebrities, COX, YouTube Face DB, and Point-and-Shoot Challenge, to demonstrate its superiority over the state-of-the-art.To address the problem of face recognition with image sets, we aim to capture the underlying data distribution in each set and thus facilitate more robust classification. To this end, we represent image set as the Gaussian mixture model (GMM) comprising a number of Gaussian components with prior probabilities and seek to discriminate Gaussian components from different classes. Since in the light of information geometry, the Gaussians lie on a specific Riemannian manifold, this paper presents a method named discriminant analysis on Riemannian manifold of Gaussian distributions (DARG). We investigate several distance metrics between Gaussians and accordingly two discriminative learning frameworks are presented to meet the geometric and statistical characteristics of the specific manifold. The first framework derives a series of provably positive definite probabilistic kernels to embed the manifold to a high-dimensional Hilbert space, where conventional discriminant analysis methods developed in Euclidean space can be applied, and a weighted Kernel discriminant analysis is devised which learns discriminative representation of the Gaussian components in GMMs with their prior probabilities as sample weights. Alternatively, the other framework extends the classical graph embedding method to the manifold by utilizing the distance metrics between Gaussians to construct the adjacency graph, and hence the original manifold is embedded to a lower-dimensional and discriminative target manifold with the geometric structure preserved and the interclass separability maximized. The proposed method is evaluated by face identification and verification tasks on four most challenging and largest databases, YouTube Celebrities, COX, YouTube Face DB, and Point-and-Shoot Challenge, to demonstrate its superiority over the state-of-the-art.
- Published
- 2017
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.