41 results on '"Ye, Mang"'
Search Results
2. Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval
- Author
-
Jiang, Ding, primary and Ye, Mang, additional
- Published
- 2023
- Full Text
- View/download PDF
3. Unsupervised Visible-Infrared Person Re-Identification via Progressive Graph Matching and Alternate Learning
- Author
-
Wu, Zesen, primary and Ye, Mang, additional
- Published
- 2023
- Full Text
- View/download PDF
4. Towards Modality-Agnostic Person Re-identification with Descriptive Query
- Author
-
Chen, Cuiqun, primary, Ye, Mang, additional, and Jiang, Ding, additional
- Published
- 2023
- Full Text
- View/download PDF
5. Rethinking Federated Learning with Domain Shift: A Prototype View
- Author
-
Huang, Wenke, primary, Ye, Mang, additional, Shi, Zekun, additional, Li, He, additional, and Du, Bo, additional
- Published
- 2023
- Full Text
- View/download PDF
6. Learnable Hierarchical Label Embedding and Grouping for Visual Intention Understanding.
- Author
-
Shi, QingHongYa, Ye, Mang, Zhang, Ziyi, and Du, Bo
- Abstract
Visual intention understanding is to mine the potential and subjective intention behind the images, which includes the user's hidden emotions and perspectives. Due to the label ambiguity, this paper presents a novel learnable Hierarchical Label Embedding and Grouping (HLEG). It is featured in three aspects: 1) For effectively mining the underlying meaning of images, we build a hierarchical transformer structure to model the hierarchy of labels, formulating a multi-level classification scheme. 2) For the label ambiguity issue, we design a novel learnable label embedding with accumulative grouping integrated into the hierarchical structure, which does not require additional annotation. 3) For multi-level classification, we propose a “Hard-First” optimization strategy to adaptively adjust the classification optimization at different levels, avoiding over-classification of the coarse labels. HLEG enhances the F1 score (average +1.24%) and mAP (average +1.48%) on Intentonomy over prominent baseline models. Comprehensive experiments validate the superiority of our proposed method, achieving state-of-the-art performance under various settings. Code is available at https://github.com/ShiQingHongYa/HLEG. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. Robust Federated Learning with Noisy and Heterogeneous Clients
- Author
-
Fang, Xiuwen, primary and Ye, Mang, additional
- Published
- 2022
- Full Text
- View/download PDF
8. Learn from Others and Be Yourself in Heterogeneous Federated Learning
- Author
-
Huang, Wenke, primary, Ye, Mang, additional, and Du, Bo, additional
- Published
- 2022
- Full Text
- View/download PDF
9. Cross-Modality Person Re-Identification via Modality Confusion and Center Aggregation
- Author
-
Hao, Xin, primary, Zhao, Sanyuan, additional, Ye, Mang, additional, and Shen, Jianbing, additional
- Published
- 2021
- Full Text
- View/download PDF
10. The Multi-Modal Video Reasoning and Analyzing Competition
- Author
-
Peng, Haoran, primary, Huang, He, additional, Xu, Li, additional, Li, Tianjiao, additional, Liu, Jun, additional, Rahmani, Hossein, additional, Ke, Qiuhong, additional, Guo, Zhicheng, additional, Wu, Cong, additional, Li, Rongchang, additional, Ye, Mang, additional, Wang, Jiahao, additional, Zhang, Jiaxu, additional, Liu, Yuanzhong, additional, He, Tao, additional, Zhang, Fuwei, additional, Liu, Xianbin, additional, and Lin, Tao, additional
- Published
- 2021
- Full Text
- View/download PDF
11. Channel Augmented Joint Learning for Visible-Infrared Recognition
- Author
-
Ye, Mang, primary, Ruan, Weijian, additional, Du, Bo, additional, and Shou, Mike Zheng, additional
- Published
- 2021
- Full Text
- View/download PDF
12. Saliency and Granularity: Discovering Temporal Coherence for Video-Based Person Re-Identification.
- Author
-
Chen, Cuiqun, Ye, Mang, Qi, Meibin, Wu, Jingjing, Liu, Yimin, and Jiang, Jianguo
- Subjects
- *
FEATURE extraction , *NOISE measurement - Abstract
Video-based person re-identification (ReID) matches the same people across the video sequences with rich spatial and temporal information in complex scenes. It is highly challenging to capture discriminative information when occlusions and pose variations exist between frames. A key solution to this problem rests on extracting the temporal invariant features of video sequences. In this paper, we propose a novel method for discovering temporal coherence by designing a region-level saliency and granularity mining network (SGMN). Firstly, to address the varying noisy frame problem, we design a temporal spatial-relation module (TSRM) to locate frame-level salient regions, adaptively modeling the temporal relations on spatial dimension through a probe-buffer mechanism. It avoids the information redundancy between frames and captures the informative cues of each frame. Secondly, a temporal channel-relation module (TCRM) is proposed to further mine the small granularity information of each frame, which is complementary to TSRM by concentrating on discriminative small-scale regions. TCRM exploits a one-and-rest difference relation on channel dimension to enhance the granularity features, leading to stronger robustness against misalignments. Finally, we evaluate our SGMN with four representative video-based datasets, including iLIDS-VID, MARS, DukeMTMC-VideoReID, and LS-VID, and the results indicate the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
13. Learning From Synthetic CT Images via Test-Time Training for Liver Tumor Segmentation.
- Author
-
Lyu, Fei, Ye, Mang, Ma, Andy J., Yip, Terry Cheuk-Fung, Wong, Grace Lai-Hung, and Yuen, Pong C.
- Subjects
- *
LIVER tumors , *COMPUTED tomography , *DEEP learning , *TUMOR diagnosis , *IMAGE segmentation - Abstract
Automatic liver tumor segmentation could offer assistance to radiologists in liver tumor diagnosis, and its performance has been significantly improved by recent deep learning based methods. These methods rely on large-scale well-annotated training datasets, but collecting such datasets is time-consuming and labor-intensive, which could hinder their performance in practical situations. Learning from synthetic data is an encouraging solution to address this problem. In our task, synthetic tumors can be injected to healthy images to form training pairs. However, directly applying the model trained using the synthetic tumor images on real test images performs poorly due to the domain shift problem. In this paper, we propose a novel approach, namely Synthetic-to-Real Test-Time Training (SR-TTT), to reduce the domain gap between synthetic training images and real test images. Specifically, we add a self-supervised auxiliary task, i.e., two-step reconstruction, which takes the output of the main segmentation task as its input to build an explicit connection between these two tasks. Moreover, we design a scheduled mixture strategy to avoid error accumulation and bias explosion in the training process. During test time, we adapt the segmentation model to each test image with self-supervision from the auxiliary task so as to improve the inference performance. The proposed method is extensively evaluated on two public datasets for liver tumor segmentation. The experimental results demonstrate that our proposed SR-TTT can effectively mitigate the synthetic-to-real domain shift problem in the liver tumor segmentation task, and is superior to existing state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
14. Disentangling Prototype and Variation for Single Sample Face Recognition
- Author
-
Pang, Meng, primary, Wang, Binghui, additional, Ye, Mang, additional, Chen, Yiran, additional, and Wen, Bihan, additional
- Published
- 2021
- Full Text
- View/download PDF
15. Auxiliary Bi-Level Graph Representation for Cross-Modal Image-Text Retrieval
- Author
-
Zhong, Xian, primary, Yang, Zhengwei, additional, Ye, Mang, additional, Huang, Wenxin, additional, Yuan, Jingling, additional, and Lin, Chia-Wen, additional
- Published
- 2021
- Full Text
- View/download PDF
16. The Multi-Modal Video Reasoning and Analyzing Competition
- Author
-
Peng, Haoran, Huang, He, Xu, Li, Li, Tianjiao, Liu, Jun, Rahmani, Hossein, Ke, Qiuhong, Guo, Zhicheng, Wu, Cong, Li, Rongchang, Ye, Mang, Wang, Jiahao, Zhang, Jiaxu, Liu, Yuanzhong, He, Tao, Zhang, Fuwei, Liu, Xianbin, Lin, Tao, Peng, Haoran, Huang, He, Xu, Li, Li, Tianjiao, Liu, Jun, Rahmani, Hossein, Ke, Qiuhong, Guo, Zhicheng, Wu, Cong, Li, Rongchang, Ye, Mang, Wang, Jiahao, Zhang, Jiaxu, Liu, Yuanzhong, He, Tao, Zhang, Fuwei, Liu, Xianbin, and Lin, Tao
- Abstract
In this paper, we introduce the Multi-Modal Video Reasoning and Analyzing Competition (MMVRAC) workshop in conjunction with ICCV 2021. This competition is composed of four different tracks, namely, video question answering, skeleton-based action recognition, fisheye video-based action recognition, and person re-identification, which are based on two datasets: SUTD-TrafficQA and UAV-Human. We summarize the top-performing methods submitted by the participants in this competition and show their results achieved in the competition.
- Published
- 2021
17. Structure-Aware Positional Transformer for Visible-Infrared Person Re-Identification.
- Author
-
Chen, Cuiqun, Ye, Mang, Qi, Meibin, Wu, Jingjing, Jiang, Jianguo, and Lin, Chia-Wen
- Subjects
- *
INFRARED cameras , *GLOBAL method of teaching , *POSE estimation (Computer vision) , *AUTOMATIC speech recognition - Abstract
Visible-infrared person re-identification (VI-ReID) is a cross-modality retrieval problem, which aims at matching the same pedestrian between the visible and infrared cameras. Due to the existence of pose variation, occlusion, and huge visual differences between the two modalities, previous studies mainly focus on learning image-level shared features. Since they usually learn a global representation or extract uniformly divided part features, these methods are sensitive to misalignments. In this paper, we propose a structure-aware positional transformer (SPOT) network to learn semantic-aware sharable modality features by utilizing the structural and positional information. It consists of two main components: attended structure representation (ASR) and transformer-based part interaction (TPI). Specifically, ASR models the modality-invariant structure feature for each modality and dynamically selects the discriminative appearance regions under the guidance of the structure information. TPI mines the part-level appearance and position relations with a transformer to learn discriminative part-level modality features. With a weighted combination of ASR and TPI, the proposed SPOT explores the rich contextual and structural information, effectively reducing cross-modality difference and enhancing the robustness against misalignments. Extensive experiments indicate that SPOT is superior to the state-of-the-art methods on two cross-modal datasets. Notably, the Rank-1/mAP value on the SYSU-MM01 dataset has improved by 8.43%/6.80%. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
18. Collaborative Refining for Person Re-Identification With Label Noise.
- Author
-
Ye, Mang, Li, He, Du, Bo, Shen, Jianbing, Shao, Ling, and Hoi, Steven C. H.
- Subjects
- *
IMAGE recognition (Computer vision) , *NOISE , *ARCHITECTURAL design , *PERFORMANCE standards , *NOISE measurement , *STATIC random access memory - Abstract
Existing person re-identification (Re-ID) methods usually rely heavily on large-scale thoroughly annotated training data. However, label noise is unavoidable due to inaccurate person detection results or annotation errors in real scenes. It is extremely challenging to learn a robust Re-ID model with label noise since each identity has very limited annotated training samples. To avoid fitting to the noisy labels, we propose to learn a prefatory model using a large learning rate at the early stage with a self-label refining strategy, in which the labels and network are jointly optimized. To further enhance the robustness, we introduce an online co-refining (CORE) framework with dynamic mutual learning, where networks and label predictions are online optimized collaboratively by distilling the knowledge from other peer networks. Moreover, it also reduces the negative impact of noisy labels using a favorable selective consistency strategy. CORE has two primary advantages: it is robust to different noise types and unknown noise ratios; it can be easily trained without much additional effort on the architecture design. Extensive experiments on Re-ID and image classification demonstrate that CORE outperforms its counterparts by a large margin under both practical and simulated noise settings. Notably, it also improves the state-of-the-art unsupervised Re-ID performance under standard settings. Code is available at https://github.com/mangye16/ReID-Label-Noise. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. Complementary Data Augmentation for Cloth-Changing Person Re-Identification.
- Author
-
Jia, Xuemei, Zhong, Xian, Ye, Mang, Liu, Wenxuan, and Huang, Wenxin
- Subjects
DATA augmentation ,GENERATIVE adversarial networks - Abstract
This paper studies the challenging person re-identification (Re-ID) task under the cloth-changing scenario, where the same identity (ID) suffers from uncertain cloth changes. To learn cloth- and ID-invariant features, it is crucial to collect abundant training data with varying clothes, which is difficult in practice. To alleviate the reliance on rich data collection, we reinforce the feature learning process by designing powerful complementary data augmentation strategies, including positive and negative data augmentation. Specifically, the positive augmentation fulfills the ID space by randomly patching the person images with different clothes, simulating rich appearance to enhance the robustness against clothes variations. For negative augmentation, its basic idea is to randomly generate out-of-distribution synthetic samples by combining various appearance and posture factors from real samples. The designed strategies seamlessly reinforce the feature learning without additional information introduction. Extensive experiments conducted on both cloth-changing and -unchanging tasks demonstrate the superiority of our proposed method, consistently improving the accuracy over various baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
20. Dynamic Tri-Level Relation Mining With Attentive Graph for Visible Infrared Re-Identification.
- Author
-
Ye, Mang, Chen, Cuiqun, Shen, Jianbing, and Shao, Ling
- Abstract
Matching the daytime visible and nighttime infrared person images, namely visible infrared person re-identification (VI-ReID), is a challenging cross-modality retrieval problem. Due to the difficulty of data collection and annotation in nighttime surveillance, VI-ReID usually suffers from noise problems, making it challenging to directly learn part discriminative features. In order to improve the discriminability and enhance the robustness against noisy images, this paper proposes a novel dynamic tri-level relation mining (DTRM) framework by simultaneously exploring channel-level, part-level intra-modality, and graph-level cross-modality relation cues. To address the misalignment within the person images, we design an intra-modality weighted-part attention (IWPA) to construct part-aggregated representation. It adaptively integrates the body part relation into the local feature learning with a residual batch normalization (RBN) connection scheme. Besides, a cross-modality graph structured attention (CGSA) is incorporated to improve the global feature learning by utilizing the contextual relation between images from two modalities. This module reduces the negative effects of noisy images. To seamlessly integrate two components, a parameter-free dynamic aggregation strategy is designed in a progressive joint learning manner. To further improve the performance, we additionally design a simple yet effective channel-level learning strategy by exploiting the rich channel information of visible images, which significantly reinforces the performance without modifying the network structure or changing the training process. Extensive experiments on two visible infrared re-identification datasets have verified the effectiveness under various settings. Code is available at: https://github.com/mangye16/DDAG [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. Person Re-Identification by Context-Aware Part Attention and Multi-Head Collaborative Learning.
- Author
-
Wu, Dongming, Ye, Mang, Lin, Gaojie, Gao, Xin, and Shen, Jianbing
- Abstract
Most existing works solve the video-based person re-identification (re-ID) problem by computing the representation of each frame independently and finally aggregate the frame-level features. However, these methods often suffer from the challenging factors in videos, such as serious occlusion, background clutter and pose variation. To address these issues, we propose a novel multi-level Context-aware Part Attention (CPA) model to learn discriminative and robust local part features. It is featured in two aspects: 1) the context-aware part attention module improves the robustness by capturing the global relationship among different body parts across different video frames, and 2) the attention module is further extended to multi-level attention mechanism which enhances the discriminability by simultaneously considering low- to high-level features in different convolutional layers. In addition, we propose a novel multi-head collaborative training scheme to improve the performance, which is collaboratively supervised by multiple heads with the same structure but different parameters. It contains two consistency regularization terms, which consider both multi-head and multi-frame consistency to achieve better results. The multi-level CPA model is designed for feature extraction, while the multi-head collaborative training scheme is designed for classifier supervision. They jointly improve our re-ID model from two complementary directions. Extensive experiments demonstrate that the proposed method achieves much better or at least comparable performance compared to the state-of-the-art on four video re-ID datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. Deep Learning for Person Re-Identification: A Survey and Outlook.
- Author
-
Ye, Mang, Shen, Jianbing, Lin, Gaojie, Xiang, Tao, Shao, Ling, and Hoi, Steven C. H.
- Subjects
- *
DEEP learning , *COMPUTER vision , *VIDEO surveillance , *VIDEO on demand , *FEATURE extraction , *CAMERAS - Abstract
Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. With the advancement of deep neural networks and increasing demand of intelligent video surveillance, it has gained significantly increased interest in the computer vision community. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings. The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. We first conduct a comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization. With the performance saturation under closed-world setting, the research focus for person Re-ID has recently shifted to the open-world setting, facing more challenging issues. This setting is closer to practical applications under specific scenarios. We summarize the open-world Re-ID in terms of five different aspects. By analyzing the advantages of existing methods, we design a powerful AGW baseline, achieving state-of-the-art or at least comparable performance on twelve datasets for four different Re-ID tasks. Meanwhile, we introduce a new evaluation metric (mINP) for person Re-ID, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re-ID system for real applications. Finally, some important yet under-investigated open issues are discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
23. Multi-Scale Cascading Network with Compact Feature Learning for RGB-Infrared Person Re-Identification
- Author
-
Zhang, Can, primary, Liu, Hong, additional, Guo, Wei, additional, and Ye, Mang, additional
- Published
- 2021
- Full Text
- View/download PDF
24. Cross-Domain Missingness-Aware Time-Series Adaptation With Similarity Distillation in Medical Applications.
- Author
-
Yang, Baoyao, Ye, Mang, Tan, Qingxiong, and Yuen, Pong C.
- Abstract
Medical time series of laboratory tests has been collected in electronic health records (EHRs) in many countries. Machine-learning algorithms have been proposed to analyze the condition of patients using these medical records. However, medical time series may be recorded using different laboratory parameters in different datasets. This results in the failure of applying a pretrained model on a test dataset containing a time series of different laboratory parameters. This article proposes to solve this problem with an unsupervised time-series adaptation method that generates time series across laboratory parameters. Specifically, a medical time-series generation network with similarity distillation is developed to reduce the domain gap caused by the difference in laboratory parameters. The relations of different laboratory parameters are analyzed, and the similarity information is distilled to guide the generation of target-domain specific laboratory parameters. To further improve the performance in cross-domain medical applications, a missingness-aware feature extraction network is proposed, where the missingness patterns reflect the health conditions and, thus, serve as auxiliary features for medical analysis. In addition, we also introduce domain-adversarial networks in both feature level and time-series level to enhance the adaptation across domains. Experimental results show that the proposed method achieves good performance on both private and publicly available medical datasets. Ablation studies and distribution visualization are provided to further analyze the properties of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
25. Grayscale Enhancement Colorization Network for Visible-Infrared Person Re-Identification.
- Author
-
Zhong, Xian, Lu, Tianyou, Huang, Wenxin, Ye, Mang, Jia, Xuemei, and Lin, Chia-Wen
- Subjects
GENERATIVE adversarial networks ,ION channels ,INFRARED imaging ,IMAGE registration ,IMAGE color analysis ,GALLIUM nitride - Abstract
Visible-infrared person re-identification (VI-ReID) is an emerging and challenging cross-modality image matching problem because of the explosive surveillance data in night-time surveillance applications. To handle the large modality gap, various generative adversarial network models have been developed to eliminate the cross-modality variations based on a cross-modal image generation framework. However, the lack of point-wise cross-modality ground-truths makes it extremely challenging to learn such a cross-modal image generator. To address these problems, we learn the correspondence between single-channel infrared images and three-channel visible images by generating intermediate grayscale images as auxiliary information to colorize the single-modality infrared images. We propose a grayscale enhancement colorization network (GECNet) to bridge the modality gap by retaining the structure of the colored image which contains rich information. To simulate the infrared-to-visible transformation, the point-wise transformed grayscale images greatly enhance the colorization process. Our experiments conducted on two visible-infrared cross-modality person re-identification datasets demonstrate the superiority of the proposed method over the state-of-the-arts. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
26. Probabilistic Structural Latent Representation for Unsupervised Embedding
- Author
-
Ye, Mang, primary and Shen, Jianbing, additional
- Published
- 2020
- Full Text
- View/download PDF
27. Augmentation Invariant and Instance Spreading Feature for Softmax Embedding.
- Author
-
Ye, Mang, Shen, Jianbing, Zhang, Xu, Yuen, Pong C., and Chang, Shih-Fu
- Subjects
- *
DATA augmentation , *SUPERVISED learning , *DEEP learning - Abstract
Deep embedding learning plays a key role in learning discriminative feature representations, where the visually similar samples are pulled closer and dissimilar samples are pushed away in the low-dimensional embedding space. This paper studies the unsupervised embedding learning problem by learning such a representation without using any category labels. This task faces two primary challenges: mining reliable positive supervision from highly similar fine-grained classes, and generalizing to unseen testing categories. To approximate the positive concentration and negative separation properties in category-wise supervised learning, we introduce a data augmentation invariant and instance spreading feature using the instance-wise supervision. We also design two novel domain-agnostic augmentation strategies to further extend the supervision in feature space, which simulates the large batch training using a small batch size and the augmented features. To learn such a representation, we propose a novel instance-wise softmax embedding, which directly perform the optimization over the augmented instance features with the binary discrmination softmax encoding. It significantly accelerates the learning speed with much higher accuracy than existing methods, under both seen and unseen testing categories. The unsupervised embedding performs well even without pre-trained network over samples from fine-grained categories. We also develop a variant using category-wise supervision, namely category-wise softmax embedding, which achieves competitive performance over the state-of-of-the-arts, without using any auxiliary information or restrict sample mining. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. Visible-Infrared Person Re-Identification via Homogeneous Augmented Tri-Modal Learning.
- Author
-
Ye, Mang, Shen, Jianbing, and Shao, Ling
- Abstract
Matching person images between the daytime visible modality and night-time infrared modality (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Existing methods usually learn the multi-modality features in raw image, ignoring the image-level discrepancy. Some methods apply GAN technique to generate the cross-modality images, but it destroys the local structure and introduces unavoidable noise. In this paper, we propose a Homogeneous Augmented Tri-Modal (HAT) learning method for VI-ReID, where an auxiliary grayscale modality is generated from their homogeneous visible images, without additional training process. It preserves the structure information of visible images and approximates the image style of infrared modality. Learning with the grayscale visible images enforces the network to mine structure relations across multiple modalities, making it robust to color variations. Specifically, we solve the tri-modal feature learning from both multi-modal classification and multi-view retrieval perspectives. For multi-modal classification, we learn a multi-modality sharing identity classifier with a parameter-sharing network, trained with a homogeneous and heterogeneous identification loss. For multi-view retrieval, we develop a weighted tri-directional ranking loss to optimize the relative distance across multiple modalities. Incorporated with two invariant regularizers, HAT simultaneously minimizes multiple modality variations. In-depth analysis demonstrates the homogeneous grayscale augmentation significantly outperforms the current state-of-the-art by a large margin. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
29. Unsupervised Embedding Learning via Invariant and Spreading Instance Feature
- Author
-
Ye, Mang, primary, Zhang, Xu, additional, Yuen, Pong C., additional, and Chang, Shih-Fu, additional
- Published
- 2019
- Full Text
- View/download PDF
30. Explainable Uncertainty-Aware Convolutional Recurrent Neural Network for Irregular Medical Time Series.
- Author
-
Tan, Qingxiong, Ye, Mang, Ma, Andy Jinhua, Yang, Baoyao, Yip, Terry Cheuk-Fung, Wong, Grace Lai-Hung, and Yuen, Pong C.
- Subjects
- *
RECURRENT neural networks , *CONVOLUTIONAL neural networks , *TIME series analysis - Abstract
Influenced by the dynamic changes in the severity of illness, patients usually take examinations in hospitals irregularly, producing a large volume of irregular medical time-series data. Performing diagnosis prediction from the irregular medical time series is challenging because the intervals between consecutive records significantly vary along time. Existing methods often handle this problem by generating regular time series from the irregular medical records without considering the uncertainty in the generated data, induced by the varying intervals. Thus, a novel Uncertainty-Aware Convolutional Recurrent Neural Network (UA-CRNN) is proposed in this article, which introduces the uncertainty information in the generated data to boost the risk prediction. To tackle the complex medical time series with subseries of different frequencies, the uncertainty information is further incorporated into the subseries level rather than the whole sequence to seamlessly adjust different time intervals. Specifically, a hierarchical uncertainty-aware decomposition layer (UADL) is designed to adaptively decompose time series into different subseries and assign them proper weights in accordance with their reliabilities. Meanwhile, an Explainable UA-CRNN (eUA-CRNN) is proposed to exploit filters with different passbands to ensure the unity of components in each subseries and the diversity of components in different subseries. Furthermore, eUA-CRNN incorporates with an uncertainty-aware attention module to learn attention weights from the uncertainty information, providing the explainable prediction results. The extensive experimental results on three real-world medical data sets illustrate the superiority of the proposed method compared with the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
31. Cross-Modality Person Re-Identification via Modality-Aware Collaborative Ensemble Learning.
- Author
-
Ye, Mang, Lan, Xiangyuan, Leng, Qingming, and Shen, Jianbing
- Subjects
- *
COLLABORATIVE learning , *LEARNING strategies - Abstract
Visible thermal person re-identification (VT-ReID) is a challenging cross-modality pedestrian retrieval problem due to the large intra-class variations and modality discrepancy across different cameras. Existing VT-ReID methods mainly focus on learning cross-modality sharable feature representations by handling the modality-discrepancy in feature level. However, the modality difference in classifier level has received much less attention, resulting in limited discriminability. In this paper, we propose a novel modality-aware collaborative ensemble (MACE) learning method with middle-level sharable two-stream network (MSTN) for VT-ReID, which handles the modality-discrepancy in both feature level and classifier level. In feature level, MSTN achieves much better performance than existing methods by capturing sharable discriminative middle-level features in convolutional layers. In classifier level, we introduce both modality-specific and modality-sharable identity classifiers for two modalities to handle the modality discrepancy. To utilize the complementary information among different classifiers, we propose an ensemble learning scheme to incorporate the modality sharable classifier and the modality specific classifiers. In addition, we introduce a collaborative learning strategy, which regularizes modality-specific identity predictions and the ensemble outputs. Extensive experiments on two cross-modality datasets demonstrate that the proposed method outperforms current state-of-the-art by a large margin, achieving rank-1/mAP accuracy 51.64%/50.11% on the SYSU-MM01 dataset, and 72.37%/69.09% on the RegDB dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
32. Learning Sparse and Identity-Preserved Hidden Attributes for Person Re-Identification.
- Author
-
Wang, Zheng, Jiang, Junjun, Wu, Yang, Ye, Mang, Bai, Xiang, and Satoh, Shin'ichi
- Subjects
IMAGE registration ,FEATURE extraction ,IMAGE reconstruction ,LEARNING ,PREDICTION models - Abstract
Person re-identification (Re-ID) aims at matching person images captured in non-overlapping camera views. To represent person appearance, low-level visual features are sensitive to environmental changes, while high-level semantic attributes, such as “short-hair” or “long-hair”, are relatively stable. Hence, researches have started to design semantic attributes to reduce the visual ambiguity. However, to train a prediction model for semantic attributes, it requires plenty of annotations, which are hard to obtain in practical large-scale applications. To alleviate the reliance on annotation efforts, we propose to incrementally generate Deep Hidden Attribute (DHA) based on baseline deep network for newly uncovered annotations. In particular, we propose an auto-encoder model that can be plugged into any deep network to mine latent information in an unsupervised manner. To optimize the effectiveness of DHA, we reform the auto-encoder model with additional orthogonal generation module, along with identity-preserving and sparsity constraints. 1) Orthogonally generating: In order to make DHAs different from each other, Singular Vector Decomposition (SVD) is introduced to generate DHAs orthogonally. 2) Identity-preserving constraint: The generated DHAs should be distinct for telling different persons, so we associate DHAs with person identities. 3) Sparsity constraint: To enhance the discriminability of DHAs, we also introduce the sparsity constraint to restrict the number of effective DHAs for each person. Experiments conducted on public datasets have validated the effectiveness of the proposed network. On two large-scale datasets, i.e., Market-1501 and DukeMTMC-reID, the proposed method outperforms the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
33. PurifyNet: A Robust Person Re-Identification Model With Noisy Labels.
- Author
-
Ye, Mang and Yuen, Pong C.
- Abstract
Person re-identification (Re-ID) has been widely studied by learning a discriminative feature representation with a set of well-annotated training data. Existing models usually assume that all the training samples are correctly annotated. However, label noise is unavoidable due to false annotations in large-scale industrial applications. Different from the label noise problem in image classification with abundant samples, the person Re-ID task with label noise usually has very limited annotated samples for each identity. In this paper, we propose a robust deep model, namely PurifyNet, to address this issue. PurifyNet is featured in two aspects: 1) it jointly refines the annotated labels and optimizes the neural networks by progressively adjusting the predicted logits, which reuses the wrong labels rather than simply filtering them; 2) it can simultaneously reduce the negative impact of noisy labels and pay more attention to hard samples with correct labels by developing a hard-aware instance re-weighting strategy. With limited annotated samples for each identity, we demonstrate that hard sample mining is crucial for label corrupted Re-ID task, while it is usually ignored in existing robust deep learning methods. Extensive experiments on three datasets demonstrate the robustness of PurifyNet over the competing methods under various settings. Meanwhile, we show that it consistently improves the unsupervised/video-based Re-ID methods. Code is available at: https://github.com/mangye16/ReID-Label-Noise. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
34. Bi-Directional Center-Constrained Top-Ranking for Visible Thermal Person Re-Identification.
- Author
-
Ye, Mang, Lan, Xiangyuan, Wang, Zheng, and Yuen, Pong C.
- Abstract
Visible thermal person re-identification (VT-REID) is a task of matching person images captured by thermal and visible cameras, which is an extremely important issue in night-time surveillance applications. Existing cross-modality recognition works mainly focus on learning sharable feature representations to handle the cross-modality discrepancies. However, apart from the cross-modality discrepancy caused by different camera spectrums, VT-REID also suffers from large cross-modality and intra-modality variations caused by different camera environments and human poses, and so on. In this paper, we propose a dual-path network with a novel bi-directional dual-constrained top-ranking (BDTR) loss to learn discriminative feature representations. It is featured in two aspects: 1) end-to-end learning without extra metric learning step and 2) the dual-constraint simultaneously handles the cross-modality and intra-modality variations to ensure the feature discriminability. Meanwhile, a bi-directional center-constrained top-ranking (eBDTR) is proposed to incorporate the previous two constraints into a single formula, which preserves the properties to handle both cross-modality and intra-modality variations. The extensive experiments on two cross-modality re-ID datasets demonstrate the superiority of the proposed method compared to the state-of-the-arts. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
35. A Survey of Open-World Person Re-Identification.
- Author
-
Leng, Qingming, Ye, Mang, and Tian, Qi
- Subjects
- *
PATTERN recognition systems , *WORK design - Abstract
Person re-identification (re-ID) has been a popular topic in computer vision and pattern recognition communities for a decade. Several important milestones such as metric-based and deeply-learned re-ID in recent years have promoted this topic. However, most existing re-ID works are designed for closed-world scenarios rather than realistic open-world settings, which limits the practical application of the re-ID technique. On one hand, the performance of the latest re-ID methods has surpassed the human-level performance on several commonly used benchmarks (e.g., Market1501 and CUHK03), which are collected from closed-world scenarios. On the other hand, open-world tasks that are less developed and more challenging have received increasing attention in the re-ID community. Therefore, this paper starts the first attempt to analyze the trends of open-world re-ID and summarizes them from both narrow and generalized perspectives. In the narrow perspective, open-world re-ID is regarded as person verification (i.e., open-set re-ID) instead of person identification, that is, the query person may not occur in the gallery set. In the generalized perspective, application-driven methods that are designed for specific applications are defined as generalized open-world re-ID. Their settings are usually close to realistic application requirements. Specifically, this survey mainly includes the following four points for open-world re-ID: 1) analyzing the discrepancies between closed- and open-world scenarios; 2) describing the developments of existing open-set re-ID works and their limitations; 3) introducing specific application-driven works from three aspects, namely, raw data, practical procedure, and efficiency; and 4) summarizing the state-of-the-art methods and future directions for open-world re-ID. This survey on open-world re-ID provides a guidance for improving the usability of re-ID technique in practical applications. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
36. Improving Night-Time Pedestrian Retrieval With Distribution Alignment and Contextual Distance.
- Author
-
Ye, Mang, Cheng, Yi, Lan, Xiangyuan, and Zhu, Hongyuan
- Abstract
Night-time pedestrian retrieval is a cross-modality retrieval task of retrieving person images between day-time visible images and night-time thermal images. It is a very challenging problem due to modality difference, camera variations, and person variations, but it plays an important role in night-time video surveillance. The existing cross-modality retrieval usually focuses on learning modality sharable feature representations to bridge the modality gap. In this article, we propose to utilize auxiliary information to improve the retrieval performance, which consistently improves the performance with different baseline loss functions. Our auxiliary information contains two major parts: cross-modality feature distribution and contextual information. The former aligns the cross-modality feature distributions between two modalities to improve the performance, and the latter optimizes the cross-modality distance measurement with the contextual information. We also demonstrate that abundant annotated visible pedestrian images, which are easily accessible, help to improve the cross-modality pedestrian retrieval as well. The proposed method is featured in two aspects: the auxiliary information does not need additional human intervention or annotation; it learns discriminative feature representations in an end-to-end deep learning manner. Extensive experiments on two cross-modality pedestrian retrieval datasets demonstrate the superiority of the proposed method, achieving much better performance than the state-of-the-arts. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
37. Learning Modality-Consistency Feature Templates: A Robust RGB-Infrared Tracking System.
- Author
-
Lan, Xiangyuan, Ye, Mang, Shao, Rui, Zhong, Bineng, Yuen, Pong C., and Zhou, Huiyu
- Subjects
- *
OBJECT tracking (Computer vision) , *VIDEO surveillance , *TRACKING algorithms , *INDUSTRIAL security , *MACHINE learning , *FEATURE extraction - Abstract
With a large number of video surveillance systems installed for the requirement from industrial security, the task of object tracking, which aims to locate objects of interest in videos, is very important. Although numerous tracking algorithms for RGB videos have been developed in the decade, the tracking performance and robustness of these systems may be degraded dramatically when the information from RGB video is unreliable (e.g., poor illumination conditions or very low resolution). To address this issue, this paper presents a new tracking system, which aims to combine the information from RGB and infrared modalities for object tracking. The proposed tracking systems is based on our proposed machine learning model. Particularly, the learning model can alleviate the modality discrepancy issue under the proposed modality consistency constraint from both representation patterns and discriminability, and generate discriminative feature templates for collaborative representations and discrimination in heterogeneous modalities. Experiments on a variety of challenging RGB-infrared videos demonstrate the effectiveness of the proposed algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
38. Dynamic Label Graph Matching for Unsupervised Video Re-identification
- Author
-
Ye, Mang, primary, Ma, Andy J., additional, Zheng, Liang, additional, Li, Jiawei, additional, and Yuen, Pong C., additional
- Published
- 2017
- Full Text
- View/download PDF
39. Spatiotemporal saliency based on location prior model
- Author
-
Hu, Liuyi, primary, Wang, Zhongyuan, additional, Ye, Mang, additional, Xiao, Jing, additional, and Hu, Ruimin, additional
- Published
- 2016
- Full Text
- View/download PDF
40. Person Reidentification via Ranking Aggregation of Similarity Pulling and Dissimilarity Pushing.
- Author
-
Ye, Mang, Liang, Chao, Yu, Yi, Wang, Zheng, Leng, Qingming, Xiao, Chunxia, Chen, Jun, and Hu, Ruimin
- Abstract
Person reidentification is a key technique to match different persons observed in nonoverlapping camera views. Many researchers treat it as a special object-retrieval problem, where ranking optimization plays an important role. Existing ranking optimization methods mainly utilize the similarity relationship between the probe and gallery images to optimize the original ranking list, but seldom consider the important dissimilarity relationship. In this paper, we propose to use both similarity and dissimilarity cues in a ranking optimization framework for person reidentification. Its core idea is that the true match should not only be similar to those strongly similar galleries of the probe, but also be dissimilar to those strongly dissimilar galleries of the probe. Furthermore, motivated by the philosophy of multiview verification, a ranking aggregation algorithm is proposed to enhance the detection of similarity and dissimilarity based on the following assumption: the true match should be similar to the probe in different baseline methods. In other words, if a gallery blue image is strongly similar to the probe in one method, while simultaneously strongly dissimilar to the probe in another method, it will probably be a wrong match of the probe. Extensive experiments conducted on public benchmark datasets and comparisons with different baseline methods have shown the great superiority of the proposed ranking optimization method. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
41. Zero-Shot Person Re-identification via Cross-View Consistency.
- Author
-
Wang, Zheng, Hu, Ruimin, Liang, Chao, Yu, Yi, Jiang, Junjun, Ye, Mang, Chen, Jun, and Leng, Qingming
- Abstract
Person re-identification, aiming to identify images of the same person from various cameras configured in different places, has attracted much attention in the multimedia retrieval community. In this problem, choosing a proper distance metric is a crucial aspect, and many classic methods utilize a uniform learnt metric. However, their performance is limited due to ignoring the zero-shot and fine-grained characteristics presented in real person re-identification applications. In this paper, we investigate two consistencies across two cameras, which are cross-view support consistency and cross-view projection consistency. The philosophy behind it is that, in spite of visual changes in two images of the same person under two camera views, the support sets in their respective views are highly consistent, and after being projected to the same view, their context sets are also highly consistent. Based on the above phenomena, we propose a data-driven distance metric (DDDM) method, re-exploiting the training data to adjust the metric for each query-gallery pair. Experiments conducted on three public data sets have validated the effectiveness of the proposed method, with a significant improvement over three baseline metric learning methods. In particular, on the public VIPeR dataset, the proposed method achieves an accuracy rate of 42.09% at rank-1, which outperforms the state-of-the-art methods by 4.29%. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.