Author: "Wu, Hefeng" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wu, Hefeng"' showing total 219 results

Start Over Author "Wu, Hefeng"

219 results on '"Wu, Hefeng"'

1. Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models

Author: Liu, Zhibin, Dong, Haoye, Chharia, Aviral, and Wu, Hefeng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Generating lifelike 3D humans from a single RGB image remains a challenging task in computer vision, as it requires accurate modeling of geometry, high-quality texture, and plausible unseen parts. Existing methods typically use multi-view diffusion models for 3D generation, but they often face inconsistent view issues, which hinder high-quality 3D human generation. To address this, we propose Human-VDM, a novel method for generating 3D human from a single RGB image using Video Diffusion Models. Human-VDM provides temporally consistent views for 3D human generation using Gaussian Splatting. It consists of three modules: a view-consistent human video diffusion module, a video augmentation module, and a Gaussian Splatting module. First, a single image is fed into a human video diffusion module to generate a coherent human video. Next, the video augmentation module applies super-resolution and video interpolation to enhance the textures and geometric smoothness of the generated video. Finally, the 3D Human Gaussian Splatting module learns lifelike humans under the guidance of these high-resolution and view-consistent images. Experiments demonstrate that Human-VDM achieves high-quality 3D human from a single image, outperforming state-of-the-art methods in both generation quality and quantity. Project page: https://human-vdm.github.io/Human-VDM/, Comment: 14 Pages, 8 figures, Project page: https://human-vdm.github.io/Human-VDM/
Published: 2024

2. Improving Network Interpretability via Explanation Consistency Evaluation

Author: Wu, Hefeng, Jiang, Hao, Wang, Keze, Tang, Ziyi, He, Xianghuan, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: While deep neural networks have achieved remarkable performance, they tend to lack transparency in prediction. The pursuit of greater interpretability in neural networks often results in a degradation of their original performance. Some works strive to improve both interpretability and performance, but they primarily depend on meticulously imposed conditions. In this paper, we propose a simple yet effective framework that acquires more explainable activation heatmaps and simultaneously increase the model performance, without the need for any extra supervision. Specifically, our concise framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning. The explanation consistency metric is utilized to measure the similarity between the model's visual explanations of the original samples and those of semantic-preserved adversarial samples, whose background regions are perturbed by using image adversarial attack techniques. Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations (i.e., low explanation consistency), for which the current model cannot provide robust interpretations. Comprehensive experimental results on various benchmarks demonstrate the superiority of our framework in multiple aspects, including higher recognition accuracy, greater data debiasing capability, stronger network robustness, and more precise localization ability on both regular networks and interpretable networks. We also provide extensive ablation studies and qualitative analyses to unveil the detailed contribution of each component., Comment: To appear in IEEE Transactions on Multimedia
Published: 2024

3. ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

Author: Chen, Weifeng, Zhang, Jiacheng, Wu, Jie, Wu, Hefeng, Xiao, Xuefeng, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: The rapid development of diffusion models has triggered diverse applications. Identity-preserving text-to-image generation (ID-T2I) particularly has received significant attention due to its wide range of application scenarios like AI portrait and advertising. While existing ID-T2I methods have demonstrated impressive results, several key challenges remain: (1) It is hard to maintain the identity characteristics of reference portraits accurately, (2) The generated images lack aesthetic appeal especially while enforcing identity retention, and (3) There is a limitation that cannot be compatible with LoRA-based and Adapter-based methods simultaneously. To address these issues, we present \textbf{ID-Aligner}, a general feedback learning framework to enhance ID-T2I performance. To resolve identity features lost, we introduce identity consistency reward fine-tuning to utilize the feedback from face detection and recognition models to improve generated identity preservation. Furthermore, we propose identity aesthetic reward fine-tuning leveraging rewards from human-annotated preference data and automatically constructed feedback on character structure generation to provide aesthetic tuning signals. Thanks to its universal feedback fine-tuning framework, our method can be readily applied to both LoRA and Adapter models, achieving consistent performance gains. Extensive experiments on SD1.5 and SDXL diffusion models validate the effectiveness of our approach. \textbf{Project Page: \url{https://idaligner.github.io/}}
Published: 2024

4. Dual-View Data Hallucination with Semantic Relation Guidance for Few-Shot Image Recognition

Author: Wu, Hefeng, Ye, Guangzhi, Zhou, Ziyang, Tian, Ling, Wang, Qing, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Learning to recognize novel concepts from just a few image samples is very challenging as the learned model is easily overfitted on the few data and results in poor generalizability. One promising but underexplored solution is to compensate the novel classes by generating plausible samples. However, most existing works of this line exploit visual information only, rendering the generated data easy to be distracted by some challenging factors contained in the few available samples. Being aware of the semantic information in the textual modality that reflects human concepts, this work proposes a novel framework that exploits semantic relations to guide dual-view data hallucination for few-shot image recognition. The proposed framework enables generating more diverse and reasonable data samples for novel classes through effective information transfer from base classes. Specifically, an instance-view data hallucination module hallucinates each sample of a novel class to generate new data by employing local semantic correlated attention and global semantic feature fusion derived from base classes. Meanwhile, a prototype-view data hallucination module exploits semantic-aware measure to estimate the prototype of a novel class and the associated distribution from the few samples, which thereby harvests the prototype as a more stable sample and enables resampling a large number of samples. We conduct extensive experiments and comparisons with state-of-the-art methods on several popular few-shot benchmarks to verify the effectiveness of the proposed framework., Comment: Accepted by IEEE Transactions on Multimedia
Published: 2024

5. SQLNet: Scale-Modulated Query and Localization Network for Few-Shot Class-Agnostic Counting

Author: Wu, Hefeng, Chen, Yandong, Liu, Lingbo, Chen, Tianshui, Wang, Keze, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The class-agnostic counting (CAC) task has recently been proposed to solve the problem of counting all objects of an arbitrary class with several exemplars given in the input image. To address this challenging task, existing leading methods all resort to density map regression, which renders them impractical for downstream tasks that require object locations and restricts their ability to well explore the scale information of exemplars for supervision. To address the limitations, we propose a novel localization-based CAC approach, termed Scale-modulated Query and Localization Network (SQLNet). It fully explores the scales of exemplars in both the query and localization stages and achieves effective counting by accurately locating each object and predicting its approximate size. Specifically, during the query stage, rich discriminative representations of the target class are acquired by the Hierarchical Exemplars Collaborative Enhancement (HECE) module from the few exemplars through multi-scale exemplar cooperation with equifrequent size prompt embedding. These representations are then fed into the Exemplars-Unified Query Correlation (EUQC) module to interact with the query features in a unified manner and produce the correlated query tensor. In the localization stage, the Scale-aware Multi-head Localization (SAML) module utilizes the query tensor to predict the confidence, location, and size of each potential object. Moreover, a scale-aware localization loss is introduced, which exploits flexible location associations and exemplar scales for supervision to optimize the model performance. Extensive experiments demonstrate that SQLNet outperforms state-of-the-art methods on popular CAC benchmarks, achieving excellent performance not only in counting accuracy but also in localization and bounding box generation. Our codes will be available at https://github.com/HCPLab-SYSU/SQLNet, Comment: 13 pages
Published: 2023

6. Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search

Author: Wu, Hefeng, Chen, Weifeng, Liu, Zhibin, Chen, Tianshui, Chen, Zhiguang, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Given a descriptive text query, text-based person search (TBPS) aims to retrieve the best-matched target person from an image gallery. Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data. To better align the two modalities, most existing works focus on introducing sophisticated network structures and auxiliary tasks, which are complex and hard to implement. In this paper, we propose a simple yet effective dual Transformer model for text-based person search. By exploiting a hardness-aware contrastive learning strategy, our model achieves state-of-the-art performance without any special design for local feature alignment or side information. Moreover, we propose a proximity data generation (PDG) module to automatically produce more diverse data for cross-modal training. The PDG module first introduces an automatic generation algorithm based on a text-to-image diffusion model, which generates new text-image pair samples in the proximity space of original ones. Then it combines approximate text generation and feature-level mixup during training to further strengthen the data diversity. The PDG module can largely guarantee the reasonability of the generated samples that are directly used for training without any human inspection for noise rejection. It improves the performance of our model significantly, providing a feasible solution to the data insufficiency problem faced by such fine-grained visual-linguistic tasks. Extensive experiments on two popular datasets of the TBPS task (i.e., CUHK-PEDES and ICFG-PEDES) show that the proposed approach outperforms state-of-the-art approaches evidently, e.g., improving by 3.88%, 4.02%, 2.92% in terms of Top1, Top5, Top10 on CUHK-PEDES. The codes will be available at https://github.com/HCPLab-SYSU/PersonSearch-CTLG, Comment: Accepted by IEEE T-CSVT
Published: 2023

7. SketchBodyNet: A Sketch-Driven Multi-faceted Decoder Network for 3D Human Reconstruction

Author: Wang, Fei, Tang, Kongzhang, Wu, Hefeng, Zhao, Baoquan, Cai, Hao, and Zhou, Teng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: Reconstructing 3D human shapes from 2D images has received increasing attention recently due to its fundamental support for many high-level 3D applications. Compared with natural images, freehand sketches are much more flexible to depict various shapes, providing a high potential and valuable way for 3D human reconstruction. However, such a task is highly challenging. The sparse abstract characteristics of sketches add severe difficulties, such as arbitrariness, inaccuracy, and lacking image details, to the already badly ill-posed problem of 2D-to-3D reconstruction. Although current methods have achieved great success in reconstructing 3D human bodies from a single-view image, they do not work well on freehand sketches. In this paper, we propose a novel sketch-driven multi-faceted decoder network termed SketchBodyNet to address this task. Specifically, the network consists of a backbone and three separate attention decoder branches, where a multi-head self-attention module is exploited in each decoder to obtain enhanced features, followed by a multi-layer perceptron. The multi-faceted decoders aim to predict the camera, shape, and pose parameters, respectively, which are then associated with the SMPL model to reconstruct the corresponding 3D human mesh. In learning, existing 3D meshes are projected via the camera parameters into 2D synthetic sketches with joints, which are combined with the freehand sketches to optimize the model. To verify our method, we collect a large-scale dataset of about 26k freehand sketches and their corresponding 3D meshes containing various poses of human bodies from 14 different angles. Extensive experimental results demonstrate our SketchBodyNet achieves superior performance in reconstructing 3D human meshes from freehand sketches., Comment: 9 pages, to appear in Pacific Graphics 2023
Published: 2023

8. Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation

Author: Pu, Tao, Chen, Tianshui, Wu, Hefeng, Lu, Yongyi, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Video scene graph generation (VidSGG) aims to identify objects in visual scenes and infer their relationships for a given video. It requires not only a comprehensive understanding of each object scattered on the whole scene but also a deep dive into their temporal motions and interactions. Inherently, object pairs and their relationships enjoy spatial co-occurrence correlations within each image and temporal consistency/transition correlations across different images, which can serve as prior knowledge to facilitate VidSGG model learning and inference. In this work, we propose a spatial-temporal knowledge-embedded transformer (STKET) that incorporates the prior spatial-temporal knowledge into the multi-head cross-attention mechanism to learn more representative relationship representations. Specifically, we first learn spatial co-occurrence and temporal transition correlations in a statistical manner. Then, we design spatial and temporal knowledge-embedded layers that introduce the multi-head cross-attention mechanism to fully explore the interaction between visual representation and the knowledge to generate spatial- and temporal-embedded representations, respectively. Finally, we aggregate these representations for each subject-object pair to predict the final semantic labels and their relationships. Extensive experiments show that STKET outperforms current competing algorithms by a large margin, e.g., improving the mR@50 by 8.1%, 4.7%, and 2.1% on different settings over current algorithms., Comment: Accepted at IEEE T-IP, 2024
Published: 2023

9. Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learning

Author: Chen, Weifeng, Ji, Yatai, Wu, Jie, Wu, Hefeng, Xie, Pan, Li, Jiashi, Xia, Xin, Xiao, Xuefeng, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multimedia
Abstract: Recent advances in text-to-image (T2I) diffusion models have enabled impressive image generation capabilities guided by text prompts. However, extending these techniques to video generation remains challenging, with existing text-to-video (T2V) methods often struggling to produce high-quality and motion-consistent videos. In this work, we introduce Control-A-Video, a controllable T2V diffusion model that can generate videos conditioned on text prompts and reference control maps like edge and depth maps. To tackle video quality and motion consistency issues, we propose novel strategies to incorporate content prior and motion prior into the diffusion-based generation process. Specifically, we employ a first-frame condition scheme to transfer video generation from the image domain. Additionally, we introduce residual-based and optical flow-based noise initialization to infuse motion priors from reference videos, promoting relevance among frame latents for reduced flickering. Furthermore, we present a Spatio-Temporal Reward Feedback Learning (ST-ReFL) algorithm that optimizes the video diffusion model using multiple reward models for video quality and motion consistency, leading to superior outputs. Comprehensive experiments demonstrate that our framework generates higher-quality, more consistent videos compared to existing state-of-the-art methods in controllable text-to-video generation
Published: 2023

10. Multi-object Video Generation from Single Frame Layouts

Author: Wu, Yang, Liu, Zhibin, Wu, Hefeng, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we study video synthesis with emphasis on simplifying the generation conditions. Most existing video synthesis models or datasets are designed to address complex motions of a single object, lacking the ability of comprehensively understanding the spatio-temporal relationships among multiple objects. Besides, current methods are usually conditioned on intricate annotations (e.g. video segmentations) to generate new videos, being fundamentally less practical. These motivate us to generate multi-object videos conditioning exclusively on object layouts from a single frame. To solve above challenges and inspired by recent research on image generation from layouts, we have proposed a novel video generative framework capable of synthesizing global scenes with local objects, via implicit neural representations and layout motion self-inference. Our framework is a non-trivial adaptation from image generation methods, and is new to this field. In addition, our model has been evaluated on two widely-used video recognition benchmarks, demonstrating effectiveness compared to the baseline model., Comment: 6 pages limit
Published: 2023

11. Category-Adaptive Label Discovery and Noise Rejection for Multi-label Image Recognition with Partial Positive Labels

Author: Pu, Tao, Lao, Qianru, Wu, Hefeng, Chen, Tianshui, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: As a promising solution of reducing annotation cost, training multi-label models with partial positive labels (MLR-PPL), in which merely few positive labels are known while other are missing, attracts increasing attention. Due to the absence of any negative labels, previous works regard unknown labels as negative and adopt traditional MLR algorithms. To reject noisy labels, recent works regard large loss samples as noise but ignore the semantic correlation different multi-label images. In this work, we propose to explore semantic correlation among different images to facilitate the MLR-PPL task. Specifically, we design a unified framework, Category-Adaptive Label Discovery and Noise Rejection, that discovers unknown labels and rejects noisy labels for each category in an adaptive manner. The framework consists of two complementary modules: (1) Category-Adaptive Label Discovery module first measures the semantic similarity between positive samples and then complement unknown labels with high similarities; (2) Category-Adaptive Noise Rejection module first computes the sample weights based on semantic similarities from different samples and then discards noisy labels with low weights. Besides, we propose a novel category-adaptive threshold updating that adaptively adjusts the threshold, to avoid the time-consuming manual tuning process. Extensive experiments demonstrate that our proposed method consistently outperforms current leading algorithms., Comment: arXiv admin note: text overlap with arXiv:2205.13092
Published: 2022

12. Dual-Perspective Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels

Author: Pu, Tao, Chen, Tianshui, Wu, Hefeng, Shi, Yukai, Yang, Zhijing, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Despite achieving impressive progress, current multi-label image recognition (MLR) algorithms heavily depend on large-scale datasets with complete labels, making collecting large-scale datasets extremely time-consuming and labor-intensive. Training the multi-label image recognition models with partial labels (MLR-PL) is an alternative way, in which merely some labels are known while others are unknown for each image. However, current MLP-PL algorithms rely on pre-trained image similarity models or iteratively updating the image classification models to generate pseudo labels for the unknown labels. Thus, they depend on a certain amount of annotations and inevitably suffer from obvious performance drops, especially when the known label proportion is low. To address this dilemma, we propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images, from instance and prototype perspective respectively, to transfer information of known labels to complement unknown labels. Specifically, an instance-perspective representation blending (IPRB) module is designed to blend the representations of the known labels in an image with the representations of the corresponding unknown labels in another image to complement these unknown labels. Meanwhile, a prototype-perspective representation blending (PPRB) module is introduced to learn more stable representation prototypes for each category and blends the representation of unknown labels with the prototypes of corresponding labels, in a location-sensitive manner, to complement these unknown labels. Extensive experiments on the MS-COCO, Visual Genome, and Pascal VOC 2007 datasets show that the proposed DSRB consistently outperforms current state-of-the-art algorithms on all known label proportion settings., Comment: Technical Report. arXiv admin note: text overlap with arXiv:2203.02172
Published: 2022

13. Semantic Representation and Dependency Learning for Multi-Label Image Recognition

Author: Pu, Tao, Sun, Mingzhan, Wu, Hefeng, Chen, Tianshui, Tian, Ling, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently many multi-label image recognition (MLR) works have made significant progress by introducing pre-trained object detection models to generate lots of proposals or utilizing statistical label co-occurrence enhance the correlation among different categories. However, these works have some limitations: (1) the effectiveness of the network significantly depends on pre-trained object detection models that bring expensive and unaffordable computation; (2) the network performance degrades when there exist occasional co-occurrence objects in images, especially for the rare categories. To address these problems, we propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category and capture semantic dependency among all categories. Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model to focus on semantic-aware regions. We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions to regularize the network training. Extensive experiments and comparisons on two popular MLR benchmark datasets (i.e., MS-COCO and Pascal VOC 2007) demonstrate the effectiveness of the proposed framework over current state-of-the-art algorithms., Comment: accepted by Neurocomputing
Published: 2022

14. Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels

Author: Pu, Tao, Chen, Tianshui, Wu, Hefeng, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Training the multi-label image recognition models with partial labels, in which merely some labels are known while others are unknown for each image, is a considerably challenging and practical task. To address this task, current algorithms mainly depend on pre-training classification or similarity models to generate pseudo labels for the unknown labels. However, these algorithms depend on sufficient multi-label annotations to train the models, leading to poor performance especially with low known label proportion. In this work, we propose to blend category-specific representation across different images to transfer information of known labels to complement unknown labels, which can get rid of pre-training models and thus does not depend on sufficient annotations. To this end, we design a unified semantic-aware representation blending (SARB) framework that exploits instance-level and prototype-level semantic representation to complement unknown labels by two complementary modules: 1) an instance-level representation blending (ILRB) module blends the representations of the known labels in an image to the representations of the unknown labels in another image to complement these unknown labels. 2) a prototype-level representation blending (PLRB) module learns more stable representation prototypes for each category and blends the representation of unknown labels with the prototypes of corresponding labels to complement these labels. Extensive experiments on the MS-COCO, Visual Genome, Pascal VOC 2007 datasets show that the proposed SARB framework obtains superior performance over current leading competitors on all known label proportion settings, i.e., with the mAP improvement of 4.6%, 4.%, 2.2% on these three datasets when the known label proportion is 10%. Codes are available at https://github.com/HCPLab-SYSU/HCP-MLR-PL., Comment: Accepted by AAAI'22
Published: 2022

15. Retraction Note: Multipoint infrared laser-based detection and tracking for people counting

Author: Wu, Hefeng, Gao, Chengying, Cui, Yirui, and Wang, Ruomei
Published: 2024
Full Text: View/download PDF

16. Dual-perspective semantic-aware representation blending for multi-label image recognition with partial labels

Author: Pu, Tao, Chen, Tianshui, Wu, Hefeng, Shi, Yukai, Yang, Zhijing, and Lin, Liang
Published: 2024
Full Text: View/download PDF

17. Structured Semantic Transfer for Multi-Label Recognition with Partial Labels

Author: Chen, Tianshui, Pu, Tao, Wu, Hefeng, Xie, Yuan, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multi-label image recognition is a fundamental yet practical task because real-world images inherently possess multiple semantic labels. However, it is difficult to collect large-scale multi-label annotations due to the complexity of both the input images and output label spaces. To reduce the annotation cost, we propose a structured semantic transfer (SST) framework that enables training multi-label recognition models with partial labels, i.e., merely some labels are known while other labels are missing (also called unknown labels) per image. The framework consists of two complementary transfer modules that explore within-image and cross-image semantic correlations to transfer knowledge of known labels to generate pseudo labels for unknown labels. Specifically, an intra-image semantic transfer module learns image-specific label co-occurrence matrix and maps the known labels to complement unknown labels based on this matrix. Meanwhile, a cross-image transfer module learns category-specific feature similarities and helps complement unknown labels with high similarities. Finally, both known and generated labels are used to train the multi-label recognition models. Extensive experiments on the Microsoft COCO, Visual Genome and Pascal VOC datasets show that the proposed SST framework obtains superior performance over current state-of-the-art algorithms. Codes are available at https://github.com/HCPLab-SYSU/HCP-MLR-PL., Comment: Accepted by AAAI'22
Published: 2021

18. AU-Expression Knowledge Constrained Representation Learning for Facial Expression Recognition

Author: Pu, Tao, Chen, Tianshui, Xie, Yuan, Wu, Hefeng, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recognizing human emotion/expressions automatically is quite an expected ability for intelligent robotics, as it can promote better communication and cooperation with humans. Current deep-learning-based algorithms may achieve impressive performance in some lab-controlled environments, but they always fail to recognize the expressions accurately for the uncontrolled in-the-wild situation. Fortunately, facial action units (AU) describe subtle facial behaviors, and they can help distinguish uncertain and ambiguous expressions. In this work, we explore the correlations among the action units and facial expressions, and devise an AU-Expression Knowledge Constrained Representation Learning (AUE-CRL) framework to learn the AU representations without AU annotations and adaptively use representations to facilitate facial expression recognition. Specifically, it leverages AU-expression correlations to guide the learning of the AU classifiers, and thus it can obtain AU representations without incurring any AU annotations. Then, it introduces a knowledge-guided attention mechanism that mines useful AU representations under the constraint of AU-expression correlations. In this way, the framework can capture local discriminative and complementary features to enhance facial representation for facial expression recognition. We conduct experiments on the challenging uncontrolled datasets to demonstrate the superiority of the proposed framework over current state-of-the-art methods. Codes and trained models are available at https://github.com/HCPLab-SYSU/AUE-CRL., Comment: Accepted at ICRA 2021
Published: 2020

19. Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting

Author: Liu, Lingbo, Chen, Jiaqi, Wu, Hefeng, Li, Guanbin, Li, Chenglong, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Crowd counting is a fundamental yet challenging task, which desires rich information to generate pixel-wise crowd density maps. However, most previous methods only used the limited information of RGB images and cannot well discover potential pedestrians in unconstrained scenarios. In this work, we find that incorporating optical and thermal information can greatly help to recognize pedestrians. To promote future researches in this field, we introduce a large-scale RGBT Crowd Counting (RGBT-CC) benchmark, which contains 2,030 pairs of RGB-thermal images with 138,389 annotated people. Furthermore, to facilitate the multimodal crowd counting, we propose a cross-modal collaborative representation learning framework, which consists of multiple modality-specific branches, a modality-shared branch, and an Information Aggregation-Distribution Module (IADM) to capture the complementary information of different modalities fully. Specifically, our IADM incorporates two collaborative information transfers to dynamically enhance the modality-shared and modality-specific representations with a dual information propagation mechanism. Extensive experiments conducted on the RGBT-CC benchmark demonstrate the effectiveness of our framework for RGBT crowd counting. Moreover, the proposed approach is universal for multimodal crowd counting and is also capable to achieve superior performance on the ShanghaiTechRGBD dataset. Finally, our source code and benchmark are released at {\url{http://lingboliu.com/RGBT_Crowd_Counting.html}}., Comment: Accepted by CVPR2021. Our code and benchmark for RGBT crowd counting are released at {\url{http://lingboliu.com/RGBT_Crowd_Counting.html}}
Published: 2020

20. Research on defect recognition technology of transmission line based on visual macromodeling

Author: Li Yang, Li Yan, Wang Qi, Wang Wanguo, Liu Guangxiu, Li Zhenyu, Wu Hefeng, and Jiang Shihao
Subjects: transmission line defect recognition, cnn, connected domain algorithm, knowledge distillation, 97m50, Mathematics, QA1-939
Abstract: In order to improve the defect recognition efficiency of transmission lines, the industry is currently using aerial images for automatic visual defect detection to ensure the safe operation of transmission lines. This paper proposes a method for defect recognition from coarse to fine, based on convolutional neural networks and connected domain algorithms, to improve recognition accuracy. The recognition speed is improved by using the knowledge distillation method of target detection networks based on decoupled features, adversarial features, and attention features. It has been found that the optimized recognition model improves the precision rate by 7%, the recall rate by 8%, and the average accuracy rate by 10%. The FPS of the model optimized by knowledge distillation is 62.5, and the average value of the FPS of other versions of this model is 47.35. It is believed that the two optimization ideas introduced in this paper can enhance the previous transmission line defect recognition algorithm in terms of accuracy and recognition speed.
Published: 2024
Full Text: View/download PDF

21. Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition

Author: Chen, Tianshui, Lin, Liang, Chen, Riquan, Hui, Xiaolu, and Wu, Hefeng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recognizing multiple labels of an image is a practical yet challenging task, and remarkable progress has been achieved by searching for semantic regions and exploiting label dependencies. However, current works utilize RNN/LSTM to implicitly capture sequential region/label dependencies, which cannot fully explore mutual interactions among the semantic regions/labels and do not explicitly integrate label co-occurrences. In addition, these works require large amounts of training samples for each category, and they are unable to generalize to novel categories with limited samples. To address these issues, we propose a knowledge-guided graph routing (KGGR) framework, which unifies prior knowledge of statistical label correlations with deep neural networks. The framework exploits prior knowledge to guide adaptive information propagation among different categories to facilitate multi-label analysis and reduce the dependency of training samples. Specifically, it first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence. Then, it introduces the label semantics to guide learning semantic-specific features to initialize the graph, and it exploits a graph propagation network to explore graph node interactions, enabling learning contextualized image feature representations. Moreover, we initialize each graph node with the classifier weights for the corresponding label and apply another propagation network to transfer node messages through the graph. In this way, it can facilitate exploiting the information of correlated labels to help train better classifiers. We conduct extensive experiments on the traditional multi-label image recognition (MLR) and multi-label few-shot learning (ML-FSL) tasks and show that our KGGR framework outperforms the current state-of-the-art methods by sizable margins on the public benchmarks., Comment: Accepted at TPAMI
Published: 2020

22. Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition

Author: Xie, Yuan, Chen, Tianshui, Pu, Tao, Wu, Hefeng, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Data inconsistency and bias are inevitable among different facial expression recognition (FER) datasets due to subjective annotating process and different collecting conditions. Recent works resort to adversarial mechanisms that learn domain-invariant features to mitigate domain shift. However, most of these works focus on holistic feature adaptation, and they ignore local features that are more transferable across different datasets. Moreover, local features carry more detailed and discriminative content for expression recognition, and thus integrating local features may enable fine-grained adaptation. In this work, we propose a novel Adversarial Graph Representation Adaptation (AGRA) framework that unifies graph representation propagation with adversarial learning for cross-domain holistic-local feature co-adaptation. To achieve this, we first build a graph to correlate holistic and local regions within each domain and another graph to correlate these regions across different domains. Then, we learn the per-class statistical distribution of each domain and extract holistic-local features from the input image to initialize the corresponding graph nodes. Finally, we introduce two stacked graph convolution networks to propagate holistic-local feature within each domain to explore their interaction and across different domains for holistic-local feature co-adaptation. In this way, the AGRA framework can adaptively learn fine-grained domain-invariant features and thus facilitate cross-domain expression recognition. We conduct extensive and fair experiments on several popular benchmarks and show that the proposed AGRA framework achieves superior performance over previous state-of-the-art methods., Comment: Accepted at ACM MM 2020
Published: 2020

23. Cross-Domain Facial Expression Recognition: A Unified Evaluation Benchmark and Adversarial Graph Learning

Author: Chen, Tianshui, Pu, Tao, Wu, Hefeng, Xie, Yuan, Liu, Lingbo, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: To address the problem of data inconsistencies among different facial expression recognition (FER) datasets, many cross-domain FER methods (CD-FERs) have been extensively devised in recent years. Although each declares to achieve superior performance, fair comparisons are lacking due to the inconsistent choices of the source/target datasets and feature extractors. In this work, we first analyze the performance effect caused by these inconsistent choices, and then re-implement some well-performing CD-FER and recently published domain adaptation algorithms. We ensure that all these algorithms adopt the same source datasets and feature extractors for fair CD-FER evaluations. We find that most of the current leading algorithms use adversarial learning to learn holistic domain-invariant features to mitigate domain shifts. However, these algorithms ignore local features, which are more transferable across different datasets and carry more detailed content for fine-grained adaptation. To address these issues, we integrate graph representation propagation with adversarial learning for cross-domain holistic-local feature co-adaptation by developing a novel adversarial graph representation adaptation (AGRA) framework. Specifically, it first builds two graphs to correlate holistic and local regions within each domain and across different domains, respectively. Then, it extracts holistic-local features from the input image and uses learnable per-class statistical distributions to initialize the corresponding graph nodes. Finally, two stacked graph convolution networks (GCNs) are adopted to propagate holistic-local features within each domain to explore their interaction and across different domains for holistic-local feature co-adaptation. We conduct extensive and fair evaluations on several popular benchmarks and show that the proposed AGRA framework outperforms previous state-of-the-art methods., Comment: Accepted at T-PAMI, 2021. arXiv admin note: text overlap with arXiv:2008.00859
Published: 2020

24. Fine-Grained Image Captioning with Global-Local Discriminative Objective

Author: Wu, Jie, Chen, Tianshui, Wu, Hefeng, Yang, Zhi, Luo, Guangchun, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Significant progress has been made in recent years in image captioning, an active topic in the fields of vision and language. However, existing methods tend to yield overly general captions and consist of some of the most frequent words/phrases, resulting in inaccurate and indistinguishable descriptions (see Figure 1). This is primarily due to (i) the conservative characteristic of traditional training objectives that drives the model to generate correct but hardly discriminative captions for similar images and (ii) the uneven word distribution of the ground-truth captions, which encourages generating highly frequent words/phrases while suppressing the less frequent but more concrete ones. In this work, we propose a novel global-local discriminative objective that is formulated on top of a reference model to facilitate generating fine-grained descriptive captions. Specifically, from a global perspective, we design a novel global discriminative constraint that pulls the generated sentence to better discern the corresponding image from all others in the entire dataset. From the local perspective, a local discriminative constraint is proposed to increase attention such that it emphasizes the less frequent but more concrete words/phrases, thus facilitating the generation of captions that better describe the visual details of the given images. We evaluate the proposed method on the widely used MS-COCO dataset, where it outperforms the baseline methods by a sizable margin and achieves competitive performance over existing leading approaches. We also conduct self-retrieval experiments to demonstrate the discriminability of the proposed method., Comment: Accepted by TMM
Published: 2020

25. Efficient Crowd Counting via Structured Knowledge Transfer

Author: Liu, Lingbo, Chen, Jiaqi, Wu, Hefeng, Chen, Tianshui, Li, Guanbin, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications. However, most previous works relied on heavy backbone networks and required prohibitive run-time consumption, which would seriously restrict their deployment scopes and cause poor scalability. To liberate these crowd counting models, we propose a novel Structured Knowledge Transfer (SKT) framework, which fully exploits the structured knowledge of a well-trained teacher network to generate a lightweight but still highly effective student network. Specifically, it is integrated with two complementary transfer modules, including an Intra-Layer Pattern Transfer which sequentially distills the knowledge embedded in layer-wise features of the teacher network to guide feature learning of the student network and an Inter-Layer Relation Transfer which densely distills the cross-layer correlation knowledge of the teacher to regularize the student's feature evolutio Consequently, our student network can derive the layer-wise and cross-layer knowledge from the teacher network to learn compact yet effective features. Extensive evaluations on three benchmarks well demonstrate the effectiveness of our SKT for extensive crowd counting models. In particular, only using around $6\%$ of the parameters and computation cost of original models, our distilled VGG-based models obtain at least 6.5$\times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance. Our code and models are available at {\url{https://github.com/HCPLab-SYSU/SKT}}., Comment: This paper has been accepted by ACM MM 2020. Our code and models are available at {\url{https://github.com/HCPLab-SYSU/SKT}}
Published: 2020

26. Physical-Virtual Collaboration Modeling for Intra-and Inter-Station Metro Ridership Prediction

Author: Liu, Lingbo, Chen, Jingwen, Wu, Hefeng, Zhen, Jiajie, Li, Guanbin, and Lin, Liang
Subjects: Computer Science - Machine Learning
Abstract: Due to the widespread applications in real-world scenarios, metro ridership prediction is a crucial but challenging task in intelligent transportation systems. However, conventional methods either ignore the topological information of metro systems or directly learn on physical topology, and cannot fully explore the patterns of ridership evolution. To address this problem, we model a metro system as graphs with various topologies and propose a unified Physical-Virtual Collaboration Graph Network (PVCGN), which can effectively learn the complex ridership patterns from the tailor-designed graphs. Specifically, a physical graph is directly built based on the realistic topology of the studied metro system, while a similarity graph and a correlation graph are built with virtual topologies under the guidance of the inter-station passenger flow similarity and correlation. These complementary graphs are incorporated into a Graph Convolution Gated Recurrent Unit (GC-GRU) for spatial-temporal representation learning. Further, a Fully-Connected Gated Recurrent Unit (FC-GRU) is also applied to capture the global evolution tendency. Finally, we develop a Seq2Seq model with GC-GRU and FC-GRU to forecast the future metro ridership sequentially. Extensive experiments on two large-scale benchmarks (e.g., Shanghai Metro and Hangzhou Metro) well demonstrate the superiority of our PVCGN for station-level metro ridership prediction. Moreover, we apply the proposed PVCGN to address the online origin-destination (OD) ridership prediction and the experiment results show the universality of our method. Our code and benchmarks are available at https://github.com/HCPLab-SYSU/PVCGN., Comment: This paper has been accepted by IEEE Transactions on Intelligent Transportation Systems (TITS). Our code and benchmarks are available at https://github.com/HCPLab-SYSU/PVCGN
Published: 2020

27. Knowledge Graph Transfer Network for Few-Shot Recognition

Author: Chen, Riquan, Chen, Tianshui, Hui, Xiaolu, Wu, Hefeng, Li, Guanbin, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Few-shot learning aims to learn novel categories from very few samples given some base categories with sufficient training samples. The main challenge of this task is the novel categories are prone to dominated by color, texture, shape of the object or background context (namely specificity), which are distinct for the given few training samples but not common for the corresponding categories (see Figure 1). Fortunately, we find that transferring information of the correlated based categories can help learn the novel concepts and thus avoid the novel concept being dominated by the specificity. Besides, incorporating semantic correlations among different categories can effectively regularize this information transfer. In this work, we represent the semantic correlations in the form of structured knowledge graph and integrate this graph into deep neural networks to promote few-shot learning by a novel Knowledge Graph Transfer Network (KGTN). Specifically, by initializing each node with the classifier weight of the corresponding category, a propagation mechanism is learned to adaptively propagate node message through the graph to explore node interaction and transfer classifier information of the base categories to those of the novel ones. Extensive experiments on the ImageNet dataset show significant performance improvement compared with current leading competitors. Furthermore, we construct an ImageNet-6K dataset that covers larger scale categories, i.e, 6,000 categories, and experiments on this dataset further demonstrate the effectiveness of our proposed model. Our codes and models are available at https://github.com/MyChocer/KGTN ., Comment: accepted by AAAI 2020 as oral paper
Published: 2019
Full Text: View/download PDF

28. Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition

Author: Chen, Tianshui, Xu, Muxin, Hui, Xiaolu, Wu, Hefeng, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recognizing multiple labels of images is a practical and challenging task, and significant progress has been made by searching semantic-aware regions and modeling label dependency. However, current methods cannot locate the semantic regions accurately due to the lack of part-level supervision or semantic guidance. Moreover, they cannot fully explore the mutual interactions among the semantic regions and do not explicitly model the label co-occurrence. To address these issues, we propose a Semantic-Specific Graph Representation Learning (SSGRL) framework that consists of two crucial modules: 1) a semantic decoupling module that incorporates category semantics to guide learning semantic-specific representations and 2) a semantic interaction module that correlates these representations with a graph built on the statistical label co-occurrence and explores their interactions via a graph propagation mechanism. Extensive experiments on public benchmarks show that our SSGRL framework outperforms current state-of-the-art methods by a sizable margin, e.g. with an mAP improvement of 2.5%, 2.6%, 6.7%, and 3.1% on the PASCAL VOC 2007 & 2012, Microsoft-COCO and Visual Genome benchmarks, respectively. Our codes and models are available at https://github.com/HCPLab-SYSU/SSGRL., Comment: accepted by ICCV 2019
Published: 2019

29. Instance-Aware Representation Learning and Association for Online Multi-Person Tracking

Author: Wu, Hefeng, Hu, Yafei, Wang, Keze, Li, Hanhui, Nie, Lin, and Cheng, Hui
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multi-Person Tracking (MPT) is often addressed within the detection-to-association paradigm. In such approaches, human detections are first extracted in every frame and person trajectories are then recovered by a procedure of data association (usually offline). However, their performances usually degenerate in presence of detection errors, mutual interactions and occlusions. In this paper, we present a deep learning based MPT approach that learns instance-aware representations of tracked persons and robustly online infers states of the tracked persons. Specifically, we design a multi-branch neural network (MBN), which predicts the classification confidences and locations of all targets by taking a batch of candidate regions as input. In our MBN architecture, each branch (instance-subnet) corresponds to an individual to be tracked and new branches can be dynamically created for handling newly appearing persons. Then based on the output of MBN, we construct a joint association matrix that represents meaningful states of tracked persons (e.g., being tracked or disappearing from the scene) and solve it by using the efficient Hungarian algorithm. Moreover, we allow the instance-subnets to be updated during tracking by online mining hard examples, accounting to person appearance variations over time. We comprehensively evaluate our framework on a popular MPT benchmark, demonstrating its excellent performance in comparison with recent online MPT methods., Comment: accepted by Pattern Recognition
Published: 2019
Full Text: View/download PDF

30. Multi-column Point-CNN for Sketch Segmentation

Author: Wang, Fei, Lin, Shujin, Li, Hanhui, Wu, Hefeng, Jiang, Junkun, Wang, Ruomei, and Luo, Xiaonan
Subjects: Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Traditional sketch segmentation methods mainly rely on handcrafted features and complicate models, and their performance is far from satisfactory due to the abstract representation of sketches. Recent success of Deep Neural Networks (DNNs) in related tasks suggests DNNs could be a practical solution for this problem, yet the suitable datasets for learning and evaluating DNNs are limited. To this end, we introduce SketchSeg, a large dataset consisting of 10,000 pixel-wisely labeled sketches.Besides, due to the lack of colors and textures in sketches, conventional DNNs learned on natural images are not optimal for tackling our problem.Therefore, we further propose the Multi-column Point-CNN (MCPNet), which (1) directly takes sampled points as its input to reduce computational costs, and (2) adopts multiple columns with different filter sizes to better capture the structures of sketches. Extensive experiments validate that the MCPNet is superior to conventional DNNs like FCN. The SketchSeg dataset is publicly available on https://drive.google.com/open?id=1OpCBvkInhxvfAHuVs-spDEppb8iXFC3C.
Published: 2018

31. Semantic representation and dependency learning for multi-label image recognition

Author: Pu, Tao, Sun, Mingzhan, Wu, Hefeng, Chen, Tianshui, Tian, Ling, and Lin, Liang
Published: 2023
Full Text: View/download PDF

32. ADCrowdNet: An Attention-injective Deformable Convolutional Network for Crowd Understanding

Author: Liu, Ning, Long, Yongchao, Zou, Changqing, Niu, Qun, Pan, Li, and Wu, Hefeng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose an attention-injective deformable convolutional network called ADCrowdNet for crowd understanding that can address the accuracy degradation problem of highly congested noisy scenes. ADCrowdNet contains two concatenated networks. An attention-aware network called Attention Map Generator (AMG) first detects crowd regions in images and computes the congestion degree of these regions. Based on detected crowd regions and congestion priors, a multi-scale deformable network called Density Map Estimator (DME) then generates high-quality density maps. With the attention-aware training scheme and multi-scale deformable convolutional scheme, the proposed ADCrowdNet achieves the capability of being more effective to capture the crowd features and more resistant to various noises. We have evaluated our method on four popular crowd counting datasets (ShanghaiTech, UCF_CC_50, WorldEXPO'10, and UCSD) and an extra vehicle counting dataset TRANCOS, and our approach beats existing state-of-the-art approaches on all of these datasets., Comment: Accepted by CVPR 2019
Published: 2018

33. Structured Inhomogeneous Density Map Learning for Crowd Counting

Author: Li, Hanhui, He, Xiangjian, Wu, Hefeng, Kasmani, Saeed Amirgholipour, Wang, Ruomei, Luo, Xiaonan, and Lin, Liang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we aim at tackling the problem of crowd counting in extremely high-density scenes, which contain hundreds, or even thousands of people. We begin by a comprehensive analysis of the most widely used density map-based methods, and demonstrate how easily existing methods are affected by the inhomogeneous density distribution problem, e.g., causing them to be sensitive to outliers, or be hard to optimized. We then present an extremely simple solution to the inhomogeneous density distribution problem, which can be intuitively summarized as extending the density map from 2D to 3D, with the extra dimension implicitly indicating the density level. Such solution can be implemented by a single Density-Aware Network, which is not only easy to train, but also can achieve the state-of-art performance on various challenging datasets., Comment: 10 pages, 7 figures
Published: 2018

34. Crowd counting via Localization Guided Transformer

Author: Yuan, Lixian, Chen, Yandong, Wu, Hefeng, Wan, Wentao, and Chen, Pei
Published: 2022
Full Text: View/download PDF

35. Learning Deep Similarity Models with Focus Ranking for Fabric Image Retrieval

Author: Deng, Daiguo, Wang, Ruomei, Wu, Hefeng, He, Huayong, Li, Qi, and Luo, Xiaonan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Fabric image retrieval is beneficial to many applications including clothing searching, online shopping and cloth modeling. Learning pairwise image similarity is of great importance to an image retrieval task. With the resurgence of Convolutional Neural Networks (CNNs), recent works have achieved significant progresses via deep representation learning with metric embedding, which drives similar examples close to each other in a feature space, and dissimilar ones apart from each other. In this paper, we propose a novel embedding method termed focus ranking that can be easily unified into a CNN for jointly learning image representations and metrics in the context of fine-grained fabric image retrieval. Focus ranking aims to rank similar examples higher than all dissimilar ones by penalizing ranking disorders via the minimization of the overall cost attributed to similar samples being ranked below dissimilar ones. At the training stage, training samples are organized into focus ranking units for efficient optimization. We build a large-scale fabric image retrieval dataset (FIRD) with about 25,000 images of 4,300 fabrics, and test the proposed model on the FIRD dataset. Experimental results show the superiority of the proposed model over existing metric embedding models., Comment: 11 pages, 9 figures, accepted by Image and Vision Computing
Published: 2017
Full Text: View/download PDF

36. Insight into excitation and acquisition mechanism and mode control of Lamb waves with piezopolymer coating-based array transducers: Analytical and experimental analysis

Author: Li, Yehai, Wang, Kai, Feng, Wei, Wu, Hefeng, Su, Zhongqing, and Guo, Shifeng
Published: 2022
Full Text: View/download PDF

37. Salient Superpixel Visual Tracking with Graph Model and Iterative Segmentation

Author: Zhan, Jin, Zhao, Huimin, Zheng, Penggen, Wu, Hefeng, and Wang, Leijun
Published: 2021
Full Text: View/download PDF

38. DiffusionGPT: LLM-Driven Text-to-Image Generation System

Author: Qin, Jie, Wu, Jie, Chen, Weifeng, Ren, Yuxi, Li, Huixia, Wu, Hefeng, Xiao, Xuefeng, Wang, Rui, Wen, Shilei, Qin, Jie, Wu, Jie, Chen, Weifeng, Ren, Yuxi, Li, Huixia, Wu, Hefeng, Xiao, Xuefeng, Wang, Rui, and Wen, Shilei
Abstract: Diffusion models have opened up new avenues for the field of image generation, resulting in the proliferation of high-quality models shared on open-source platforms. However, a major challenge persists in current text-to-image systems are often unable to handle diverse inputs, or are limited to single model results. Current unified attempts often fall into two orthogonal aspects: i) parse Diverse Prompts in input stage; ii) activate expert model to output. To combine the best of both worlds, we propose DiffusionGPT, which leverages Large Language Models (LLM) to offer a unified generation system capable of seamlessly accommodating various types of prompts and integrating domain-expert models. DiffusionGPT constructs domain-specific Trees for various generative models based on prior knowledge. When provided with an input, the LLM parses the prompt and employs the Trees-of-Thought to guide the selection of an appropriate model, thereby relaxing input constraints and ensuring exceptional performance across diverse domains. Moreover, we introduce Advantage Databases, where the Tree-of-Thought is enriched with human feedback, aligning the model selection process with human preferences. Through extensive experiments and comparisons, we demonstrate the effectiveness of DiffusionGPT, showcasing its potential for pushing the boundaries of image synthesis in diverse domains.
Published: 2024

39. A survey of script learning

Author: Han, Yi, Qiao, Linbo, Zheng, Jianming, Wu, Hefeng, Li, Dongsheng, and Liao, Xiangke
Published: 2021
Full Text: View/download PDF

40. Crowd counting via scale-communicative aggregation networks

Author: Yuan, Lixian, Qiu, Zhilin, Liu, Lingbo, Wu, Hefeng, Chen, Tianshui, Chen, Pei, and Lin, Liang
Published: 2020
Full Text: View/download PDF

41. Multi-column point-CNN for sketch segmentation

Author: Wang, Fei, Lin, Shujin, Li, Hanhui, Wu, Hefeng, Cai, Tie, Luo, Xiaonan, and Wang, Ruomei
Published: 2020
Full Text: View/download PDF

42. Spatial–Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation

Author: Pu, Tao, primary, Chen, Tianshui, additional, Wu, Hefeng, additional, Lu, Yongyi, additional, and Lin, Liang, additional
Published: 2024
Full Text: View/download PDF

43. Real-Time RGBD Object Tracking via Collaborative Appearance and Motion Models

Author: Chen, Danxian, Liu, Zhanming, Wu, Hefeng, Zhan, Jin, Barbosa, Simone Diniz Junqueira, Series Editor, Filipe, Joaquim, Series Editor, Kotenko, Igor, Series Editor, Sivalingam, Krishna M., Series Editor, Washio, Takashi, Series Editor, Yuan, Junsong, Series Editor, Zhou, Lizhu, Series Editor, Li, Kangshun, editor, Li, Wei, editor, Chen, Zhangxing, editor, and Liu, Yong, editor
Published: 2018
Full Text: View/download PDF

44. Visual Tracking via Clustering-Based Patch Weighing and Masking

Author: Yuan, He, Wu, Hefeng, Feng, Dapeng, Gong, Yongyi, Kacprzyk, Janusz, Series editor, Pal, Nikhil R., Advisory editor, Bello Perez, Rafael, Advisory editor, Corchado, Emilio S., Advisory editor, Hagras, Hani, Advisory editor, Kóczy, László T., Advisory editor, Kreinovich, Vladik, Advisory editor, Lin, Chin-Teng, Advisory editor, Lu, Jie, Advisory editor, Melin, Patricia, Advisory editor, Nedjah, Nadia, Advisory editor, Nguyen, Ngoc Thanh, Advisory editor, Wang, Jun, Advisory editor, Bhatia, Sanjiv K., editor, Mishra, Krishn K., editor, Tiwari, Shailesh, editor, and Singh, Vivek Kumar, editor
Published: 2018
Full Text: View/download PDF

45. Robust Visual Tracking via Sparse Feature Selection and Weight Dictionary Update

Author: Zheng, Penggen, Zhan, Jin, Zhao, Huimin, Wu, Hefeng, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Ren, Jinchang, editor, Hussain, Amir, editor, Zheng, Jiangbin, editor, Liu, Cheng-Lin, editor, Luo, Bin, editor, Zhao, Huimin, editor, and Zhao, Xinbo, editor
Published: 2018
Full Text: View/download PDF

46. Contrastive Transformer Learning With Proximity Data Generation for Text-Based Person Search

Author: Wu, Hefeng, Chen, Weifeng, Liu, Zhibin, Chen, Tianshui, Chen, Zhiguang, and Lin, Liang
Abstract: Given a descriptive text query, text-based person search (TBPS) aims to retrieve the best-matched target person from an image gallery. Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data. To better align the two modalities, most existing works focus on introducing sophisticated network structures and auxiliary tasks, which are complex and hard to implement. In this paper, we propose a simple yet effective dual Transformer model for text-based person search. By exploiting a hardness-aware contrastive learning strategy, our model achieves state-of-the-art performance without any special design for local feature alignment or side information. Moreover, we propose a proximity data generation (PDG) module to automatically produce more diverse data for cross-modal training. The PDG module first introduces an automatic generation algorithm based on a text-to-image diffusion model, which generates new text-image pair samples in the proximity space of original ones. Then it combines approximate text generation and feature-level mixup during training to further strengthen the data diversity. The PDG module can largely guarantee the reasonability of the generated samples that are directly used for training without any human inspection for noise rejection. It improves the performance of our model significantly, providing a feasible solution to the data insufficiency problem faced by such fine-grained visual-linguistic tasks. Extensive experiments on two popular datasets of the TBPS task (i.e., CUHK-PEDES and ICFG-PEDES) show that the proposed approach outperforms state-of-the-art approaches evidently, e.g., improving by 3.88%, 4.02%, 2.92% in terms of Top1, Top5, Top10 on CUHK-PEDES.
Published: 2024
Full Text: View/download PDF

47. Pedestrian Detection via Structure-Sensitive Deep Representation Learning

Author: Huang, Deliang, Huang, Shijia, Wu, Hefeng, Liu, Ning, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Zhao, Yao, editor, Kong, Xiangwei, editor, and Taubman, David, editor
Published: 2017
Full Text: View/download PDF

48. Boosting Zero-Shot Image Classification via Pairwise Relationship Learning

Author: Li, Hanhui, Wu, Hefeng, Lin, Shujin, Lin, Liang, Luo, Xiaonan, Izquierdo, Ebroul, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Lai, Shang-Hong, editor, Lepetit, Vincent, editor, Nishino, Ko, editor, and Sato, Yoichi, editor
Published: 2017
Full Text: View/download PDF

49. Weak-structure-aware visual object tracking with bottom-up and top-down context exploration

Author: Liu, Ning, Liu, Chang, Wu, Hefeng, Zhu, Hengzheng, and Zhan, Jin
Published: 2018
Full Text: View/download PDF

50. A new geometric modeling approach for woven fabric based on Frenet frame and Spiral Equation

Author: Deng, Daiguo, Wu, Hefeng, Sun, Peng, Wang, Ruomei, Shi, Zhuo, and Luo, Xiaonan
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

219 results on '"Wu, Hefeng"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources