718 results on '"Zhang, Zhongfei"'
Search Results
2. Dense Affinity Matching for Few-Shot Segmentation
- Author
-
Chen, Hao, Dong, Yonghan, Lu, Zheming, Yu, Yunlong, Li, Yingming, Han, Jungong, and Zhang, Zhongfei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Few-Shot Segmentation (FSS) aims to segment the novel class images with a few annotated samples. In this paper, we propose a dense affinity matching (DAM) framework to exploit the support-query interaction by densely capturing both the pixel-to-pixel and pixel-to-patch relations in each support-query pair with the bidirectional 3D convolutions. Different from the existing methods that remove the support background, we design a hysteretic spatial filtering module (HSFM) to filter the background-related query features and retain the foreground-related query features with the assistance of the support background, which is beneficial for eliminating interference objects in the query background. We comprehensively evaluate our DAM on ten benchmarks under cross-category, cross-dataset, and cross-domain FSS tasks. Experimental results demonstrate that DAM performs very competitively under different settings with only 0.68M parameters, especially under cross-domain FSS tasks, showing its effectiveness and efficiency.
- Published
- 2023
3. Multi-Content Interaction Network for Few-Shot Segmentation
- Author
-
Chen, Hao, Yu, Yunlong, Dong, Yonghan, Lu, Zheming, Li, Yingming, and Zhang, Zhongfei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Few-Shot Segmentation (FSS) is challenging for limited support images and large intra-class appearance discrepancies. Most existing approaches focus on extracting high-level representations of the same layers for support-query correlations, neglecting the shift issue between different layers and scales, due to the huge difference between support and query samples. In this paper, we propose a Multi-Content Interaction Network (MCINet) to remedy this issue by fully exploiting and interacting with the multi-scale contextual information contained in the support-query pairs to supplement the same-layer correlations. Specifically, MCINet improves FSS from the perspectives of boosting the query representations by incorporating the low-level structural information from another query branch into the high-level semantic features, enhancing the support-query correlations by exploiting both the same-layer and adjacent-layer features, and refining the predicted results by a multi-scale mask prediction strategy, with which the different scale contents have bidirectionally interacted. Experiments on two benchmarks demonstrate that our approach reaches SOTA performances and outperforms the best competitors with many desirable advantages, especially on the challenging COCO dataset.
- Published
- 2023
4. Blockchain-Based Trusted Synchronization Operation Framework for Open Production Logistics System
- Author
-
Zhang, Zhongfei, Qu, Ting, Zhao, Kuo, Zhang, Kai, Zhang, Yongheng, Liu, Lei, Su, Nong, Huang, George Q., Chaari, Fakher, Series Editor, Gherardini, Francesco, Series Editor, Ivanov, Vitalii, Series Editor, Haddar, Mohamed, Series Editor, Cavas-Martínez, Francisco, Editorial Board Member, di Mare, Francesca, Editorial Board Member, Kwon, Young W., Editorial Board Member, Tolio, Tullio A.M., Editorial Board Member, Trojanowska, Justyna, Editorial Board Member, Schmitt, Robert, Editorial Board Member, Xu, Jinyang, Editorial Board Member, Chien, Chen-Fu, editor, Dou, Runliang, editor, and Luo, Li, editor
- Published
- 2024
- Full Text
- View/download PDF
5. Digital twin and blockchain-enabled trusted optimal-state synchronized control approach for distributed smart manufacturing system in social manufacturing
- Author
-
Zhang, Zhongfei, Qu, Ting, Huang, George Q., Zhao, Kuo, Zhang, Kai, Li, Mingxing, Zhang, Yongheng, Liu, Lei, and Zhong, Haihui
- Published
- 2024
- Full Text
- View/download PDF
6. Reducing Flipping Errors in Deep Neural Networks
- Author
-
Deng, Xiang, Xiao, Yun, Long, Bo, and Zhang, Zhongfei
- Subjects
Computer Science - Machine Learning - Abstract
Deep neural networks (DNNs) have been widely applied in various domains in artificial intelligence including computer vision and natural language processing. A DNN is typically trained for many epochs and then a validation dataset is used to select the DNN in an epoch (we simply call this epoch "the last epoch") as the final model for making predictions on unseen samples, while it usually cannot achieve a perfect accuracy on unseen samples. An interesting question is "how many test (unseen) samples that a DNN misclassifies in the last epoch were ever correctly classified by the DNN before the last epoch?". In this paper, we empirically study this question and find on several benchmark datasets that the vast majority of the misclassified samples in the last epoch were ever classified correctly before the last epoch, which means that the predictions for these samples were flipped from "correct" to "wrong". Motivated by this observation, we propose to restrict the behavior changes of a DNN on the correctly-classified samples so that the correct local boundaries can be maintained and the flipping error on unseen samples can be largely reduced. Extensive experiments on different benchmark datasets with different modern network architectures demonstrate that the proposed flipping error reduction (FER) approach can substantially improve the generalization, the robustness, and the transferability of DNNs without introducing any additional network parameters or inference cost, only with a negligible training overhead.
- Published
- 2022
7. COREN: Multi-Modal Co-Occurrence Transformer Reasoning Network for Image-Text Retrieval
- Author
-
Wang, Yaodong, Ji, Zhong, Chen, Kexin, Pang, Yanwei, and Zhang, Zhongfei
- Published
- 2023
- Full Text
- View/download PDF
8. Stable Prediction on Graphs with Agnostic Distribution Shift
- Author
-
Zhang, Shengyu, Kuang, Kun, Qiu, Jiezhong, Yu, Jin, Zhao, Zhou, Yang, Hongxia, Zhang, Zhongfei, and Wu, Fei
- Subjects
Computer Science - Machine Learning ,Computer Science - Social and Information Networks - Abstract
Graph is a flexible and effective tool to represent complex structures in practice and graph neural networks (GNNs) have been shown to be effective on various graph tasks with randomly separated training and testing data. In real applications, however, the distribution of training graph might be different from that of the test one (e.g., users' interactions on the user-item training graph and their actual preference on items, i.e., testing environment, are known to have inconsistencies in recommender systems). Moreover, the distribution of test data is always agnostic when GNNs are trained. Hence, we are facing the agnostic distribution shift between training and testing on graph learning, which would lead to unstable inference of traditional GNNs across different test environments. To address this problem, we propose a novel stable prediction framework for GNNs, which permits both locally and globally stable learning and prediction on graphs. In particular, since each node is partially represented by its neighbors in GNNs, we propose to capture the stable properties for each node (locally stable) by re-weighting the information propagation/aggregation processes. For global stability, we propose a stable regularizer that reduces the training losses on heterogeneous environments and thus warping the GNNs to generalize well. We conduct extensive experiments on several graph benchmarks and a noisy industrial recommendation dataset that is collected from 5 consecutive days during a product promotion festival. The results demonstrate that our method outperforms various SOTA GNNs for stable prediction on graphs with agnostic distribution shift, including shift caused by node labels and attributes., Comment: 11 pages, 6 figures
- Published
- 2021
9. Enhancing trusted synchronization in open production logistics: A platform framework integrating blockchain and digital twin under social manufacturing
- Author
-
Zhang, Zhongfei, Qu, Ting, Zhao, Kuo, Zhang, Kai, Zhang, Yongheng, Guo, Wenyou, Liu, Lei, and Chen, Zefeng
- Published
- 2024
- Full Text
- View/download PDF
10. Complementary Calibration: Boosting General Continual Learning with Collaborative Distillation and Self-Supervision
- Author
-
Ji, Zhong, Li, Jin, Wang, Qiang, and Zhang, Zhongfei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
General Continual Learning (GCL) aims at learning from non independent and identically distributed stream data without catastrophic forgetting of the old tasks that don't rely on task boundaries during both training and testing stages. We reveal that the relation and feature deviations are crucial problems for catastrophic forgetting, in which relation deviation refers to the deficiency of the relationship among all classes in knowledge distillation, and feature deviation refers to indiscriminative feature representations. To this end, we propose a Complementary Calibration (CoCa) framework by mining the complementary model's outputs and features to alleviate the two deviations in the process of GCL. Specifically, we propose a new collaborative distillation approach for addressing the relation deviation. It distills model's outputs by utilizing ensemble dark knowledge of new model's outputs and reserved outputs, which maintains the performance of old tasks as well as balancing the relationship among all classes. Furthermore, we explore a collaborative self-supervision idea to leverage pretext tasks and supervised contrastive learning for addressing the feature deviation problem by learning complete and discriminative features for all classes. Extensive experiments on four popular datasets show that our CoCa framework achieves superior performance against state-of-the-art methods. Code is available at https://github.com/lijincm/CoCa., Comment: Paper is available at https://ieeexplore.ieee.org/document/10002397. Code is available at https://github.com/lijincm/CoCa
- Published
- 2021
- Full Text
- View/download PDF
11. Self-Taught Cross-Domain Few-Shot Learning with Weakly Supervised Object Localization and Task-Decomposition
- Author
-
Liu, Xiyao, Ji, Zhong, Pang, Yanwei, and Zhang, Zhongfei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
The domain shift between the source and target domain is the main challenge in Cross-Domain Few-Shot Learning (CD-FSL). However, the target domain is absolutely unknown during the training on the source domain, which results in lacking directed guidance for target tasks. We observe that since there are similar backgrounds in target domains, it can apply self-labeled samples as prior tasks to transfer knowledge onto target tasks. To this end, we propose a task-expansion-decomposition framework for CD-FSL, called Self-Taught (ST) approach, which alleviates the problem of non-target guidance by constructing task-oriented metric spaces. Specifically, Weakly Supervised Object Localization (WSOL) and self-supervised technologies are employed to enrich task-oriented samples by exchanging and rotating the discriminative regions, which generates a more abundant task set. Then these tasks are decomposed into several tasks to finish the task of few-shot recognition and rotation classification. It helps to transfer the source knowledge onto the target tasks and focus on discriminative regions. We conduct extensive experiments under the cross-domain setting including 8 target domains: CUB, Cars, Places, Plantae, CropDieases, EuroSAT, ISIC, and ChestX. Experimental results demonstrate that the proposed ST approach is applicable to various metric-based models, and provides promising improvements in CD-FSL.
- Published
- 2021
12. Small-molecule Molephantin induces apoptosis and mitophagy flux blockage through ROS production in glioblastoma
- Author
-
Ling, Zhipeng, Pan, Junping, Zhang, Zhongfei, Chen, Guisi, Geng, Jiayuan, Lin, Qiang, Zhang, Tao, Cao, Shuqin, Chen, Cheng, Lin, Jinrong, Yuan, Hongyao, Ding, Weilong, Xiao, Fei, Xu, Xinke, Li, Fangcheng, Wang, Guocai, Zhang, Yubo, and Li, Junliang
- Published
- 2024
- Full Text
- View/download PDF
13. Tolerant Self-Distillation for image classification
- Author
-
Liu, Mushui, Yu, Yunlong, Ji, Zhong, Han, Jungong, and Zhang, Zhongfei
- Published
- 2024
- Full Text
- View/download PDF
14. Dense affinity matching for Few-Shot Segmentation
- Author
-
Chen, Hao, Dong, Yonghan, Lu, Zheming, Yu, Yunlong, Li, Yingming, Han, Jungong, and Zhang, Zhongfei
- Published
- 2024
- Full Text
- View/download PDF
15. Graph-Free Knowledge Distillation for Graph Neural Networks
- Author
-
Deng, Xiang and Zhang, Zhongfei
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Knowledge distillation (KD) transfers knowledge from a teacher network to a student by enforcing the student to mimic the outputs of the pretrained teacher on training data. However, data samples are not always accessible in many cases due to large data sizes, privacy, or confidentiality. Many efforts have been made on addressing this problem for convolutional neural networks (CNNs) whose inputs lie in a grid domain within a continuous space such as images and videos, but largely overlook graph neural networks (GNNs) that handle non-grid data with different topology structures within a discrete space. The inherent differences between their inputs make these CNN-based approaches not applicable to GNNs. In this paper, we propose to our best knowledge the first dedicated approach to distilling knowledge from a GNN without graph data. The proposed graph-free KD (GFKD) learns graph topology structures for knowledge transfer by modeling them with multivariate Bernoulli distribution. We then introduce a gradient estimator to optimize this framework. Essentially, the gradients w.r.t. graph structures are obtained by only using GNN forward-propagation without back-propagation, which means that GFKD is compatible with modern GNN libraries such as DGL and Geometric. Moreover, we provide the strategies for handling different types of prior knowledge in the graph data or the GNNs. Extensive experiments demonstrate that GFKD achieves the state-of-the-art performance for distilling knowledge from GNNs without training data., Comment: This version is to correct typos
- Published
- 2021
16. Multimedia Data Learning
- Author
-
Zhang, Zhongfei (Mark), Zhang, Ruofei (Bruce), Rokach, Lior, editor, Maimon, Oded, editor, and Shmueli, Erez, editor
- Published
- 2023
- Full Text
- View/download PDF
17. Dual-DIANet: A sharing-learnable multi-task network based on dense information aggregation
- Author
-
Lyu, Kejie, Li, Yingming, and Zhang, Zhongfei
- Published
- 2024
- Full Text
- View/download PDF
18. Zero-shot classification with unseen prototype learning
- Author
-
Ji, Zhong, Cui, Biying, Yu, Yunlong, Pang, Yanwei, and Zhang, Zhongfei
- Published
- 2023
- Full Text
- View/download PDF
19. Learning with Retrospection
- Author
-
Deng, Xiang and Zhang, Zhongfei
- Subjects
Computer Science - Machine Learning - Abstract
Deep neural networks have been successfully deployed in various domains of artificial intelligence, including computer vision and natural language processing. We observe that the current standard procedure for training DNNs discards all the learned information in the past epochs except the current learned weights. An interesting question is: is this discarded information indeed useless? We argue that the discarded information can benefit the subsequent training. In this paper, we propose learning with retrospection (LWR) which makes use of the learned information in the past epochs to guide the subsequent training. LWR is a simple yet effective training framework to improve accuracies, calibration, and robustness of DNNs without introducing any additional network parameters or inference cost, only with a negligible training overhead. Extensive experiments on several benchmark datasets demonstrate the superiority of LWR for training DNNs., Comment: Accepted to AAAI2021
- Published
- 2020
20. CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation
- Author
-
Zhou, Xingran, Zhang, Bo, Zhang, Ting, Zhang, Pan, Bao, Jianmin, Chen, Dong, Zhang, Zhongfei, and Wen, Fang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We present the full-resolution correspondence learning for cross-domain images, which aids image translation. We adopt a hierarchical strategy that uses the correspondence from coarse level to guide the fine levels. At each hierarchy, the correspondence can be efficiently computed via PatchMatch that iteratively leverages the matchings from the neighborhood. Within each PatchMatch iteration, the ConvGRU module is employed to refine the current correspondence considering not only the matchings of larger context but also the historic estimates. The proposed CoCosNet v2, a GRU-assisted PatchMatch approach, is fully differentiable and highly efficient. When jointly trained with image translation, full-resolution semantic correspondence can be established in an unsupervised manner, which in turn facilitates the exemplar-based image translation. Experiments on diverse translation tasks show that CoCosNet v2 performs considerably better than state-of-the-art literature on producing high-resolution images., Comment: CVPR 2021 oral presentation
- Published
- 2020
21. Deep Metric Learning with Spherical Embedding
- Author
-
Zhang, Dingyi, Li, Yingming, and Zhang, Zhongfei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Deep metric learning has attracted much attention in recent years, due to seamlessly combining the distance metric learning and deep neural network. Many endeavors are devoted to design different pair-based angular loss functions, which decouple the magnitude and direction information for embedding vectors and ensure the training and testing measure consistency. However, these traditional angular losses cannot guarantee that all the sample embeddings are on the surface of the same hypersphere during the training stage, which would result in unstable gradient in batch optimization and may influence the quick convergence of the embedding learning. In this paper, we first investigate the effect of the embedding norm for deep metric learning with angular distance, and then propose a spherical embedding constraint (SEC) to regularize the distribution of the norms. SEC adaptively adjusts the embeddings to fall on the same hypersphere and performs more balanced direction update. Extensive experiments on deep metric learning, face recognition, and contrastive self-supervised learning show that the SEC-based angular space learning strategy significantly improves the performance of the state-of-the-art., Comment: To appear in NeurIPS 2020. Code is available at https://github.com/Dyfine/SphericalEmbedding
- Published
- 2020
22. Sparsity-Control Ternary Weight Networks
- Author
-
Deng, Xiang and Zhang, Zhongfei
- Subjects
Computer Science - Machine Learning - Abstract
Deep neural networks (DNNs) have been widely and successfully applied to various applications, but they require large amounts of memory and computational power. This severely restricts their deployment on resource-limited devices. To address this issue, many efforts have been made on training low-bit weight DNNs. In this paper, we focus on training ternary weight \{-1, 0, +1\} networks which can avoid multiplications and dramatically reduce the memory and computation requirements. A ternary weight network can be considered as a sparser version of the binary weight counterpart by replacing some -1s or 1s in the binary weights with 0s, thus leading to more efficient inference but more memory cost. However, the existing approaches to training ternary weight networks cannot control the sparsity (i.e., percentage of 0s) of the ternary weights, which undermines the advantage of ternary weights. In this paper, we propose to our best knowledge the first sparsity-control approach (SCA) to training ternary weight networks, which is simply achieved by a weight discretization regularizer (WDR). SCA is different from all the existing regularizer-based approaches in that it can control the sparsity of the ternary weights through a controller $\alpha$ and does not rely on gradient estimators. We theoretically and empirically show that the sparsity of the trained ternary weights is positively related to $\alpha$. SCA is extremely simple, easy-to-implement, and is shown to consistently outperform the state-of-the-art approaches significantly over several benchmark datasets and even matches the performances of the full-precision weight counterparts., Comment: version 1 of SCA; accepted by journal "Neural Networks"; the final version could be a little different from this version
- Published
- 2020
23. SBAT: Video Captioning with Sparse Boundary-Aware Transformer
- Author
-
Jin, Tao, Huang, Siyu, Chen, Ming, Li, Yingming, and Zhang, Zhongfei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language ,Computer Science - Machine Learning ,Computer Science - Multimedia - Abstract
In this paper, we focus on the problem of applying the transformer structure to video captioning effectively. The vanilla transformer is proposed for uni-modal language generation task such as machine translation. However, video captioning is a multimodal learning problem, and the video features have much redundancy between different time steps. Based on these concerns, we propose a novel method called sparse boundary-aware transformer (SBAT) to reduce the redundancy in video representation. SBAT employs boundary-aware pooling operation for scores from multihead attention and selects diverse features from different scenarios. Also, SBAT includes a local correlation scheme to compensate for the local information loss brought by sparse operation. Based on SBAT, we further propose an aligned cross-modal encoding scheme to boost the multimodal interaction. Experimental results on two benchmark datasets show that SBAT outperforms the state-of-the-art methods under most of the metrics., Comment: Appearing at IJCAI 2020
- Published
- 2020
24. Multitask Non-Autoregressive Model for Human Motion Prediction
- Author
-
Li, Bin, Tian, Jian, Zhang, Zhongfei, Feng, Hailin, and Li, Xi
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Human motion prediction, which aims at predicting future human skeletons given the past ones, is a typical sequence-to-sequence problem. Therefore, extensive efforts have been continued on exploring different RNN-based encoder-decoder architectures. However, by generating target poses conditioned on the previously generated ones, these models are prone to bringing issues such as error accumulation problem. In this paper, we argue that such issue is mainly caused by adopting autoregressive manner. Hence, a novel Non-auToregressive Model (NAT) is proposed with a complete non-autoregressive decoding scheme, as well as a context encoder and a positional encoding module. More specifically, the context encoder embeds the given poses from temporal and spatial perspectives. The frame decoder is responsible for predicting each future pose independently. The positional encoding module injects positional signal into the model to indicate temporal order. Moreover, a multitask training paradigm is presented for both low-level human skeleton prediction and high-level human action recognition, resulting in the convincing improvement for the prediction task. Our approach is evaluated on Human3.6M and CMU-Mocap benchmarks and outperforms state-of-the-art autoregressive methods.
- Published
- 2020
- Full Text
- View/download PDF
25. Is the Meta-Learning Idea Able to Improve the Generalization of Deep Neural Networks on the Standard Supervised Learning?
- Author
-
Deng, Xiang and Zhang, Zhongfei
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,Statistics - Machine Learning - Abstract
Substantial efforts have been made on improving the generalization abilities of deep neural networks (DNNs) in order to obtain better performances without introducing more parameters. On the other hand, meta-learning approaches exhibit powerful generalization on new tasks in few-shot learning. Intuitively, few-shot learning is more challenging than the standard supervised learning as each target class only has a very few or no training samples. The natural question that arises is whether the meta-learning idea can be used for improving the generalization of DNNs on the standard supervised learning. In this paper, we propose a novel meta-learning based training procedure (MLTP) for DNNs and demonstrate that the meta-learning idea can indeed improve the generalization abilities of DNNs. MLTP simulates the meta-training process by considering a batch of training samples as a task. The key idea is that the gradient descent step for improving the current task performance should also improve a new task performance, which is ignored by the current standard procedure for training neural networks. MLTP also benefits from all the existing training techniques such as dropout, weight decay, and batch normalization. We evaluate MLTP by training a variety of small and large neural networks on three benchmark datasets, i.e., CIFAR-10, CIFAR-100, and Tiny ImageNet. The experimental results show a consistently improved generalization performance on all the DNNs with different sizes, which verifies the promise of MLTP and demonstrates that the meta-learning idea is indeed able to improve the generalization of DNNs on the standard supervised learning.
- Published
- 2020
26. MCPIP-1 knockdown enhances endothelial colony-forming cell angiogenesis via the TFRC/AKT/mTOR signaling pathway in the ischemic penumbra of MCAO mice
- Author
-
Zou, Xiaoxiong, Xie, Yu, Zhang, Zhongfei, Feng, Zhiming, Han, Jianbang, Ouyang, Qian, Hua, Shiting, Huang, Sixian, Li, Cong, Liu, Zhizheng, Cai, Yingqian, Zou, Yuxi, Tang, Yanping, Chen, Haijia, and Jiang, Xiaodan
- Published
- 2023
- Full Text
- View/download PDF
27. Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video Captioning
- Author
-
Jin, Tao, Huang, Siyu, Li, Yingming, and Zhang, Zhongfei
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition ,Statistics - Machine Learning - Abstract
This paper addresses the challenging task of video captioning which aims to generate descriptions for video data. Recently, the attention-based encoder-decoder structures have been widely used in video captioning. In existing literature, the attention weights are often built from the information of an individual modality, while, the association relationships between multiple modalities are neglected. Motivated by this observation, we propose a video captioning model with High-Order Cross-Modal Attention (HOCA) where the attention weights are calculated based on the high-order correlation tensor to capture the frame-level cross-modal interaction of different modalities sufficiently. Furthermore, we novelly introduce Low-Rank HOCA which adopts tensor decomposition to reduce the extremely large space requirement of HOCA, leading to a practical and efficient implementation in real-world applications. Experimental results on two benchmark datasets, MSVD and MSR-VTT, show that Low-rank HOCA establishes a new state-of-the-art., Comment: Accepted as a long paper at EMNLP 2019
- Published
- 2019
28. Episode-based Prototype Generating Network for Zero-Shot Learning
- Author
-
Yu, Yunlong, Ji, Zhong, Zhang, Zhongfei, and Han, Jungong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We introduce a simple yet effective episode-based training framework for zero-shot learning (ZSL), where the learning system requires to recognize unseen classes given only the corresponding class semantics. During training, the model is trained within a collection of episodes, each of which is designed to simulate a zero-shot classification task. Through training multiple episodes, the model progressively accumulates ensemble experiences on predicting the mimetic unseen classes, which will generalize well on the real unseen classes. Based on this training framework, we propose a novel generative model that synthesizes visual prototypes conditioned on the class semantic prototypes. The proposed model aligns the visual-semantic interactions by formulating both the visual prototype generation and the class semantic inference into an adversarial framework paired with a parameter-economic Multi-modal Cross-Entropy Loss to capture the discriminative information. Extensive experiments on four datasets under both traditional ZSL and generalized ZSL tasks show that our model outperforms the state-of-the-art approaches by large margins.
- Published
- 2019
29. A Semantics-Guided Class Imbalance Learning Model for Zero-Shot Classification
- Author
-
Ji, Zhong, Yu, Xuejie, Yu, Yunlong, Pang, Yanwei, and Zhang, Zhongfei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Zero-Shot Classification (ZSC) equips the learned model with the ability to recognize the visual instances from the novel classes via constructing the interactions between the visual and the semantic modalities. In contrast to the traditional image classification, ZSC is easily suffered from the class-imbalance issue since it is more concerned with the class-level knowledge transfer capability. In the real world, the class samples follow a long-tailed distribution, and the discriminative information in the sample-scarce seen classes is hard to be transferred to the related unseen classes in the traditional batch-based training manner, which degrades the overall generalization ability a lot. Towards alleviating the class imbalance issue in ZSC, we propose a sample-balanced training process to encourage all training classes to contribute equally to the learned model. Specifically, we randomly select the same number of images from each class across all training classes to form a training batch to ensure that the sample-scarce classes contribute equally as those classes with sufficient samples during each iteration. Considering that the instances from the same class differ in class representativeness, we further develop an efficient semantics-guided feature fusion model to obtain discriminative class visual prototype for the following visual-semantic interaction process via distributing different weights to the selected samples based on their class representativeness. Extensive experiments on three imbalanced ZSC benchmark datasets for both the Traditional ZSC (TZSC) and the Generalized ZSC (GZSC) tasks demonstrate our approach achieves promising results especially for the unseen categories those are closely related to the sample-scarce seen categories.
- Published
- 2019
30. Text Guided Person Image Synthesis
- Author
-
Zhou, Xingran, Huang, Siyu, Li, Bin, Li, Yingming, Li, Jiachen, and Zhang, Zhongfei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper presents a novel method to manipulate the visual appearance (pose and attribute) of a person image according to natural language descriptions. Our method can be boiled down to two stages: 1) text guided pose generation and 2) visual appearance transferred image synthesis. In the first stage, our method infers a reasonable target human pose based on the text. In the second stage, our method synthesizes a realistic and appearance transferred person image according to the text in conjunction with the target pose. Our method extracts sufficient information from the text and establishes a mapping between the image space and the language space, making generating and editing images corresponding to the description possible. We conduct extensive experiments to reveal the effectiveness of our method, as well as using the VQA Perceptual Score as a metric for evaluating the method. It shows for the first time that we can automatically edit the person image from the natural language descriptions., Comment: To appear at CVPR 2019
- Published
- 2019
31. Training a Lightweight ViT Network for Image Retrieval
- Author
-
Zhang, Hanqi, Yu, Yunlong, Li, Yingming, Zhang, Zhongfei, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Khanna, Sankalp, editor, Cao, Jian, editor, Bai, Quan, editor, and Xu, Guandong, editor
- Published
- 2022
- Full Text
- View/download PDF
32. Personalized Education: Blind Knowledge Distillation
- Author
-
Deng, Xiang, Zheng, Jian, Zhang, Zhongfei, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Avidan, Shai, editor, Brostow, Gabriel, editor, Cissé, Moustapha, editor, Farinella, Giovanni Maria, editor, and Hassner, Tal, editor
- Published
- 2022
- Full Text
- View/download PDF
33. Cross-relation Cross-bag Attention for Distantly-supervised Relation Extraction
- Author
-
Yuan, Yujin, Liu, Liyuan, Tang, Siliang, Zhang, Zhongfei, Zhuang, Yueting, Pu, Shiliang, Wu, Fei, and Ren, Xiang
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C$^2$SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques., Comment: AAAI 2019
- Published
- 2018
34. Perceiving Physical Equation by Observing Visual Scenarios
- Author
-
Huang, Siyu, Cheng, Zhi-Qi, Li, Xi, Wu, Xiao, Zhang, Zhongfei, and Hauptmann, Alexander
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Inferring universal laws of the environment is an important ability of human intelligence as well as a symbol of general AI. In this paper, we take a step toward this goal such that we introduce a new challenging problem of inferring invariant physical equation from visual scenarios. For instance, teaching a machine to automatically derive the gravitational acceleration formula by watching a free-falling object. To tackle this challenge, we present a novel pipeline comprised of an Observer Engine and a Physicist Engine by respectively imitating the actions of an observer and a physicist in the real world. Generally, the Observer Engine watches the visual scenarios and then extracting the physical properties of objects. The Physicist Engine analyses these data and then summarizing the inherent laws of object dynamics. Specifically, the learned laws are expressed by mathematical equations such that they are more interpretable than the results given by common probabilistic models. Experiments on synthetic videos have shown that our pipeline is able to discover physical equations on various physical worlds with different visual appearances., Comment: NIPS 2018 Workshop on Modeling the Physical World
- Published
- 2018
35. Bi-Adversarial Auto-Encoder for Zero-Shot Learning
- Author
-
Yu, Yunlong, Ji, Zhong, Pang, Yanwei, Guo, Jichang, Zhang, Zhongfei, and Wu, Fei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Existing generative Zero-Shot Learning (ZSL) methods only consider the unidirectional alignment from the class semantics to the visual features while ignoring the alignment from the visual features to the class semantics, which fails to construct the visual-semantic interactions well. In this paper, we propose to synthesize visual features based on an auto-encoder framework paired with bi-adversarial networks respectively for visual and semantic modalities to reinforce the visual-semantic interactions with a bi-directional alignment, which ensures the synthesized visual features to fit the real visual distribution and to be highly related to the semantics. The encoder aims at synthesizing real-like visual features while the decoder forces both the real and the synthesized visual features to be more related to the class semantics. To further capture the discriminative information of the synthesized visual features, both the real and synthesized visual features are forced to be classified into the correct classes via a classification network. Experimental results on four benchmark datasets show that the proposed approach is particularly competitive on both the traditional ZSL and the generalized ZSL tasks.
- Published
- 2018
36. Stacked Pooling: Improving Crowd Counting by Boosting Scale Invariance
- Author
-
Huang, Siyu, Li, Xi, Cheng, Zhi-Qi, Zhang, Zhongfei, and Hauptmann, Alexander
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this work, we explore the cross-scale similarity in crowd counting scenario, in which the regions of different scales often exhibit high visual similarity. This feature is universal both within an image and across different images, indicating the importance of scale invariance of a crowd counting model. Motivated by this, in this paper we propose simple but effective variants of pooling module, i.e., multi-kernel pooling and stacked pooling, to boost the scale invariance of convolutional neural networks (CNNs), benefiting much the crowd density estimation and counting. Specifically, the multi-kernel pooling comprises of pooling kernels with multiple receptive fields to capture the responses at multi-scale local ranges. The stacked pooling is an equivalent form of multi-kernel pooling, while, it reduces considerable computing cost. Our proposed pooling modules do not introduce extra parameters into model and can easily take place of the vanilla pooling layer in implementation. In empirical study on two benchmark crowd counting datasets, the stacked pooling beats the vanilla pooling layer in most cases., Comment: The code is available at http://github.com/siyuhuang/crowdcount-stackpool
- Published
- 2018
37. Stacked Semantic-Guided Attention Model for Fine-Grained Zero-Shot Learning
- Author
-
Yu, Yunlong, Ji, Zhong, Fu, Yanwei, Guo, Jichang, Pang, Yanwei, and Zhang, Zhongfei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Zero-Shot Learning (ZSL) is achieved via aligning the semantic relationships between the global image feature vector and the corresponding class semantic descriptions. However, using the global features to represent fine-grained images may lead to sub-optimal results since they neglect the discriminative differences of local regions. Besides, different regions contain distinct discriminative information. The important regions should contribute more to the prediction. To this end, we propose a novel stacked semantics-guided attention (S2GA) model to obtain semantic relevant features by using individual class semantic features to progressively guide the visual features to generate an attention map for weighting the importance of different local regions. Feeding both the integrated visual features and the class semantic features into a multi-class classification architecture, the proposed framework can be trained end-to-end. Extensive experimental results on CUB and NABird datasets show that the proposed approach has a consistent improvement on both fine-grained zero-shot classification and retrieval tasks.
- Published
- 2018
38. GNAS: A Greedy Neural Architecture Search Method for Multi-Attribute Learning
- Author
-
Huang, Siyu, Li, Xi, Cheng, Zhi-Qi, Zhang, Zhongfei, and Hauptmann, Alexander
- Subjects
Computer Science - Neural and Evolutionary Computing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
A key problem in deep multi-attribute learning is to effectively discover the inter-attribute correlation structures. Typically, the conventional deep multi-attribute learning approaches follow the pipeline of manually designing the network architectures based on task-specific expertise prior knowledge and careful network tunings, leading to the inflexibility for various complicated scenarios in practice. Motivated by addressing this problem, we propose an efficient greedy neural architecture search approach (GNAS) to automatically discover the optimal tree-like deep architecture for multi-attribute learning. In a greedy manner, GNAS divides the optimization of global architecture into the optimizations of individual connections step by step. By iteratively updating the local architectures, the global tree-like architecture gets converged where the bottom layers are shared across relevant attributes and the branches in top layers more encode attribute-specific features. Experiments on three benchmark multi-attribute datasets show the effectiveness and compactness of neural architectures derived by GNAS, and also demonstrate the efficiency of GNAS in searching neural architectures., Comment: ACM MM 2018 (Oral)
- Published
- 2018
39. Multi-Channel Pyramid Person Matching Network for Person Re-Identification
- Author
-
Mao, Chaojie, Li, Yingming, Zhang, Yaqing, Zhang, Zhongfei, and Li, Xi
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this work, we present a Multi-Channel deep convolutional Pyramid Person Matching Network (MC-PPMN) based on the combination of the semantic-components and the color-texture distributions to address the problem of person re-identification. In particular, we learn separate deep representations for semantic-components and color-texture distributions from two person images and then employ pyramid person matching network (PPMN) to obtain correspondence representations. These correspondence representations are fused to perform the re-identification task. Further, the proposed framework is optimized via a unified end-to-end deep learning scheme. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our approach against the state-of-the-art literature, especially on the rank-1 recognition rate., Comment: 9 pages, 5 figures, 7 tables and accepted by the 32nd AAAI Conference on Artificial Intelligence
- Published
- 2018
40. Pyramid Person Matching Network for Person Re-identification
- Author
-
Mao, Chaojie, Li, Yingming, Zhang, Zhongfei, Zhang, Yaqing, and Li, Xi
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this work, we present a deep convolutional pyramid person matching network (PPMN) with specially designed Pyramid Matching Module to address the problem of person re-identification. The architecture takes a pair of RGB images as input, and outputs a similiarity value indicating whether the two input images represent the same person or not. Based on deep convolutional neural networks, our approach first learns the discriminative semantic representation with the semantic-component-aware features for persons and then employs the Pyramid Matching Module to match the common semantic-components of persons, which is robust to the variation of spatial scales and misalignment of locations posed by viewpoint changes. The above two processes are jointly optimized via a unified end-to-end deep learning scheme. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our approach against the state-of-the-art approaches, especially on the rank-1 recognition rate., Comment: 11pages, 3 figures, 4 tables and accepted by Proceedings of 9th Asian Conference on Machine Learning (ACML2017) JMLR Workshop and Conference Proceedings, vol. 77, 2017
- Published
- 2018
41. Hierarchical Correlations Replay for Continual Learning
- Author
-
Wang, Qiang, Liu, Jiayi, Ji, Zhong, Pang, Yanwei, and Zhang, Zhongfei
- Published
- 2022
- Full Text
- View/download PDF
42. Local spatial alignment network for few-shot learning
- Author
-
Yu, Yunlong, Zhang, Dingyi, Wang, Sidi, Ji, Zhong, and Zhang, Zhongfei
- Published
- 2022
- Full Text
- View/download PDF
43. Indole-3-propionic acid alleviates ischemic brain injury in a mouse middle cerebral artery occlusion model
- Author
-
Xie, Yu, Zou, Xiaoxiong, Han, Jianbang, Zhang, Zhongfei, Feng, Zhiming, Ouyang, Qian, Hua, Shiting, Liu, Zhizheng, Li, Cong, Cai, Yingqian, Zou, Yuxi, Tang, Yanping, and Jiang, Xiaodan
- Published
- 2022
- Full Text
- View/download PDF
44. JSENet: A deep convolutional neural network for joint image super-resolution and enhancement
- Author
-
Lyu, Kejie, Pan, Sicheng, Li, Yingming, and Zhang, Zhongfei
- Published
- 2022
- Full Text
- View/download PDF
45. A Deep Learning Approach for Expert Identification in Question Answering Communities
- Author
-
Zheng, Chen, Zhai, Shuangfei, and Zhang, Zhongfei
- Subjects
Computer Science - Computation and Language - Abstract
In this paper, we describe an effective convolutional neural network framework for identifying the expert in question answering community. This approach uses the convolutional neural network and combines user feature representations with question feature representations to compute scores that the user who gets the highest score is the expert on this question. Unlike prior work, this method does not measure expert based on measure answer content quality to identify the expert but only require question sentence and user embedding feature to identify the expert. Remarkably, Our model can be applied to different languages and different domains. The proposed framework is trained on two datasets, The first dataset is Stack Overflow and the second one is Zhihu. The Top-1 accuracy results of our experiments show that our framework outperforms the best baseline framework for expert identification., Comment: 7 pages. arXiv admin note: text overlap with arXiv:1403.6652 by other authors
- Published
- 2017
46. Text Coherence Analysis Based on Deep Neural Network
- Author
-
Cui, Baiyun, Li, Yingming, Zhang, Yaqing, and Zhang, Zhongfei
- Subjects
Computer Science - Computation and Language - Abstract
In this paper, we propose a novel deep coherence model (DCM) using a convolutional neural network architecture to capture the text coherence. The text coherence problem is investigated with a new perspective of learning sentence distributional representation and text coherence modeling simultaneously. In particular, the model captures the interactions between sentences by computing the similarities of their distributional representations. Further, it can be easily trained in an end-to-end fashion. The proposed model is evaluated on a standard Sentence Ordering task. The experimental results demonstrate its effectiveness and promise in coherence assessment showing a significant improvement over the state-of-the-art by a wide margin., Comment: 4 pages, 2 figures, CIKM 2017
- Published
- 2017
- Full Text
- View/download PDF
47. Transductive Zero-Shot Learning with a Self-training dictionary approach
- Author
-
Yu, Yunlong, Ji, Zhong, Li, Xi, Guo, Jichang, Zhang, Zhongfei, Ling, Haibin, and Wu, Fei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
As an important and challenging problem in computer vision, zero-shot learning (ZSL) aims at automatically recognizing the instances from unseen object classes without training data. To address this problem, ZSL is usually carried out in the following two aspects: 1) capturing the domain distribution connections between seen classes data and unseen classes data; and 2) modeling the semantic interactions between the image feature space and the label embedding space. Motivated by these observations, we propose a bidirectional mapping based semantic relationship modeling scheme that seeks for crossmodal knowledge transfer by simultaneously projecting the image features and label embeddings into a common latent space. Namely, we have a bidirectional connection relationship that takes place from the image feature space to the latent space as well as from the label embedding space to the latent space. To deal with the domain shift problem, we further present a transductive learning approach that formulates the class prediction problem in an iterative refining process, where the object classification capacity is progressively reinforced through bootstrapping-based model updating over highly reliable instances. Experimental results on three benchmark datasets (AwA, CUB and SUN) demonstrate the effectiveness of the proposed approach against the state-of-the-art approaches.
- Published
- 2017
48. Sparsity-control ternary weight networks
- Author
-
Deng, Xiang and Zhang, Zhongfei
- Published
- 2022
- Full Text
- View/download PDF
49. Mesenchymal stem cells protect against TBI-induced pyroptosis in vivo and in vitro through TSG-6
- Author
-
Feng, Zhiming, Hua, Shiting, Li, Wangan, Han, Jianbang, Li, Feng, Chen, Haijia, Zhang, Zhongfei, Xie, Yu, Ouyang, Qian, Zou, Xiaoxiong, Liu, Zhizheng, Li, Cong, Huang, Sixian, Lai, Zelin, Cai, Xiaolin, Cai, Yingqian, Zou, Yuxi, Tang, Yanping, and Jiang, Xiaodan
- Published
- 2022
- Full Text
- View/download PDF
50. A new knowledge-guided multi-objective optimisation for the multi-AGV dispatching problem in dynamic production environments.
- Author
-
Liu, Lei, Qu, Ting, Thürer, Matthias, Ma, Lin, Zhang, Zhongfei, and Yuan, Mingze
- Subjects
PARTICLE swarm optimization ,DISTRIBUTION (Probability theory) ,EVOLUTIONARY algorithms ,AUTOMATED guided vehicle systems ,SATISFACTION ,CONSTRAINT satisfaction - Abstract
The efficiency of material supply for workstations using Automatic Guided Vehicles (AGVs) is largely determined by the performance of the AGV dispatching scheme. This paper proposes a new solution approach for the AGV dispatching problem (AGVDP) for material replenishment in a general manufacturing workshop where workstations are in a matrix layout, and where uncertainty in replenishment time of workstations and stochastic unloading efficiencies of AGVs are dynamic contextual factors. We first extend the literature proposing a mixed integer optimisation model with a delivery satisfaction soft constraint of material orders and two objectives: transportation costs and delivery time deviation. We then develop a new knowledge-guided estimation of distribution algorithm with delivery satisfaction evaluation for solving the model. Our algorithm fuses three knowledge-guided strategies to enhance optimisation capabilities at its respective execution stages. Comprehensive numerical experiments with instances built from a real-world scenario validate the proposed model and algorithm. Results demonstrate that the new algorithm outperforms three popular multi-objective evolutionary algorithms, a discrete version of a recent multi-objective particle swarm optimisation, and a multi-objective estimation of distribution algorithm. Findings of this work provide major implications for workshop management and algorithm design. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.