1,393 results on '"Peng, Yuxin"'
Search Results
52. GmCOL4-GmZTL1 interaction co-regulates GmSBH1 to improve seed deterioration under high temperature and humidity stress and affect leaf growth and development
- Author
-
Mu, Kebin, primary, Shu, Yingjie, additional, Chen, Ming, additional, Chen, Keke, additional, Peng, Yuxin, additional, Hu, Huimin, additional, Shen, Yingzi, additional, Zhang, Xi, additional, Zhuang, Lifang, additional, and Ma, Hao, additional
- Published
- 2024
- Full Text
- View/download PDF
53. Gender-Based Double Standards and Inequalities in Online Culture: Objectification of Feminine Sexuality, Self-Expression and Appearance, and Cybershaming
- Author
-
Peng, Yuxin, primary
- Published
- 2024
- Full Text
- View/download PDF
54. Multi-Objective Optimal Operation Decision for Parallel Reservoirs Based on NSGA-II-TOPSIS-GCA Algorithm: A Case Study in the Upper Reach of Hanjiang River
- Author
-
Wei, Na, primary, Peng, Yuxin, additional, Lu, Kunming, additional, Zhou, Guixing, additional, Guo, Xingtao, additional, and Niu, Minghui, additional
- Published
- 2024
- Full Text
- View/download PDF
55. SlCRCa is a key D‐class gene controlling ovule fate determination in tomato
- Author
-
Wu, Junqing, primary, Li, Pengxue, additional, Zhu, Danyang, additional, Ma, Haochuan, additional, Li, Meng, additional, Lai, Yixuan, additional, Peng, Yuxin, additional, Li, Haixiao, additional, Li, Shuang, additional, Wei, Jinbo, additional, Bian, Xinxin, additional, Rahman, Abidur, additional, and Wu, Shuang, additional
- Published
- 2024
- Full Text
- View/download PDF
56. Front Cover: Covalent Tethering of Cobalt Porphyrins on Phenolic Resins for Electrocatalytic Oxygen Reduction and Evolution Reactions (ChemPhysChem 7/2024)
- Author
-
Kong, Jiafan, primary, Qin, Haonan, additional, Yang, Luna, additional, Zhang, Jieling, additional, Peng, Yuxin, additional, Gao, Yimei, additional, Wu, Yizhen, additional, Nam, Wonwoo, additional, and Cao, Rui, additional
- Published
- 2024
- Full Text
- View/download PDF
57. Negatives Make a Positive: An Embarrassingly Simple Approach to Semi-Supervised Few-Shot Learning
- Author
-
Wei, Xiu-Shen, primary, Xu, He-Yang, additional, Yang, Zhiwen, additional, Duan, Chen-Long, additional, and Peng, Yuxin, additional
- Published
- 2024
- Full Text
- View/download PDF
58. Semantic association enhancement transformer with relative position for image captioning
- Author
-
Jia, Xin, Wang, Yunbo, Peng, Yuxin, and Chen, Shengyong
- Published
- 2022
- Full Text
- View/download PDF
59. Text-to-image Synthesis via Symmetrical Distillation Networks
- Author
-
Yuan, Mingkuan and Peng, Yuxin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Text-to-image synthesis aims to automatically generate images according to text descriptions given by users, which is a highly challenging task. The main issues of text-to-image synthesis lie in two gaps: the heterogeneous and homogeneous gaps. The heterogeneous gap is between the high-level concepts of text descriptions and the pixel-level contents of images, while the homogeneous gap exists between synthetic image distributions and real image distributions. For addressing these problems, we exploit the excellent capability of generic discriminative models (e.g. VGG19), which can guide the training process of a new generative model on multiple levels to bridge the two gaps. The high-level representations can teach the generative model to extract necessary visual information from text descriptions, which can bridge the heterogeneous gap. The mid-level and low-level representations can lead it to learn structures and details of images respectively, which relieves the homogeneous gap. Therefore, we propose Symmetrical Distillation Networks (SDN) composed of a source discriminative model as "teacher" and a target generative model as "student". The target generative model has a symmetrical structure with the source discriminative model, in order to transfer hierarchical knowledge accessibly. Moreover, we decompose the training process into two stages with different distillation paradigms for promoting the performance of the target generative model. Experiments on two widely-used datasets are conducted to verify the effectiveness of our proposed SDN., Comment: 9 pages, accepted as an oral paper of ACM Multimedia 2018
- Published
- 2018
60. Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification
- Author
-
Zhang, Chenrui and Peng, Yuxin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Video representation learning is a vital problem for classification task. Recently, a promising unsupervised paradigm termed self-supervised learning has emerged, which explores inherent supervisory signals implied in massive data for feature learning via solving auxiliary tasks. However, existing methods in this regard suffer from two limitations when extended to video classification. First, they focus only on a single task, whereas ignoring complementarity among different task-specific features and thus resulting in suboptimal video representation. Second, high computational and memory cost hinders their application in real-world scenarios. In this paper, we propose a graph-based distillation framework to address these problems: (1) We propose logits graph and representation graph to transfer knowledge from multiple self-supervised tasks, where the former distills classifier-level knowledge by solving a multi-distribution joint matching problem, and the latter distills internal feature knowledge from pairwise ensembled representations with tackling the challenge of heterogeneity among different features; (2) The proposal that adopts a teacher-student framework can reduce the redundancy of knowledge learnt from teachers dramatically, leading to a lighter student model that solves classification task more efficiently. Experimental results on 3 video datasets validate that our proposal not only helps learn better video representation but also compress model for faster inference., Comment: 7 pages, accepted by International Joint Conference on Artificial Intelligence (IJCAI) 2018
- Published
- 2018
61. Visual Data Synthesis via GAN for Zero-Shot Video Classification
- Author
-
Zhang, Chenrui and Peng, Yuxin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Zero-Shot Learning (ZSL) in video classification is a promising research direction, which aims to tackle the challenge from explosive growth of video categories. Most existing methods exploit seen-to-unseen correlation via learning a projection between visual and semantic spaces. However, such projection-based paradigms cannot fully utilize the discriminative information implied in data distribution, and commonly suffer from the information degradation issue caused by "heterogeneity gap". In this paper, we propose a visual data synthesis framework via GAN to address these problems. Specifically, both semantic knowledge and visual distribution are leveraged to synthesize video feature of unseen categories, and ZSL can be turned into typical supervised problem with the synthetic features. First, we propose multi-level semantic inference to boost video feature synthesis, which captures the discriminative information implied in joint visual-semantic distribution via feature-level and label-level semantic inference. Second, we propose Matching-aware Mutual Information Correlation to overcome information degradation issue, which captures seen-to-unseen correlation in matched and mismatched visual-semantic pairs by mutual information, providing the zero-shot synthesis procedure with robust guidance signals. Experimental results on four video datasets demonstrate that our approach can improve the zero-shot video classification performance significantly., Comment: 7 pages, accepted by International Joint Conference on Artificial Intelligence (IJCAI) 2018
- Published
- 2018
62. Cross-media Multi-level Alignment with Relation Attention Network
- Author
-
Qi, Jinwei, Peng, Yuxin, and Yuan, Yuxin
- Subjects
Computer Science - Multimedia ,Computer Science - Information Retrieval - Abstract
With the rapid growth of multimedia data, such as image and text, it is a highly challenging problem to effectively correlate and retrieve the data of different media types. Naturally, when correlating an image with textual description, people focus on not only the alignment between discriminative image regions and key words, but also the relations lying in the visual and textual context. Relation understanding is essential for cross-media correlation learning, which is ignored by prior cross-media retrieval works. To address the above issue, we propose Cross-media Relation Attention Network (CRAN) with multi-level alignment. First, we propose visual-language relation attention model to explore both fine-grained patches and their relations of different media types. We aim to not only exploit cross-media fine-grained local information, but also capture the intrinsic relation information, which can provide complementary hints for correlation learning. Second, we propose cross-media multi-level alignment to explore global, local and relation alignments across different media types, which can mutually boost to learn more precise cross-media correlation. We conduct experiments on 2 cross-media datasets, and compare with 10 state-of-the-art methods to verify the effectiveness of proposed approach., Comment: 7 pages, accepted by International Joint Conference on Artificial Intelligence (IJCAI) 2018
- Published
- 2018
63. Deep Cross-media Knowledge Transfer
- Author
-
Huang, Xin and Peng, Yuxin
- Subjects
Computer Science - Multimedia - Abstract
Cross-media retrieval is a research hotspot in multimedia area, which aims to perform retrieval across different media types such as image and text. The performance of existing methods usually relies on labeled data for model training. However, cross-media data is very labor consuming to collect and label, so how to transfer valuable knowledge in existing data to new data is a key problem towards application. For achieving the goal, this paper proposes deep cross-media knowledge transfer (DCKT) approach, which transfers knowledge from a large-scale cross-media dataset to promote the model training on another small-scale cross-media dataset. The main contributions of DCKT are: (1) Two-level transfer architecture is proposed to jointly minimize the media-level and correlation-level domain discrepancies, which allows two important and complementary aspects of knowledge to be transferred: intra-media semantic and inter-media correlation knowledge. It can enrich the training information and boost the retrieval accuracy. (2) Progressive transfer mechanism is proposed to iteratively select training samples with ascending transfer difficulties, via the metric of cross-media domain consistency with adaptive feedback. It can drive the transfer process to gradually reduce vast cross-media domain discrepancy, so as to enhance the robustness of model training. For verifying the effectiveness of DCKT, we take the largescale dataset XMediaNet as source domain, and 3 widelyused datasets as target domain for cross-media retrieval. Experimental results show that DCKT achieves promising improvement on retrieval accuracy., Comment: 10 pages, accepted by IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
- Published
- 2018
64. Deep Reinforcement Learning for Image Hashing
- Author
-
Peng, Yuxin, Zhang, Jian, and Ye, Zhaoda
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep hashing methods have received much attention recently, which achieve promising results by taking advantage of the strong representation power of deep networks. However, most existing deep hashing methods learn a whole set of hashing functions independently, while ignore the correlations between different hashing functions that can promote the retrieval accuracy greatly. Inspired by the sequential decision ability of deep reinforcement learning, we propose a new Deep Reinforcement Learning approach for Image Hashing (DRLIH). Our proposed DRLIH approach models the hashing learning problem as a sequential decision process, which learns each hashing function by correcting the errors imposed by previous ones and promotes retrieval accuracy. To the best of our knowledge, this is the first work to address hashing problem from deep reinforcement learning perspective. The main contributions of our proposed DRLIH approach can be summarized as follows: (1) We propose a deep reinforcement learning hashing network. In the proposed network, we utilize recurrent neural network (RNN) as agents to model the hashing functions, which take actions of projecting images into binary codes sequentially, so that the current hashing function learning can take previous hashing functions' error into account. (2) We propose a sequential learning strategy based on proposed DRLIH. We define the state as a tuple of internal features of RNN's hidden layers and image features, which can reflect history decisions made by the agents. We also propose an action group method to enhance the correlation of hash functions in the same group. Experiments on three widely-used datasets demonstrate the effectiveness of our proposed DRLIH approach., Comment: 12 pages, submitted to IEEE Transactions on Multimedia
- Published
- 2018
65. SCH-GAN: Semi-supervised Cross-modal Hashing by Generative Adversarial Network
- Author
-
Zhang, Jian, Peng, Yuxin, and Yuan, Mingkuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Cross-modal hashing aims to map heterogeneous multimedia data into a common Hamming space, which can realize fast and flexible retrieval across different modalities. Supervised cross-modal hashing methods have achieved considerable progress by incorporating semantic side information. However, they mainly have two limitations: (1) Heavily rely on large-scale labeled cross-modal training data which are labor intensive and hard to obtain. (2) Ignore the rich information contained in the large amount of unlabeled data across different modalities, especially the margin examples that are easily to be incorrectly retrieved, which can help to model the correlations. To address these problems, in this paper we propose a novel Semi-supervised Cross-Modal Hashing approach by Generative Adversarial Network (SCH-GAN). We aim to take advantage of GAN's ability for modeling data distributions to promote cross-modal hashing learning in an adversarial way. The main contributions can be summarized as follows: (1) We propose a novel generative adversarial network for cross-modal hashing. In our proposed SCH-GAN, the generative model tries to select margin examples of one modality from unlabeled data when giving a query of another modality. While the discriminative model tries to distinguish the selected examples and true positive examples of the query. These two models play a minimax game so that the generative model can promote the hashing performance of discriminative model. (2) We propose a reinforcement learning based algorithm to drive the training of proposed SCH-GAN. The generative model takes the correlation score predicted by discriminative model as a reward, and tries to select the examples close to the margin to promote discriminative model by maximizing the margin between positive and negative data. Experiments on 3 widely-used datasets verify the effectiveness of our proposed approach., Comment: 12 pages, submitted to IEEE Transactions on Cybernetics
- Published
- 2018
66. Unsupervised Generative Adversarial Cross-modal Hashing
- Author
-
Zhang, Jian, Peng, Yuxin, and Yuan, Mingkuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Cross-modal hashing aims to map heterogeneous multimedia data into a common Hamming space, which can realize fast and flexible retrieval across different modalities. Unsupervised cross-modal hashing is more flexible and applicable than supervised methods, since no intensive labeling work is involved. However, existing unsupervised methods learn hashing functions by preserving inter and intra correlations, while ignoring the underlying manifold structure across different modalities, which is extremely helpful to capture meaningful nearest neighbors of different modalities for cross-modal retrieval. To address the above problem, in this paper we propose an Unsupervised Generative Adversarial Cross-modal Hashing approach (UGACH), which makes full use of GAN's ability for unsupervised representation learning to exploit the underlying manifold structure of cross-modal data. The main contributions can be summarized as follows: (1) We propose a generative adversarial network to model cross-modal hashing in an unsupervised fashion. In the proposed UGACH, given a data of one modality, the generative model tries to fit the distribution over the manifold structure, and select informative data of another modality to challenge the discriminative model. The discriminative model learns to distinguish the generated data and the true positive data sampled from correlation graph to achieve better retrieval accuracy. These two models are trained in an adversarial way to improve each other and promote hashing function learning. (2) We propose a correlation graph based approach to capture the underlying manifold structure across different modalities, so that data of different modalities but within the same manifold can have smaller Hamming distance and promote retrieval accuracy. Extensive experiments compared with 6 state-of-the-art methods verify the effectiveness of our proposed approach., Comment: 8 pages, accepted by 32th AAAI Conference on Artificial Intelligence (AAAI), 2018
- Published
- 2017
67. Two-stream Collaborative Learning with Spatial-Temporal Attention for Video Classification
- Author
-
Peng, Yuxin, Zhao, Yunzhen, and Zhang, Junchao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Video classification is highly important with wide applications, such as video search and intelligent surveillance. Video naturally consists of static and motion information, which can be represented by frame and optical flow. Recently, researchers generally adopt the deep networks to capture the static and motion information \textbf{\emph{separately}}, which mainly has two limitations: (1) Ignoring the coexistence relationship between spatial and temporal attention, while they should be jointly modelled as the spatial and temporal evolutions of video, thus discriminative video features can be extracted.(2) Ignoring the strong complementarity between static and motion information coexisted in video, while they should be collaboratively learned to boost each other. For addressing the above two limitations, this paper proposes the approach of two-stream collaborative learning with spatial-temporal attention (TCLSTA), which consists of two models: (1) Spatial-temporal attention model: The spatial-level attention emphasizes the salient regions in frame, and the temporal-level attention exploits the discriminative frames in video. They are jointly learned and mutually boosted to learn the discriminative static and motion features for better classification performance. (2) Static-motion collaborative model: It not only achieves mutual guidance on static and motion information to boost the feature learning, but also adaptively learns the fusion weights of static and motion streams, so as to exploit the strong complementarity between static and motion information to promote video classification. Experiments on 4 widely-used datasets show that our TCLSTA approach achieves the best performance compared with more than 10 state-of-the-art methods., Comment: 14 pages, accepted by IEEE Transactions on Circuits and Systems for Video Technology
- Published
- 2017
68. CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning
- Author
-
Peng, Yuxin, Qi, Jinwei, and Yuan, Yuxin
- Subjects
Computer Science - Multimedia ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Learning - Abstract
It is known that the inconsistent distribution and representation of different modalities, such as image and text, cause the heterogeneity gap that makes it challenging to correlate such heterogeneous data. Generative adversarial networks (GANs) have shown its strong ability of modeling data distribution and learning discriminative representation, existing GANs-based works mainly focus on generative problem to generate new data. We have different goal, aim to correlate heterogeneous data, by utilizing the power of GANs to model cross-modal joint distribution. Thus, we propose Cross-modal GANs to learn discriminative common representation for bridging heterogeneity gap. The main contributions are: (1) Cross-modal GANs architecture is proposed to model joint distribution over data of different modalities. The inter-modality and intra-modality correlation can be explored simultaneously in generative and discriminative models. Both of them beat each other to promote cross-modal correlation learning. (2) Cross-modal convolutional autoencoders with weight-sharing constraint are proposed to form generative model. They can not only exploit cross-modal correlation for learning common representation, but also preserve reconstruction information for capturing semantic consistency within each modality. (3) Cross-modal adversarial mechanism is proposed, which utilizes two kinds of discriminative models to simultaneously conduct intra-modality and inter-modality discrimination. They can mutually boost to make common representation more discriminative by adversarial training process. To the best of our knowledge, our proposed CM-GANs approach is the first to utilize GANs to perform cross-modal common representation learning. Experiments are conducted to verify the performance of our proposed approach on cross-modal retrieval paradigm, compared with 10 methods on 3 cross-modal datasets.
- Published
- 2017
69. Hand gesture recognition framework using a lie group based spatio-temporal recurrent network with multiple hand-worn motion sensors
- Author
-
Wang, Shu, Wang, Aiguo, Ran, Mengyuan, Liu, Li, Peng, Yuxin, Liu, Ming, Su, Guoxin, Alhudhaif, Adi, Alenezi, Fayadh, and Alnaim, Norah
- Published
- 2022
- Full Text
- View/download PDF
70. Learning conditional photometric stereo with high-resolution features
- Author
-
Ju, Yakun, Peng, Yuxin, Jian, Muwei, Gao, Feng, and Dong, Junyu
- Published
- 2022
- Full Text
- View/download PDF
71. Subtercola endophyticus sp. nov., a cold-adapted bacterium isolated from Abies koreana
- Author
-
Jiang, Lingmin, Peng, Yuxin, Seo, Jiyoon, Jeon, Doeun, Jo, Mi Gyeong, Lee, Ju Huck, Jeong, Jae Cheol, Kim, Cha Young, Park, Hyeong Cheol, and Lee, Jiyoung
- Published
- 2022
- Full Text
- View/download PDF
72. Luminescent binuclear Zinc(II) organic framework as bifunctional water-stable chemosensor for efficient detection of antibiotics and Cr(VI) anions in water
- Author
-
Fan, Liming, Zhao, Dongsheng, Li, Bei, Wang, Feng, Deng, Yuxin, Peng, Yuxin, Wang, Xin, and Zhang, Xiutang
- Published
- 2022
- Full Text
- View/download PDF
73. Fast Fine-grained Image Classification via Weakly Supervised Discriminative Localization
- Author
-
He, Xiangteng, Peng, Yuxin, and Zhao, Junjie
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Fine-grained image classification is to recognize hundreds of subcategories in each basic-level category. Existing methods employ discriminative localization to find the key distinctions among subcategories. However, they generally have two limitations: (1) Discriminative localization relies on region proposal methods to hypothesize the locations of discriminative regions, which are time-consuming. (2) The training of discriminative localization depends on object or part annotations, which are heavily labor-consuming. It is highly challenging to address the two key limitations simultaneously, and existing methods only focus on one of them. Therefore, we propose a weakly supervised discriminative localization approach (WSDL) for fast fine-grained image classification to address the two limitations at the same time, and its main advantages are: (1) n-pathway end-to-end discriminative localization network is designed to improve classification speed, which simultaneously localizes multiple different discriminative regions for one image to boost classification accuracy, and shares full-image convolutional features generated by region proposal network to accelerate the process of generating region proposals as well as reduce the computation of convolutional operation. (2) Multi-level attention guided localization learning is proposed to localize discriminative regions with different focuses automatically, without using object and part annotations, avoiding the labor consumption. Different level attentions focus on different characteristics of the image, which are complementary and boost the classification accuracy. Both are jointly employed to simultaneously improve classification speed and eliminate dependence on object and part annotations. Compared with state-of-the-art methods on 2 widely-used fine-grained image classification datasets, our WSDL approach achieves the best performance., Comment: 13pages, submitted to IEEE Transactions on Circuits and Systems for Video Technology. arXiv admin note: text overlap with arXiv:1709.08295
- Published
- 2017
- Full Text
- View/download PDF
74. Fine-grained Discriminative Localization via Saliency-guided Faster R-CNN
- Author
-
He, Xiangteng, Peng, Yuxin, and Zhao, Junjie
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Discriminative localization is essential for fine-grained image classification task, which devotes to recognizing hundreds of subcategories in the same basic-level category. Reflecting on discriminative regions of objects, key differences among different subcategories are subtle and local. Existing methods generally adopt a two-stage learning framework: The first stage is to localize the discriminative regions of objects, and the second is to encode the discriminative features for training classifiers. However, these methods generally have two limitations: (1) Separation of the two-stage learning is time-consuming. (2) Dependence on object and parts annotations for discriminative localization learning leads to heavily labor-consuming labeling. It is highly challenging to address these two important limitations simultaneously. Existing methods only focus on one of them. Therefore, this paper proposes the discriminative localization approach via saliency-guided Faster R-CNN to address the above two limitations at the same time, and our main novelties and advantages are: (1) End-to-end network based on Faster R-CNN is designed to simultaneously localize discriminative regions and encode discriminative features, which accelerates classification speed. (2) Saliency-guided localization learning is proposed to localize the discriminative region automatically, avoiding labor-consuming labeling. Both are jointly employed to simultaneously accelerate classification speed and eliminate dependence on object and parts annotations. Comparing with the state-of-the-art methods on the widely-used CUB-200-2011 dataset, our approach achieves both the best classification accuracy and efficiency., Comment: 9 pages, to appear in ACM MM 2017
- Published
- 2017
- Full Text
- View/download PDF
75. Fine-grained Visual-textual Representation Learning
- Author
-
He, Xiangteng and Peng, Yuxin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Fine-grained visual categorization is to recognize hundreds of subcategories belonging to the same basic-level category, which is a highly challenging task due to the quite subtle and local visual distinctions among similar subcategories. Most existing methods generally learn part detectors to discover discriminative regions for better categorization performance. However, not all parts are beneficial and indispensable for visual categorization, and the setting of part detector number heavily relies on prior knowledge as well as experimental validation. As is known to all, when we describe the object of an image via textual descriptions, we mainly focus on the pivotal characteristics, and rarely pay attention to common characteristics as well as the background areas. This is an involuntary transfer from human visual attention to textual attention, which leads to the fact that textual attention tells us how many and which parts are discriminative and significant to categorization. So textual attention could help us to discover visual attention in image. Inspired by this, we propose a fine-grained visual-textual representation learning (VTRL) approach, and its main contributions are: (1) Fine-grained visual-textual pattern mining devotes to discovering discriminative visual-textual pairwise information for boosting categorization performance through jointly modeling vision and text with generative adversarial networks (GANs), which automatically and adaptively discovers discriminative parts. (2) Visual-textual representation learning jointly combines visual and textual information, which preserves the intra-modality and inter-modality information to generate complementary fine-grained representation, as well as further improves categorization performance., Comment: 12 pages, accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
- Published
- 2017
- Full Text
- View/download PDF
76. Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network
- Author
-
Peng, Yuxin, Qi, Jinwei, and Yuan, Yuxin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language ,Computer Science - Multimedia - Abstract
Nowadays, cross-modal retrieval plays an indispensable role to flexibly find information across different modalities of data. Effectively measuring the similarity between different modalities of data is the key of cross-modal retrieval. Different modalities such as image and text have imbalanced and complementary relationships, which contain unequal amount of information when describing the same semantics. For example, images often contain more details that cannot be demonstrated by textual descriptions and vice versa. Existing works based on Deep Neural Network (DNN) mostly construct one common space for different modalities to find the latent alignments between them, which lose their exclusive modality-specific characteristics. Different from the existing works, we propose modality-specific cross-modal similarity measurement (MCSM) approach by constructing independent semantic space for each modality, which adopts end-to-end framework to directly generate modality-specific cross-modal similarity without explicit common representation. For each semantic space, modality-specific characteristics within one modality are fully exploited by recurrent attention network, while the data of another modality is projected into this space with attention based joint embedding to utilize the learned attention weights for guiding the fine-grained cross-modal correlation learning, which can capture the imbalanced and complementary relationships between different modalities. Finally, the complementarity between the semantic spaces for different modalities is explored by adaptive fusion of the modality-specific cross-modal similarities to perform cross-modal retrieval. Experiments on the widely-used Wikipedia and Pascal Sentence datasets as well as our constructed large-scale XMediaNet dataset verify the effectiveness of our proposed approach, outperforming 9 state-of-the-art methods., Comment: 13 pages, submitted to IEEE Transactions on Image Processing
- Published
- 2017
77. MHTN: Modal-adversarial Hybrid Transfer Network for Cross-modal Retrieval
- Author
-
Huang, Xin, Peng, Yuxin, and Yuan, Mingkuan
- Subjects
Computer Science - Multimedia ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Learning - Abstract
Cross-modal retrieval has drawn wide interest for retrieval across different modalities of data. However, existing methods based on DNN face the challenge of insufficient cross-modal training data, which limits the training effectiveness and easily leads to overfitting. Transfer learning is for relieving the problem of insufficient training data, but it mainly focuses on knowledge transfer only from large-scale datasets as single-modal source domain to single-modal target domain. Such large-scale single-modal datasets also contain rich modal-independent semantic knowledge that can be shared across different modalities. Besides, large-scale cross-modal datasets are very labor-consuming to collect and label, so it is significant to fully exploit the knowledge in single-modal datasets for boosting cross-modal retrieval. This paper proposes modal-adversarial hybrid transfer network (MHTN), which to the best of our knowledge is the first work to realize knowledge transfer from single-modal source domain to cross-modal target domain, and learn cross-modal common representation. It is an end-to-end architecture with two subnetworks: (1) Modal-sharing knowledge transfer subnetwork is proposed to jointly transfer knowledge from a large-scale single-modal dataset in source domain to all modalities in target domain with a star network structure, which distills modal-independent supplementary knowledge for promoting cross-modal common representation learning. (2) Modal-adversarial semantic learning subnetwork is proposed to construct an adversarial training mechanism between common representation generator and modality discriminator, making the common representation discriminative for semantics but indiscriminative for modalities to enhance cross-modal semantic consistency during transfer process. Comprehensive experiments on 4 widely-used datasets show its effectiveness and generality., Comment: 12 pages, submitted to IEEE Transactions on Cybernetics
- Published
- 2017
78. Cross-modal Common Representation Learning by Hybrid Transfer Network
- Author
-
Huang, Xin, Peng, Yuxin, and Yuan, Mingkuan
- Subjects
Computer Science - Multimedia ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Learning - Abstract
DNN-based cross-modal retrieval is a research hotspot to retrieve across different modalities as image and text, but existing methods often face the challenge of insufficient cross-modal training data. In single-modal scenario, similar problem is usually relieved by transferring knowledge from large-scale auxiliary datasets (as ImageNet). Knowledge from such single-modal datasets is also very useful for cross-modal retrieval, which can provide rich general semantic information that can be shared across different modalities. However, it is challenging to transfer useful knowledge from single-modal (as image) source domain to cross-modal (as image/text) target domain. Knowledge in source domain cannot be directly transferred to both two different modalities in target domain, and the inherent cross-modal correlation contained in target domain provides key hints for cross-modal retrieval which should be preserved during transfer process. This paper proposes Cross-modal Hybrid Transfer Network (CHTN) with two subnetworks: Modal-sharing transfer subnetwork utilizes the modality in both source and target domains as a bridge, for transferring knowledge to both two modalities simultaneously; Layer-sharing correlation subnetwork preserves the inherent cross-modal semantic correlation to further adapt to cross-modal retrieval task. Cross-modal data can be converted to common representation by CHTN for retrieval, and comprehensive experiment on 3 datasets shows its effectiveness., Comment: To appear in the proceedings of 26th International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, Aug. 19-25, 2017. 8 pages, 2 figures
- Published
- 2017
79. Cross-media Similarity Metric Learning with Unified Deep Networks
- Author
-
Qi, Jinwei, Huang, Xin, and Peng, Yuxin
- Subjects
Computer Science - Multimedia ,Computer Science - Learning ,Statistics - Machine Learning - Abstract
As a highlighting research topic in the multimedia area, cross-media retrieval aims to capture the complex correlations among multiple media types. Learning better shared representation and distance metric for multimedia data is important to boost the cross-media retrieval. Motivated by the strong ability of deep neural network in feature representation and comparison functions learning, we propose the Unified Network for Cross-media Similarity Metric (UNCSM) to associate cross-media shared representation learning with distance metric in a unified framework. First, we design a two-pathway deep network pretrained with contrastive loss, and employ double triplet similarity loss for fine-tuning to learn the shared representation for each media type by modeling the relative semantic similarity. Second, the metric network is designed for effectively calculating the cross-media similarity of the shared representation, by modeling the pairwise similar and dissimilar constraints. Compared to the existing methods which mostly ignore the dissimilar constraints and only use sample distance metric as Euclidean distance separately, our UNCSM approach unifies the representation learning and distance metric to preserve the relative similarity as well as embrace more complex similarity functions for further improving the cross-media retrieval accuracy. The experimental results show that our UNCSM approach outperforms 8 state-of-the-art methods on 4 widely-used cross-media datasets., Comment: 19 pages, submitted to Multimedia Tools and Applications
- Published
- 2017
80. Fine-graind Image Classification via Combining Vision and Language
- Author
-
He, Xiangteng and Peng, Yuxin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Fine-grained image classification is a challenging task due to the large intra-class variance and small inter-class variance, aiming at recognizing hundreds of sub-categories belonging to the same basic-level category. Most existing fine-grained image classification methods generally learn part detection models to obtain the semantic parts for better classification accuracy. Despite achieving promising results, these methods mainly have two limitations: (1) not all the parts which obtained through the part detection models are beneficial and indispensable for classification, and (2) fine-grained image classification requires more detailed visual descriptions which could not be provided by the part locations or attribute annotations. For addressing the above two limitations, this paper proposes the two-stream model combining vision and language (CVL) for learning latent semantic representations. The vision stream learns deep representations from the original visual information via deep convolutional neural network. The language stream utilizes the natural language descriptions which could point out the discriminative parts or characteristics for each image, and provides a flexible and compact way of encoding the salient visual aspects for distinguishing sub-categories. Since the two streams are complementary, combining the two streams can further achieves better classification accuracy. Comparing with 12 state-of-the-art methods on the widely used CUB-200-2011 dataset for fine-grained image classification, the experimental results demonstrate our CVL approach achieves the best performance., Comment: 9 pages, to appear in CVPR 2017
- Published
- 2017
- Full Text
- View/download PDF
81. An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges
- Author
-
Peng, Yuxin, Huang, Xin, and Zhao, Yunzhen
- Subjects
Computer Science - Multimedia - Abstract
Multimedia retrieval plays an indispensable role in big data utilization. Past efforts mainly focused on single-media retrieval. However, the requirements of users are highly flexible, such as retrieving the relevant audio clips with one query of image. So challenges stemming from the "media gap", which means that representations of different media types are inconsistent, have attracted increasing attention. Cross-media retrieval is designed for the scenarios where the queries and retrieval results are of different media types. As a relatively new research topic, its concepts, methodologies and benchmarks are still not clear in the literatures. To address these issues, we review more than 100 references, give an overview including the concepts, methodologies, major challenges and open issues, as well as build up the benchmarks including datasets and experimental results. Researchers can directly adopt the benchmarks to promptly evaluate their proposed methods. This will help them to focus on algorithm design, rather than the time-consuming compared methods and results. It is noted that we have constructed a new dataset XMedia, which is the first publicly available dataset with up to five media types (text, image, video, audio and 3D model). We believe this overview will attract more researchers to focus on cross-media retrieval and be helpful to them., Comment: 14 pages, accepted by IEEE Transactions on Circuits and Systems for Video Technology
- Published
- 2017
82. CCL: Cross-modal Correlation Learning with Multi-grained Fusion by Hierarchical Network
- Author
-
Peng, Yuxin, Qi, Jinwei, Huang, Xin, and Yuan, Yuxin
- Subjects
Computer Science - Multimedia - Abstract
Cross-modal retrieval has become a highlighted research topic for retrieval across multimedia data such as image and text. A two-stage learning framework is widely adopted by most existing methods based on Deep Neural Network (DNN): The first learning stage is to generate separate representation for each modality, and the second learning stage is to get the cross-modal common representation. However, the existing methods have three limitations: (1) In the first learning stage, they only model intra-modality correlation, but ignore inter-modality correlation with rich complementary context. (2) In the second learning stage, they only adopt shallow networks with single-loss regularization, but ignore the intrinsic relevance of intra-modality and inter-modality correlation. (3) Only original instances are considered while the complementary fine-grained clues provided by their patches are ignored. For addressing the above problems, this paper proposes a cross-modal correlation learning (CCL) approach with multi-grained fusion by hierarchical network, and the contributions are as follows: (1) In the first learning stage, CCL exploits multi-level association with joint optimization to preserve the complementary context from intra-modality and inter-modality correlation simultaneously. (2) In the second learning stage, a multi-task learning strategy is designed to adaptively balance the intra-modality semantic category constraints and inter-modality pairwise similarity constraints. (3) CCL adopts multi-grained modeling, which fuses the coarse-grained instances and fine-grained patches to make cross-modal correlation more precise. Comparing with 13 state-of-the-art methods on 6 widely-used cross-modal datasets, the experimental results show our CCL approach achieves the best performance., Comment: 16 pages, accepted by IEEE Transactions on Multimedia
- Published
- 2017
83. Object-Part Attention Model for Fine-grained Image Classification
- Author
-
Peng, Yuxin, He, Xiangteng, and Zhao, Junjie
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Fine-grained image classification is to recognize hundreds of subcategories belonging to the same basic-level category, such as 200 subcategories belonging to the bird, which is highly challenging due to large variance in the same subcategory and small variance among different subcategories. Existing methods generally first locate the objects or parts and then discriminate which subcategory the image belongs to. However, they mainly have two limitations: (1) Relying on object or part annotations which are heavily labor consuming. (2) Ignoring the spatial relationships between the object and its parts as well as among these parts, both of which are significantly helpful for finding discriminative parts. Therefore, this paper proposes the object-part attention model (OPAM) for weakly supervised fine-grained image classification, and the main novelties are: (1) Object-part attention model integrates two level attentions: object-level attention localizes objects of images, and part-level attention selects discriminative parts of object. Both are jointly employed to learn multi-view and multi-scale features to enhance their mutual promotions. (2) Object-part spatial constraint model combines two spatial constraints: object spatial constraint ensures selected parts highly representative, and part spatial constraint eliminates redundancy and enhances discrimination of selected parts. Both are jointly employed to exploit the subtle and local differences for distinguishing the subcategories. Importantly, neither object nor part annotations are used in our proposed approach, which avoids the heavy labor consumption of labeling. Comparing with more than 10 state-of-the-art methods on 4 widely-used datasets, our OPAM approach achieves the best performance., Comment: 14 pages, submitted to IEEE Transactions on Image Processing
- Published
- 2017
- Full Text
- View/download PDF
84. Saliency-guided video classification via adaptively weighted learning
- Author
-
Zhao, Yunzhen and Peng, Yuxin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Video classification is productive in many practical applications, and the recent deep learning has greatly improved its accuracy. However, existing works often model video frames indiscriminately, but from the view of motion, video frames can be decomposed into salient and non-salient areas naturally. Salient and non-salient areas should be modeled with different networks, for the former present both appearance and motion information, and the latter present static background information. To address this problem, in this paper, video saliency is predicted by optical flow without supervision firstly. Then two streams of 3D CNN are trained individually for raw frames and optical flow on salient areas, and another 2D CNN is trained for raw frames on non-salient areas. For the reason that these three streams play different roles for each class, the weights of each stream are adaptively learned for each class. Experimental results show that saliency-guided modeling and adaptively weighted learning can reinforce each other, and we achieve the state-of-the-art results., Comment: 6 pages, 1 figure, accepted by ICME 2017
- Published
- 2017
85. Cross-modal Deep Metric Learning with Multi-task Regularization
- Author
-
Huang, Xin and Peng, Yuxin
- Subjects
Computer Science - Learning ,Computer Science - Computer Vision and Pattern Recognition ,Statistics - Machine Learning - Abstract
DNN-based cross-modal retrieval has become a research hotspot, by which users can search results across various modalities like image and text. However, existing methods mainly focus on the pairwise correlation and reconstruction error of labeled data. They ignore the semantically similar and dissimilar constraints between different modalities, and cannot take advantage of unlabeled data. This paper proposes Cross-modal Deep Metric Learning with Multi-task Regularization (CDMLMR), which integrates quadruplet ranking loss and semi-supervised contrastive loss for modeling cross-modal semantic similarity in a unified multi-task learning architecture. The quadruplet ranking loss can model the semantically similar and dissimilar constraints to preserve cross-modal relative similarity ranking information. The semi-supervised contrastive loss is able to maximize the semantic similarity on both labeled and unlabeled data. Compared to the existing methods, CDMLMR exploits not only the similarity ranking information but also unlabeled cross-modal data, and thus boosts cross-modal retrieval accuracy., Comment: Revision: Added reference [7] 6 pages, 1 figure, to appear in the proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Jul 10, 2017 - Jul 14, 2017, Hong Kong, Hong Kong
- Published
- 2017
86. Highly Flexible and Superelastic Graphene Nanofibrous Aerogels for Intelligent Sign Language.
- Author
-
Pang, Kai, Ma, Jingyu, Song, Xian, Liu, Xiaoting, Zhang, Chengqi, Gao, Yue, Li, Kaiwen, Liu, Yingjun, Peng, Yuxin, Xu, Zhen, and Gao, Chao
- Published
- 2024
- Full Text
- View/download PDF
87. Query-adaptive Image Retrieval by Deep Weighted Hashing
- Author
-
Zhang, Jian and Peng, Yuxin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,H.3.1, H.3.3 - Abstract
Hashing methods have attracted much attention for large scale image retrieval. Some deep hashing methods have achieved promising results by taking advantage of the strong representation power of deep networks recently. However, existing deep hashing methods treat all hash bits equally. On one hand, a large number of images share the same distance to a query image due to the discrete Hamming distance, which raises a critical issue of image retrieval where fine-grained rankings are very important. On the other hand, different hash bits actually contribute to the image retrieval differently, and treating them equally greatly affects the retrieval accuracy of image. To address the above two problems, we propose the query-adaptive deep weighted hashing (QaDWH) approach, which can perform fine-grained ranking for different queries by weighted Hamming distance. First, a novel deep hashing network is proposed to learn the hash codes and corresponding class-wise weights jointly, so that the learned weights can reflect the importance of different hash bits for different image classes. Second, a query-adaptive image retrieval method is proposed, which rapidly generates hash bit weights for different query images by fusing its semantic probability and the learned class-wise weights. Fine-grained image retrieval is then performed by the weighted Hamming distance, which can provide more accurate ranking than the traditional Hamming distance. Experiments on four widely used datasets show that the proposed approach outperforms eight state-of-the-art hashing methods., Comment: 13 pages, submitted to IEEE Transactions On Multimedia
- Published
- 2016
88. Weakly Supervised Video Anomaly Detection with Temporal and Abnormal Information
- Author
-
Pi, Ruoyan, primary, He, Xiangteng, additional, and Peng, Yuxin, additional
- Published
- 2022
- Full Text
- View/download PDF
89. Covalent Tethering of Cobalt Porphyrins on Phenolic Resins for Electrocatalytic Oxygen Reduction and Evolution Reactions
- Author
-
Kong, Jiafan, primary, Qin, Haonan, additional, Yang, Luna, additional, Zhang, Jieling, additional, Peng, Yuxin, additional, Gao, Yimei, additional, Wu, Yizhen, additional, Nam, Wonwoo, additional, and Cao, Rui, additional
- Published
- 2024
- Full Text
- View/download PDF
90. SPIRIT: Style-guided Patch Interaction for Fashion Image Retrieval with Text Feedback
- Author
-
Chen, Yanzhe, primary, Zhou, Jiahuan, additional, and Peng, Yuxin, additional
- Published
- 2024
- Full Text
- View/download PDF
91. Recent Trends in Self-Powered Photoelectrochemical Sensors: From the Perspective of Signal Output
- Author
-
Yang, Peilin, primary, Hou, Xiuli, additional, Gao, Xin, additional, Peng, Yuxin, additional, Li, Qingfeng, additional, Niu, Qijian, additional, and Liu, Qian, additional
- Published
- 2024
- Full Text
- View/download PDF
92. Study on the Frying Performance Evaluation of Refined Soybean Oil after PLC Enzymatic Degumming
- Author
-
Zhou, Wenting, primary, Peng, Yuxin, additional, Wu, Zongyuan, additional, Zhang, Weinong, additional, and Cong, Yanxia, additional
- Published
- 2024
- Full Text
- View/download PDF
93. Heat stress impairs floral meristem termination and fruit development via affecting BR-SlCRCa cascade in tomato
- Author
-
Wu, Junqing, primary, Li, Pengxue, additional, Li, Meng, additional, Zhu, Danyang, additional, Ma, Haochuan, additional, Xu, Huimin, additional, Li, Shuang, additional, Wei, Jinbo, additional, Bian, Xinxin, additional, Wang, Mengyao, additional, Lai, Yixuan, additional, Peng, Yuxin, additional, Li, Haixiao, additional, Rahman, Abidur, additional, and Wu, Shuang, additional
- Published
- 2024
- Full Text
- View/download PDF
94. I2C: Invertible Continuous Codec for High-Fidelity Variable-Rate Image Compression
- Author
-
Cai, Shilv, primary, Chen, Liqun, additional, Zhang, Zhijun, additional, Zhao, Xiangyun, additional, Zhou, Jiahuan, additional, Peng, Yuxin, additional, Yan, Luxin, additional, Zhong, Sheng, additional, and Zou, Xu, additional
- Published
- 2024
- Full Text
- View/download PDF
95. Toward Video Anomaly Retrieval From Video Anomaly Detection: New Benchmarks and Model
- Author
-
Wu, Peng, primary, Liu, Jing, additional, He, Xiangteng, additional, Peng, Yuxin, additional, Wang, Peng, additional, and Zhang, Yanning, additional
- Published
- 2024
- Full Text
- View/download PDF
96. DMA: Dual Modality-Aware Alignment for Visible-Infrared Person Re-Identification
- Author
-
Cui, Zhenyu, primary, Zhou, Jiahuan, additional, and Peng, Yuxin, additional
- Published
- 2024
- Full Text
- View/download PDF
97. A twisted carbonaceous nanotube as the air-electrode for flexible Zn–Air batteries
- Author
-
Hua, Rong, primary, Bao, Zijia, additional, Peng, Yuxin, additional, Lei, Haitao, additional, Liang, Zuozhong, additional, Zhang, Wei, additional, Cao, Rui, additional, and Zheng, Haoquan, additional
- Published
- 2024
- Full Text
- View/download PDF
98. Imine organic cages derived from tetraphenylethylene dialdehydes exhibiting aggregation-induced emission and explosives detection
- Author
-
Feng, Fanda, Peng, Yuxin, Zhang, Lei, and Huang, Wei
- Published
- 2021
- Full Text
- View/download PDF
99. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval
- Author
-
Zhang, Jian and Peng, Yuxin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,H.3.1 - Abstract
Hashing methods have been widely used for efficient similarity retrieval on large scale image database. Traditional hashing methods learn hash functions to generate binary codes from hand-crafted features, which achieve limited accuracy since the hand-crafted features cannot optimally represent the image content and preserve the semantic similarity. Recently, several deep hashing methods have shown better performance because the deep architectures generate more discriminative feature representations. However, these deep hashing methods are mainly designed for supervised scenarios, which only exploit the semantic similarity information, but ignore the underlying data structures. In this paper, we propose the semi-supervised deep hashing (SSDH) approach, to perform more effective hash function learning by simultaneously preserving semantic similarity and underlying data structures. The main contributions are as follows: (1) We propose a semi-supervised loss to jointly minimize the empirical error on labeled data, as well as the embedding error on both labeled and unlabeled data, which can preserve the semantic similarity and capture the meaningful neighbors on the underlying data structures for effective hashing. (2) A semi-supervised deep hashing network is designed to extensively exploit both labeled and unlabeled data, in which we propose an online graph construction method to benefit from the evolving deep features during training to better capture semantic neighbors. To the best of our knowledge, the proposed deep network is the first deep hashing method that can perform hash code learning and feature learning simultaneously in a semi-supervised fashion. Experimental results on 5 widely-used datasets show that our proposed approach outperforms the state-of-the-art hashing methods., Comment: 14 pages, accepted by IEEE Transactions on Circuits and Systems for Video Technology
- Published
- 2016
- Full Text
- View/download PDF
100. Prototype-based classifier learning for long-tailed visual recognition
- Author
-
Wei, Xiu-Shen, Xu, Shu-Lin, Chen, Hao, Xiao, Liang, and Peng, Yuxin
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.