4,474 results on '"XIE Xing"'
Search Results
102. Generalizable Low-Resource Activity Recognition with Diverse and Discriminative Representation Learning
- Author
-
Qin, Xin, Wang, Jindong, Ma, Shuo, Lu, Wang, Zhu, Yongchun, Xie, Xing, and Chen, Yiqiang
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Human activity recognition (HAR) is a time series classification task that focuses on identifying the motion patterns from human sensor readings. Adequate data is essential but a major bottleneck for training a generalizable HAR model, which assists customization and optimization of online web applications. However, it is costly in time and economy to collect large-scale labeled data in reality, i.e., the low-resource challenge. Meanwhile, data collected from different persons have distribution shifts due to different living habits, body shapes, age groups, etc. The low-resource and distribution shift challenges are detrimental to HAR when applying the trained model to new unseen subjects. In this paper, we propose a novel approach called Diverse and Discriminative representation Learning (DDLearn) for generalizable low-resource HAR. DDLearn simultaneously considers diversity and discrimination learning. With the constructed self-supervised learning task, DDLearn enlarges the data diversity and explores the latent activity properties. Then, we propose a diversity preservation module to preserve the diversity of learned features by enlarging the distribution divergence between the original and augmented domains. Meanwhile, DDLearn also enhances semantic discrimination by learning discriminative representations with supervised contrastive learning. Extensive experiments on three public HAR datasets demonstrate that our method significantly outperforms state-of-art methods by an average accuracy improvement of 9.5% under the low-resource distribution shift scenarios, while being a generic, explainable, and flexible framework. Code is available at: https://github.com/microsoft/robustlearn., Comment: Accepted by SIGKDD 2023 Research track; 12 pages; Code is available at: https://github.com/microsoft/robustlearn
- Published
- 2023
103. Continual Learning on Dynamic Graphs via Parameter Isolation
- Author
-
Zhang, Peiyan, Yan, Yuchen, Li, Chaozhuo, Wang, Senzhang, Xie, Xing, Song, Guojie, and Kim, Sunghun
- Subjects
Computer Science - Machine Learning ,Computer Science - Information Retrieval ,H.3.3 - Abstract
Many real-world graph learning tasks require handling dynamic graphs where new nodes and edges emerge. Dynamic graph learning methods commonly suffer from the catastrophic forgetting problem, where knowledge learned for previous graphs is overwritten by updates for new graphs. To alleviate the problem, continual graph learning methods are proposed. However, existing continual graph learning methods aim to learn new patterns and maintain old ones with the same set of parameters of fixed size, and thus face a fundamental tradeoff between both goals. In this paper, we propose Parameter Isolation GNN (PI-GNN) for continual learning on dynamic graphs that circumvents the tradeoff via parameter isolation and expansion. Our motivation lies in that different parameters contribute to learning different graph patterns. Based on the idea, we expand model parameters to continually learn emerging graph patterns. Meanwhile, to effectively preserve knowledge for unaffected patterns, we find parameters that correspond to them via optimization and freeze them to prevent them from being rewritten. Experiments on eight real-world datasets corroborate the effectiveness of PI-GNN compared to state-of-the-art baselines.
- Published
- 2023
- Full Text
- View/download PDF
104. To Copy Rather Than Memorize: A Vertical Learning Paradigm for Knowledge Graph Completion
- Author
-
Li, Rui, Chen, Xu, Li, Chaozhuo, Shen, Yanming, Zhao, Jianan, Wang, Yujing, Han, Weihao, Sun, Hao, Deng, Weiwei, Zhang, Qi, and Xie, Xing
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Embedding models have shown great power in knowledge graph completion (KGC) task. By learning structural constraints for each training triple, these methods implicitly memorize intrinsic relation rules to infer missing links. However, this paper points out that the multi-hop relation rules are hard to be reliably memorized due to the inherent deficiencies of such implicit memorization strategy, making embedding models underperform in predicting links between distant entity pairs. To alleviate this problem, we present Vertical Learning Paradigm (VLP), which extends embedding models by allowing to explicitly copy target information from related factual triples for more accurate prediction. Rather than solely relying on the implicit memory, VLP directly provides additional cues to improve the generalization ability of embedding models, especially making the distant link prediction significantly easier. Moreover, we also propose a novel relative distance based negative sampling technique (ReD) for more effective optimization. Experiments demonstrate the validity and generality of our proposals on two standard benchmarks. Our code is available at https://github.com/rui9812/VLP., Comment: Accepted to ACL 2023 Main Conference (Long Paper)
- Published
- 2023
105. Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations
- Author
-
Chen, Hao, Shah, Ankit, Wang, Jindong, Tao, Ran, Wang, Yidong, Xie, Xing, Sugiyama, Masashi, Singh, Rita, and Raj, Bhiksha
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Learning with reduced labeling standards, such as noisy label, partial label, and multiple label candidates, which we generically refer to as \textit{imprecise} labels, is a commonplace challenge in machine learning tasks. Previous methods tend to propose specific designs for every emerging imprecise label configuration, which is usually unsustainable when multiple configurations of imprecision coexist. In this paper, we introduce imprecise label learning (ILL), a framework for the unification of learning with various imprecise label configurations. ILL leverages expectation-maximization (EM) for modeling the imprecise label information, treating the precise labels as latent variables.Instead of approximating the correct labels for training, it considers the entire distribution of all possible labeling entailed by the imprecise information. We demonstrate that ILL can seamlessly adapt to partial label learning, semi-supervised learning, noisy label learning, and, more importantly, a mixture of these settings. Notably, ILL surpasses the existing specified techniques for handling imprecise labels, marking the first unified framework with robust and effective performance across various challenging settings. We hope our work will inspire further research on this topic, unleashing the full potential of ILL in wider scenarios where precise labels are expensive and complicated to obtain., Comment: NeurIPS 2024
- Published
- 2023
106. Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark
- Author
-
Peng, Wenjun, Yi, Jingwei, Wu, Fangzhao, Wu, Shangxi, Zhu, Bin, Lyu, Lingjuan, Jiao, Binxing, Xu, Tong, Sun, Guangzhong, and Xie, Xing
- Subjects
Computer Science - Computation and Language ,Computer Science - Computers and Society - Abstract
Large language models (LLMs) have demonstrated powerful capabilities in both text understanding and generation. Companies have begun to offer Embedding as a Service (EaaS) based on these LLMs, which can benefit various natural language processing (NLP) tasks for customers. However, previous studies have shown that EaaS is vulnerable to model extraction attacks, which can cause significant losses for the owners of LLMs, as training these models is extremely expensive. To protect the copyright of LLMs for EaaS, we propose an Embedding Watermark method called EmbMarker that implants backdoors on embeddings. Our method selects a group of moderate-frequency words from a general text corpus to form a trigger set, then selects a target embedding as the watermark, and inserts it into the embeddings of texts containing trigger words as the backdoor. The weight of insertion is proportional to the number of trigger words included in the text. This allows the watermark backdoor to be effectively transferred to EaaS-stealer's model for copyright verification while minimizing the adverse impact on the original embeddings' utility. Our extensive experiments on various datasets show that our method can effectively protect the copyright of EaaS models without compromising service quality., Comment: Accepted by ACL 2023
- Published
- 2023
107. Towards Explainable Collaborative Filtering with Taste Clusters Learning
- Author
-
Du, Yuntao, Lian, Jianxun, Yao, Jing, Wang, Xiting, Wu, Mingqi, Chen, Lu, Gao, Yunjun, and Xie, Xing
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence - Abstract
Collaborative Filtering (CF) is a widely used and effective technique for recommender systems. In recent decades, there have been significant advancements in latent embedding-based CF methods for improved accuracy, such as matrix factorization, neural collaborative filtering, and LightGCN. However, the explainability of these models has not been fully explored. Adding explainability to recommendation models can not only increase trust in the decisionmaking process, but also have multiple benefits such as providing persuasive explanations for item recommendations, creating explicit profiles for users and items, and assisting item producers in design improvements. In this paper, we propose a neat and effective Explainable Collaborative Filtering (ECF) model that leverages interpretable cluster learning to achieve the two most demanding objectives: (1) Precise - the model should not compromise accuracy in the pursuit of explainability; and (2) Self-explainable - the model's explanations should truly reflect its decision-making process, not generated from post-hoc methods. The core of ECF is mining taste clusters from user-item interactions and item profiles.We map each user and item to a sparse set of taste clusters, and taste clusters are distinguished by a few representative tags. The user-item preference, users/items' cluster affiliations, and the generation of taste clusters are jointly optimized in an end-to-end manner. Additionally, we introduce a forest mechanism to ensure the model's accuracy, explainability, and diversity. To comprehensively evaluate the explainability quality of taste clusters, we design several quantitative metrics, including in-cluster item coverage, tag utilization, silhouette, and informativeness. Our model's effectiveness is demonstrated through extensive experiments on three real-world datasets., Comment: Accepted to WWW 2023
- Published
- 2023
- Full Text
- View/download PDF
108. Exploring Vision-Language Models for Imbalanced Learning
- Author
-
Wang, Yidong, Yu, Zhuohao, Wang, Jindong, Heng, Qiang, Chen, Hao, Ye, Wei, Xie, Rui, Xie, Xing, and Zhang, Shikun
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Vision-Language models (VLMs) that use contrastive language-image pre-training have shown promising zero-shot classification performance. However, their performance on imbalanced dataset is relatively poor, where the distribution of classes in the training dataset is skewed, leading to poor performance in predicting minority classes. For instance, CLIP achieved only 5% accuracy on the iNaturalist18 dataset. We propose to add a lightweight decoder to VLMs to avoid OOM (out of memory) problem caused by large number of classes and capture nuanced features for tail classes. Then, we explore improvements of VLMs using prompt tuning, fine-tuning, and incorporating imbalanced algorithms such as Focal Loss, Balanced SoftMax and Distribution Alignment. Experiments demonstrate that the performance of VLMs can be further boosted when used with decoder and imbalanced methods. Specifically, our improved VLMs significantly outperforms zero-shot classification by an average accuracy of 6.58%, 69.82%, and 6.17%, on ImageNet-LT, iNaturalist18, and Places-LT, respectively. We further analyze the influence of pre-training data size, backbones, and training cost. Our study highlights the significance of imbalanced learning algorithms in face of VLMs pre-trained by huge data. We release our code at https://github.com/Imbalance-VLM/Imbalance-VLM., Comment: IJCV minor revision; 16 pages; code: https://github.com/Imbalance-VLM/Imbalance-VLM
- Published
- 2023
109. IRGen: Generative Modeling for Image Retrieval
- Author
-
Zhang, Yidan, Zhang, Ting, Chen, Dong, Wang, Yujing, Chen, Qi, Xie, Xing, Sun, Hao, Deng, Weiwei, Zhang, Qi, Yang, Fan, Yang, Mao, Liao, Qingmin, Wang, Jingdong, and Guo, Baining
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
While generative modeling has become prevalent across numerous research fields, its integration into the realm of image retrieval remains largely unexplored and underjustified. In this paper, we present a novel methodology, reframing image retrieval as a variant of generative modeling and employing a sequence-to-sequence model. This approach is harmoniously aligned with the current trend towards unification in research, presenting a cohesive framework that allows for end-to-end differentiable searching. This, in turn, facilitates superior performance via direct optimization techniques. The development of our model, dubbed IRGen, addresses the critical technical challenge of converting an image into a concise sequence of semantic units, which is pivotal for enabling efficient and effective search. Extensive experiments demonstrate that our model achieves state-of-the-art performance on three widely-used image retrieval benchmarks as well as two million-scale datasets, yielding significant improvement compared to prior competitive retrieval methods. In addition, the notable surge in precision scores facilitated by generative modeling presents the potential to bypass the reranking phase, which is traditionally indispensable in practical retrieval workflows., Comment: Accepted by ECCV 2024
- Published
- 2023
110. DualFair: Fair Representation Learning at Both Group and Individual Levels via Contrastive Self-supervision
- Author
-
Han, Sungwon, Lee, Seungeon, Wu, Fangzhao, Kim, Sundong, Wu, Chuhan, Wang, Xiting, Xie, Xing, and Cha, Meeyoung
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society - Abstract
Algorithmic fairness has become an important machine learning problem, especially for mission-critical Web applications. This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations. Unlike existing models that target a single type of fairness, our model jointly optimizes for two fairness criteria - group fairness and counterfactual fairness - and hence makes fairer predictions at both the group and individual levels. Our model uses contrastive loss to generate embeddings that are indistinguishable for each protected group, while forcing the embeddings of counterfactual pairs to be similar. It then uses a self-knowledge distillation method to maintain the quality of representation for the downstream tasks. Extensive analysis over multiple datasets confirms the model's validity and further shows the synergy of jointly addressing two fairness criteria, suggesting the model's potential value in fair intelligent Web applications., Comment: Accepted and will be published at TheWebConf2023 (WWW2023)
- Published
- 2023
111. Distillation from Heterogeneous Models for Top-K Recommendation
- Author
-
Kang, SeongKu, Kweon, Wonbin, Lee, Dongha, Lian, Jianxun, Xie, Xing, and Yu, Hwanjo
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence - Abstract
Recent recommender systems have shown remarkable performance by using an ensemble of heterogeneous models. However, it is exceedingly costly because it requires resources and inference latency proportional to the number of models, which remains the bottleneck for production. Our work aims to transfer the ensemble knowledge of heterogeneous teachers to a lightweight student model using knowledge distillation (KD), to reduce the huge inference costs while retaining high accuracy. Through an empirical study, we find that the efficacy of distillation severely drops when transferring knowledge from heterogeneous teachers. Nevertheless, we show that an important signal to ease the difficulty can be obtained from the teacher's training trajectory. This paper proposes a new KD framework, named HetComp, that guides the student model by transferring easy-to-hard sequences of knowledge generated from the teachers' trajectories. To provide guidance according to the student's learning state, HetComp uses dynamic knowledge construction to provide progressively difficult ranking knowledge and adaptive knowledge transfer to gradually transfer finer-grained ranking information. Our comprehensive experiments show that HetComp significantly improves the distillation quality and the generalization of the student model., Comment: TheWebConf'23
- Published
- 2023
112. Recent Advances in Triboelectric Nanogenerators: From Technological Progress to Commercial Applications.
- Author
-
Choi, Dongwhi, Lee, Younghoon, Lin, Zong-Hong, Cho, Sumin, Kim, Miso, Ao, Chi, Soh, Siowling, Sohn, Changwan, Jeong, Chang, Lee, Jeongwan, Lee, Minbaek, Lee, Seungah, Ryu, Jungho, Parashar, Parag, Cho, Yujang, Ahn, Jaewan, Kim, Il-Doo, Jiang, Feng, Lee, Pooi, Khandelwal, Gaurav, Kim, Sang-Jae, Kim, Hyun, Song, Hyun-Cheol, Kim, Minje, Nah, Junghyo, Kim, Wook, Menge, Habtamu, Park, Yong, Xu, Wei, Hao, Jianhua, Park, Hyosik, Lee, Ju-Hyuck, Lee, Dong-Min, Kim, Sang-Woo, Park, Ji, Zhang, Haixia, Zi, Yunlong, Guo, Ru, Cheng, Jia, Yang, Ze, Xie, Yannan, Lee, Sangmin, Chung, Jihoon, Oh, Il-Kwon, Kim, Ji-Seok, Cheng, Tinghai, Gao, Qi, Cheng, Gang, Gu, Guangqin, Shim, Minseob, Jung, Jeehoon, Yun, Changwoo, Zhang, Chi, Liu, Guoxu, Chen, Yufeng, Kim, Suhan, Chen, Xiangyu, Hu, Jun, Pu, Xiong, Guo, Zi, Wang, Xudong, Chen, Jun, Xiao, Xiao, Xie, Xing, Jarin, Mourin, Zhang, Hulin, Lai, Ying-Chih, He, Tianyiyi, Kim, Hakjeong, Park, Inkyu, Ahn, Junseong, Huynh, Nghia, Yang, Ya, Wang, Zhong, Baik, Jeong, and Choi, Dukhyun
- Subjects
Triboelectric nanogenerator ,applications ,circuits ,device designs ,energy harvesting ,mechanical energy ,mechanical systems ,tribomaterials - Abstract
Serious climate changes and energy-related environmental problems are currently critical issues in the world. In order to reduce carbon emissions and save our environment, renewable energy harvesting technologies will serve as a key solution in the near future. Among them, triboelectric nanogenerators (TENGs), which is one of the most promising mechanical energy harvesters by means of contact electrification phenomenon, are explosively developing due to abundant wasting mechanical energy sources and a number of superior advantages in a wide availability and selection of materials, relatively simple device configurations, and low-cost processing. Significant experimental and theoretical efforts have been achieved toward understanding fundamental behaviors and a wide range of demonstrations since its report in 2012. As a result, considerable technological advancement has been exhibited and it advances the timeline of achievement in the proposed roadmap. Now, the technology has reached the stage of prototype development with verification of performance beyond the lab scale environment toward its commercialization. In this review, distinguished authors in the world worked together to summarize the state of the art in theory, materials, devices, systems, circuits, and applications in TENG fields. The great research achievements of researchers in this field around the world over the past decade are expected to play a major role in coming to fruition of unexpectedly accelerated technological advances over the next decade.
- Published
- 2023
113. FedCLIP: Fast Generalization and Personalization for CLIP in Federated Learning
- Author
-
Lu, Wang, Hu, Xixu, Wang, Jindong, and Xie, Xing
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Federated learning (FL) has emerged as a new paradigm for privacy-preserving computation in recent years. Unfortunately, FL faces two critical challenges that hinder its actual performance: data distribution heterogeneity and high resource costs brought by large foundation models. Specifically, the non-IID data in different clients make existing FL algorithms hard to converge while the high resource costs, including computational and communication costs that increase the deployment difficulty in real-world scenarios. In this paper, we propose an effective yet simple method, named FedCLIP, to achieve fast generalization and personalization for CLIP in federated learning. Concretely, we design an attention-based adapter for the large model, CLIP, and the rest operations merely depend on adapters. Lightweight adapters can make the most use of pretrained model information and ensure models be adaptive for clients in specific tasks. Simultaneously, small-scale operations can mitigate the computational burden and communication burden caused by large models. Extensive experiments are conducted on three datasets with distribution shifts. Qualitative and quantitative results demonstrate that FedCLIP significantly outperforms other baselines (9% overall improvements on PACS) and effectively reduces computational and communication costs (283x faster than FedAVG). Our code will be available at: https://github.com/microsoft/PersonalizedFL., Comment: Accepted by IEEE Data Engineering Bulletin; code is at: https://github.com/microsoft/PersonalizedFL
- Published
- 2023
114. On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective
- Author
-
Wang, Jindong, Hu, Xixu, Hou, Wenxin, Chen, Hao, Zheng, Runkai, Wang, Yidong, Yang, Linyi, Huang, Haojun, Ye, Wei, Geng, Xiubo, Jiao, Binxin, Zhang, Yue, and Xie, Xing
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
ChatGPT is a recent chatbot service released by OpenAI and is receiving increasing attention over the past few months. While evaluations of various aspects of ChatGPT have been done, its robustness, i.e., the performance to unexpected inputs, is still unclear to the public. Robustness is of particular concern in responsible AI, especially for safety-critical applications. In this paper, we conduct a thorough evaluation of the robustness of ChatGPT from the adversarial and out-of-distribution (OOD) perspective. To do so, we employ the AdvGLUE and ANLI benchmarks to assess adversarial robustness and the Flipkart review and DDXPlus medical diagnosis datasets for OOD evaluation. We select several popular foundation models as baselines. Results show that ChatGPT shows consistent advantages on most adversarial and OOD classification and translation tasks. However, the absolute performance is far from perfection, which suggests that adversarial and OOD robustness remains a significant threat to foundation models. Moreover, ChatGPT shows astounding performance in understanding dialogue-related texts and we find that it tends to provide informal suggestions for medical tasks instead of definitive answers. Finally, we present in-depth discussions of possible research directions., Comment: Highlighted paper at ICLR 2023 workshop on Trustworthy and Reliable Large-Scale Machine Learning Models; code is at: https://github.com/microsoft/robustlearn; more works: https://llm-eval.github.io/
- Published
- 2023
115. SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning
- Author
-
Chen, Hao, Tao, Ran, Fan, Yue, Wang, Yidong, Wang, Jindong, Schiele, Bernt, Xie, Xing, Raj, Bhiksha, and Savvides, Marios
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
The critical challenge of Semi-Supervised Learning (SSL) is how to effectively leverage the limited labeled data and massive unlabeled data to improve the model's generalization performance. In this paper, we first revisit the popular pseudo-labeling methods via a unified sample weighting formulation and demonstrate the inherent quantity-quality trade-off problem of pseudo-labeling with thresholding, which may prohibit learning. To this end, we propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training, effectively exploiting the unlabeled data. We derive a truncated Gaussian function to weight samples based on their confidence, which can be viewed as a soft version of the confidence threshold. We further enhance the utilization of weakly-learned classes by proposing a uniform alignment approach. In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification., Comment: Accepted by ICLR 2023
- Published
- 2023
116. Generating Multiple-Length Summaries via Reinforcement Learning for Unsupervised Sentence Summarization
- Author
-
Hyun, Dongmin, Wang, Xiting, Park, Chanyoung, Xie, Xing, and Yu, Hwanjo
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Sentence summarization shortens given texts while maintaining core contents of the texts. Unsupervised approaches have been studied to summarize texts without human-written summaries. However, recent unsupervised models are extractive, which remove words from texts and thus they are less flexible than abstractive summarization. In this work, we devise an abstractive model based on reinforcement learning without ground-truth summaries. We formulate the unsupervised summarization based on the Markov decision process with rewards representing the summary quality. To further enhance the summary quality, we develop a multi-summary learning mechanism that generates multiple summaries with varying lengths for a given text, while making the summaries mutually enhance each other. Experimental results show that the proposed model substantially outperforms both abstractive and extractive models, yet frequently generating new words not contained in input texts., Comment: Findings of EMNLP 2022
- Published
- 2022
117. DuNST: Dual Noisy Self Training for Semi-Supervised Controllable Text Generation
- Author
-
Feng, Yuxi, Yi, Xiaoyuan, Wang, Xiting, Lakshmanan, Laks V. S., and Xie, Xing
- Subjects
Computer Science - Computation and Language - Abstract
Self-training (ST) has prospered again in language understanding by augmenting the fine-tuning of pre-trained language models when labeled data is insufficient. However, it remains challenging to incorporate ST into attribute-controllable language generation. Augmented by only self-generated pseudo text, generation models over-emphasize exploitation of the previously learned space, suffering from a constrained generalization boundary. We revisit ST and propose a novel method, DuNST to alleviate this problem. DuNST jointly models text generation and classification with a shared Variational AutoEncoder and corrupts the generated pseudo text by two kinds of flexible noise to disturb the space. In this way, our model could construct and utilize both pseudo text from given labels and pseudo labels from available unlabeled text, which are gradually refined during the ST process. We theoretically demonstrate that DuNST can be regarded as enhancing exploration towards the potential real text space, providing a guarantee of improved performance. Experiments on three controllable generation tasks show that DuNST could significantly boost control accuracy while maintaining comparable generation fluency and diversity against several strong baselines.
- Published
- 2022
118. CDSM: Cascaded Deep Semantic Matching on Textual Graphs Leveraging Ad-hoc Neighbor Selection
- Author
-
Yao, Jing, Liu, Zheng, Yang, Junhan, Dou, Zhicheng, Xie, Xing, and Wen, Ji-Rong
- Subjects
Computer Science - Information Retrieval - Abstract
Deep semantic matching aims to discriminate the relationship between documents based on deep neural networks. In recent years, it becomes increasingly popular to organize documents with a graph structure, then leverage both the intrinsic document features and the extrinsic neighbor features to derive discrimination. Most of the existing works mainly care about how to utilize the presented neighbors, whereas limited effort is made to filter appropriate neighbors. We argue that the neighbor features could be highly noisy and partially useful. Thus, a lack of effective neighbor selection will not only incur a great deal of unnecessary computation cost, but also restrict the matching accuracy severely. In this work, we propose a novel framework, Cascaded Deep Semantic Matching (CDSM), for accurate and efficient semantic matching on textual graphs. CDSM is highlighted for its two-stage workflow. In the first stage, a lightweight CNN-based ad-hod neighbor selector is deployed to filter useful neighbors for the matching task with a small computation cost. We design both one-step and multi-step selection methods. In the second stage, a high-capacity graph-based matching network is employed to compute fine-grained relevance scores based on the well-selected neighbors. It is worth noting that CDSM is a generic framework which accommodates most of the mainstream graph-based semantic matching networks. The major challenge is how the selector can learn to discriminate the neighbors usefulness which has no explicit labels. To cope with this problem, we design a weak-supervision strategy for optimization, where we train the graph-based matching network at first and then the ad-hoc neighbor selector is learned on top of the annotations from the matching network.
- Published
- 2022
119. Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes
- Author
-
Jin, Yiqiao, Wang, Xiting, Hao, Yaru, Sun, Yizhou, and Xie, Xing
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
In this paper, we move towards combining large parametric models with non-parametric prototypical networks. We propose prototypical fine-tuning, a novel prototypical framework for fine-tuning pretrained language models (LM), which automatically learns a bias to improve predictive performance for varying data sizes, especially low-resource settings. Our prototypical fine-tuning approach can automatically adjust the model capacity according to the number of data points and the model's inherent attributes. Moreover, we propose four principles for effective prototype fine-tuning towards the optimal solution. Experimental results across various datasets show that our work achieves significant performance improvements under various low-resource settings, as well as comparable and usually better performances in high-resource scenarios., Comment: Published as a conference paper at AAAI 2023
- Published
- 2022
120. An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning
- Author
-
Chen, Hao, Fan, Yue, Wang, Yidong, Wang, Jindong, Schiele, Bernt, Xie, Xing, Savvides, Marios, and Raj, Bhiksha
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Semi-supervised learning (SSL) has shown great promise in leveraging unlabeled data to improve model performance. While standard SSL assumes uniform data distribution, we consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data. Although there are existing endeavors to tackle this challenge, their performance degenerates when facing severe imbalance since they can not reduce the class imbalance sufficiently and effectively. In this paper, we study a simple yet overlooked baseline -- SimiS -- which tackles data imbalance by simply supplementing labeled data with pseudo-labels, according to the difference in class distribution from the most frequent class. Such a simple baseline turns out to be highly effective in reducing class imbalance. It outperforms existing methods by a significant margin, e.g., 12.8%, 13.6%, and 16.7% over previous SOTA on CIFAR100-LT, FOOD101-LT, and ImageNet127 respectively. The reduced imbalance results in faster convergence and better pseudo-label accuracy of SimiS. The simplicity of our method also makes it possible to be combined with other re-balancing techniques to improve the performance further. Moreover, our method shows great robustness to a wide range of data distributions, which holds enormous potential in practice. Code will be publicly available., Comment: Issues in the paper, will re-open later
- Published
- 2022
121. GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective
- Author
-
Yang, Linyi, Zhang, Shuibai, Qin, Libo, Li, Yafu, Wang, Yidong, Liu, Hanmeng, Wang, Jindong, Xie, Xing, and Zhang, Yue
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Performance - Abstract
Pre-trained language models (PLMs) are known to improve the generalization performance of natural language understanding models by leveraging large amounts of data during the pre-training phase. However, the out-of-distribution (OOD) generalization problem remains a challenge in many NLP tasks, limiting the real-world deployment of these methods. This paper presents the first attempt at creating a unified benchmark named GLUE-X for evaluating OOD robustness in NLP models, highlighting the importance of OOD robustness and providing insights on how to measure the robustness of a model and how to improve it. The benchmark includes 13 publicly available datasets for OOD testing, and evaluations are conducted on 8 classic NLP tasks over 21 popularly used PLMs, including GPT-3 and GPT-3.5. Our findings confirm the need for improved OOD accuracy in NLP tasks, as significant performance degradation was observed in all settings compared to in-distribution (ID) accuracy., Comment: Accepted to ACL-23 Findings
- Published
- 2022
122. Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention
- Author
-
Li, Wenhao, Yi, Xiaoyuan, Hu, Jinyi, Sun, Maosong, and Xie, Xing
- Subjects
Computer Science - Computation and Language - Abstract
Recently, powerful Transformer architectures have proven superior in generating high-quality sentences. Nevertheless, these models tend to produce dull high-frequency phrases, severely hurting the diversity and novelty of generated text. In this work, we dig into the intrinsic mechanism of this problem and found that sparser attention values in Transformer could improve diversity. To understand such a phenomenon, we first conduct both empirical and theoretical analysis and then attribute it to representation degeneration caused by the attentive mixture of the hidden states during training. We term this process the Trap of Mediocrity. To escape from such a trap, we introduce a novel attention regularization loss to control the sharpness of the attention distribution, which is transparent to model structures and can be easily implemented within 20 lines of python code. We prove that this method could be mathematically regarded as learning a Bayesian approximation of posterior attention. Experiments show that our method improved the diversity and novelty of the generated text while maintaining comparable quality on a variety of conditional and unconditional generation tasks., Comment: Accepted by EMNLP 2022 Main Conference
- Published
- 2022
123. Robust Federated Learning against both Data Heterogeneity and Poisoning Attack via Aggregation Optimization
- Author
-
Xie, Yueqi, Zhang, Weizhong, Pi, Renjie, Wu, Fangzhao, Chen, Qifeng, Xie, Xing, and Kim, Sunghun
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Non-IID data distribution across clients and poisoning attacks are two main challenges in real-world federated learning (FL) systems. While both of them have attracted great research interest with specific strategies developed, no known solution manages to address them in a unified framework. To universally overcome both challenges, we propose SmartFL, a generic approach that optimizes the server-side aggregation process with a small amount of proxy data collected by the service provider itself via a subspace training technique. Specifically, the aggregation weight of each participating client at each round is optimized using the server-collected proxy data, which is essentially the optimization of the global model in the convex hull spanned by client models. Since at each round, the number of tunable parameters optimized on the server side equals the number of participating clients (thus independent of the model size), we are able to train a global model with massive parameters using only a small amount of proxy data (e.g., around one hundred samples). With optimized aggregation, SmartFL ensures robustness against both heterogeneous and malicious clients, which is desirable in real-world FL where either or both problems may occur. We provide theoretical analyses of the convergence and generalization capacity for SmartFL. Empirically, SmartFL achieves state-of-the-art performance on both FL with non-IID data distribution and FL with malicious clients. The source code will be released.
- Published
- 2022
124. FIXED: Frustratingly Easy Domain Generalization with Mixup
- Author
-
Lu, Wang, Wang, Jindong, Yu, Han, Huang, Lei, Zhang, Xiang, Chen, Yiqiang, and Xie, Xing
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Domain generalization (DG) aims to learn a generalizable model from multiple training domains such that it can perform well on unseen target domains. A popular strategy is to augment training data to benefit generalization through methods such as Mixup~\cite{zhang2018mixup}. While the vanilla Mixup can be directly applied, theoretical and empirical investigations uncover several shortcomings that limit its performance. Firstly, Mixup cannot effectively identify the domain and class information that can be used for learning invariant representations. Secondly, Mixup may introduce synthetic noisy data points via random interpolation, which lowers its discrimination capability. Based on the analysis, we propose a simple yet effective enhancement for Mixup-based DG, namely domain-invariant Feature mIXup (FIX). It learns domain-invariant representations for Mixup. To further enhance discrimination, we leverage existing techniques to enlarge margins among classes to further propose the domain-invariant Feature MIXup with Enhanced Discrimination (FIXED) approach. We present theoretical insights about guarantees on its effectiveness. Extensive experiments on seven public datasets across two modalities including image classification (Digits-DG, PACS, Office-Home) and time series (DSADS, PAMAP2, UCI-HAR, and USC-HAD) demonstrate that our approach significantly outperforms nine state-of-the-art related methods, beating the best performing baseline by 6.5\% on average in terms of test accuracy., Comment: Technical report; code for DG at: https://github.com/jindongwang/transferlearning/tree/master/code/DeepDG
- Published
- 2022
125. Learning on Large-scale Text-attributed Graphs via Variational Inference
- Author
-
Zhao, Jianan, Qu, Meng, Li, Chaozhuo, Yan, Hao, Liu, Qian, Li, Rui, Xie, Xing, and Tang, Jian
- Subjects
Computer Science - Machine Learning - Abstract
This paper studies learning on text-attributed graphs (TAGs), where each node is associated with a text description. An ideal solution for such a problem would be integrating both the text and graph structure information with large language models and graph neural networks (GNNs). However, the problem becomes very challenging when graphs are large due to the high computational complexity brought by training large language models and GNNs together. In this paper, we propose an efficient and effective solution to learning on large text-attributed graphs by fusing graph structure and language learning with a variational Expectation-Maximization (EM) framework, called GLEM. Instead of simultaneously training large language models and GNNs on big graphs, GLEM proposes to alternatively update the two modules in the E-step and M-step. Such a procedure allows training the two modules separately while simultaneously allowing the two modules to interact and mutually enhance each other. Extensive experiments on multiple data sets demonstrate the efficiency and effectiveness of the proposed approach., Comment: ICLR 2023
- Published
- 2022
126. Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in Transformer-Based Variational AutoEncoder for Diverse Text Generation
- Author
-
Hu, Jinyi, Yi, Xiaoyuan, Li, Wenhao, Sun, Maosong, and Xie, Xing
- Subjects
Computer Science - Computation and Language - Abstract
Variational Auto-Encoder (VAE) has been widely adopted in text generation. Among many variants, recurrent VAE learns token-wise latent variables with each conditioned on the preceding ones, which captures sequential variability better in the era of RNN. However, it is unclear how to incorporate such recurrent dynamics into the recently dominant Transformer due to its parallelism. In this work, we propose TRACE, a Transformer-based recurrent VAE structure. TRACE imposes recurrence on segment-wise latent variables with arbitrarily separated text segments and constructs the posterior distribution with residual parameterization. Besides, we design an acceleration method by approximating idempotent matrices, which allows parallelism while maintaining the conditional dependence of latent variables. We demonstrate that TRACE could enhance the entanglement of each segment and preceding latent variables and deduce a non-zero lower bound of the KL term, providing a theoretical guarantee of generation diversity. Experiments on two unconditional and one conditional generation tasks show that TRACE achieves significantly improved diversity while maintaining satisfactory generation quality., Comment: EMNLP 2022 Findings
- Published
- 2022
127. RAPO: An Adaptive Ranking Paradigm for Bilingual Lexicon Induction
- Author
-
Tian, Zhoujin, Li, Chaozhuo, Ren, Shuo, Zuo, Zhiqiang, Wen, Zengxuan, Hu, Xinyue, Han, Xiao, Huang, Haizhen, Deng, Denvy, Zhang, Qi, and Xie, Xing
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Bilingual lexicon induction induces the word translations by aligning independently trained word embeddings in two languages. Existing approaches generally focus on minimizing the distances between words in the aligned pairs, while suffering from low discriminative capability to distinguish the relative orders between positive and negative candidates. In addition, the mapping function is globally shared by all words, whose performance might be hindered by the deviations in the distributions of different languages. In this work, we propose a novel ranking-oriented induction model RAPO to learn personalized mapping function for each word. RAPO is capable of enjoying the merits from the unique characteristics of a single word and the cross-language isomorphism simultaneously. Extensive experimental results on public datasets including both rich-resource and low-resource languages demonstrate the superiority of our proposal. Our code is publicly available in \url{https://github.com/Jlfj345wf/RAPO}., Comment: 9 pages, accepted by EMNLP 2022
- Published
- 2022
128. Test-Time Training for Graph Neural Networks
- Author
-
Wang, Yiqi, Li, Chaozhuo, Jin, Wei, Li, Rui, Zhao, Jianan, Tang, Jiliang, and Xie, Xing
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Graph Neural Networks (GNNs) have made tremendous progress in the graph classification task. However, a performance gap between the training set and the test set has often been noticed. To bridge such gap, in this work we introduce the first test-time training framework for GNNs to enhance the model generalization capacity for the graph classification task. In particular, we design a novel test-time training strategy with self-supervised learning to adjust the GNN model for each test graph sample. Experiments on the benchmark datasets have demonstrated the effectiveness of the proposed framework, especially when there are distribution shifts between training set and test set. We have also conducted exploratory studies and theoretical analysis to gain deeper understandings on the rationality of the design of the proposed graph test time training framework (GT3).
- Published
- 2022
129. Effective and Efficient Query-aware Snippet Extraction for Web Search
- Author
-
Yi, Jingwei, Wu, Fangzhao, Wu, Chuhan, Huang, Xiaolong, Jiao, Binxing, Sun, Guangzhong, and Xie, Xing
- Subjects
Computer Science - Artificial Intelligence - Abstract
Query-aware webpage snippet extraction is widely used in search engines to help users better understand the content of the returned webpages before clicking. Although important, it is very rarely studied. In this paper, we propose an effective query-aware webpage snippet extraction method named DeepQSE, aiming to select a few sentences which can best summarize the webpage content in the context of input query. DeepQSE first learns query-aware sentence representations for each sentence to capture the fine-grained relevance between query and sentence, and then learns document-aware query-sentence relevance representations for snippet extraction. Since the query and each sentence are jointly modeled in DeepQSE, its online inference may be slow. Thus, we further propose an efficient version of DeepQSE, named Efficient-DeepQSE, which can significantly improve the inference speed of DeepQSE without affecting its performance. The core idea of Efficient-DeepQSE is to decompose the query-aware snippet extraction task into two stages, i.e., a coarse-grained candidate sentence selection stage where sentence representations can be cached, and a fine-grained relevance modeling stage. Experiments on two real-world datasets validate the effectiveness and efficiency of our methods., Comment: Accepted by EMNLP2022
- Published
- 2022
130. Self-explaining deep models with logic rule reasoning
- Author
-
Lee, Seungeon, Wang, Xiting, Han, Sungwon, Yi, Xiaoyuan, Xie, Xing, and Cha, Meeyoung
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Logic in Computer Science - Abstract
We present SELOR, a framework for integrating self-explaining capabilities into a given deep model to achieve both high prediction performance and human precision. By "human precision", we refer to the degree to which humans agree with the reasons models provide for their predictions. Human precision affects user trust and allows users to collaborate closely with the model. We demonstrate that logic rule explanations naturally satisfy human precision with the expressive power required for good predictive performance. We then illustrate how to enable a deep model to predict and explain with logic rules. Our method does not require predefined logic rule sets or human annotations and can be learned efficiently and easily with widely-used deep learning modules in a differentiable way. Extensive experiments show that our method gives explanations closer to human decision logic than other methods while maintaining the performance of deep learning models., Comment: 26 pages including reference, checklist, and appendix. Accepted in NeurIPS 2022
- Published
- 2022
131. Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization
- Author
-
Yang, Zonghan, Yi, Xiaoyuan, Li, Peng, Liu, Yang, and Xie, Xing
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Warning: this paper contains model outputs exhibiting offensiveness and biases. Recently pre-trained language models (PLMs) have prospered in various natural language generation (NLG) tasks due to their ability to generate fairly fluent text. Nevertheless, these models are observed to capture and reproduce harmful contents in training corpora, typically toxic language and social biases, raising severe moral issues. Prior works on ethical NLG tackle detoxifying and debiasing separately, which is problematic since we find debiased models still exhibit toxicity while detoxified ones even exacerbate social biases. To address such a challenge, we propose the first unified framework of detoxifying and debiasing called UDDIA, which jointly formalizes these two problems as rectifying the output space. We theoretically interpret our framework as learning a text distribution mixing weighted attributes. Besides, UDDIA conducts adaptive optimization of only a few parameters during decoding based on a parameter-efficient tuning schema without any training data. This leads to minimal generation quality loss and improved rectification performance with acceptable computational cost. Experimental results demonstrate that compared to several strong baselines, UDDIA achieves debiasing and detoxifying simultaneously and better balances efficiency and effectiveness, taking a further step towards practical ethical NLG., Comment: Accepted at ICLR 2023
- Published
- 2022
132. Out-of-Distribution Representation Learning for Time Series Classification
- Author
-
Lu, Wang, Wang, Jindong, Sun, Xinwei, Chen, Yiqiang, and Xie, Xing
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Time series classification is an important problem in real world. Due to its non-stationary property that the distribution changes over time, it remains challenging to build models for generalization to unseen distributions. In this paper, we propose to view the time series classification problem from the distribution perspective. We argue that the temporal complexity attributes to the unknown latent distributions within. To this end, we propose DIVERSIFY to learn generalized representations for time series classification. DIVERSIFY takes an iterative process: it first obtains the worst-case distribution scenario via adversarial training, then matches the distributions of the obtained sub-domains. We also present some theoretical insights. We conduct experiments on gesture recognition, speech commands recognition, wearable stress and affect detection, and sensor-based human activity recognition with a total of seven datasets in different settings. Results demonstrate that DIVERSIFY significantly outperforms other baselines and effectively characterizes the latent distributions by qualitative and quantitative analysis. Code is available at: https://github.com/microsoft/robustlearn., Comment: ICLR 2023 camera-ready version; code is at: https://github.com/microsoft/robustlearn
- Published
- 2022
133. Interface characteristics and mechanical properties of titanium/aluminum composites with an interlayer fabricated by explosive welding
- Author
-
Yuan, Jia-xin, Shao, Fei, Bai, Lin-yue, Zhang, Hong-wei, Xu, Qian, Gao, Lei, Xie, Xing-kun, and Pan, Yu
- Published
- 2024
- Full Text
- View/download PDF
134. Exploring Vision-Language Models for Imbalanced Learning
- Author
-
Wang, Yidong, Yu, Zhuohao, Wang, Jindong, Heng, Qiang, Chen, Hao, Ye, Wei, Xie, Rui, Xie, Xing, and Zhang, Shikun
- Published
- 2024
- Full Text
- View/download PDF
135. Seismic response characteristics and performance improvement of near-fault continuous rigid-frame bridges
- Author
-
Li, Jun, Xu, Long-He, and Xie, Xing-Si
- Published
- 2024
- Full Text
- View/download PDF
136. Defending ChatGPT against jailbreak attack via self-reminders
- Author
-
Xie, Yueqi, Yi, Jingwei, Shao, Jiawei, Curl, Justin, Lyu, Lingjuan, Chen, Qifeng, Xie, Xing, and Wu, Fangzhao
- Published
- 2023
- Full Text
- View/download PDF
137. Avulsion fracture is associated with more pain after anatomic repair procedure for ATFL injury at the talar side
- Author
-
Xiong, Shikai, Xie, Xing, Shi, Weili, Yang, Shuai, Zhang, Keying, Pi, Yanbin, Chen, Linxin, Jiang, Dong, Hu, Yuelin, Jiao, Chen, and Guo, Qinwei
- Published
- 2023
- Full Text
- View/download PDF
138. Towards Optimization and Model Selection for Domain Generalization: A Mixup-guided Solution
- Author
-
Lu, Wang, Wang, Jindong, Wang, Yidong, and Xie, Xing
- Subjects
Computer Science - Machine Learning - Abstract
The distribution shifts between training and test data typically undermine the performance of models. In recent years, lots of work pays attention to domain generalization (DG) where distribution shifts exist, and target data are unseen. Despite the progress in algorithm design, two foundational factors have long been ignored: 1) the optimization for regularization-based objectives, and 2) the model selection for DG since no knowledge about the target domain can be utilized. In this paper, we propose Mixup guided optimization and selection techniques for DG. For optimization, we utilize an adapted Mixup to generate an out-of-distribution dataset that can guide the preference direction and optimize with Pareto optimization. For model selection, we generate a validation dataset with a closer distance to the target distribution, and thereby it can better represent the target data. We also present some theoretical insights behind our proposals. Comprehensive experiments demonstrate that our model optimization and selection techniques can largely improve the performance of existing domain generalization algorithms and even achieve new state-of-the-art results., Comment: Accepted by SIAM International Conference on Data Mining (SDM) 2024
- Published
- 2022
139. Domain-Specific Risk Minimization for Out-of-Distribution Generalization
- Author
-
Zhang, Yi-Fan, Wang, Jindong, Liang, Jian, Zhang, Zhang, Yu, Baosheng, Wang, Liang, Tao, Dacheng, and Xie, Xing
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Recent domain generalization (DG) approaches typically use the hypothesis learned on source domains for inference on the unseen target domain. However, such a hypothesis can be arbitrarily far from the optimal one for the target domain, induced by a gap termed ``adaptivity gap''. Without exploiting the domain information from the unseen test samples, adaptivity gap estimation and minimization are intractable, which hinders us to robustify a model to any unknown distribution. In this paper, we first establish a generalization bound that explicitly considers the adaptivity gap. Our bound motivates two strategies to reduce the gap: the first one is ensembling multiple classifiers to enrich the hypothesis space, then we propose effective gap estimation methods for guiding the selection of a better hypothesis for the target. The other method is minimizing the gap directly by adapting model parameters using online target samples. We thus propose \textbf{Domain-specific Risk Minimization (DRM)}. During training, DRM models the distributions of different source domains separately; for inference, DRM performs online model steering using the source hypothesis for each arriving target sample. Extensive experiments demonstrate the effectiveness of the proposed DRM for domain generalization with the following advantages: 1) it significantly outperforms competitive baselines on different distributional shift settings; 2) it achieves either comparable or superior accuracies on all source domains compared to vanilla empirical risk minimization; 3) it remains simple and efficient during training, and 4) it is complementary to invariant learning approaches., Comment: 9 papers for the main paper, 2 pages for the appendix, published in SIGKDD 2023
- Published
- 2022
140. USB: A Unified Semi-supervised Learning Benchmark for Classification
- Author
-
Wang, Yidong, Chen, Hao, Fan, Yue, Sun, Wang, Tao, Ran, Hou, Wenxin, Wang, Renjie, Yang, Linyi, Zhou, Zhi, Guo, Lan-Zhe, Qi, Heli, Wu, Zhen, Li, Yu-Feng, Nakamura, Satoshi, Ye, Wei, Savvides, Marios, Raj, Bhiksha, Shinozaki, Takahiro, Schiele, Bernt, Wang, Jindong, Xie, Xing, and Zhang, Yue
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained to computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issues, we construct a Unified SSL Benchmark (USB) for classification by selecting 15 diverse, challenging, and comprehensive tasks from CV, natural language processing (NLP), and audio processing (Audio), on which we systematically evaluate the dominant SSL methods, and also open-source a modular and extensible codebase for fair evaluation of these SSL methods. We further provide the pre-trained versions of the state-of-the-art neural models for CV tasks to make the cost affordable for further tuning. USB enables the evaluation of a single SSL algorithm on more tasks from multiple domains but with less cost. Specifically, on a single NVIDIA V100, only 39 GPU days are required to evaluate FixMatch on 15 tasks in USB while 335 GPU days (279 GPU days on 4 CV datasets except for ImageNet) are needed on 5 CV tasks with TorchSSL., Comment: Accepted by NeurIPS'22 dataset and benchmark track; code at https://github.com/microsoft/Semi-supervised-learning
- Published
- 2022
141. Equivariant Disentangled Transformation for Domain Generalization under Combination Shift
- Author
-
Zhang, Yivan, Wang, Jindong, Xie, Xing, and Sugiyama, Masashi
- Subjects
Computer Science - Machine Learning - Abstract
Machine learning systems may encounter unexpected problems when the data distribution changes in the deployment environment. A major reason is that certain combinations of domains and labels are not observed during training but appear in the test environment. Although various invariance-based algorithms can be applied, we find that the performance gain is often marginal. To formally analyze this issue, we provide a unique algebraic formulation of the combination shift problem based on the concepts of homomorphism, equivariance, and a refined definition of disentanglement. The algebraic requirements naturally derive a simple yet effective method, referred to as equivariant disentangled transformation (EDT), which augments the data based on the algebraic structures of labels and makes the transformation satisfy the equivariance and disentanglement requirements. Experimental results demonstrate that invariance may be insufficient, and it is important to exploit the equivariance structure in the combination shift problem.
- Published
- 2022
142. Geometric Interaction Augmented Graph Collaborative Filtering
- Author
-
Zhang, Yiding, Li, Chaozhuo, Wang, Senzhang, Lian, Jianxun, and Xie, Xing
- Subjects
Computer Science - Information Retrieval - Abstract
Graph-based collaborative filtering is capable of capturing the essential and abundant collaborative signals from the high-order interactions, and thus received increasingly research interests. Conventionally, the embeddings of users and items are defined in the Euclidean spaces, along with the propagation on the interaction graphs. Meanwhile, recent works point out that the high-order interactions naturally form up the tree-likeness structures, which the hyperbolic models thrive on. However, the interaction graphs inherently exhibit the hybrid and nested geometric characteristics, while the existing single geometry-based models are inadequate to fully capture such sophisticated topological patterns. In this paper, we propose to model the user-item interactions in a hybrid geometric space, in which the merits of Euclidean and hyperbolic spaces are simultaneously enjoyed to learn expressive representations. Experimental results on public datasets validate the effectiveness of our proposal.
- Published
- 2022
143. Domain-invariant Feature Exploration for Domain Generalization
- Author
-
Lu, Wang, Wang, Jindong, Li, Haoliang, Chen, Yiqiang, and Xie, Xing
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep learning has achieved great success in the past few years. However, the performance of deep learning is likely to impede in face of non-IID situations. Domain generalization (DG) enables a model to generalize to an unseen test distribution, i.e., to learn domain-invariant representations. In this paper, we argue that domain-invariant features should be originating from both internal and mutual sides. Internal invariance means that the features can be learned with a single domain and the features capture intrinsic semantics of data, i.e., the property within a domain, which is agnostic to other domains. Mutual invariance means that the features can be learned with multiple domains (cross-domain) and the features contain common information, i.e., the transferable features w.r.t. other domains. We then propose DIFEX for Domain-Invariant Feature EXploration. DIFEX employs a knowledge distillation framework to capture the high-level Fourier phase as the internally-invariant features and learn cross-domain correlation alignment as the mutually-invariant features. We further design an exploration loss to increase the feature diversity for better generalization. Extensive experiments on both time-series and visual benchmarks demonstrate that the proposed DIFEX achieves state-of-the-art performance., Comment: Accepted by Transactions on Machine Learning Research (TMLR) 2022; 20 pages; code: https://github.com/jindongwang/transferlearning/tree/master/code/DeepDG
- Published
- 2022
144. FedX: Unsupervised Federated Learning with Cross Knowledge Distillation
- Author
-
Han, Sungwon, Park, Sungwon, Wu, Fangzhao, Kim, Sundong, Wu, Chuhan, Xie, Xing, and Cha, Meeyoung
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
This paper presents FedX, an unsupervised federated learning framework. Our model learns unbiased representation from decentralized and heterogeneous local data. It employs a two-sided knowledge distillation with contrastive learning as a core component, allowing the federated system to function without requiring clients to share any data features. Furthermore, its adaptable architecture can be used as an add-on module for existing unsupervised algorithms in federated settings. Experiments show that our model improves performance significantly (1.58--5.52pp) on five unsupervised algorithms., Comment: Accepted and will be published at ECCV2022
- Published
- 2022
145. Fuse It More Deeply! A Variational Transformer with Layer-Wise Latent Variable Inference for Text Generation
- Author
-
Hu, Jinyi, Yi, Xiaoyuan, Li, Wenhao, Sun, Maosong, and Xie, Xing
- Subjects
Computer Science - Computation and Language - Abstract
The past several years have witnessed Variational Auto-Encoder's superiority in various text generation tasks. However, due to the sequential nature of the text, auto-regressive decoders tend to ignore latent variables and then reduce to simple language models, known as the KL vanishing problem, which would further deteriorate when VAE is combined with Transformer-based structures. To ameliorate this problem, we propose DELLA, a novel variational Transformer framework. DELLA learns a series of layer-wise latent variables with each inferred from those of lower layers and tightly coupled with the hidden states by low-rank tensor product. In this way, DELLA forces these posterior latent variables to be fused deeply with the whole computation path and hence incorporate more information. We theoretically demonstrate that our method can be regarded as entangling latent variables to avoid posterior information decrease through layers, enabling DELLA to get higher non-zero KL values even without any annealing or thresholding tricks. Experiments on four unconditional and three conditional generation tasks show that DELLA could better alleviate KL vanishing and improve both quality and diversity compared to several strong baselines., Comment: NAACL 2022
- Published
- 2022
146. Cooperative Retriever and Ranker in Deep Recommenders
- Author
-
Huang, Xu, Lian, Defu, Chen, Jin, Liu, Zheng, Xie, Xing, and Chen, Enhong
- Subjects
Computer Science - Information Retrieval ,Computer Science - Machine Learning - Abstract
Deep recommender systems (DRS) are intensively applied in modern web services. To deal with the massive web contents, DRS employs a two-stage workflow: retrieval and ranking, to generate its recommendation results. The retriever aims to select a small set of relevant candidates from the entire items with high efficiency; while the ranker, usually more precise but time-consuming, is supposed to further refine the best items from the retrieved candidates. Traditionally, the two components are trained either independently or within a simple cascading pipeline, which is prone to poor collaboration effect. Though some latest works suggested to train retriever and ranker jointly, there still exist many severe limitations: item distribution shift between training and inference, false negative, and misalignment of ranking order. As such, it remains to explore effective collaborations between retriever and ranker., Comment: 12pages, 4 figures, WWW'23
- Published
- 2022
- Full Text
- View/download PDF
147. Two-Stage Neural Contextual Bandits for Personalised News Recommendation
- Author
-
Zhang, Mengyan, Nguyen-Tang, Thanh, Wu, Fangzhao, He, Zhenyu, Xie, Xing, and Ong, Cheng Soon
- Subjects
Computer Science - Information Retrieval ,Computer Science - Machine Learning - Abstract
We consider the problem of personalised news recommendation where each user consumes news in a sequential fashion. Existing personalised news recommendation methods focus on exploiting user interests and ignores exploration in recommendation, which leads to biased feedback loops and hurt recommendation quality in the long term. We build on contextual bandits recommendation strategies which naturally address the exploitation-exploration trade-off. The main challenges are the computational efficiency for exploring the large-scale item space and utilising the deep representations with uncertainty. We propose a two-stage hierarchical topic-news deep contextual bandits framework to efficiently learn user preferences when there are many news items. We use deep learning representations for users and news, and generalise the neural upper confidence bound (UCB) policies to generalised additive UCB and bilinear UCB. Empirical results on a large-scale news recommendation dataset show that our proposed policies are efficient and outperform the baseline bandit policies.
- Published
- 2022
148. Efficiently Leveraging Multi-level User Intent for Session-based Recommendation via Atten-Mixer Network
- Author
-
Zhang, Peiyan, Guo, Jiayan, Li, Chaozhuo, Xie, Yueqi, Kim, Jaeboum, Zhang, Yan, Xie, Xing, Wang, Haohan, and Kim, Sunghun
- Subjects
Computer Science - Information Retrieval ,H.3.3 - Abstract
Session-based recommendation (SBR) aims to predict the user's next action based on short and dynamic sessions. Recently, there has been an increasing interest in utilizing various elaborately designed graph neural networks (GNNs) to capture the pair-wise relationships among items, seemingly suggesting the design of more complicated models is the panacea for improving the empirical performance. However, these models achieve relatively marginal improvements with exponential growth in model complexity. In this paper, we dissect the classical GNN-based SBR models and empirically find that some sophisticated GNN propagations are redundant, given the readout module plays a significant role in GNN-based models. Based on this observation, we intuitively propose to remove the GNN propagation part, while the readout module will take on more responsibility in the model reasoning process. To this end, we propose the Multi-Level Attention Mixture Network (Atten-Mixer), which leverages both concept-view and instance-view readouts to achieve multi-level reasoning over item transitions. As simply enumerating all possible high-level concepts is infeasible for large real-world recommender systems, we further incorporate SBR-related inductive biases, i.e., local invariance and inherent priority to prune the search space. Experiments on three benchmarks demonstrate the effectiveness and efficiency of our proposal. We also have already launched the proposed techniques to a large-scale e-commercial online service since April 2021, with significant improvements of top-tier business metrics demonstrated in the online experiments on live traffic.
- Published
- 2022
- Full Text
- View/download PDF
149. Evolutionary Preference Learning via Graph Nested GRU ODE for Session-based Recommendation
- Author
-
Guo, Jiayan, Zhang, Peiyan, Li, Chaozhuo, Xie, Xing, Zhang, Yan, and Kim, Sunghun
- Subjects
Computer Science - Information Retrieval ,H.3.3 - Abstract
Session-based recommendation (SBR) aims to predict the user next action based on the ongoing sessions. Recently, there has been an increasing interest in modeling the user preference evolution to capture the fine-grained user interests. While latent user preferences behind the sessions drift continuously over time, most existing approaches still model the temporal session data in discrete state spaces, which are incapable of capturing the fine-grained preference evolution and result in sub-optimal solutions. To this end, we propose Graph Nested GRU ordinary differential equation (ODE), namely GNG-ODE, a novel continuum model that extends the idea of neural ODEs to continuous-time temporal session graphs. The proposed model preserves the continuous nature of dynamic user preferences, encoding both temporal and structural patterns of item transitions into continuous-time dynamic embeddings. As the existing ODE solvers do not consider graph structure change and thus cannot be directly applied to the dynamic graph, we propose a time alignment technique, called t-Alignment, to align the updating time steps of the temporal session graphs within a batch. Empirical results on three benchmark datasets show that GNG-ODE significantly outperforms other baselines., Comment: Under Review
- Published
- 2022
- Full Text
- View/download PDF
150. MetaFed: Federated Learning among Federations with Cyclic Knowledge Distillation for Personalized Healthcare
- Author
-
Chen, Yiqiang, Lu, Wang, Qin, Xin, Wang, Jindong, and Xie, Xing
- Subjects
Computer Science - Machine Learning ,Computer Science - Computers and Society - Abstract
Federated learning has attracted increasing attention to building models without accessing the raw user data, especially in healthcare. In real applications, different federations can seldom work together due to possible reasons such as data heterogeneity and distrust/inexistence of the central server. In this paper, we propose a novel framework called MetaFed to facilitate trustworthy FL between different federations. MetaFed obtains a personalized model for each federation without a central server via the proposed Cyclic Knowledge Distillation. Specifically, MetaFed treats each federation as a meta distribution and aggregates knowledge of each federation in a cyclic manner. The training is split into two parts: common knowledge accumulation and personalization. Comprehensive experiments on three benchmarks demonstrate that MetaFed without a server achieves better accuracy compared to state-of-the-art methods (e.g., 10%+ accuracy improvement compared to the baseline for PAMAP2) with fewer communication costs., Comment: Accepted by IEEE Trans. on Neural Networks and Learning Systems (TNNLS); IJCAI'22 FTL workshop innovation award; code at https://github.com/microsoft/PersonalizedFL
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.