Author: "Hu, Songlin" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Hu, Songlin"' showing total 702 results

Start Over Author "Hu, Songlin"

702 results on '"Hu, Songlin"'

1. CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts

Author: Su, Zhenpeng, Wu, Xing, Lin, Zijia, Xiong, Yizhe, Lv, Minxuan, Ma, Guangyuan, Chen, Hui, Hu, Songlin, and Ding, Guiguang
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Large language models (LLM) have been attracting much attention from the community recently, due to their remarkable performance in all kinds of downstream tasks. According to the well-known scaling law, scaling up a dense LLM enhances its capabilities, but also significantly increases the computational complexity. Mixture-of-Experts (MoE) models address that by allowing the model size to grow without substantially raising training or inference costs. Yet MoE models face challenges regarding knowledge sharing among experts, making their performance somehow sensitive to routing accuracy. To tackle that, previous works introduced shared experts and combined their outputs with those of the top $K$ routed experts in an ``addition'' manner. In this paper, inspired by collective matrix factorization to learn shared knowledge among data, we propose CartesianMoE, which implements more effective knowledge sharing among experts in more like a ``multiplication'' manner. Extensive experimental results indicate that CartesianMoE outperforms previous MoE models for building LLMs, in terms of both perplexity and downstream task performance. And we also find that CartesianMoE achieves better expert routing robustness.
Published: 2024

2. CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning

Author: Yu, Huimu, Wu, Xing, Yin, Weidong, Zhang, Debing, and Hu, Songlin
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large language models (LLMs) have made significant progress in natural language understanding and generation, driven by scalable pretraining and advanced finetuning. However, enhancing reasoning abilities in LLMs, particularly via reinforcement learning from human feedback (RLHF), remains challenging due to the scarcity of high-quality preference data, which is labor-intensive to annotate and crucial for reward model (RM) finetuning. To alleviate this issue, we introduce CodePMP, a scalable preference model pretraining (PMP) pipeline that utilizes a large corpus of synthesized code-preference pairs from publicly available high-quality source code. CodePMP improves RM finetuning efficiency by pretraining preference models on large-scale synthesized code-preference pairs. We evaluate CodePMP on mathematical reasoning tasks (GSM8K, MATH) and logical reasoning tasks (ReClor, LogiQA2.0), consistently showing significant improvements in reasoning performance of LLMs and highlighting the importance of scalable preference model pretraining for efficient reward modeling., Comment: work in progress
Published: 2024

3. Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction

Author: Zhang, Jinchuan, Zhou, Yan, Liu, Yaxin, Li, Ziming, and Hu, Songlin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: Automated red teaming is an effective method for identifying misaligned behaviors in large language models (LLMs). Existing approaches, however, often focus primarily on improving attack success rates while overlooking the need for comprehensive test case coverage. Additionally, most of these methods are limited to single-turn red teaming, failing to capture the multi-turn dynamics of real-world human-machine interactions. To overcome these limitations, we propose HARM (Holistic Automated Red teaMing), which scales up the diversity of test cases using a top-down approach based on an extensible, fine-grained risk taxonomy. Our method also leverages a novel fine-tuning strategy and reinforcement learning techniques to facilitate multi-turn adversarial probing in a human-like manner. Experimental results demonstrate that our framework enables a more systematic understanding of model vulnerabilities and offers more targeted guidance for the alignment process., Comment: EMNLP 2024 camera ready version
Published: 2024

4. AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs

Author: Lv, Lijia, Zhang, Weigang, Tang, Xuehai, Wen, Jie, Liu, Feng, Han, Jizhong, and Hu, Songlin
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Jailbreak vulnerabilities in Large Language Models (LLMs) refer to methods that extract malicious content from the model by carefully crafting prompts or suffixes, which has garnered significant attention from the research community. However, traditional attack methods, which primarily focus on the semantic level, are easily detected by the model. These methods overlook the difference in the model's alignment protection capabilities at different output stages. To address this issue, we propose an adaptive position pre-fill jailbreak attack approach for executing jailbreak attacks on LLMs. Our method leverages the model's instruction-following capabilities to first output pre-filled safe content, then exploits its narrative-shifting abilities to generate harmful content. Extensive black-box experiments demonstrate our method can improve the attack success rate by 47% on the widely recognized secure model (Llama2) compared to existing approaches. Our code can be found at: https://github.com/Yummy416/AdaPPA.
Published: 2024

5. Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval

Author: Ma, Guangyuan, Ma, Yongliang, Wu, Xing, Su, Zhenpeng, Zhou, Ming, and Hu, Songlin
Subjects: Computer Science - Information Retrieval
Abstract: Large Language Model-based Dense Retrieval (LLM-DR) optimizes over numerous heterogeneous fine-tuning collections from different domains. However, the discussion about its training data distribution is still minimal. Previous studies rely on empirically assigned dataset choices or sampling ratios, which inevitably leads to sub-optimal retrieval performances. In this paper, we propose a new task-level Distributionally Robust Optimization (tDRO) algorithm for LLM-DR fine-tuning, targeted at improving the universal domain generalization ability by end-to-end reweighting the data distribution of each task. The tDRO parameterizes the domain weights and updates them with scaled domain gradients. The optimized weights are then transferred to the LLM-DR fine-tuning to train more robust retrievers. Experiments show optimal improvements in large-scale retrieval benchmarks and reduce up to 30% dataset usage after applying our optimization algorithm with a series of different-sized LLM-DR models.
Published: 2024

6. Multi-source Knowledge Enhanced Graph Attention Networks for Multimodal Fact Verification

Author: Cao, Han, Wei, Lingwei, Zhou, Wei, and Hu, Songlin
Subjects: Computer Science - Multimedia
Abstract: Multimodal fact verification is an under-explored and emerging field that has gained increasing attention in recent years. The goal is to assess the veracity of claims that involve multiple modalities by analyzing the retrieved evidence. The main challenge in this area is to effectively fuse features from different modalities to learn meaningful multimodal representations. To this end, we propose a novel model named Multi-Source Knowledge-enhanced Graph Attention Network (MultiKE-GAT). MultiKE-GAT introduces external multimodal knowledge from different sources and constructs a heterogeneous graph to capture complex cross-modal and cross-source interactions. We exploit a Knowledge-aware Graph Fusion (KGF) module to learn knowledge-enhanced representations for each claim and evidence and eliminate inconsistencies and noises introduced by redundant entities. Experiments on two public benchmark datasets demonstrate that our model outperforms other comparison methods, showing the effectiveness and superiority of the proposed model., Comment: Accepted by ICME 2024
Published: 2024

7. Transferring Structure Knowledge: A New Task to Fake news Detection Towards Cold-Start Propagation

Author: Wei, Lingwei, Hu, Dou, Zhou, Wei, and Hu, Songlin
Subjects: Computer Science - Social and Information Networks, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Many fake news detection studies have achieved promising performance by extracting effective semantic and structure features from both content and propagation trees. However, it is challenging to apply them to practical situations, especially when using the trained propagation-based models to detect news with no propagation data. Towards this scenario, we study a new task named cold-start fake news detection, which aims to detect content-only samples with missing propagation. To achieve the task, we design a simple but effective Structure Adversarial Net (SAN) framework to learn transferable features from available propagation to boost the detection of content-only samples. SAN introduces a structure discriminator to estimate dissimilarities among learned features with and without propagation, and further learns structure-invariant features to enhance the generalization of existing propagation-based methods for content-only samples. We conduct qualitative and quantitative experiments on three datasets. Results show the challenge of the new task and the effectiveness of our SAN framework., Comment: ICASSP 2024
Published: 2024

8. MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts

Author: Su, Zhenpeng, Lin, Zijia, Bai, Xue, Wu, Xing, Xiong, Yizhe, Lian, Haoran, Ma, Guangyuan, Chen, Hui, Ding, Guiguang, Zhou, Wei, and Hu, Songlin
Subjects: Computer Science - Computation and Language
Abstract: Scaling the size of a model enhances its capabilities but significantly increases computation complexity. Mixture-of-Experts models (MoE) address the issue by allowing model size to scale up without substantially increasing training or inference costs. In MoE, there is an important module called the router, which is used to distribute each token to the experts. Currently, the mainstream routing methods include dynamic routing and fixed routing. Despite their promising results, MoE models encounter several challenges. Primarily, for dynamic routing methods, the dispersion of training tokens across multiple experts can lead to underfitting, particularly for infrequent tokens. Additionally, though fixed routing methods can mitigate that issue, they compromise on the diversity of representations. In this paper, we propose \textbf{MaskMoE}, a method designed to enhance token-level learning by employing a routing \textbf{mask}ing technique within the \textbf{M}ixture-\textbf{o}f-\textbf{E}xperts model. MaskMoE is capable of maintaining representation diversity while achieving more comprehensive training. Experimental results demonstrate that our method outperforms previous dominant Mixture-of-Experts models in terms of both perplexity (PPL) and downstream task performance., Comment: Work in progress
Published: 2024

9. Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens

Author: Yang, Xikang, Tang, Xuehai, Zhu, Fuqing, Han, Jizhong, and Hu, Songlin
Subjects: Computer Science - Multimedia, Computer Science - Machine Learning
Abstract: Vision-language models (VLMs) seamlessly integrate visual and textual data to perform tasks such as image classification, caption generation, and visual question answering. However, adversarial images often struggle to deceive all prompts effectively in the context of cross-prompt migration attacks, as the probability distribution of the tokens in these images tends to favor the semantics of the original image rather than the target tokens. To address this challenge, we propose a Contextual-Injection Attack (CIA) that employs gradient-based perturbation to inject target tokens into both visual and textual contexts, thereby improving the probability distribution of the target tokens. By shifting the contextual semantics towards the target tokens instead of the original image semantics, CIA enhances the cross-prompt transferability of adversarial images.Extensive experiments on the BLIP2, InstructBLIP, and LLaVA models show that CIA outperforms existing methods in cross-prompt transferability, demonstrating its potential for more effective adversarial strategies in VLMs., Comment: 13 pages
Published: 2024

10. Representation Learning with Conditional Information Flow Maximization

Author: Hu, Dou, Wei, Lingwei, Zhou, Wei, and Hu, Songlin
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: This paper proposes an information-theoretic representation learning framework, named conditional information flow maximization, to extract noise-invariant sufficient representations for the input data and target task. It promotes the learned representations have good feature uniformity and sufficient predictive ability, which can enhance the generalization of pre-trained language models (PLMs) for the target task. Firstly, an information flow maximization principle is proposed to learn more sufficient representations for the input and target by simultaneously maximizing both input-representation and representation-label mutual information. Unlike the information bottleneck, we handle the input-representation information in an opposite way to avoid the over-compression issue of latent representations. Besides, to mitigate the negative effect of potential redundant features from the input, we design a conditional information minimization principle to eliminate negative redundant features while preserve noise-invariant features. Experiments on 13 language understanding benchmarks demonstrate that our method effectively improves the performance of PLMs for classification and regression. Extensive experiments show that the learned representations are more sufficient, robust and transferable., Comment: 16 pages, accepted to ACL 2024 (main conference), the code is available at https://github.com/zerohd4869/CIFM
Published: 2024

11. Semantic-Enhanced Relational Metric Learning for Recommender Systems

Author: Li, Mingming, Zhu, Fuqing, Yuan, Feng, and Hu, Songlin
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence
Abstract: Recently, relational metric learning methods have been received great attention in recommendation community, which is inspired by the translation mechanism in knowledge graph. Different from the knowledge graph where the entity-to-entity relations are given in advance, historical interactions lack explicit relations between users and items in recommender systems. Currently, many researchers have succeeded in constructing the implicit relations to remit this issue. However, in previous work, the learning process of the induction function only depends on a single source of data (i.e., user-item interaction) in a supervised manner, resulting in the co-occurrence relation that is free of any semantic information. In this paper, to tackle the above problem in recommender systems, we propose a joint Semantic-Enhanced Relational Metric Learning (SERML) framework that incorporates the semantic information. Specifically, the semantic signal is first extracted from the target reviews containing abundant item features and personalized user preferences. A novel regression model is then designed via leveraging the extracted semantic signal to improve the discriminative ability of original relation-based training process. On four widely-used public datasets, experimental results demonstrate that SERML produces a competitive performance compared with several state-of-the-art methods in recommender systems.
Published: 2024

12. Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model

Author: Gao, Chaochen, Wu, Xing, Fu, Qi, and Hu, Songlin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Recent advancements in large language models (LLMs) have highlighted the importance of extending context lengths for handling complex tasks. While traditional methods for training on long contexts often use filtered long documents, these approaches lead to domain imbalances, limiting model performance. To address this, techniques like random document concatenation (Standard) and similarity-based methods (KNN, ICLM) have been developed. However, they either sacrifice semantic coherence or diversity. To balance both aspects, we introduce Quest, a query-centric data synthesis method aggregating semantically relevant yet diverse documents. Quest uses a generative model to predict potential queries for each document, grouping documents with similar queries and keywords. Extensive experiments demonstrate Quest's superior performance on long-context tasks, achieving remarkable results with context lengths of up to 1M tokens and confirming its scalability across various model sizes.
Published: 2024

13. Improve Student's Reasoning Generalizability through Cascading Decomposed CoTs Distillation

Author: Dai, Chengwei, Li, Kun, Zhou, Wei, and Hu, Songlin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) exhibit enhanced reasoning at larger scales, driving efforts to distill these capabilities into smaller models via teacher-student learning. Previous works simply fine-tune student models on teachers' generated Chain-of-Thoughts (CoTs) data. Although these methods enhance in-domain (IND) reasoning performance, they struggle to generalize to out-of-domain (OOD) tasks. We believe that the widespread spurious correlations between questions and answers may lead the model to preset a specific answer which restricts the diversity and generalizability of its reasoning process. In this paper, we propose Cascading Decomposed CoTs Distillation (CasCoD) to address these issues by decomposing the traditional single-step learning process into two cascaded learning steps. Specifically, by restructuring the training objectives -- removing the answer from outputs and concatenating the question with the rationale as input -- CasCoD's two-step learning process ensures that students focus on learning rationales without interference from the preset answers, thus improving reasoning generalizability. Extensive experiments demonstrate the effectiveness of CasCoD on both IND and OOD benchmark reasoning datasets. Code can be found at https://github.com/C-W-D/CasCoD.
Published: 2024

14. Beyond Imitation: Learning Key Reasoning Steps from Dual Chain-of-Thoughts in Reasoning Distillation

Author: Dai, Chengwei, Li, Kun, Zhou, Wei, and Hu, Songlin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: As Large Language Models (LLMs) scale up and gain powerful Chain-of-Thoughts (CoTs) reasoning abilities, practical resource constraints drive efforts to distill these capabilities into more compact Smaller Language Models (SLMs). We find that CoTs consist mainly of simple reasoning forms, with a small proportion ($\approx 4.7\%$) of key reasoning steps that truly impact conclusions. However, previous distillation methods typically involve supervised fine-tuning student SLMs only on correct CoTs data produced by teacher LLMs, resulting in students struggling to learn the key reasoning steps, instead imitating the teacher's reasoning forms and making errors or omissions on these steps. To address these issues, drawing an analogy to human learning, where analyzing mistakes according to correct solutions often reveals the crucial steps leading to successes or failures, we propose mistak\textbf{E}-\textbf{D}riven key reason\textbf{I}ng step distilla\textbf{T}ion (\textbf{EDIT}), a novel method that further aids SLMs learning key reasoning steps rather than mere simple fine-tuning. Firstly, to expose these crucial steps in CoTs, we design specific prompts to generate dual CoTs data with similar reasoning paths but divergent conclusions. Then, we apply the minimum edit distance algorithm on the dual CoTs data to locate these key steps and optimize the likelihood of these steps. Extensive experiments validate the effectiveness of EDIT across both in-domain and out-of-domain benchmark reasoning datasets. Further analysis shows that EDIT can generate high-quality CoTs with more correct key reasoning steps. Notably, we also explore how different mistake patterns affect performance and find that EDIT benefits more from logical errors than from knowledge or mathematical calculation errors in dual CoTs\footnote{Code can be found at \url{https://github.com/C-W-D/EDIT}}.
Published: 2024

15. RNG: Reducing Multi-level Noise and Multi-grained Semantic Gap for Joint Multimodal Aspect-Sentiment Analysis

Author: Liu, Yaxin, Zhou, Yan, Li, Ziming, Zhang, Jinchuan, Shang, Yu, Zhang, Chenyang, and Hu, Songlin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: As an important multimodal sentiment analysis task, Joint Multimodal Aspect-Sentiment Analysis (JMASA), aiming to jointly extract aspect terms and their associated sentiment polarities from the given text-image pairs, has gained increasing concerns. Existing works encounter two limitations: (1) multi-level modality noise, i.e., instance- and feature-level noise; and (2) multi-grained semantic gap, i.e., coarse- and fine-grained gap. Both issues may interfere with accurate identification of aspect-sentiment pairs. To address these limitations, we propose a novel framework named RNG for JMASA. Specifically, to simultaneously reduce multi-level modality noise and multi-grained semantic gap, we design three constraints: (1) Global Relevance Constraint (GR-Con) based on text-image similarity for instance-level noise reduction, (2) Information Bottleneck Constraint (IB-Con) based on the Information Bottleneck (IB) principle for feature-level noise reduction, and (3) Semantic Consistency Constraint (SC-Con) based on mutual information maximization in a contrastive learning way for multi-grained semantic gap reduction. Extensive experiments on two datasets validate our new state-of-the-art performance., Comment: Accepted by ICME 2024
Published: 2024

16. Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM

Author: Yang, Xikang, Tang, Xuehai, Hu, Songlin, and Han, Jizhong
Subjects: Computer Science - Computation and Language, Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: Large language models (LLMs) have achieved remarkable performance in various natural language processing tasks, especially in dialogue systems. However, LLM may also pose security and moral threats, especially in multi round conversations where large models are more easily guided by contextual content, resulting in harmful or biased responses. In this paper, we present a novel method to attack LLMs in multi-turn dialogues, called CoA (Chain of Attack). CoA is a semantic-driven contextual multi-turn attack method that adaptively adjusts the attack policy through contextual feedback and semantic relevance during multi-turn of dialogue with a large model, resulting in the model producing unreasonable or harmful content. We evaluate CoA on different LLMs and datasets, and show that it can effectively expose the vulnerabilities of LLMs, and outperform existing attack methods. Our work provides a new perspective and tool for attacking and defending LLMs, and contributes to the security and ethical assessment of dialogue systems.
Published: 2024

17. Event Temporal Relation Extraction based on Retrieval-Augmented on LLMs

Author: Zhang, Xiaobin, Zang, Liangjun, Liu, Qianwen, Wei, Shuchong, and Hu, Songlin
Subjects: Computer Science - Computation and Language
Abstract: Event temporal relation (TempRel) is a primary subject of the event relation extraction task. However, the inherent ambiguity of TempRel increases the difficulty of the task. With the rise of prompt engineering, it is important to design effective prompt templates and verbalizers to extract relevant knowledge. The traditional manually designed templates struggle to extract precise temporal knowledge. This paper introduces a novel retrieval-augmented TempRel extraction approach, leveraging knowledge retrieved from large language models (LLMs) to enhance prompt templates and verbalizers. Our method capitalizes on the diverse capabilities of various LLMs to generate a wide array of ideas for template and verbalizer design. Our proposed method fully exploits the potential of LLMs for generation tasks and contributes more knowledge to our design. Empirical evaluations across three widely recognized datasets demonstrate the efficacy of our method in improving the performance of event temporal relation extraction tasks., Comment: 8 pages,6 figures.Accepted to the International Joint Conference on Neural Networks (IJCNN2024)
Published: 2024

18. Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval

Author: Ma, Guangyuan, Wu, Xing, Lin, Zijia, and Hu, Songlin
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language
Abstract: Masked auto-encoder pre-training has emerged as a prevalent technique for initializing and enhancing dense retrieval systems. It generally utilizes additional Transformer decoder blocks to provide sustainable supervision signals and compress contextual information into dense representations. However, the underlying reasons for the effectiveness of such a pre-training technique remain unclear. The usage of additional Transformer-based decoders also incurs significant computational costs. In this study, we aim to shed light on this issue by revealing that masked auto-encoder (MAE) pre-training with enhanced decoding significantly improves the term coverage of input tokens in dense representations, compared to vanilla BERT checkpoints. Building upon this observation, we propose a modification to the traditional MAE by replacing the decoder of a masked auto-encoder with a completely simplified Bag-of-Word prediction task. This modification enables the efficient compression of lexical signals into dense representations through unsupervised pre-training. Remarkably, our proposed method achieves state-of-the-art retrieval performance on several large-scale retrieval benchmarks without requiring any additional parameters, which provides a 67% training speed-up compared to standard masked auto-encoder pre-training with enhanced decoding., Comment: Accepted by SIGIR24. Our code is available at https://github.com/ma787639046/bowdpr
Published: 2024

19. Structured Probabilistic Coding

Author: Hu, Dou, Wei, Lingwei, Liu, Yaxin, Zhou, Wei, and Hu, Songlin
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: This paper presents a new supervised representation learning framework, namely structured probabilistic coding (SPC), to learn compact and informative representations from input related to the target task. SPC is an encoder-only probabilistic coding technology with a structured regularization from the target space. It can enhance the generalization ability of pre-trained language models for better language understanding. Specifically, our probabilistic coding simultaneously performs information encoding and task prediction in one module to more fully utilize the effective information from input data. It uses variational inference in the output space to reduce randomness and uncertainty. Besides, to better control the learning process of probabilistic representations, a structured regularization is proposed to promote uniformity across classes in the latent space. With the regularization term, SPC can preserve the Gaussian structure of the latent code and achieve better coverage of the hidden space with class uniformly. Experimental results on 12 natural language understanding tasks demonstrate that our SPC effectively improves the performance of pre-trained language models for classification and regression. Extensive experiments show that SPC can enhance the generalization capability, robustness to label noise, and clustering quality of output representations., Comment: 11 pages, accepted by AAAI 2024 (Oral)
Published: 2023

20. Can Large Language Models Understand Content and Propagation for Misinformation Detection: An Empirical Study

Author: Chen, Mengyang, Wei, Lingwei, Cao, Han, Zhou, Wei, and Hu, Songlin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society
Abstract: Large Language Models (LLMs) have garnered significant attention for their powerful ability in natural language understanding and reasoning. In this paper, we present a comprehensive empirical study to explore the performance of LLMs on misinformation detection tasks. This study stands as the pioneering investigation into the understanding capabilities of multiple LLMs regarding both content and propagation across social media platforms. Our empirical studies on five misinformation detection datasets show that LLMs with diverse prompts achieve comparable performance in text-based misinformation detection but exhibit notably constrained capabilities in comprehending propagation structure compared to existing models in propagation-based misinformation detection. Besides, we further design four instruction-tuned strategies to enhance LLMs for both content and propagation-based misinformation detection. These strategies boost LLMs to actively learn effective features from multiple instances or hard instances, and eliminate irrelevant propagation structures, thereby achieving better detection performance. Extensive experiments further demonstrate LLMs would play a better capacity in content and propagation structure under these proposed strategies and achieve promising detection performance. These findings highlight the potential ability of LLMs to detect misinformation.
Published: 2023

21. MiLe Loss: a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models

Author: Su, Zhenpeng, Wu, Xing, Bai, Xue, Lin, Zijia, Chen, Hui, Ding, Guiguang, Zhou, Wei, and Hu, Songlin
Subjects: Computer Science - Computation and Language
Abstract: Generative language models are usually pretrained on large text corpus via predicting the next token (i.e., sub-word/word/phrase) given the previous ones. Recent works have demonstrated the impressive performance of large generative language models on downstream tasks. However, existing generative language models generally neglect an inherent challenge in text corpus during training, i.e., the imbalance between frequent tokens and infrequent ones. It can lead a language model to be dominated by common and easy-to-learn tokens, thereby overlooking the infrequent and difficult-to-learn ones. To alleviate that, we propose a MiLe Loss function for mitigating the bias of learning difficulties with tokens. During training, it can dynamically assess the learning difficulty of a to-be-learned token, according to the information entropy of the corresponding predicted probability distribution over the vocabulary. Then it scales the training loss adaptively, trying to lead the model to focus more on the difficult-to-learn tokens. On the Pile dataset, we train generative language models at different scales of 468M, 1.2B, and 6.7B parameters. Experiments reveal that models incorporating the proposed MiLe Loss can gain consistent performance improvement on downstream benchmarks., Comment: This paper has been accepted by NAACL 2024
Published: 2023

22. CT-GAT: Cross-Task Generative Adversarial Attack based on Transferability

Author: Lv, Minxuan, Dai, Chengwei, Li, Kun, Zhou, Wei, and Hu, Songlin
Subjects: Computer Science - Computation and Language
Abstract: Neural network models are vulnerable to adversarial examples, and adversarial transferability further increases the risk of adversarial attacks. Current methods based on transferability often rely on substitute models, which can be impractical and costly in real-world scenarios due to the unavailability of training data and the victim model's structural details. In this paper, we propose a novel approach that directly constructs adversarial examples by extracting transferable features across various tasks. Our key insight is that adversarial transferability can extend across different tasks. Specifically, we train a sequence-to-sequence generative model named CT-GAT using adversarial sample data collected from multiple tasks to acquire universal adversarial features and generate adversarial examples for different tasks. We conduct experiments on ten distinct datasets, and the results demonstrate that our method achieves superior attack performance with small cost., Comment: Accepted to EMNLP 2023 main conference Corrected the header error in Figure 3
Published: 2023

23. HC3 Plus: A Semantic-Invariant Human ChatGPT Comparison Corpus

Author: Su, Zhenpeng, Wu, Xing, Zhou, Wei, Ma, Guangyuan, and Hu, Songlin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: ChatGPT has garnered significant interest due to its impressive performance; however, there is growing concern about its potential risks, particularly in the detection of AI-generated content (AIGC), which is often challenging for untrained individuals to identify. Current datasets used for detecting ChatGPT-generated text primarily focus on question-answering tasks, often overlooking tasks with semantic-invariant properties, such as summarization, translation, and paraphrasing. In this paper, we demonstrate that detecting model-generated text in semantic-invariant tasks is more challenging. To address this gap, we introduce a more extensive and comprehensive dataset that incorporates a wider range of tasks than previous work, including those with semantic-invariant properties. In addition, instruction fine-tuning has demonstrated superior performance across various tasks. In this paper, we explore the use of instruction fine-tuning models for detecting text generated by ChatGPT., Comment: This paper has been accepted by CIKM2023 workshop
Published: 2023

24. Pre-training with Large Language Model-based Document Expansion for Dense Passage Retrieval

Author: Ma, Guangyuan, Wu, Xing, Wang, Peng, Lin, Zijia, and Hu, Songlin
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language
Abstract: In this paper, we systematically study the potential of pre-training with Large Language Model(LLM)-based document expansion for dense passage retrieval. Concretely, we leverage the capabilities of LLMs for document expansion, i.e. query generation, and effectively transfer expanded knowledge to retrievers using pre-training strategies tailored for passage retrieval. These strategies include contrastive learning and bottlenecked query generation. Furthermore, we incorporate a curriculum learning strategy to reduce the reliance on LLM inferences. Experimental results demonstrate that pre-training with LLM-based document expansion significantly boosts the retrieval performance on large-scale web-search tasks. Our work shows strong zero-shot and out-of-domain retrieval abilities, making it more widely applicable for retrieval when initializing with no human-labeled data., Comment: 10 pages, 3 tables, 4 figures, under review
Published: 2023

25. Dial-MAE: ConTextual Masked Auto-Encoder for Retrieval-based Dialogue Systems

Author: Su, Zhenpeng, Wu, Xing, Zhou, Wei, Ma, Guangyuan, and Hu, Songlin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Dialogue response selection aims to select an appropriate response from several candidates based on a given user and system utterance history. Most existing works primarily focus on post-training and fine-tuning tailored for cross-encoders. However, there are no post-training methods tailored for dense encoders in dialogue response selection. We argue that when the current language model, based on dense dialogue systems (such as BERT), is employed as a dense encoder, it separately encodes dialogue context and response, leading to a struggle to achieve the alignment of both representations. Thus, we propose Dial-MAE (Dialogue Contextual Masking Auto-Encoder), a straightforward yet effective post-training technique tailored for dense encoders in dialogue response selection. Dial-MAE uses an asymmetric encoder-decoder architecture to compress the dialogue semantics into dense vectors, which achieves better alignment between the features of the dialogue context and response. Our experiments have demonstrated that Dial-MAE is highly effective, achieving state-of-the-art performance on two commonly evaluated benchmarks., Comment: This paper has been accepted by NAACL 2024
Published: 2023

26. Supervised Adversarial Contrastive Learning for Emotion Recognition in Conversations

Author: Hu, Dou, Bao, Yinan, Wei, Lingwei, Zhou, Wei, and Hu, Songlin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Extracting generalized and robust representations is a major challenge in emotion recognition in conversations (ERC). To address this, we propose a supervised adversarial contrastive learning (SACL) framework for learning class-spread structured representations in a supervised manner. SACL applies contrast-aware adversarial training to generate worst-case samples and uses joint class-spread contrastive learning to extract structured representations. It can effectively utilize label-level feature consistency and retain fine-grained intra-class features. To avoid the negative impact of adversarial perturbations on context-dependent data, we design a contextual adversarial training (CAT) strategy to learn more diverse features from context and enhance the model's context robustness. Under the framework with CAT, we develop a sequence-based SACL-LSTM to learn label-consistent and context-robust features for ERC. Experiments on three datasets show that SACL-LSTM achieves state-of-the-art performance on ERC. Extended experiments prove the effectiveness of SACL and CAT., Comment: 16 pages, accepted by ACL 2023
Published: 2023

27. UCAS-IIE-NLP at SemEval-2023 Task 12: Enhancing Generalization of Multilingual BERT for Low-resource Sentiment Analysis

Author: Hu, Dou, Wei, Lingwei, Liu, Yaxin, Zhou, Wei, and Hu, Songlin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: This paper describes our system designed for SemEval-2023 Task 12: Sentiment analysis for African languages. The challenge faced by this task is the scarcity of labeled data and linguistic resources in low-resource settings. To alleviate these, we propose a generalized multilingual system SACL-XLMR for sentiment analysis on low-resource languages. Specifically, we design a lexicon-based multilingual BERT to facilitate language adaptation and sentiment-aware representation learning. Besides, we apply a supervised adversarial contrastive learning technique to learn sentiment-spread structured representations and enhance model generalization. Our system achieved competitive results, largely outperforming baselines on both multilingual and zero-shot sentiment classification subtasks. Notably, the system obtained the 1st rank on the zero-shot classification subtask in the official ranking. Extensive experiments demonstrate the effectiveness of our system., Comment: 9 pages, accepted by SemEval@ACL 2023
Published: 2023

28. Author Correction: Stable training via elastic adaptive deep reinforcement learning for autonomous navigation of intelligent vehicles

Author: Zhao, Yujiao, Ma, Yong, Zhu, Guibing, Hu, Songlin, and Yan, Xinping
Published: 2024
Full Text: View/download PDF

29. Stable training via elastic adaptive deep reinforcement learning for autonomous navigation of intelligent vehicles

Author: Zhao, Yujiao, Ma, Yong, Zhu, Guibing, Hu, Songlin, and Yan, Xinping
Published: 2024
Full Text: View/download PDF

30. PUNR: Pre-training with User Behavior Modeling for News Recommendation

Author: Ma, Guangyuan, Liu, Hongtao, Wu, Xing, Qian, Wanhui, Lv, Zhepeng, Yang, Qing, and Hu, Songlin
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language
Abstract: News recommendation aims to predict click behaviors based on user behaviors. How to effectively model the user representations is the key to recommending preferred news. Existing works are mostly focused on improvements in the supervised fine-tuning stage. However, there is still a lack of PLM-based unsupervised pre-training methods optimized for user representations. In this work, we propose an unsupervised pre-training paradigm with two tasks, i.e. user behavior masking and user behavior generation, both towards effective user behavior modeling. Firstly, we introduce the user behavior masking pre-training task to recover the masked user behaviors based on their contextual behaviors. In this way, the model could capture a much stronger and more comprehensive user news reading pattern. Besides, we incorporate a novel auxiliary user behavior generation pre-training task to enhance the user representation vector derived from the user encoder. We use the above pre-trained user modeling encoder to obtain news and user representations in downstream fine-tuning. Evaluations on the real-world news benchmark show significant performance improvements over existing baselines., Comment: Accepted by Findings of EMNLP23. Github Repo: https://github.com/ma787639046/punr
Published: 2023

31. CoT-MoTE: Exploring ConTextual Masked Auto-Encoder Pre-training with Mixture-of-Textual-Experts for Passage Retrieval

Author: Ma, Guangyuan, Wu, Xing, Wang, Peng, and Hu, Songlin
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: Passage retrieval aims to retrieve relevant passages from large collections of the open-domain corpus. Contextual Masked Auto-Encoding has been proven effective in representation bottleneck pre-training of a monolithic dual-encoder for passage retrieval. Siamese or fully separated dual-encoders are often adopted as basic retrieval architecture in the pre-training and fine-tuning stages for encoding queries and passages into their latent embedding spaces. However, simply sharing or separating the parameters of the dual-encoder results in an imbalanced discrimination of the embedding spaces. In this work, we propose to pre-train Contextual Masked Auto-Encoder with Mixture-of-Textual-Experts (CoT-MoTE). Specifically, we incorporate textual-specific experts for individually encoding the distinct properties of queries and passages. Meanwhile, a shared self-attention layer is still kept for unified attention modeling. Results on large-scale passage retrieval benchmarks show steady improvement in retrieval performances. The quantitive analysis also shows a more balanced discrimination of the latent embedding spaces.
Published: 2023

32. CoT-MAE v2: Contextual Masked Auto-Encoder with Multi-view Modeling for Passage Retrieval

Author: Wu, Xing, Ma, Guangyuan, Wang, Peng, Lin, Meng, Lin, Zijia, Zhang, Fuzheng, and Hu, Songlin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Growing techniques have been emerging to improve the performance of passage retrieval. As an effective representation bottleneck pretraining technique, the contextual masked auto-encoder utilizes contextual embedding to assist in the reconstruction of passages. However, it only uses a single auto-encoding pre-task for dense representation pre-training. This study brings multi-view modeling to the contextual masked auto-encoder. Firstly, multi-view representation utilizes both dense and sparse vectors as multi-view representations, aiming to capture sentence semantics from different aspects. Moreover, multiview decoding paradigm utilizes both autoencoding and auto-regressive decoders in representation bottleneck pre-training, aiming to provide both reconstructive and generative signals for better contextual representation pretraining. We refer to this multi-view pretraining method as CoT-MAE v2. Through extensive experiments, we show that CoT-MAE v2 is effective and robust on large-scale passage retrieval benchmarks and out-of-domain zero-shot benchmarks., Comment: working in progress
Published: 2023

33. Byzantine-Resilient Multi-Agent Distributed Exact Optimization with Less Data

Author: Zhai, Yang, Liu, Zhi-Wei, Yue, Dong, Hu, Songlin, and Xie, Xiangpeng
Subjects: Mathematics - Optimization and Control, Electrical Engineering and Systems Science - Systems and Control
Abstract: This paper studies the distributed multi-agent resilient optimization problem under the f-total Byzantine attacks. Compared with the previous work on Byzantineresilient multi-agent exact optimization problems, we do not require the communication topology to be fully connected. Under the redundancy of cost functions, we propose the distributed comparative gradient elimination resilient optimization algorithm based on the traditional assumptions on strongly convex global costs and Lipschitz continuous gradients. Under this algorithm, we successfully prove that if the number of inneighbors of each normal agent is greater than some constant and the parameter f satisfies certain conditions, all normal agents' local estimations of the global variable will finally reach consensus and converge to the optimized solution. Finally, the numerical experiments successfully verify the correctness of the results., Comment: There are some errors in the provement of this paper
Published: 2023

34. Query-as-context Pre-training for Dense Passage Retrieval

Author: Wu, Xing, Ma, Guangyuan, Qian, Wanhui, Lin, Zijia, and Hu, Songlin
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence
Abstract: Recently, methods have been developed to improve the performance of dense passage retrieval by using context-supervised pre-training. These methods simply consider two passages from the same document to be relevant, without taking into account the possibility of weakly correlated pairs. Thus, this paper proposes query-as-context pre-training, a simple yet effective pre-training technique to alleviate the issue. Query-as-context pre-training assumes that the query derived from a passage is more likely to be relevant to that passage and forms a passage-query pair. These passage-query pairs are then used in contrastive or generative context-supervised pre-training. The pre-trained models are evaluated on large-scale passage retrieval benchmarks and out-of-domain zero-shot benchmarks. Experimental results show that query-as-context pre-training brings considerable gains and meanwhile speeds up training, demonstrating its effectiveness and efficiency. Our code will be available at https://github.com/caskcsg/ir/tree/main/cotmae-qc ., Comment: EMNLP 2023 Main Conference
Published: 2022

35. Propagation Structure-Semantic Transfer Learning for Robust Fake News Detection

Author: Chen, Mengyang, Wei, Lingwei, Cao, Han, Zhou, Wei, Yan, Zhou, Hu, Songlin, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bifet, Albert, editor, Davis, Jesse, editor, Krilavičius, Tomas, editor, Kull, Meelis, editor, Ntoutsi, Eirini, editor, and Žliobaitė, Indrė, editor
Published: 2024
Full Text: View/download PDF

36. Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models

Author: Zhang, Weigang, Zhou, Biyu, Wu, Xing, Gao, Chaochen, Liu, Zhibing, Tang, Xuehai, Li, Ruixuan, Han, Jizhong, Hu, Songlin, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Carretero, Jesus, editor, Shende, Sameer, editor, Garcia-Blas, Javier, editor, Brandic, Ivona, editor, Olcoz, Katzalin, editor, and Schreiber, Martin, editor
Published: 2024
Full Text: View/download PDF

37. MetaPETR: An Effective Model for Handling Class-Imbalanced Data About Event Temporal Relations

Author: Zhang, Xiaobin, Zang, Liangjun, Liu, Qianwen, Wei, Shuchong, Hu, Songlin, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Zhang, Xiankun, editor, and Zhang, Qinhu, editor
Published: 2024
Full Text: View/download PDF

38. A Multi-source Domain Adaption Approach to Minority Disk Failure Prediction

Author: Wang, Wang, Tang, Xuehai, Zhou, Biyu, Dong, Yangchen, Feng, Yuanhang, Han, Jizhong, Hu, Songlin, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Tari, Zahir, editor, Li, Keqiu, editor, and Wu, Hongyi, editor
Published: 2024
Full Text: View/download PDF

39. Supervised Prototypical Contrastive Learning for Emotion Recognition in Conversation

Author: Song, Xiaohui, Huang, Longtao, Xue, Hui, and Hu, Songlin
Subjects: Computer Science - Artificial Intelligence
Abstract: Capturing emotions within a conversation plays an essential role in modern dialogue systems. However, the weak correlation between emotions and semantics brings many challenges to emotion recognition in conversation (ERC). Even semantically similar utterances, the emotion may vary drastically depending on contexts or speakers. In this paper, we propose a Supervised Prototypical Contrastive Learning (SPCL) loss for the ERC task. Leveraging the Prototypical Network, the SPCL targets at solving the imbalanced classification problem through contrastive learning and does not require a large batch size. Meanwhile, we design a difficulty measure function based on the distance between classes and introduce curriculum learning to alleviate the impact of extreme samples. We achieve state-of-the-art results on three widely used benchmarks. Further, we conduct analytical experiments to demonstrate the effectiveness of our proposed SPCL and curriculum learning strategy. We release the code at https://github.com/caskcsg/SPCL., Comment: Accepted by EMNLP 2022
Published: 2022

40. RaP: Redundancy-aware Video-language Pre-training for Text-Video Retrieval

Author: Wu, Xing, Gao, Chaochen, Lin, Zijia, Wang, Zhongyuan, Han, Jizhong, and Hu, Songlin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Video language pre-training methods have mainly adopted sparse sampling techniques to alleviate the temporal redundancy of videos. Though effective, sparse sampling still suffers inter-modal redundancy: visual redundancy and textual redundancy. Compared with highly generalized text, sparsely sampled frames usually contain text-independent portions, called visual redundancy. Sparse sampling is also likely to miss important frames corresponding to some text portions, resulting in textual redundancy. Inter-modal redundancy leads to a mismatch of video and text information, hindering the model from better learning the shared semantics across modalities. To alleviate it, we propose Redundancy-aware Video-language Pre-training. We design a redundancy measurement of video patches and text tokens by calculating the cross-modal minimum dis-similarity. Then, we penalize the highredundant video patches and text tokens through a proposed redundancy-aware contrastive learning. We evaluate our method on four benchmark datasets, MSRVTT, MSVD, DiDeMo, and LSMDC, achieving a significant improvement over the previous stateof-the-art results. Our code are available at https://github.com/caskcsg/VLP/tree/main/RaP., Comment: EMNLP 2022
Published: 2022

41. InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings

Author: Wu, Xing, Gao, Chaochen, Lin, Zijia, Han, Jizhong, Wang, Zhongyuan, and Hu, Songlin
Subjects: Computer Science - Computation and Language
Abstract: Contrastive learning has been extensively studied in sentence embedding learning, which assumes that the embeddings of different views of the same sentence are closer. The constraint brought by this assumption is weak, and a good sentence representation should also be able to reconstruct the original sentence fragments. Therefore, this paper proposes an information-aggregated contrastive learning framework for learning unsupervised sentence embeddings, termed InfoCSE. InfoCSE forces the representation of [CLS] positions to aggregate denser sentence information by introducing an additional Masked language model task and a well-designed network. We evaluate the proposed InfoCSE on several benchmark datasets w.r.t the semantic text similarity (STS) task. Experimental results show that InfoCSE outperforms SimCSE by an average Spearman correlation of 2.60% on BERT-base, and 1.77% on BERT-large, achieving state-of-the-art results among unsupervised sentence representation learning methods. Our code are available at https://github.com/caskcsg/sentemb/tree/main/InfoCSE., Comment: EMNLP 2022
Published: 2022

42. ConTextual Masked Auto-Encoder for Dense Passage Retrieval

Author: Wu, Xing, Ma, Guangyuan, Lin, Meng, Lin, Zijia, Wang, Zhongyuan, and Hu, Songlin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Dense passage retrieval aims to retrieve the relevant passages of a query from a large corpus based on dense representations (i.e., vectors) of the query and the passages. Recent studies have explored improving pre-trained language models to boost dense retrieval performance. This paper proposes CoT-MAE (ConTextual Masked Auto-Encoder), a simple yet effective generative pre-training method for dense passage retrieval. CoT-MAE employs an asymmetric encoder-decoder architecture that learns to compress the sentence semantics into a dense vector through self-supervised and context-supervised masked auto-encoding. Precisely, self-supervised masked auto-encoding learns to model the semantics of the tokens inside a text span, and context-supervised masked auto-encoding learns to model the semantical correlation between the text spans. We conduct experiments on large-scale passage retrieval benchmarks and show considerable improvements over strong baselines, demonstrating the high efficiency of CoT-MAE. Our code is available at https://github.com/caskcsg/ir/tree/main/cotmae., Comment: This paper has been accepted by AAAI2023
Published: 2022

43. Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in Conversation

Author: Bao, Yinan, Ma, Qianwen, Wei, Lingwei, Zhou, Wei, and Hu, Songlin
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The emotion recognition in conversation (ERC) task aims to predict the emotion label of an utterance in a conversation. Since the dependencies between speakers are complex and dynamic, which consist of intra- and inter-speaker dependencies, the modeling of speaker-specific information is a vital role in ERC. Although existing researchers have proposed various methods of speaker interaction modeling, they cannot explore dynamic intra- and inter-speaker dependencies jointly, leading to the insufficient comprehension of context and further hindering emotion prediction. To this end, we design a novel speaker modeling scheme that explores intra- and inter-speaker dependencies jointly in a dynamic manner. Besides, we propose a Speaker-Guided Encoder-Decoder (SGED) framework for ERC, which fully exploits speaker information for the decoding of emotion. We use different existing methods as the conversational context encoder of our framework, showing the high scalability and flexibility of the proposed framework. Experimental results demonstrate the superiority and effectiveness of SGED., Comment: Accepted by IJCAI-ECAI 2022
Published: 2022

44. A Multi-source Domain Adaption Approach to Minority Disk Failure Prediction

Author: Wang, Wang, primary, Tang, Xuehai, additional, Zhou, Biyu, additional, Dong, Yangchen, additional, Feng, Yuanhang, additional, Han, Jizhong, additional, and Hu, Songlin, additional
Published: 2024
Full Text: View/download PDF

45. Multi-Granularity Semantic Aware Graph Model for Reducing Position Bias in Emotion-Cause Pair Extraction

Author: Bao, Yinan, Ma, Qianwen, Wei, Lingwei, Zhou, Wei, and Hu, Songlin
Subjects: Computer Science - Computation and Language
Abstract: The Emotion-Cause Pair Extraction (ECPE) task aims to extract emotions and causes as pairs from documents. We observe that the relative distance distribution of emotions and causes is extremely imbalanced in the typical ECPE dataset. Existing methods have set a fixed size window to capture relations between neighboring clauses. However, they neglect the effective semantic connections between distant clauses, leading to poor generalization ability towards position-insensitive data. To alleviate the problem, we propose a novel Multi-Granularity Semantic Aware Graph model (MGSAG) to incorporate fine-grained and coarse-grained semantic features jointly, without regard to distance limitation. In particular, we first explore semantic dependencies between clauses and keywords extracted from the document that convey fine-grained semantic features, obtaining keywords enhanced clause representations. Besides, a clause graph is also established to model coarse-grained semantic relations between clauses. Experimental results indicate that MGSAG surpasses the existing state-of-the-art ECPE models. Especially, MGSAG outperforms other models significantly in the condition of position-insensitive data., Comment: Accepted by the Findings of ACL 2022
Published: 2022

46. HELoC: Hierarchical Contrastive Learning of Source Code Representation

Author: Wang, Xiao, Wu, Qiong, Zhang, Hongyu, Lyu, Chen, Jiang, Xue, Zheng, Zhuoran, Lyu, Lei, and Hu, Songlin
Subjects: Computer Science - Software Engineering
Abstract: Abstract syntax trees (ASTs) play a crucial role in source code representation. However, due to the large number of nodes in an AST and the typically deep AST hierarchy, it is challenging to learn the hierarchical structure of an AST effectively. In this paper, we propose HELoC, a hierarchical contrastive learning model for source code representation. To effectively learn the AST hierarchy, we use contrastive learning to allow the network to predict the AST node level and learn the hierarchical relationships between nodes in a self-supervised manner, which makes the representation vectors of nodes with greater differences in AST levels farther apart in the embedding space. By using such vectors, the structural similarities between code snippets can be measured more precisely. In the learning process, a novel GNN (called Residual Self-attention Graph Neural Network, RSGNN) is designed, which enables HELoC to focus on embedding the local structure of an AST while capturing its overall structure. HELoC is self-supervised and can be applied to many source code related downstream tasks such as code classification, code clone detection, and code clustering after pre-training. Our extensive experiments demonstrate that HELoC outperforms the state-of-the-art source code representation models., Comment: Accepted by ICPC 2022
Published: 2022

47. Event-based output consensus of heterogeneous MASs with nonuniform delays and sequential scaling attacks

Author: Li, Gen, Yin, Xiuxia, and Hu, Songlin
Published: 2024
Full Text: View/download PDF

48. Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

Author: Wu, Xing, Gao, Chaochen, Lin, Meng, Zang, Liangjun, Wang, Zhongyuan, and Hu, Songlin
Subjects: Computer Science - Computation and Language
Abstract: Before entering the neural network, a token is generally converted to the corresponding one-hot representation, which is a discrete distribution of the vocabulary. Smoothed representation is the probability of candidate tokens obtained from a pre-trained masked language model, which can be seen as a more informative substitution to the one-hot representation. We propose an efficient data augmentation method, termed text smoothing, by converting a sentence from its one-hot representation to a controllable smoothed representation. We evaluate text smoothing on different benchmarks in a low-resource regime. Experimental results show that text smoothing outperforms various mainstream data augmentation methods by a substantial margin. Moreover, text smoothing can be combined with those data augmentation methods to achieve better performance., Comment: ACL 2022 Main Conference Accepted
Published: 2022

49. DistilCSE: Effective Knowledge Distillation For Contrastive Sentence Embeddings

Author: Gao, Chaochen, Wu, Xing, Wang, Peng, Wang, Jue, Zang, Liangjun, Wang, Zhongyuan, and Hu, Songlin
Subjects: Computer Science - Artificial Intelligence
Abstract: Large-scale contrastive learning models can learn very informative sentence embeddings, but are hard to serve online due to the huge model size. Therefore, they often play the role of "teacher", transferring abilities to small "student" models through knowledge distillation. However, knowledge distillation inevitably brings some drop in embedding effect. To tackle that, we propose an effective knowledge distillation framework for contrastive sentence embeddings, termed DistilCSE. It first applies knowledge distillation on a large amount of unlabeled data, and then fine-tunes student models through contrastive learning on limited labeled data. To achieve better distillation results, we further propose Contrastive Knowledge Distillation (CKD). CKD uses InfoNCE as the loss function in knowledge distillation, enhancing the objective consistency among teacher model training, knowledge distillation, and student model fine-tuning. Extensive experiments show that student models trained with the proposed DistilCSE and CKD suffer from little or even no performance decrease and consistently outperform the corresponding counterparts of the same parameter size. Impressively, our 110M student model outperforms the latest state-of-the-art model, i.e., Sentence-T5 (11B), with only 1% parameters and 0.25% unlabeled data., Comment: Work in progress
Published: 2021

50. Discovery of membrane-targeting amphiphilic honokiol derivatives containing an oxazolethione moiety to combat methicillin-resistant Staphylococcus aureus (MRSA) infections

Author: Yang, Ruige, Cui, Liping, Xu, Ting, Zhong, Yan, Hu, Songlin, Liu, Jifeng, Qin, Shangshang, Wang, Xiaoliu, and Guo, Yong
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

702 results on '"Hu, Songlin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources