Author: "Quan, Xiaojun" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Quan, Xiaojun"' showing total 398 results

Start Over Author "Quan, Xiaojun"

398 results on '"Quan, Xiaojun"'

1. FuseChat: Knowledge Fusion of Chat Models

Author: Wan, Fanqi, Zhong, Longguang, Yang, Ziyi, Chen, Ruijun, and Quan, Xiaojun
Subjects: Computer Science - Computation and Language
Abstract: While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, it incurs substantial costs and may lead to redundancy in competencies. Knowledge fusion aims to integrate existing LLMs of diverse architectures and capabilities into a more potent LLM through lightweight continual training, thereby reducing the need for costly LLM development. In this work, we propose a new framework for the knowledge fusion of chat LLMs through two main stages, resulting in FuseChat. Firstly, we conduct pairwise knowledge fusion on source chat LLMs of varying structures and scales to create multiple target LLMs with identical structure and size via lightweight fine-tuning. During this process, a statistics-based token alignment approach is introduced as the cornerstone for fusing LLMs with different structures. Secondly, we merge these target LLMs within the parameter space, where we propose a novel method for determining the merging coefficients based on the magnitude of parameter updates before and after fine-tuning. We implement and validate FuseChat using six prominent chat LLMs with diverse architectures and scales, including OpenChat-3.5-7B, Starling-LM-7B-alpha, NH2-SOLAR-10.7B, InternLM2-Chat-20B, Mixtral-8x7B-Instruct, and Qwen-1.5-Chat-72B. Experimental results on two instruction-following benchmarks, AlpacaEval 2.0 and MT-Bench, demonstrate the superiority of FuseChat-7B over baselines of various sizes. Our model is even comparable to the larger Mixtral-8x7B-Instruct and approaches GPT-3.5-Turbo-1106 on MT-Bench. Our code, model weights, and data are public at \url{https://github.com/fanqiwan/FuseAI}., Comment: Work in progress
Published: 2024

2. ProFuser: Progressive Fusion of Large Language Models

Author: Shi, Tianyuan, Wan, Fanqi, Huang, Canbin, Quan, Xiaojun, Li, Chenliang, Yan, Ming, and Zhang, Ji
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: While fusing the capacities and advantages of various large language models (LLMs) offers a pathway to construct more powerful and versatile models, a fundamental challenge is to properly select advantageous model during the training. Existing fusion methods primarily focus on the training mode that uses cross entropy on ground truth in a teacher-forcing setup to measure a model's advantage, which may provide limited insight towards model advantage. In this paper, we introduce a novel approach that enhances the fusion process by incorporating both the training and inference modes. Our method evaluates model advantage not only through cross entropy during training but also by considering inference outputs, providing a more comprehensive assessment. To combine the two modes effectively, we introduce ProFuser to progressively transition from inference mode to training mode. To validate ProFuser's effectiveness, we fused three models, including vicuna-7b-v1.5, Llama-2-7b-chat, and mpt-7b-8k-chat, and demonstrated the improved performance in knowledge, reasoning, and safety compared to baseline methods.
Published: 2024

3. Cool-Fusion: Fuse Large Language Models without Training

Author: Liu, Cong, Quan, Xiaojun, Pan, Yan, Lin, Liang, Wu, Weigang, and Chen, Xu
Subjects: Computer Science - Computation and Language
Abstract: We focus on the problem of fusing two or more heterogeneous large language models (LLMs) to facilitate their complementary strengths. One of the challenges on model fusion is high computational load, i.e. to fine-tune or to align vocabularies via combinatorial optimization. To this end, we propose \emph{Cool-Fusion}, a simple yet effective approach that fuses the knowledge of heterogeneous source LLMs to leverage their complementary strengths. \emph{Cool-Fusion} is the first method that does not require any type of training like the ensemble approaches. But unlike ensemble methods, it is applicable to any set of source LLMs that have different vocabularies. The basic idea is to have each source LLM individually generate tokens until the tokens can be decoded into a text segment that ends at word boundaries common to all source LLMs. Then, the source LLMs jointly rerank the generated text segment and select the best one, which is the fused text generation in one step. Extensive experiments are conducted across a variety of benchmark datasets. On \emph{GSM8K}, \emph{Cool-Fusion} increases accuracy from three strong source LLMs by a significant 8\%-17.8\%.
Published: 2024

4. Self-Evolution Fine-Tuning for Policy Optimization

Author: Chen, Ruijun, Liang, Jiehao, Gao, Shiping, Wan, Fanqi, and Quan, Xiaojun
Subjects: Computer Science - Computation and Language
Abstract: The alignment of large language models (LLMs) is crucial not only for unlocking their potential in specific tasks but also for ensuring that responses meet human expectations and adhere to safety and ethical principles. Current alignment methodologies face considerable challenges. For instance, supervised fine-tuning (SFT) requires extensive, high-quality annotated samples, while reinforcement learning from human feedback (RLHF) is complex and often unstable. In this paper, we introduce self-evolution fine-tuning (SEFT) for policy optimization, with the aim of eliminating the need for annotated samples while retaining the stability and efficiency of SFT. SEFT first trains an adaptive reviser to elevate low-quality responses while maintaining high-quality ones. The reviser then gradually guides the policy's optimization by fine-tuning it with enhanced responses. One of the prominent features of this method is its ability to leverage unlimited amounts of unannotated data for policy optimization through supervised fine-tuning. Our experiments on AlpacaEval 2.0 and MT-Bench demonstrate the effectiveness of SEFT. We also provide a comprehensive analysis of its advantages over existing alignment techniques.
Published: 2024

5. BlockPruner: Fine-grained Pruning for Large Language Models

Author: Zhong, Longguang, Wan, Fanqi, Chen, Ruijun, Quan, Xiaojun, and Li, Liangzhi
Subjects: Computer Science - Computation and Language
Abstract: With the rapid growth in the size and complexity of large language models (LLMs), the costs associated with their training and inference have escalated significantly. Research indicates that certain layers in LLMs harbor substantial redundancy, and pruning these layers has minimal impact on the overall performance. While various layer pruning methods have been developed based on this insight, they generally overlook the finer-grained redundancies within the layers themselves. In this paper, we delve deeper into the architecture of LLMs and demonstrate that finer-grained pruning can be achieved by targeting redundancies in multi-head attention (MHA) and multi-layer perceptron (MLP) blocks. We propose a novel, training-free structured pruning approach called BlockPruner. Unlike existing layer pruning methods, BlockPruner segments each Transformer layer into MHA and MLP blocks. It then assesses the importance of these blocks using perplexity measures and applies a heuristic search for iterative pruning. We applied BlockPruner to LLMs of various sizes and architectures and validated its performance across a wide range of downstream tasks. Experimental results show that BlockPruner achieves more granular and effective pruning compared to state-of-the-art baselines.
Published: 2024

6. SocialBench: Sociality Evaluation of Role-Playing Conversational Agents

Author: Chen, Hongzhan, Chen, Hehong, Yan, Ming, Xu, Wenshen, Gao, Xing, Shen, Weizhou, Quan, Xiaojun, Li, Chenliang, Zhang, Ji, Huang, Fei, and Zhou, Jingren
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) have advanced the development of various AI conversational agents, including role-playing conversational agents that mimic diverse characters and human behaviors. While prior research has predominantly focused on enhancing the conversational capability, role-specific knowledge, and stylistic attributes of these agents, there has been a noticeable gap in assessing their social intelligence. In this paper, we introduce SocialBench, the first benchmark designed to systematically evaluate the sociality of role-playing conversational agents at both individual and group levels of social interactions. The benchmark is constructed from a variety of sources and covers a wide range of 500 characters and over 6,000 question prompts and 30,800 multi-turn role-playing utterances. We conduct comprehensive evaluations on this benchmark using mainstream open-source and closed-source LLMs. We find that agents excelling in individual level does not imply their proficiency in group level. Moreover, the behavior of individuals may drift as a result of the influence exerted by other agents within the group. Experimental results on SocialBench confirm its significance as a testbed for assessing the social interaction of role-playing conversational agents. The benchmark is publicly accessible at https://github.com/X-PLUG/SocialBench., Comment: ACL 2024 Findings
Published: 2024

7. Knowledge Fusion of Chat LLMs: A Preliminary Technical Report

Author: Wan, Fanqi, Yang, Ziyi, Zhong, Longguang, Quan, Xiaojun, Huang, Xinting, and Bi, Wei
Subjects: Computer Science - Computation and Language
Abstract: Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training. In this report, we extend the scalability and flexibility of the FuseLLM framework to realize the fusion of chat LLMs, resulting in FusionChat. FusionChat comprises two main stages. Firstly, we undertake knowledge fusion for structurally and scale-varied source LLMs to derive multiple target LLMs of identical structure and size via lightweight fine-tuning. Then, these target LLMs are merged within the parameter space, wherein we propose a novel method for determining the merging weights based on the variation ratio of parameter matrices before and after fine-tuning. We validate our approach using three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B. Experimental results spanning various chat domains demonstrate the superiority of FusionChat-7B across a broad spectrum of chat LLMs at 7B and 34B scales, even surpassing GPT-3.5 (March) and approaching Mixtral-8x7B-Instruct., Comment: Technical Report, work in progress
Published: 2024

8. Alirector: Alignment-Enhanced Chinese Grammatical Error Corrector

Author: Yang, Haihui and Quan, Xiaojun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Chinese grammatical error correction (CGEC) faces serious overcorrection challenges when employing autoregressive generative models such as sequence-to-sequence (Seq2Seq) models and decoder-only large language models (LLMs). While previous methods aim to address overcorrection in Seq2Seq models, they are difficult to adapt to decoder-only LLMs. In this paper, we propose an alignment-enhanced corrector for the overcorrection problem that applies to both Seq2Seq models and decoder-only LLMs. Our method first trains a correction model to generate an initial correction of the source sentence. Then, we combine the source sentence with the initial correction and feed it through an alignment model for another round of correction, aiming to enforce the alignment model to focus on potential overcorrection. Moreover, to enhance the model's ability to identify nuances, we further explore the reverse alignment of the source sentence and the initial correction. Finally, we transfer the alignment knowledge from two alignment models to the correction model, instructing it on how to avoid overcorrection. Experimental results on three CGEC datasets demonstrate the effectiveness of our approach in alleviating overcorrection and improving overall performance. Our code has been made publicly available., Comment: Accepted to Findings of ACL 2024
Published: 2024

9. Knowledge Verification to Nip Hallucination in the Bud

Author: Wan, Fanqi, Huang, Xinting, Cui, Leyang, Quan, Xiaojun, Bi, Wei, and Shi, Shuming
Subjects: Computer Science - Computation and Language
Abstract: While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as hallucination. In this paper, we demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs. Specifically, we propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge to evaluate the knowledge boundaries of foundation LLMs. To address knowledge inconsistencies in the alignment data, KCA implements several specific strategies to deal with these data instances. We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales. This confirms the effectiveness of mitigating hallucinations by reducing knowledge inconsistency. Our code, model weights, and data are openly accessible at \url{https://github.com/fanqiwan/KCA}., Comment: Accepted to EMNLP 2024 (Main Conference)
Published: 2024

10. Knowledge Fusion of Large Language Models

Author: Wan, Fanqi, Huang, Xinting, Cai, Deng, Quan, Xiaojun, Bi, Wei, and Shi, Shuming
Subjects: Computer Science - Computation and Language
Abstract: While training large language models (LLMs) from scratch can generate models with distinct functionalities and strengths, it comes at significant costs and may result in redundant capabilities. Alternatively, a cost-effective and compelling approach is to merge existing pre-trained LLMs into a more potent model. However, due to the varying architectures of these LLMs, directly blending their weights is impractical. In this paper, we introduce the notion of knowledge fusion for LLMs, aimed at combining the capabilities of existing LLMs and transferring them into a single LLM. By leveraging the generative distributions of source LLMs, we externalize their collective knowledge and unique strengths, thereby potentially elevating the capabilities of the target model beyond those of any individual source LLM. We validate our approach using three popular LLMs with different architectures--Llama-2, MPT, and OpenLLaMA--across various benchmarks and tasks. Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation. Our code, model weights, and data are public at \url{https://github.com/fanqiwan/FuseLLM}., Comment: Accepted to ICLR 2024
Published: 2024

11. Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

Author: Shen, Weizhou, Li, Chenliang, Chen, Hongzhan, Yan, Ming, Quan, Xiaojun, Chen, Hehong, Zhang, Ji, and Huang, Fei
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs, empowering them to interact with external tools (e.g., APIs, functions) and complete various tasks in a self-directed fashion. The challenge of tool use demands that LLMs not only understand user queries and generate answers accurately but also excel in task planning, tool invocation, and result summarization. While traditional works focus on training a single LLM with all these capabilities, performance limitations become apparent, particularly with smaller models. To overcome these challenges, we propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. Each component is implemented by a single LLM that focuses on a specific capability and collaborates with others to accomplish the task. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability. To effectively train this framework, we introduce a two-stage training paradigm. First, we fine-tune a backbone LLM on the entire dataset without discriminating sub-tasks, providing the model with a comprehensive understanding of the task. Second, the fine-tuned LLM is used to instantiate the planner, caller, and summarizer respectively, which are continually fine-tuned on respective sub-tasks. Evaluation across various tool-use benchmarks illustrates that our proposed multi-LLM framework surpasses the traditional single-LLM approach, highlighting its efficacy and advantages in tool learning., Comment: On progress, github repo: https://github.com/X-PLUG/Multi-LLM-Agent
Published: 2024

12. PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for Personality Detection

Author: Yang, Tao, Shi, Tianyuan, Wan, Fanqi, Quan, Xiaojun, Wang, Qifan, Wu, Bingzhe, and Wu, Jiaxiang
Subjects: Computer Science - Computation and Language
Abstract: Recent advances in large language models (LLMs), such as ChatGPT, have showcased remarkable zero-shot performance across various NLP tasks. However, the potential of LLMs in personality detection, which involves identifying an individual's personality from their written texts, remains largely unexplored. Drawing inspiration from Psychological Questionnaires, which are carefully designed by psychologists to evaluate individual personality traits through a series of targeted items, we argue that these items can be regarded as a collection of well-structured chain-of-thought (CoT) processes. By incorporating these processes, LLMs can enhance their capabilities to make more reasonable inferences on personality from textual input. In light of this, we propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner. In particular, we employ a LLM as an AI assistant with a specialization in text analysis. We prompt the assistant to rate individual items at each turn and leverage the historical rating results to derive a conclusive personality preference. Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection, achieving an average F1 score improvement of 4.23/10.63 points on two benchmark datasets compared to the standard prompting method. Our code is available at https://github.com/TaoYang225/PsyCoT., Comment: Accepted to Findings of EMNLP 2023
Published: 2023

13. MCC-KD: Multi-CoT Consistent Knowledge Distillation

Author: Chen, Hongzhan, Wu, Siyue, Quan, Xiaojun, Wang, Rui, Yan, Ming, and Zhang, Ji
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) have showcased remarkable capabilities in complex reasoning through chain of thought (CoT) prompting. Recently, there has been a growing interest in transferring these reasoning abilities from LLMs to smaller models. However, achieving both the diversity and consistency in rationales presents a challenge. In this paper, we focus on enhancing these two aspects and propose Multi-CoT Consistent Knowledge Distillation (MCC-KD) to efficiently distill the reasoning capabilities. In MCC-KD, we generate multiple rationales for each question and enforce consistency among the corresponding predictions by minimizing the bidirectional KL-divergence between the answer distributions. We investigate the effectiveness of MCC-KD with different model architectures (LLaMA/FlanT5) and various model scales (3B/7B/11B/13B) on both mathematical reasoning and commonsense reasoning benchmarks. The empirical results not only confirm MCC-KD's superior performance on in-distribution datasets but also highlight its robust generalization ability on out-of-distribution datasets., Comment: Accepted to ENMLP 2023
Published: 2023

14. Dual-Feedback Knowledge Retrieval for Task-Oriented Dialogue Systems

Author: Shi, Tianyuan, Li, Liangzhi, Lin, Zijian, Yang, Tao, Quan, Xiaojun, and Wang, Qifan
Subjects: Computer Science - Computation and Language
Abstract: Efficient knowledge retrieval plays a pivotal role in ensuring the success of end-to-end task-oriented dialogue systems by facilitating the selection of relevant information necessary to fulfill user requests. However, current approaches generally integrate knowledge retrieval and response generation, which poses scalability challenges when dealing with extensive knowledge bases. Taking inspiration from open-domain question answering, we propose a retriever-generator architecture that harnesses a retriever to retrieve pertinent knowledge and a generator to generate system responses.~Due to the lack of retriever training labels, we propose relying on feedback from the generator as pseudo-labels to train the retriever. To achieve this, we introduce a dual-feedback mechanism that generates both positive and negative feedback based on the output of the generator. Our method demonstrates superior performance in task-oriented dialogue tasks, as evidenced by experimental results on three benchmark datasets., Comment: Accepted to EMNLP 2023 (Main Conference)
Published: 2023

15. Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration

Author: Wan, Fanqi, Huang, Xinting, Yang, Tao, Quan, Xiaojun, Bi, Wei, and Shi, Shuming
Subjects: Computer Science - Computation and Language
Abstract: Instruction-tuning can be substantially optimized through enhanced diversity, resulting in models capable of handling a broader spectrum of tasks. However, existing data employed for such tuning often exhibit an inadequate coverage of individual domains, limiting the scope for nuanced comprehension and interactions within these areas. To address this deficiency, we propose Explore-Instruct, a novel approach to enhance the data coverage to be used in domain-specific instruction-tuning through active exploration via Large Language Models (LLMs). Built upon representative domain use cases, Explore-Instruct explores a multitude of variations or possibilities by implementing a search algorithm to obtain diversified and domain-focused instruction-tuning data. Our data-centric analysis validates the effectiveness of this proposed approach in improving domain-specific instruction coverage. Moreover, our model's performance demonstrates considerable advancements over multiple baselines, including those utilizing domain-specific data enhancement. Our findings offer a promising opportunity to improve instruction coverage, especially in domain-specific contexts, thereby advancing the development of adaptable language models. Our code, model weights, and data are public at \url{https://github.com/fanqiwan/Explore-Instruct}., Comment: Accepted to EMNLP 2023 (Main Conference)
Published: 2023

16. Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System

Author: Shen, Weizhou, Gao, Yingqi, Huang, Canbin, Wan, Fanqi, Quan, Xiaojun, and Bi, Wei
Subjects: Computer Science - Computation and Language
Abstract: Developing an efficient retriever to retrieve knowledge from a large-scale knowledge base (KB) is critical for task-oriented dialogue systems to effectively handle localized and specialized tasks. However, widely used generative models such as T5 and ChatGPT often struggle to differentiate subtle differences among the retrieved KB records when generating responses, resulting in suboptimal quality of generated responses. In this paper, we propose the application of maximal marginal likelihood to train a perceptive retriever by utilizing signals from response generation for supervision. In addition, our approach goes beyond considering solely retrieved entities and incorporates various meta knowledge to guide the generator, thus improving the utilization of knowledge. We evaluate our approach on three task-oriented dialogue datasets using T5 and ChatGPT as the backbone models. The results demonstrate that when combined with meta knowledge, the response generator can effectively leverage high-quality knowledge records from the retriever and enhance the quality of generated responses. The codes and models of this paper are available at https://github.com/shenwzh3/MK-TOD., Comment: Accepted to EMNLP 2023 Main Conference
Published: 2023

17. Dual-aligned porous electrodes for enhanced hydrogen evolution in alkaline water electrolysis

Author: Zhang, Yuqi, Cui, Wenzhi, Li, Longjian, Wang, Chongbo, Zhan, Chen, and Quan, Xiaojun
Published: 2024
Full Text: View/download PDF

18. Disentangled Phonetic Representation for Chinese Spelling Correction

Author: Liang, Zihong, Quan, Xiaojun, and Wang, Qifan
Subjects: Computer Science - Computation and Language
Abstract: Chinese Spelling Correction (CSC) aims to detect and correct erroneous characters in Chinese texts. Although efforts have been made to introduce phonetic information (Hanyu Pinyin) in this task, they typically merge phonetic representations with character representations, which tends to weaken the representation effect of normal texts. In this work, we propose to disentangle the two types of features to allow for direct interaction between textual and phonetic information. To learn useful phonetic representations, we introduce a pinyin-to-character objective to ask the model to predict the correct characters based solely on phonetic information, where a separation mask is imposed to disable attention from phonetic input to text. To avoid overfitting the phonetics, we further design a self-distillation module to ensure that semantic information plays a major role in the prediction. Extensive experiments on three CSC benchmarks demonstrate the superiority of our method in using phonetic information., Comment: Accepted to ACL 2023 Main Conference
Published: 2023

19. Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog

Author: Wan, Fanqi, Shen, Weizhou, Yang, Ke, Quan, Xiaojun, and Bi, Wei
Subjects: Computer Science - Computation and Language
Abstract: Retrieving proper domain knowledge from an external database lies at the heart of end-to-end task-oriented dialog systems to generate informative responses. Most existing systems blend knowledge retrieval with response generation and optimize them with direct supervision from reference responses, leading to suboptimal retrieval performance when the knowledge base becomes large-scale. To address this, we propose to decouple knowledge retrieval from response generation and introduce a multi-grained knowledge retriever (MAKER) that includes an entity selector to search for relevant entities and an attribute selector to filter out irrelevant attributes. To train the retriever, we propose a novel distillation objective that derives supervision signals from the response generator. Experiments conducted on three standard benchmarks with both small and large-scale knowledge bases demonstrate that our retriever performs knowledge retrieval more effectively than existing methods. Our code has been made publicly available.\footnote{https://github.com/18907305772/MAKER}, Comment: Accepted to ACL 2023 (Main Conference)
Published: 2023

20. AD-KD: Attribution-Driven Knowledge Distillation for Language Model Compression

Author: Wu, Siyue, Chen, Hongzhan, Quan, Xiaojun, Wang, Qifan, and Wang, Rui
Subjects: Computer Science - Computation and Language
Abstract: Knowledge distillation has attracted a great deal of interest recently to compress pre-trained language models. However, existing knowledge distillation methods suffer from two limitations. First, the student model simply imitates the teacher's behavior while ignoring the underlying reasoning. Second, these methods usually focus on the transfer of sophisticated model-specific knowledge but overlook data-specific knowledge. In this paper, we present a novel attribution-driven knowledge distillation approach, which explores the token-level rationale behind the teacher model based on Integrated Gradients (IG) and transfers attribution knowledge to the student model. To enhance the knowledge transfer of model reasoning and generalization, we further explore multi-view attribution distillation on all potential decisions of the teacher. Comprehensive experiments are conducted with BERT on the GLUE benchmark. The experimental results demonstrate the superior performance of our approach to several state-of-the-art methods., Comment: Accepted to ACL 2023 Main Conference
Published: 2023

21. Clustering-Aware Negative Sampling for Unsupervised Sentence Representation

Author: Deng, Jinghao, Wan, Fanqi, Yang, Tao, Quan, Xiaojun, and Wang, Rui
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Contrastive learning has been widely studied in sentence representation learning. However, earlier works mainly focus on the construction of positive examples, while in-batch samples are often simply treated as negative examples. This approach overlooks the importance of selecting appropriate negative examples, potentially leading to a scarcity of hard negatives and the inclusion of false negatives. To address these issues, we propose ClusterNS (Clustering-aware Negative Sampling), a novel method that incorporates cluster information into contrastive learning for unsupervised sentence representation learning. We apply a modified K-means clustering algorithm to supply hard negatives and recognize in-batch false negatives during training, aiming to solve the two issues in one unified framework. Experiments on semantic textual similarity (STS) tasks demonstrate that our proposed ClusterNS compares favorably with baselines in unsupervised sentence representation learning. Our code has been made publicly available., Comment: accepted to Finding of ACL2023, 16 pages
Published: 2023

22. Generic Dependency Modeling for Multi-Party Conversation

Author: Shen, Weizhou, Quan, Xiaojun, and Yang, Ke
Subjects: Computer Science - Computation and Language
Abstract: To model the dependencies between utterances in multi-party conversations, we propose a simple and generic framework based on the dependency parsing results of utterances. Particularly, we present an approach to encoding the dependencies in the form of relative dependency encoding (ReDE) and illustrate how to implement it in Transformers by modifying the computation of self-attention. Experimental results on four multi-party conversation benchmarks show that this framework successfully boosts the general performance of two Transformer-based language models and leads to comparable or even superior performance compared to the state-of-the-art methods. The codes are available at https://github.com/shenwzh3/ReDE., Comment: Accepted to ICASSP 2023
Published: 2023

23. Orders Are Unwanted: Dynamic Deep Graph Convolutional Network for Personality Detection

Author: Yang, Tao, Deng, Jinghao, Quan, Xiaojun, and Wang, Qifan
Subjects: Computer Science - Computation and Language
Abstract: Predicting personality traits based on online posts has emerged as an important task in many fields such as social network analysis. One of the challenges of this task is assembling information from various posts into an overall profile for each user. While many previous solutions simply concatenate the posts into a long document and then encode the document by sequential or hierarchical models, they introduce unwarranted orders for the posts, which may mislead the models. In this paper, we propose a dynamic deep graph convolutional network (D-DGCN) to overcome the above limitation. Specifically, we design a learn-to-connect approach that adopts a dynamic multi-hop structure instead of a deterministic structure, and combine it with a DGCN module to automatically learn the connections between posts. The modules of post encoder, learn-to-connect, and DGCN are jointly trained in an end-to-end manner. Experimental results on the Kaggle and Pandora datasets show the superior performance of D-DGCN to state-of-the-art baselines. Our code is available at https://github.com/djz233/D-DGCN., Comment: AAAI2023 Camera-ready
Published: 2022

24. AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning

Author: Yang, Tao, Deng, Jinghao, Quan, Xiaojun, Wang, Qifan, and Nie, Shaoliang
Subjects: Computer Science - Computation and Language
Abstract: Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout proves to be an effective antidote by randomly dropping a proportion of units, existing research has not examined its effect on the self-attention mechanism. In this paper, we investigate this problem through self-attention attribution and find that dropping attention positions with low attribution scores can accelerate training and increase the risk of overfitting. Motivated by this observation, we propose Attribution-Driven Dropout (AD-DROP), which randomly discards some high-attribution positions to encourage the model to make predictions by relying more on low-attribution positions to reduce overfitting. We also develop a cross-tuning strategy to alternate fine-tuning and AD-DROP to avoid dropping high-attribution positions excessively. Extensive experiments on various benchmarks show that AD-DROP yields consistent improvements over baselines. Analysis further confirms that AD-DROP serves as a strategic regularizer to prevent overfitting during fine-tuning., Comment: Accepted to NeurIPS 2022
Published: 2022

25. XPrompt: Exploring the Extreme of Prompt Tuning

Author: Ma, Fang, Zhang, Chen, Ren, Lei, Wang, Jingang, Wang, Qifan, Wu, Wei, Quan, Xiaojun, and Song, Dawei
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Prompt tuning learns soft prompts to condition frozen Pre-trained Language Models (PLMs) for performing downstream tasks in a parameter-efficient manner. While prompt tuning has gradually reached the performance level of fine-tuning as the model scale increases, there is still a large performance gap between prompt tuning and fine-tuning for models of moderate and small scales (typically less than 11B parameters). In this paper, we empirically show that the trained prompt tokens can have a negative impact on a downstream task and thus degrade its performance. To bridge the gap, we propose a novel Prompt tuning model with an eXtremely small scale (XPrompt) under the regime of lottery tickets hypothesis. Specifically, XPrompt eliminates the negative prompt tokens at different granularity levels through a hierarchical structured pruning, yielding a more parameter-efficient prompt yet with a competitive performance. Comprehensive experiments are carried out on SuperGLUE tasks, and the extensive results indicate that XPrompt is able to close the performance gap at smaller model scales., Comment: 15 pages, accepted to EMNLP 2022 main conference
Published: 2022

26. Autoregressive Entity Generation for End-to-End Task-Oriented Dialog

Author: Huang, Guanhuan, Quan, Xiaojun, and Wang, Qifan
Subjects: Computer Science - Computation and Language
Abstract: Task-oriented dialog (TOD) systems often require interaction with an external knowledge base to retrieve necessary entity (e.g., restaurant) information to support the response generation. Most current end-to-end TOD systems either retrieve the KB information explicitly or embed it into model parameters for implicit access.~While the former approach demands scanning the KB at each turn of response generation, which is inefficient when the KB scales up, the latter approach shows higher flexibility and efficiency. In either approach, the systems may generate a response with conflicting entity information. To address this issue, we propose to generate the entity autoregressively first and leverage it to guide the response generation in an end-to-end system. To ensure entity consistency, we impose a trie constraint on entity generation. We also introduce a logit concatenation strategy to facilitate gradient backpropagation for end-to-end training. Experiments on MultiWOZ 2.1 single and CAMREST show that our system can generate more high-quality and entity-consistent responses., Comment: Accepted to COLING 2022
Published: 2022

27. UBARv2: Towards Mitigating Exposure Bias in Task-Oriented Dialogs

Author: Yang, Yunyi, Ding, Hong, Liu, Qingyi, and Quan, Xiaojun
Subjects: Computer Science - Computation and Language
Abstract: This paper studies the exposure bias problem in task-oriented dialog systems, where the model's generated content over multiple turns drives the dialog context away from the ground-truth distribution at training time, introducing error propagation and damaging the robustness of the TOD system. To bridge the gap between training and inference for multi-turn task-oriented dialogs, we propose session-level sampling which explicitly exposes the model to sampled generated content of dialog context during training. Additionally, we employ a dropout-based consistency regularization with the masking strategy R-Mask to further improve the robustness and performance of the model. The proposed UBARv2 achieves state-of-the-art performance on the standardized evaluation benchmark MultiWOZ and extensive experiments show the effectiveness of the proposed methods., Comment: 15 pages, 8 figures
Published: 2022

28. Joint Generator-Ranker Learning for Natural Language Generation

Author: Shen, Weizhou, Gong, Yeyun, Shen, Yelong, Wang, Song, Quan, Xiaojun, Duan, Nan, and Chen, Weizhu
Subjects: Computer Science - Computation and Language
Abstract: Generate-then-rank is a widely used mechanism for text generation, where a generator produces multiple text candidates and a ranker chooses the best one among the text candidates. However, existing methods usually train the generator and the ranker individually, neglecting the mutual feedback that could further enhance the generation quality. To tackle this limitation, we propose JGR, a novel joint training algorithm that integrates the generator and the ranker in a single framework. JGR optimizes the generator with a hybrid objective that combines data likelihood and ranker reward, and trains the ranker with a contrastive loss that compares the generator outputs. By iteratively updating the generator and the ranker, JGR can effectively harmonize their learning and enhance their quality jointly. We evaluate JGR on various text generation tasks and demonstrate that it surpasses existing methods on four public datasets across three common generation scenarios. Our code and models are publicly available at https://github.com/microsoft/ProphetNet/tree/master/JGR.
Published: 2022

29. GL-RG: Global-Local Representation Granularity for Video Captioning

Author: Yan, Liqi, Wang, Qifan, Cui, Yiming, Feng, Fuli, Quan, Xiaojun, Zhang, Xiangyu, and Liu, Dongfang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local representation across video frames for caption generation, leaving plenty of room for improvement. In this work, we approach the video captioning task from a new perspective and propose a GL-RG framework for video captioning, namely a \textbf{G}lobal-\textbf{L}ocal \textbf{R}epresentation \textbf{G}ranularity. Our GL-RG demonstrates three advantages over the prior efforts: 1) we explicitly exploit extensive visual representations from different video ranges to improve linguistic expression; 2) we devise a novel global-local encoder to produce rich semantic vocabulary to obtain a descriptive granularity of video contents across frames; 3) we develop an incremental training strategy which organizes model learning in an incremental fashion to incur an optimal captioning behavior. Experimental results on the challenging MSR-VTT and MSVD datasets show that our DL-RG outperforms recent state-of-the-art methods by a significant margin. Code is available at \url{https://github.com/ylqi/GL-RG}., Comment: Accepted to IJCAI 2022
Published: 2022
Full Text: View/download PDF

30. Deep Partial Multiplex Network Embedding

Author: Wang, Qifan, Fang, Yi, Ravula, Anirudh, He, Ruining, Shen, Bin, Wang, Jingang, Quan, Xiaojun, and Liu, Dongfang
Subjects: Computer Science - Machine Learning, Computer Science - Social and Information Networks
Abstract: Network embedding is an effective technique to learn the low-dimensional representations of nodes in networks. Real-world networks are usually with multiplex or having multi-view representations from different relations. Recently, there has been increasing interest in network embedding on multiplex data. However, most existing multiplex approaches assume that the data is complete in all views. But in real applications, it is often the case that each view suffers from the missing of some data and therefore results in partial multiplex data. In this paper, we present a novel Deep Partial Multiplex Network Embedding approach to deal with incomplete data. In particular, the network embeddings are learned by simultaneously minimizing the deep reconstruction loss with the autoencoder neural network, enforcing the data consistency across views via common latent subspace learning, and preserving the data topological structure within the same network through graph Laplacian. We further prove the orthogonal invariant property of the learned embeddings and connect our approach with the binary embedding techniques. Experiments on four multiplex benchmarks demonstrate the superior performance of the proposed approach over several state-of-the-art methods on node classification, link prediction and clustering tasks., Comment: Accepted to WWW 2022 GL workshop
Published: 2022

31. Bubble transport characteristic on hydrogen evolution reaction of aligned porous electrode

Author: Zhang, Yuqi, Cui, Wenzhi, Li, Longjian, Wang, Chongbo, Zhan, Chen, Wang, Zhanpeng, and Quan, Xiaojun
Published: 2024
Full Text: View/download PDF

32. WebFormer: The Web-page Transformer for Structure Information Extraction

Author: Wang, Qifan, Fang, Yi, Ravula, Anirudh, Feng, Fuli, Quan, Xiaojun, and Liu, Dongfang
Subjects: Computer Science - Computation and Language
Abstract: Structure information extraction refers to the task of extracting structured text fields from web pages, such as extracting a product offer from a shopping page including product title, description, brand and price. It is an important research topic which has been widely studied in document understanding and web search. Recent natural language models with sequence modeling have demonstrated state-of-the-art performance on web information extraction. However, effectively serializing tokens from unstructured web pages is challenging in practice due to a variety of web layout patterns. Limited work has focused on modeling the web layout for extracting the text fields. In this paper, we introduce WebFormer, a Web-page transFormer model for structure information extraction from web documents. First, we design HTML tokens for each DOM node in the HTML by embedding representations from their neighboring tokens through graph attention. Second, we construct rich attention patterns between HTML tokens and text tokens, which leverages the web layout for effective attention weight computation. We conduct an extensive set of experiments on SWDE and Common Crawl benchmarks. Experimental results demonstrate the superior performance of the proposed approach over several state-of-the-art methods., Comment: Accepted to WWW 2022
Published: 2022

33. Single track deposition of lunar regolith without substrate based on millimeter-sized spot

Author: Shen, Tianrun, Zhang, Hui, Wang, Chao, Zhang, Xian, Yao, Wei, and Quan, Xiaojun
Published: 2024
Full Text: View/download PDF

34. Psycholinguistic Tripartite Graph Network for Personality Detection

Author: Yang, Tao, Yang, Feifan, Ouyang, Haolan, and Quan, Xiaojun
Subjects: Computer Science - Computation and Language
Abstract: Most of the recent work on personality detection from online posts adopts multifarious deep neural networks to represent the posts and builds predictive models in a data-driven manner, without the exploitation of psycholinguistic knowledge that may unveil the connections between one's language usage and his psychological traits. In this paper, we propose a psycholinguistic knowledge-based tripartite graph network, TrigNet, which consists of a tripartite graph network and a BERT-based graph initializer. The graph network injects structural psycholinguistic knowledge from LIWC, a computerized instrument for psycholinguistic analysis, by constructing a heterogeneous tripartite graph. The graph initializer is employed to provide initial embeddings for the graph nodes. To reduce the computational cost in graph learning, we further propose a novel flow graph attention network (GAT) that only transmits messages between neighboring parties in the tripartite graph. Benefiting from the tripartite graph, TrigNet can aggregate post information from a psychological perspective, which is a novel way of exploiting domain knowledge. Extensive experiments on two datasets show that TrigNet outperforms the existing state-of-art model by 3.47 and 2.10 points in average F1. Moreover, the flow GAT reduces the FLOPS and Memory measures by 38% and 32%, respectively, in comparison to the original GAT in our setting., Comment: Accepted by ACL 2021
Published: 2021

35. Retrieve & Memorize: Dialog Policy Learning with Multi-Action Memory

Author: Li, Yunhao, Yang, Yunyi, Quan, Xiaojun, and Yu, Jianxing
Subjects: Computer Science - Computation and Language
Abstract: Dialogue policy learning, a subtask that determines the content of system response generation and then the degree of task completion, is essential for task-oriented dialogue systems. However, the unbalanced distribution of system actions in dialogue datasets often causes difficulty in learning to generate desired actions and responses. In this paper, we propose a retrieve-and-memorize framework to enhance the learning of system actions. Specially, we first design a neural context-aware retrieval module to retrieve multiple candidate system actions from the training set given a dialogue context. Then, we propose a memory-augmented multi-decoder network to generate the system actions conditioned on the candidate actions, which allows the network to adaptively select key information in the candidate actions and ignore noises. We conduct experiments on the large-scale multi-domain task-oriented dialogue dataset MultiWOZ 2.0 and MultiWOZ 2.1. Experimental results show that our method achieves competitive performance among several state-of-the-art models in the context-to-response generation task., Comment: Acceptdd to ACL2021 Findings
Published: 2021

36. Bi-Granularity Contrastive Learning for Post-Training in Few-Shot Scene

Author: Luo, Ruikun, Huang, Guanhuan, and Quan, Xiaojun
Subjects: Computer Science - Computation and Language
Abstract: The major paradigm of applying a pre-trained language model to downstream tasks is to fine-tune it on labeled task data, which often suffers instability and low performance when the labeled examples are scarce.~One way to alleviate this problem is to apply post-training on unlabeled task data before fine-tuning, adapting the pre-trained model to target domains by contrastive learning that considers either token-level or sequence-level similarity. Inspired by the success of sequence masking, we argue that both token-level and sequence-level similarities can be captured with a pair of masked sequences.~Therefore, we propose complementary random masking (CRM) to generate a pair of masked sequences from an input sequence for sequence-level contrastive learning and then develop contrastive masked language modeling (CMLM) for post-training to integrate both token-level and sequence-level contrastive learnings.~Empirical results show that CMLM surpasses several recent post-training methods in few-shot settings without the need for data augmentation.
Published: 2021

37. Directed Acyclic Graph Network for Conversational Emotion Recognition

Author: Shen, Weizhou, Wu, Siyue, Yang, Yunyi, and Quan, Xiaojun
Subjects: Computer Science - Computation and Language
Abstract: The modeling of conversational context plays a vital role in emotion recognition from conversation (ERC). In this paper, we put forward a novel idea of encoding the utterances with a directed acyclic graph (DAG) to better model the intrinsic structure within a conversation, and design a directed acyclic neural network, namely DAG-ERC, to implement this idea. In an attempt to combine the strengths of conventional graph-based neural models and recurrence-based neural models, DAG-ERC provides a more intuitive way to model the information flow between long-distance conversation background and nearby context. Extensive experiments are conducted on four ERC benchmarks with state-of-the-art models employed as baselines for comparison. The empirical results demonstrate the superiority of this new model and confirm the motivation of the directed acyclic graph architecture for ERC., Comment: Accepted to ACL-IJCNLP 2021 main conference
Published: 2021

38. Syntax-Enhanced Pre-trained Model

Author: Xu, Zenan, Guo, Daya, Tang, Duyu, Su, Qinliang, Shou, Linjun, Gong, Ming, Zhong, Wanjun, Quan, Xiaojun, Duan, Nan, and Jiang, Daxin
Subjects: Computer Science - Computation and Language
Abstract: We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa. Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages. Such a problem would lead to the necessity of having human-annotated syntactic information, which limits the application of existing methods to broader scenarios. To address this, we present a model that utilizes the syntax of text in both pre-training and fine-tuning stages. Our model is based on Transformer with a syntax-aware attention layer that considers the dependency tree of the text. We further introduce a new pre-training task of predicting the syntactic distance among tokens in the dependency tree. We evaluate the model on three downstream tasks, including relation classification, entity typing, and question answering. Results show that our model achieves state-of-the-art performance on six public benchmark datasets. We have two major findings. First, we demonstrate that infusing automatically produced syntax of text improves pre-trained models. Second, global syntactic distances among tokens bring larger performance gains compared to local head relations between contiguous tokens., Comment: Accepted by ACL-IJCNLP 2021: The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing
Published: 2020

39. DialogXL: All-in-One XLNet for Multi-Party Conversation Emotion Recognition

Author: Shen, Weizhou, Chen, Junqing, Quan, Xiaojun, and Xie, Zhixian
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: This paper presents our pioneering effort for emotion recognition in conversation (ERC) with pre-trained language models. Unlike regular documents, conversational utterances appear alternately from different parties and are usually organized as hierarchical structures in previous work. Such structures are not conducive to the application of pre-trained language models such as XLNet. To address this issue, we propose an all-in-one XLNet model, namely DialogXL, with enhanced memory to store longer historical context and dialog-aware self-attention to deal with the multi-party structures. Specifically, we first modify the recurrence mechanism of XLNet from segment-level to utterance-level in order to better model the conversational data. Second, we introduce dialog-aware self-attention in replacement of the vanilla self-attention in XLNet to capture useful intra- and inter-speaker dependencies. Extensive experiments are conducted on four ERC benchmarks with mainstream models presented for comparison. The experimental results show that the proposed model outperforms the baselines on all the datasets. Several other experiments such as ablation study and error analysis are also conducted and the results confirm the role of the critical modules of DialogXL., Comment: Accepted by AAAI 2021 main conference
Published: 2020

40. UBAR: Towards Fully End-to-End Task-Oriented Dialog Systems with GPT-2

Author: Yang, Yunyi, Li, Yunhao, and Quan, Xiaojun
Subjects: Computer Science - Computation and Language
Abstract: This paper presents our task-oriented dialog system UBAR which models task-oriented dialogs on a dialog session level. Specifically, UBAR is acquired by fine-tuning the large pre-trained unidirectional language model GPT-2 on the sequence of the entire dialog session which is composed of user utterance, belief state, database result, system act, and system response of every dialog turn. Additionally, UBAR is evaluated in a more realistic setting, where its dialog context has access to user utterances and all content it generated such as belief states, system acts, and system responses. Experimental results on the MultiWOZ datasets show that UBAR achieves state-of-the-art performances in multiple settings, improving the combined score of response generation, policy optimization, and end-to-end modeling by 4.7, 3.5, and 9.4 points respectively. Thorough analyses demonstrate that the session-level training sequence formulation and the generated dialog context are essential for UBAR to operate as a fully end-to-end task-oriented dialog system in real life. We also examine the transfer ability of UBAR to new domains with limited data and provide visualization and a case study to illustrate the advantages of UBAR in modeling on a dialog session level., Comment: Accepted by AAAI 2021
Published: 2020

41. A Unified Generation Approach for Robust Dialogue State Tracking

Author: Lin, Zijian, Guo, Beizhang, Shi, Tianyuan, Li, Yunhao, Quan, Xiaojun, Li, Liangzhi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Fei, editor, Duan, Nan, editor, Xu, Qingting, editor, and Hong, Yu, editor
Published: 2023
Full Text: View/download PDF

42. Enhanced liquid replenishment for pool boiling heat transfer and its CHF mechanism on patterned freeze-casted surfaces

Author: Wang, Dan, Lin, Tao, and Quan, Xiaojun
Published: 2024
Full Text: View/download PDF

43. Embedding Dynamic Attributed Networks by Modeling the Evolution Processes

Author: Xu, Zenan, Ou, Zijing, Su, Qinliang, Yu, Jianxing, Quan, Xiaojun, and Lin, Zhenkun
Subjects: Computer Science - Social and Information Networks
Abstract: Network embedding has recently emerged as a promising technique to embed nodes of a network into low-dimensional vectors. While fairly successful, most existing works focus on the embedding techniques for static networks. But in practice, there are many networks that are evolving over time and hence are dynamic, e.g., the social networks. To address this issue, a high-order spatio-temporal embedding model is developed to track the evolutions of dynamic networks. Specifically, an activeness-aware neighborhood embedding method is first proposed to extract the high-order neighborhood information at each given timestamp. Then, an embedding prediction framework is further developed to capture the temporal correlations, in which the attention mechanism is employed instead of recurrent neural networks (RNNs) for its efficiency in computing and flexibility in modeling. Extensive experiments are conducted on four real-world datasets from three different areas. It is shown that the proposed method outperforms all the baselines by a substantial margin for the tasks of dynamic link prediction and node classification, which demonstrates the effectiveness of the proposed methods on tracking the evolutions of dynamic networks., Comment: Accepted by COLING 2020 : The 28th International Conference on Computational Linguistics
Published: 2020

44. Conditional Augmentation for Aspect Term Extraction via Masked Sequence-to-Sequence Generation

Author: Li, Kun, Chen, Chengbo, Quan, Xiaojun, Ling, Qing, and Song, Yan
Subjects: Computer Science - Computation and Language
Abstract: Aspect term extraction aims to extract aspect terms from review texts as opinion targets for sentiment analysis. One of the big challenges with this task is the lack of sufficient annotated data. While data augmentation is potentially an effective technique to address the above issue, it is uncontrollable as it may change aspect words and aspect labels unexpectedly. In this paper, we formulate the data augmentation as a conditional generation task: generating a new sentence while preserving the original opinion targets and labels. We propose a masked sequence-to-sequence method for conditional augmentation of aspect term extraction. Unlike existing augmentation approaches, ours is controllable and allows us to generate more diversified sentences. Experimental results confirm that our method alleviates the data scarcity problem significantly. It also effectively boosts the performances of several current models for aspect term extraction., Comment: To appear at ACL 2020
Published: 2020

45. Multi-Domain Dialogue Acts and Response Co-Generation

Author: Wang, Kai, Tian, Junfeng, Wang, Rui, Quan, Xiaojun, and Yu, Jianxing
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Generating fluent and informative responses is of critical importance for task-oriented dialogue systems. Existing pipeline approaches generally predict multiple dialogue acts first and use them to assist response generation. There are at least two shortcomings with such approaches. First, the inherent structures of multi-domain dialogue acts are neglected. Second, the semantic associations between acts and responses are not taken into account for response generation. To address these issues, we propose a neural co-generation model that generates dialogue acts and responses concurrently. Unlike those pipeline approaches, our act generation module preserves the semantic structures of multi-domain dialogue acts and our response generation module dynamically attends to different acts as needed. We train the two modules jointly using an uncertainty loss to adjust their task weights adaptively. Extensive experiments are conducted on the large-scale MultiWOZ dataset and the results show that our model achieves very favorable improvement over several state-of-the-art models in both automatic and human evaluations., Comment: To appear at ACL 2020
Published: 2020

46. Relational Graph Attention Network for Aspect-based Sentiment Analysis

Author: Wang, Kai, Shen, Weizhou, Yang, Yunyi, Quan, Xiaojun, and Wang, Rui
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Aspect-based sentiment analysis aims to determine the sentiment polarity towards a specific aspect in online reviews. Most recent efforts adopt attention-based neural network models to implicitly connect aspects with opinion words. However, due to the complexity of language and the existence of multiple aspects in a single sentence, these models often confuse the connections. In this paper, we address this problem by means of effective encoding of syntax information. Firstly, we define a unified aspect-oriented dependency tree structure rooted at a target aspect by reshaping and pruning an ordinary dependency parse tree. Then, we propose a relational graph attention network (R-GAT) to encode the new tree structure for sentiment prediction. Extensive experiments are conducted on the SemEval 2014 and Twitter datasets, and the experimental results confirm that the connections between aspects and opinion words can be better established with our approach, and the performance of the graph attention network (GAT) is significantly improved as a consequence., Comment: To appear at ACL 2020
Published: 2020

47. Effect of aligned porous electrode thickness and pore size on bubble removal capability and hydrogen evolution reaction performance

Author: Zhang, Yuqi, Cui, Wenzhi, Li, Longjian, Zhan, Chen, Xiao, Fei, and Quan, Xiaojun
Published: 2023
Full Text: View/download PDF

48. A Deep Neural Information Fusion Architecture for Textual Network Embeddings

Author: Xu, Zenan, Su, Qinliang, Quan, Xiaojun, and Zhang, Weijia
Subjects: Computer Science - Social and Information Networks, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Textual network embeddings aim to learn a low-dimensional representation for every node in the network so that both the structural and textual information from the networks can be well preserved in the representations. Traditionally, the structural and textual embeddings were learned by models that rarely take the mutual influences between them into account. In this paper, a deep neural architecture is proposed to effectively fuse the two kinds of informations into one representation. The novelties of the proposed architecture are manifested in the aspects of a newly defined objective function, the complementary information fusion method for structural and textual features, and the mutual gate mechanism for textual feature extraction. Experimental results show that the proposed model outperforms the comparing methods on all three datasets., Comment: To appear at EMNLP-IJCNLP 2019 (Conference on Empirical Methods in Natural Language Processing & International Joint Conference on Natural Language Processing 2019)
Published: 2019

49. BiSET: Bi-directional Selective Encoding with Template for Abstractive Summarization

Author: Wang, Kai, Quan, Xiaojun, and Wang, Rui
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: The success of neural summarization models stems from the meticulous encodings of source articles. To overcome the impediments of limited and sometimes noisy training data, one promising direction is to make better use of the available training data by applying filters during summarization. In this paper, we propose a novel Bi-directional Selective Encoding with Template (BiSET) model, which leverages template discovered from training data to softly select key information from each source article to guide its summarization process. Extensive experiments on a standard summarization dataset were conducted and the results show that the template-equipped BiSET model manages to improve the summarization performance significantly with a new state of the art., Comment: The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)
Published: 2019

50. In-situ visualization of powder wrapping behavior in millimeter-scale-beam lunar regolith powder bed fusion

Author: Shen, Tianrun, Yao, Wei, and Quan, Xiaojun
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

398 results on '"Quan, Xiaojun"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources