Author: "Mao, Xian-ling" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Mao, Xian-ling"' showing total 337 results

Start Over Author "Mao, Xian-ling"

337 results on '"Mao, Xian-ling"'

1. Training Language Models to Critique With Multi-agent Feedback

Author: Lan, Tian, Zhang, Wenwei, Lyu, Chengqi, Li, Shuaibin, Xu, Chen, Huang, Heyan, Lin, Dahua, Mao, Xian-Ling, and Chen, Kai
Subjects: Computer Science - Computation and Language
Abstract: Critique ability, a meta-cognitive capability of humans, presents significant challenges for LLMs to improve. Recent works primarily rely on supervised fine-tuning (SFT) using critiques generated by a single LLM like GPT-4. However, these model-generated critiques often exhibit flaws due to the inherent complexity of the critique. Consequently, fine-tuning LLMs on such flawed critiques typically limits the model's performance and propagates these flaws into the learned model. To overcome these challenges, this paper proposes a novel data generation pipeline, named MultiCritique, that improves the critique ability of LLMs by utilizing multi-agent feedback in both the SFT and reinforcement learning (RL) stages. First, our data generation pipeline aggregates high-quality critiques from multiple agents instead of a single model, with crucial information as input for simplifying the critique. Furthermore, our pipeline improves the preference accuracy of critique quality through multi-agent feedback, facilitating the effectiveness of RL in improving the critique ability of LLMs. Based on our proposed MultiCritique data generation pipeline, we construct the MultiCritiqueDataset for the SFT and RL fine-tuning stages. Extensive experimental results on two benchmarks demonstrate: 1) the superior quality of our constructed SFT dataset compared to existing critique datasets; 2) additional improvements to the critique ability of LLMs brought by the RL stage. Notably, our fine-tuned 7B model significantly surpasses other advanced 7B-13B open-source models, approaching the performance of advanced 70B LLMs and GPT-4. Codes, datasets and model weights will be publicly available.
Published: 2024

2. Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models

Author: Lu, Yi-Fan, Mao, Xian-Ling, Lan, Tian, Xu, Chen, and Huang, Heyan
Subjects: Computer Science - Computation and Language
Abstract: Event extraction has gained extensive research attention due to its broad range of applications. However, the current mainstream evaluation method for event extraction relies on token-level exact match, which misjudges numerous semantic-level correct cases. This reliance leads to a significant discrepancy between the evaluated performance of models under exact match criteria and their real performance. To address this problem, we propose RAEE, an automatic evaluation framework that accurately assesses event extraction results at semantic-level instead of token-level. Specifically, RAEE leverages Large Language Models (LLMs) as automatic evaluation agents, incorporating chain-of-thought prompting and an adaptive mechanism to achieve interpretable and adaptive evaluations for precision and recall of triggers and arguments. Extensive experimental results demonstrate that: (1) RAEE achieves a very high correlation with the human average; (2) after reassessing 14 models, including advanced LLMs, on 10 datasets, there is a significant performance gap between exact match and RAEE. The exact match evaluation significantly underestimates the performance of existing event extraction models, particularly underestimating the capabilities of LLMs; (3) fine-grained analysis under RAEE evaluation reveals insightful phenomena worth further exploration. The evaluation toolkit of our proposed RAEE will be publicly released.
Published: 2024

3. EXCEEDS: Extracting Complex Events as Connecting the Dots to Graphs in Scientific Domain

Author: Lu, Yi-Fan, Mao, Xian-Ling, Wang, Bo, Liu, Xiao, and Huang, Heyan
Subjects: Computer Science - Computation and Language
Abstract: It is crucial to utilize events to understand a specific domain. There are lots of research on event extraction in many domains such as news, finance and biology domain. However, scientific domain still lacks event extraction research, including comprehensive datasets and corresponding methods. Compared to other domains, scientific domain presents two characteristics: denser nuggets and more complex events. To solve the above problem, considering these two characteristics, we first construct SciEvents, a large-scale multi-event document-level dataset with a schema tailored for scientific domain. It has 2,508 documents and 24,381 events under refined annotation and quality control. Then, we propose EXCEEDS, a novel end-to-end scientific event extraction framework by storing dense nuggets in a grid matrix and simplifying complex event extraction into a dot construction and connection task. Experimental results demonstrate state-of-the-art performances of EXCEEDS on SciEvents. Additionally, we release SciEvents and EXCEEDS on GitHub., Comment: This paper is working in process
Published: 2024

4. Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue

Author: Fan, Shixuan, Wei, Wei, Li, Wendi, Mao, Xian-Ling, Xie, Wenfeng, and Chen, Dangyang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The core of the dialogue system is to generate relevant, informative, and human-like responses based on extensive dialogue history. Recently, dialogue generation domain has seen mainstream adoption of large language models (LLMs), due to its powerful capability in generating utterances. However, there is a natural deficiency for such models, that is, inherent position bias, which may lead them to pay more attention to the nearby utterances instead of causally relevant ones, resulting in generating irrelevant and generic responses in long-term dialogue. To alleviate such problem, in this paper, we propose a novel method, named Causal Perception long-term Dialogue framework (CPD), which employs perturbation-based causal variable discovery method to extract casually relevant utterances from the dialogue history and enhances model causal perception during fine-tuning. Specifically, a local-position awareness method is proposed in CPD for inter-sentence position correlation elimination, which helps models extract causally relevant utterances based on perturbations. Then, a casual-perception fine-tuning strategy is also proposed, to enhance the capability of discovering the causal invariant factors, by differently perturbing causally relevant and non-casually relevant ones for response generation. Experimental results on two datasets prove that our proposed method can effectively alleviate the position bias for multiple LLMs and achieve significant progress compared with existing baselines., Comment: Accepted to IJCAI 2024
Published: 2024

5. Mix-Initiative Response Generation with Dynamic Prefix Tuning

Author: Nie, Yuxiang, Huang, Heyan, Mao, Xian-Ling, and Liao, Lizi
Subjects: Computer Science - Computation and Language
Abstract: Mixed initiative serves as one of the key factors in controlling conversation directions. For a speaker, responding passively or leading proactively would result in rather different responses. However, most dialogue systems focus on training a holistic response generation model without any distinction among different initiatives. It leads to the cross-contamination problem, where the model confuses different initiatives and generates inappropriate responses. Moreover, obtaining plenty of human annotations for initiative labels can be expensive. To address this issue, we propose a general mix-Initiative Dynamic Prefix Tuning framework (IDPT) to decouple different initiatives from the generation model, which learns initiative-aware prefixes in both supervised and unsupervised settings. Specifically, IDPT decouples initiative factors into different prefix parameters and uses the attention mechanism to adjust the selection of initiatives in guiding generation dynamically. The prefix parameters can be tuned towards accurate initiative prediction as well as mix-initiative response generation. Extensive experiments on two public dialogue datasets show that the proposed IDPT outperforms previous baselines on both automatic metrics and human evaluations. It also manages to generate appropriate responses with manipulated initiatives., Comment: Accepted to the main conference of NAACL 2024
Published: 2024

6. ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training

Author: Zhuo, Le, Chi, Zewen, Xu, Minghao, Huang, Heyan, Zheng, Heqi, He, Conghui, Mao, Xian-Ling, and Zhang, Wentao
Subjects: Quantitative Biology - Biomolecules, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: We propose ProtLLM, a versatile cross-modal large language model (LLM) for both protein-centric and protein-language tasks. ProtLLM features a unique dynamic protein mounting mechanism, enabling it to handle complex inputs where the natural language text is interspersed with an arbitrary number of proteins. Besides, we propose the protein-as-word language modeling approach to train ProtLLM. By developing a specialized protein vocabulary, we equip the model with the capability to predict not just natural language but also proteins from a vast pool of candidates. Additionally, we construct a large-scale interleaved protein-text dataset, named InterPT, for pre-training. This dataset comprehensively encompasses both (1) structured data sources like protein annotations and (2) unstructured data sources like biological research papers, thereby endowing ProtLLM with crucial knowledge for understanding proteins. We evaluate ProtLLM on classic supervised protein-centric tasks and explore its novel protein-language applications. Experimental results demonstrate that ProtLLM not only achieves superior performance against protein-specialized baselines on protein-centric tasks but also induces zero-shot and in-context learning capabilities on protein-language tasks., Comment: https://protllm.github.io/project/
Published: 2024

7. CriticEval: Evaluating Large Language Model as Critic

Author: Lan, Tian, Zhang, Wenwei, Xu, Chen, Huang, Heyan, Lin, Dahua, Chen, Kai, and Mao, Xian-ling
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Critique ability, i.e., the capability of Large Language Models (LLMs) to identify and rectify flaws in responses, is crucial for their applications in self-improvement and scalable oversight. While numerous studies have been proposed to evaluate critique ability of LLMs, their comprehensiveness and reliability are still limited. To overcome this problem, we introduce CriticEval, a novel benchmark designed to comprehensively and reliably evaluate critique ability of LLMs. Specifically, to ensure the comprehensiveness, CriticEval evaluates critique ability from four dimensions across nine diverse task scenarios. It evaluates both scalar-valued and textual critiques, targeting responses of varying quality. To ensure the reliability, a large number of critiques are annotated to serve as references, enabling GPT-4 to evaluate textual critiques reliably. Extensive evaluations of open-source and closed-source LLMs first validate the reliability of evaluation in CriticEval. Then, experimental results demonstrate the promising potential of open-source LLMs, the effectiveness of critique datasets and several intriguing relationships between the critique ability and some critical factors, including task types, response qualities and critique dimensions.
Published: 2024

8. Multi-view Hypergraph Contrastive Policy Learning for Conversational Recommendation

Author: Zhao, Sen, Wei, Wei, Mao, Xian-Ling, Zhu, Shuai, Yang, Minghui, Wen, Zujie, Chen, Dangyang, and Zhu, Feida
Subjects: Computer Science - Information Retrieval
Abstract: Conversational recommendation systems (CRS) aim to interactively acquire user preferences and accordingly recommend items to users. Accurately learning the dynamic user preferences is of crucial importance for CRS. Previous works learn the user preferences with pairwise relations from the interactive conversation and item knowledge, while largely ignoring the fact that factors for a relationship in CRS are multiplex. Specifically, the user likes/dislikes the items that satisfy some attributes (Like/Dislike view). Moreover social influence is another important factor that affects user preference towards the item (Social view), while is largely ignored by previous works in CRS. The user preferences from these three views are inherently different but also correlated as a whole. The user preferences from the same views should be more similar than that from different views. The user preferences from Like View should be similar to Social View while different from Dislike View. To this end, we propose a novel model, namely Multi-view Hypergraph Contrastive Policy Learning (MHCPL). Specifically, MHCPL timely chooses useful social information according to the interactive history and builds a dynamic hypergraph with three types of multiplex relations from different views. The multiplex relations in each view are successively connected according to their generation order.
Published: 2023

9. TREA: Tree-Structure Reasoning Schema for Conversational Recommendation

Author: Li, Wendi, Wei, Wei, Qu, Xiaoye, Mao, Xian-Ling, Yuan, Ye, Xie, Wenfeng, and Chen, Dangyang
Subjects: Computer Science - Artificial Intelligence
Abstract: Conversational recommender systems (CRS) aim to timely trace the dynamic interests of users through dialogues and generate relevant responses for item recommendations. Recently, various external knowledge bases (especially knowledge graphs) are incorporated into CRS to enhance the understanding of conversation contexts. However, recent reasoning-based models heavily rely on simplified structures such as linear structures or fixed-hierarchical structures for causality reasoning, hence they cannot fully figure out sophisticated relationships among utterances with external knowledge. To address this, we propose a novel Tree structure Reasoning schEmA named TREA. TREA constructs a multi-hierarchical scalable tree as the reasoning structure to clarify the causal relationships between mentioned entities, and fully utilizes historical conversations to generate more reasonable and suitable responses for recommended results. Extensive experiments on two public CRS datasets have demonstrated the effectiveness of our approach., Comment: Accepted by ACL2023 main conference
Published: 2023

10. Copy Is All You Need

Author: Lan, Tian, Cai, Deng, Wang, Yan, Huang, Heyan, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The dominant text generation models compose the output by sequentially selecting words from a fixed vocabulary. In this paper, we formulate text generation as progressively copying text segments (e.g., words or phrases) from an existing text collection. We compute the contextualized representations of meaningful text segments and index them using efficient vector search toolkits. The task of text generation is then decomposed into a series of copy-and-paste operations: at each time step, we seek suitable text spans from the text collection rather than selecting from a standalone vocabulary. Experiments on the standard language modeling benchmark (WikiText-103) show that our approach achieves better generation quality according to both automatic and human evaluations. Besides, its inference efficiency is comparable to token-level autoregressive models thanks to the reduction of decoding steps. We also show that our approach allows for effective domain adaptation by simply switching to domain-specific text collection without extra training. Finally, we observe that our approach attains additional performance gains by simply scaling up to larger text collections, again without further training.\footnote{Our source codes are publicly available at \url{https://github.com/gmftbyGMFTBY/Copyisallyouneed}.}
Published: 2023

11. SciMRC: Multi-perspective Scientific Machine Reading Comprehension

Author: Zhang, Xiao, Zheng, Heqi, Nie, Yuxiang, Huang, Heyan, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: Scientific machine reading comprehension (SMRC) aims to understand scientific texts through interactions with humans by given questions. As far as we know, there is only one dataset focused on exploring full-text scientific machine reading comprehension. However, the dataset has ignored the fact that different readers may have different levels of understanding of the text, and only includes single-perspective question-answer pairs, leading to a lack of consideration of different perspectives. To tackle the above problem, we propose a novel multi-perspective SMRC dataset, called SciMRC, which includes perspectives from beginners, students and experts. Our proposed SciMRC is constructed from 741 scientific papers and 6,057 question-answer pairs. Each perspective of beginners, students and experts contains 3,306, 1,800 and 951 QA pairs, respectively. The extensive experiments on SciMRC by utilizing pre-trained models suggest the importance of considering perspectives of SMRC, and demonstrate its challenging nature for machine comprehension.
Published: 2023

12. An Empirical Study on the Language Modal in Visual Question Answering

Author: Peng, Daowan, Wei, Wei, Mao, Xian-Ling, Fu, Yuanyuan, and Chen, Dangyang
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Generalization beyond in-domain experience to out-of-distribution data is of paramount significance in the AI domain. Of late, state-of-the-art Visual Question Answering (VQA) models have shown impressive performance on in-domain data, partially due to the language priors bias which, however, hinders the generalization ability in practice. This paper attempts to provide new insights into the influence of language modality on VQA performance from an empirical study perspective. To achieve this, we conducted a series of experiments on six models. The results of these experiments revealed that, 1) apart from prior bias caused by question types, there is a notable influence of postfix-related bias in inducing biases, and 2) training VQA models with word-sequence-related variant questions demonstrated improved performance on the out-of-distribution benchmark, and the LXMERT even achieved a 10-point gain without adopting any debiasing methods. We delved into the underlying reasons behind these experimental results and put forward some simple proposals to reduce the models' dependency on language priors. The experimental results demonstrated the effectiveness of our proposed method in improving performance on the out-of-distribution benchmark, VQA-CPv2. We hope this study can inspire novel insights for future research on designing bias-reduction approaches., Comment: Accepted by IJCAI2023
Published: 2023
Full Text: View/download PDF

13. Measuring Cross-Lingual Transferability of Multilingual Transformers on Sentence Classification

Author: Chi, Zewen, Huang, Heyan, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: Recent studies have exhibited remarkable capabilities of pre-trained multilingual Transformers, especially cross-lingual transferability. However, current methods do not measure cross-lingual transferability well, hindering the understanding of multilingual Transformers. In this paper, we propose IGap, a cross-lingual transferability metric for multilingual Transformers on sentence classification tasks. IGap takes training error into consideration, and can also estimate transferability without end-task data. Experimental results show that IGap outperforms baseline metrics for transferability measuring and transfer direction ranking. Besides, we conduct extensive systematic experiments where we compare transferability among various multilingual Transformers, fine-tuning algorithms, and transfer directions. More importantly, our results reveal three findings about cross-lingual transfer, which helps us to better understand multilingual Transformers.
Published: 2023

14. Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning

Author: Zhao, Sen, Wei, Wei, Liu, Yifan, Wang, Ziyang, Li, Wendi, Mao, Xian-Ling, Zhu, Shuai, Yang, Minghui, and Wen, Zujie
Subjects: Computer Science - Information Retrieval
Abstract: Conversational recommendation systems (CRS) aim to timely and proactively acquire user dynamic preferred attributes through conversations for item recommendation. In each turn of CRS, there naturally have two decision-making processes with different roles that influence each other: 1) director, which is to select the follow-up option (i.e., ask or recommend) that is more effective for reducing the action space and acquiring user preferences; and 2) actor, which is to accordingly choose primitive actions (i.e., asked attribute or recommended item) that satisfy user preferences and give feedback to estimate the effectiveness of the director's option. However, existing methods heavily rely on a unified decision-making module or heuristic rules, while neglecting to distinguish the roles of different decision procedures, as well as the mutual influences between them. To address this, we propose a novel Director-Actor Hierarchical Conversational Recommender (DAHCR), where the director selects the most effective option, followed by the actor accordingly choosing primitive actions that satisfy user preferences. Specifically, we develop a dynamic hypergraph to model user preferences and introduce an intrinsic motivation to train from weak supervision over the director. Finally, to alleviate the bad effect of model bias on the mutual influence between the director and actor, we model the director's option by sampling from a categorical distribution. Extensive experiments demonstrate that DAHCR outperforms state-of-the-art methods.
Published: 2023

15. AttenWalker: Unsupervised Long-Document Question Answering via Attention-based Graph Walking

Author: Nie, Yuxiang, Huang, Heyan, Wei, Wei, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: Annotating long-document question answering (long-document QA) pairs is time-consuming and expensive. To alleviate the problem, it might be possible to generate long-document QA pairs via unsupervised question answering (UQA) methods. However, existing UQA tasks are based on short documents, and can hardly incorporate long-range information. To tackle the problem, we propose a new task, named unsupervised long-document question answering (ULQA), aiming to generate high-quality long-document QA instances in an unsupervised manner. Besides, we propose AttenWalker, a novel unsupervised method to aggregate and generate answers with long-range dependency so as to construct long-document QA pairs. Specifically, AttenWalker is composed of three modules, i.e., span collector, span linker and answer aggregator. Firstly, the span collector takes advantage of constituent parsing and reconstruction loss to select informative candidate spans for constructing answers. Secondly, by going through the attention graph of a pre-trained long-document model, potentially interrelated text spans (that might be far apart) could be linked together via an attention-walking algorithm. Thirdly, in the answer aggregator, linked spans are aggregated into the final answer via the mask-filling ability of a pre-trained model. Extensive experiments show that AttenWalker outperforms previous methods on Qasper and NarrativeQA. In addition, AttenWalker also shows strong performance in the few-shot learning setting., Comment: Accepted to the Findings of ACL 2023
Published: 2023

16. Gated Mechanism Enhanced Multi-Task Learning for Dialog Routing

Author: Huang, Ziming, Jiang, Zhuoxuan, Wang, Ke, Li, Juntao, Feng, Shanshan, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: Currently, human-bot symbiosis dialog systems, e.g., pre- and after-sales in E-commerce, are ubiquitous, and the dialog routing component is essential to improve the overall efficiency, reduce human resource cost, and enhance user experience. Although most existing methods can fulfil this requirement, they can only model single-source dialog data and cannot effectively capture the underlying knowledge of relations among data and subtasks. In this paper, we investigate this important problem by thoroughly mining both the data-to-task and task-to-task knowledge among various kinds of dialog data. To achieve the above targets, we propose a Gated Mechanism enhanced Multi-task Model (G3M), specifically including a novel dialog encoder and two tailored gated mechanism modules. The proposed method can play the role of hierarchical information filtering and is non-invasive to existing dialog systems. Based on two datasets collected from real world applications, extensive experimental results demonstrate the effectiveness of our method, which achieves the state-of-the-art performance by improving 8.7\%/11.8\% on RMSE metric and 2.2\%/4.4\% on F1 metric., Comment: Accepted by COLING'22(Oral)
Published: 2023

17. Bridging The Gap: Entailment Fused-T5 for Open-retrieval Conversational Machine Reading Comprehension

Author: Zhang, Xiao, Huang, Heyan, Chi, Zewen, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: Open-retrieval conversational machine reading comprehension (OCMRC) simulates real-life conversational interaction scenes. Machines are required to make a decision of "Yes/No/Inquire" or generate a follow-up question when the decision is "Inquire" based on retrieved rule texts, user scenario, user question, and dialogue history. Recent studies explored the methods to reduce the information gap between decision-making and question generation and thus improve the performance of generation. However, the information gap still exists because these pipeline structures are still limited in decision-making, span extraction, and question rephrasing three stages. Decision-making and generation are reasoning separately, and the entailment reasoning utilized in decision-making is hard to share through all stages. To tackle the above problem, we proposed a novel one-stage end-to-end framework, called Entailment Fused-T5 (EFT), to bridge the information gap between decision-making and generation in a global understanding manner. The extensive experimental results demonstrate that our proposed framework achieves new state-of-the-art performance on the OR-ShARC benchmark.
Published: 2022

18. Momentum Decoding: Open-ended Text Generation As Graph Exploration

Author: Lan, Tian, Su, Yixuan, Liu, Shuhang, Huang, Heyan, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: Open-ended text generation with autoregressive language models (LMs) is one of the core tasks in natural language processing. However, maximization-based decoding methods (e.g., greedy/beam search) often lead to the degeneration problem, i.e., the generated text is unnatural and contains undesirable repetitions. Existing solutions to this problem either introduce randomness prone to incoherence or require a look-ahead mechanism that demands extra computational overhead. In this study, we formulate open-ended text generation from a new perspective, i.e., we view it as an exploration process within a directed graph. Thereby, we understand the phenomenon of degeneration as circular loops within the directed graph. Based on our formulation, we propose a novel decoding method -- \textit{momentum decoding} -- which encourages the LM to \textit{greedily} explore new nodes outside the current graph. Meanwhile, it also allows the LM to return to the existing nodes with a momentum downgraded by a pre-defined resistance function. We extensively test our approach on three benchmarks from different domains through automatic and human evaluations. The results show that momentum decoding performs comparably with the current state of the art while enjoying notably improved inference speed and computation FLOPs. Furthermore, we conduct a detailed analysis to reveal the merits and inner workings of our approach. Our codes and other related resources are publicly available at https://github.com/gmftbyGMFTBY/MomentumDecoding., Comment: Work in progress
Published: 2022

19. STAGE: Span Tagging and Greedy Inference Scheme for Aspect Sentiment Triplet Extraction

Author: Liang, Shuo, Wei, Wei, Mao, Xian-Ling, Fu, Yuanyuan, Fang, Rui, and Chen, Dangyang
Subjects: Computer Science - Computation and Language
Abstract: Aspect Sentiment Triplet Extraction (ASTE) has become an emerging task in sentiment analysis research, aiming to extract triplets of the aspect term, its corresponding opinion term, and its associated sentiment polarity from a given sentence. Recently, many neural networks based models with different tagging schemes have been proposed, but almost all of them have their limitations: heavily relying on 1) prior assumption that each word is only associated with a single role (e.g., aspect term, or opinion term, etc. ) and 2) word-level interactions and treating each opinion/aspect as a set of independent words. Hence, they perform poorly on the complex ASTE task, such as a word associated with multiple roles or an aspect/opinion term with multiple words. Hence, we propose a novel approach, Span TAgging and Greedy infErence (STAGE), to extract sentiment triplets in span-level, where each span may consist of multiple words and play different roles simultaneously. To this end, this paper formulates the ASTE task as a multi-class span classification problem. Specifically, STAGE generates more accurate aspect sentiment triplet extractions via exploring span-level information and constraints, which consists of two components, namely, span tagging scheme and greedy inference strategy. The former tag all possible candidate spans based on a newly-defined tagging set. The latter retrieves the aspect/opinion term with the maximum length from the candidate sentiment snippet to output sentiment triplets. Furthermore, we propose a simple but effective model based on the STAGE, which outperforms the state-of-the-arts by a large margin on four widely-used datasets. Moreover, our STAGE can be easily generalized to other pair/triplet extraction tasks, which also demonstrates the superiority of the proposed scheme STAGE., Comment: Accepted by AAAI 2023
Published: 2022

20. HCL-TAT: A Hybrid Contrastive Learning Method for Few-shot Event Detection with Task-Adaptive Threshold

Author: Zhang, Ruihan, Wei, Wei, Mao, Xian-Ling, Fang, Rui, and Chen, Dangyang
Subjects: Computer Science - Computation and Language
Abstract: Conventional event detection models under supervised learning settings suffer from the inability of transfer to newly-emerged event types owing to lack of sufficient annotations. A commonly-adapted solution is to follow a identify-then-classify manner, which first identifies the triggers and then converts the classification task via a few-shot learning paradigm. However, these methods still fall far short of expectations due to: (i) insufficient learning of discriminative representations in low-resource scenarios, and (ii) trigger misidentification caused by the overlap of the learned representations of triggers and non-triggers. To address the problems, in this paper, we propose a novel Hybrid Contrastive Learning method with a Task-Adaptive Threshold (abbreviated as HCLTAT), which enables discriminative representation learning with a two-view contrastive loss (support-support and prototype-query), and devises a easily-adapted threshold to alleviate misidentification of triggers. Extensive experiments on the benchmark dataset FewEvent demonstrate the superiority of our method to achieve better results compared to the state-of-the-arts. All the code and data of this paper will be available for online public access., Comment: This paper has been accepted by Findings of EMNLP 2022
Published: 2022

21. Sequential Topic Selection Model with Latent Variable for Topic-Grounded Dialogue

Author: Wen, Xiaofei, Wei, Wei, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: Recently, topic-grounded dialogue system has attracted significant attention due to its effectiveness in predicting the next topic to yield better responses via the historical context and given topic sequence. However, almost all existing topic prediction solutions focus on only the current conversation and corresponding topic sequence to predict the next conversation topic, without exploiting other topic-guided conversations which may contain relevant topic-transitions to current conversation. To address the problem, in this paper we propose a novel approach, named Sequential Global Topic Attention (SGTA) to exploit topic transition over all conversations in a subtle way for better modeling post-to-response topic-transition and guiding the response generation to the current conversation. Specifically, we introduce a latent space modeled as a Multivariate Skew-Normal distribution with hybrid kernel functions to flexibly integrate the global-level information with sequence-level information, and predict the topic based on the distribution sampling results. We also leverage a topic-aware prior-posterior approach for secondary selection of predicted topics, which is utilized to optimize the response generation task. Extensive experiments demonstrate that our model outperforms competitive baselines on prediction and generation tasks., Comment: 11 pages, accepted by EMNLP2022 Findings
Published: 2022

22. Capturing Global Structural Information in Long Document Question Answering with Compressive Graph Selector Network

Author: Nie, Yuxiang, Huang, Heyan, Wei, Wei, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: Long document question answering is a challenging task due to its demands for complex reasoning over long text. Previous works usually take long documents as non-structured flat texts or only consider the local structure in long documents. However, these methods usually ignore the global structure of the long document, which is essential for long-range understanding. To tackle this problem, we propose Compressive Graph Selector Network (CGSN) to capture the global structure in a compressive and iterative manner. The proposed model mainly focuses on the evidence selection phase of long document question answering. Specifically, it consists of three modules: local graph network, global graph network and evidence memory network. Firstly, the local graph network builds the graph structure of the chunked segment in token, sentence, paragraph and segment levels to capture the short-term dependency of the text. Secondly, the global graph network selectively receives the information of each level from the local graph, compresses them into the global graph nodes and applies graph attention to the global graph nodes to build the long-range reasoning over the entire text in an iterative way. Thirdly, the evidence memory network is designed to alleviate the redundancy problem in the evidence selection by saving the selected result in the previous steps. Extensive experiments show that the proposed model outperforms previous methods on two datasets., Comment: Accepted to the main conference of EMNLP 2022
Published: 2022

23. ET5: A Novel End-to-end Framework for Conversational Machine Reading Comprehension

Author: Zhang, Xiao, Huang, Heyan, Chi, Zewen, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: Conversational machine reading comprehension (CMRC) aims to assist computers to understand an natural language text and thereafter engage in a multi-turn conversation to answer questions related to the text. Existing methods typically require three steps: (1) decision making based on entailment reasoning; (2) span extraction if required by the above decision; (3) question rephrasing based on the extracted span. However, for nearly all these methods, the span extraction and question rephrasing steps cannot fully exploit the fine-grained entailment reasoning information in decision making step because of their relative independence, which will further enlarge the information gap between decision making and question phrasing. Thus, to tackle this problem, we propose a novel end-to-end framework for conversational machine reading comprehension based on shared parameter mechanism, called entailment reasoning T5 (ET5). Despite the lightweight of our proposed framework, experimental results show that the proposed ET5 achieves new state-of-the-art results on the ShARC leaderboard with the BLEU-4 score of 55.2. Our model and code are publicly available at https://github.com/Yottaxx/ET5., Comment: Accepted by COLING2022
Published: 2022

24. Unsupervised Hashing with Semantic Concept Mining

Author: Tu, Rong-Cheng, Mao, Xian-Ling, Lin, Kevin Qinghong, Cai, Chengfei, Qin, Weize, Wang, Hongfa, Wei, Wei, and Huang, Heyan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Retrieval
Abstract: Recently, to improve the unsupervised image retrieval performance, plenty of unsupervised hashing methods have been proposed by designing a semantic similarity matrix, which is based on the similarities between image features extracted by a pre-trained CNN model. However, most of these methods tend to ignore high-level abstract semantic concepts contained in images. Intuitively, concepts play an important role in calculating the similarity among images. In real-world scenarios, each image is associated with some concepts, and the similarity between two images will be larger if they share more identical concepts. Inspired by the above intuition, in this work, we propose a novel Unsupervised Hashing with Semantic Concept Mining, called UHSCM, which leverages a VLP model to construct a high-quality similarity matrix. Specifically, a set of randomly chosen concepts is first collected. Then, by employing a vision-language pretraining (VLP) model with the prompt engineering which has shown strong power in visual representation learning, the set of concepts is denoised according to the training images. Next, the proposed method UHSCM applies the VLP model with prompting again to mine the concept distribution of each image and construct a high-quality semantic similarity matrix based on the mined concept distributions. Finally, with the semantic similarity matrix as guiding information, a novel hashing loss with a modified contrastive loss based regularization item is proposed to optimize the hashing network. Extensive experiments on three benchmark datasets show that the proposed method outperforms the state-of-the-art baselines in the image retrieval task.
Published: 2022

25. Multi-level Contrastive Learning Framework for Sequential Recommendation

Author: Wang, Ziyang, Liu, Huoyu, Wei, Wei, Hu, Yue, Mao, Xian-Ling, He, Shaojian, Fang, Rui, and chen, Dangyang
Subjects: Computer Science - Information Retrieval
Abstract: Sequential recommendation (SR) aims to predict the subsequent behaviors of users by understanding their successive historical behaviors. Recently, some methods for SR are devoted to alleviating the data sparsity problem (i.e., limited supervised signals for training), which take account of contrastive learning to incorporate self-supervised signals into SR. Despite their achievements, it is far from enough to learn informative user/item embeddings due to the inadequacy modeling of complex collaborative information and co-action information, such as user-item relation, user-user relation, and item-item relation. In this paper, we study the problem of SR and propose a novel multi-level contrastive learning framework for sequential recommendation, named MCLSR. Different from the previous contrastive learning-based methods for SR, MCLSR learns the representations of users and items through a cross-view contrastive learning paradigm from four specific views at two different levels (i.e., interest- and feature-level). Specifically, the interest-level contrastive mechanism jointly learns the collaborative information with the sequential transition patterns, and the feature-level contrastive mechanism re-observes the relation between users and items via capturing the co-action information (i.e., co-occurrence). Extensive experiments on four real-world datasets show that the proposed MCLSR outperforms the state-of-the-art methods consistently.
Published: 2022
Full Text: View/download PDF

26. Unsupervised Question Answering via Answer Diversifying

Author: Nie, Yuxiang, Huang, Heyan, Chi, Zewen, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: Unsupervised question answering is an attractive task due to its independence on labeled data. Previous works usually make use of heuristic rules as well as pre-trained models to construct data and train QA models. However, most of these works regard named entity (NE) as the only answer type, which ignores the high diversity of answers in the real world. To tackle this problem, we propose a novel unsupervised method by diversifying answers, named DiverseQA. Specifically, the proposed method is composed of three modules: data construction, data augmentation and denoising filter. Firstly, the data construction module extends the extracted named entity into a longer sentence constituent as the new answer span to construct a QA dataset with diverse answers. Secondly, the data augmentation module adopts an answer-type dependent data augmentation process via adversarial training in the embedding level. Thirdly, the denoising filter module is designed to alleviate the noise in the constructed data. Extensive experiments show that the proposed method outperforms previous unsupervised models on five benchmark datasets, including SQuADv1.1, NewsQA, TriviaQA, BioASQ, and DuoRC. Besides, the proposed method shows strong performance in the few-shot learning setting., Comment: Accepted by COLING 2022
Published: 2022

27. Improving Knowledge-aware Recommendation with Multi-level Interactive Contrastive Learning

Author: Zou, Ding, Wei, Wei, Wang, Ziyang, Mao, Xian-Ling, Zhu, Feida, Fang, Rui, and Chen, Dangyang
Subjects: Computer Science - Information Retrieval
Abstract: Incorporating Knowledge Graphs (KG) into recommeder system has attracted considerable attention. Recently, the technical trend of Knowledge-aware Recommendation (KGR) is to develop end-to-end models based on graph neural networks (GNNs). However, the extremely sparse user-item interactions significantly degrade the performance of the GNN-based models, as: 1) the sparse interaction, means inadequate supervision signals and limits the supervised GNN-based models; 2) the combination of sparse interactions (CF part) and redundant KG facts (KG part) results in an unbalanced information utilization. Besides, the GNN paradigm aggregates local neighbors for node representation learning, while ignoring the non-local KG facts and making the knowledge extraction insufficient. Inspired by the recent success of contrastive learning in mining supervised signals from data itself, in this paper, we focus on exploring contrastive learning in KGR and propose a novel multi-level interactive contrastive learning mechanism. Different from traditional contrastive learning methods which contrast nodes of two generated graph views, interactive contrastive mechanism conducts layer-wise self-supervised learning by contrasting layers of different parts within graphs, which is also an "interaction" action. Specifically, we first construct local and non-local graphs for user/item in KG, exploring more KG facts for KGR. Then an intra-graph level interactive contrastive learning is performed within each graph, which contrasts layers of the CF and KG parts, for more consistent information leveraging. Besides, an inter-graph level interactive contrastive learning is performed between the local and non-local graphs, for sufficiently and coherently extracting non-local KG signals. Extensive experiments conducted on three benchmark datasets show the superior performance of our proposed method over the state-of-the-arts., Comment: Accepted to CIKM 2022
Published: 2022
Full Text: View/download PDF

28. Person-job fit estimation from candidate profile and related recruitment history with co-attention neural networks

Author: Wang, Ziyang, Wei, Wei, Xu, Chenwei, Xu, Jun, and Mao, Xian-Ling
Subjects: Computer Science - Information Retrieval
Abstract: Existing online recruitment platforms depend on automatic ways of conducting the person-job fit, whose goal is matching appropriate job seekers with job positions. Intuitively, the previous successful recruitment records contain important information, which should be helpful for the current person-job fit. Existing studies on person-job fit, however, mainly focus on calculating the similarity between the candidate resumes and the job postings on the basis of their contents, without taking the recruiters' experience (i.e., historical successful recruitment records) into consideration. In this paper, we propose a novel neural network approach for person-job fit, which estimates person-job fit from candidate profile and related recruitment history with co-attention neural networks (named PJFCANN). Specifically, given a target resume-job post pair, PJFCANN generates local semantic representations through co-attention neural networks and global experience representations via graph neural networks. The final matching degree is calculated by combining these two representations. In this way, the historical successful recruitment records are introduced to enrich the features of resumes and job postings and strengthen the current matching process. Extensive experiments conducted on a large-scale recruitment dataset verify the effectiveness of PJFCANN compared with several state-of-the-art baselines. The codes are released at: https://github.com/CCIIPLab/PJFCANN.
Published: 2022

29. Relational Triple Extraction: One Step is Enough

Author: Shang, Yu-Ming, Huang, Heyan, Sun, Xin, Wei, Wei, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Extracting relational triples from unstructured text is an essential task in natural language processing and knowledge graph construction. Existing approaches usually contain two fundamental steps: (1) finding the boundary positions of head and tail entities; (2) concatenating specific tokens to form triples. However, nearly all previous methods suffer from the problem of error accumulation, i.e., the boundary recognition error of each entity in step (1) will be accumulated into the final combined triples. To solve the problem, in this paper, we introduce a fresh perspective to revisit the triple extraction task, and propose a simple but effective model, named DirectRel. Specifically, the proposed model first generates candidate entities through enumerating token sequences in a sentence, and then transforms the triple extraction task into a linking problem on a "head $\rightarrow$ tail" bipartite graph. By doing so, all triples can be directly extracted in only one step. Extensive experimental results on two widely used datasets demonstrate that the proposed model performs better than the state-of-the-art baselines., Comment: Accepted by IJCAI-2022
Published: 2022

30. On the Representation Collapse of Sparse Mixture of Experts

Author: Chi, Zewen, Dong, Li, Huang, Shaohan, Dai, Damai, Ma, Shuming, Patra, Barun, Singhal, Saksham, Bajaj, Payal, Song, Xia, Mao, Xian-Ling, Huang, Heyan, and Wei, Furu
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead. It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations. However, learning such a routing mechanism encourages token clustering around expert centroids, implying a trend toward representation collapse. In this work, we propose to estimate the routing scores between tokens and experts on a low-dimensional hypersphere. We conduct extensive experiments on cross-lingual language model pre-training and fine-tuning on downstream tasks. Experimental results across seven multilingual benchmarks show that our method achieves consistent gains. We also present a comprehensive analysis on the representation and routing behaviors of our models. Our method alleviates the representation collapse issue and achieves more consistent routing than the baseline mixture-of-experts methods., Comment: NeurIPS 2022
Published: 2022

31. Cross-Lingual Phrase Retrieval

Author: Zheng, Heqi, Zhang, Xiao, Chi, Zewen, Huang, Heyan, Yan, Tan, Lan, Tian, Wei, Wei, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that XPR outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. XPR also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during training. Our dataset, code, and trained models are publicly available at www.github.com/cwszz/XPR/.
Published: 2022

32. Multi-level Cross-view Contrastive Learning for Knowledge-aware Recommender System

Author: Zou, Ding, Wei, Wei, Mao, Xian-Ling, Wang, Ziyang, Qiu, Minghui, Zhu, Feida, and Cao, Xin
Subjects: Computer Science - Information Retrieval
Abstract: Knowledge graph (KG) plays an increasingly important role in recommender systems. Recently, graph neural networks (GNNs) based model has gradually become the theme of knowledge-aware recommendation (KGR). However, there is a natural deficiency for GNN-based KGR models, that is, the sparse supervised signal problem, which may make their actual performance drop to some extent. Inspired by the recent success of contrastive learning in mining supervised signals from data itself, in this paper, we focus on exploring the contrastive learning in KG-aware recommendation and propose a novel multi-level cross-view contrastive learning mechanism, named MCCLK. Different from traditional contrastive learning methods which generate two graph views by uniform data augmentation schemes such as corruption or dropping, we comprehensively consider three different graph views for KG-aware recommendation, including global-level structural view, local-level collaborative and semantic views. Specifically, we consider the user-item graph as a collaborative view, the item-entity graph as a semantic view, and the user-item-entity graph as a structural view. MCCLK hence performs contrastive learning across three views on both local and global levels, mining comprehensive graph feature and structure information in a self-supervised manner. Besides, in semantic view, a k-Nearest-Neighbor (kNN) item-item semantic graph construction module is proposed, to capture the important item-item semantic relation which is usually ignored by previous work. Extensive experiments conducted on three benchmark datasets show the superior performance of our proposed method over the state-of-the-arts. The implementations are available at: https://github.com/CCIIPLab/MCCLK., Comment: Accepted to SIGIR2022
Published: 2022
Full Text: View/download PDF

33. BiSyn-GAT+: Bi-Syntax Aware Graph Attention Network for Aspect-based Sentiment Analysis

Author: Liang, Shuo, Wei, Wei, Mao, Xian-Ling, Wang, Fei, and He, Zhiyong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task that aims to align aspects and corresponding sentiments for aspect-specific sentiment polarity inference. It is challenging because a sentence may contain multiple aspects or complicated (e.g., conditional, coordinating, or adversative) relations. Recently, exploiting dependency syntax information with graph neural networks has been the most popular trend. Despite its success, methods that heavily rely on the dependency tree pose challenges in accurately modeling the alignment of the aspects and their words indicative of sentiment, since the dependency tree may provide noisy signals of unrelated associations (e.g., the "conj" relation between "great" and "dreadful" in Figure 2). In this paper, to alleviate this problem, we propose a Bi-Syntax aware Graph Attention Network (BiSyn-GAT+). Specifically, BiSyn-GAT+ fully exploits the syntax information (e.g., phrase segmentation and hierarchical structure) of the constituent tree of a sentence to model the sentiment-aware context of every single aspect (called intra-context) and the sentiment relations across aspects (called inter-context) for learning. Experiments on four benchmark datasets demonstrate that BiSyn-GAT+ outperforms the state-of-the-art methods consistently., Comment: Findings of ACL 2022
Published: 2022
Full Text: View/download PDF

34. Hammer PDF: An Intelligent PDF Reader for Scientific Papers

Author: Wang, Sheng-Fu, Liu, Shu-Hang, Che, Tian-Yi, Lu, Yi-Fan, Yang, Song-Xiao, Huang, Heyan, and Mao, Xian-Ling
Subjects: Computer Science - Digital Libraries, Computer Science - Information Retrieval
Abstract: It is the most important way for researchers to acquire academic progress via reading scientific papers, most of which are in PDF format. However, existing PDF Readers like Adobe Acrobat Reader and Foxit PDF Reader are usually only for reading by rendering PDF files as a whole, and do not consider the multi-granularity content understanding of a paper itself. Specifically, taking a paper as a basic and separate unit, existing PDF Readers cannot access extended information about the paper, such as corresponding videos, blogs and codes. Meanwhile, they cannot understand the academic content of a paper, such as terms, authors, and citations. To solve these problems, we introduce Hammer PDF, an intelligent PDF Reader for scientific papers. Apart from basic reading functions, Hammer PDF has the following four innovative features: (1) information extraction ability, which can locate and mark spans like terms and other entities; (2) information extension ability, which can present relevant academic content of a paper, such as citations, references, codes, videos, blogs; (3) built-in Hammer Scholar, an academic search engine based on academic information collected from major academic databases; (4) built-in Q&A bot, which can find helpful conference information. The proposed Hammer PDF Reader can help researchers, especially those studying computer science, to improve the efficiency and experience of reading scientific papers. We have released Hammer PDF, available at https://pdf.hammerscholar.net/face.
Published: 2022

35. OneRel:Joint Entity and Relation Extraction with One Module in One Step

Author: Shang, Yu-Ming, Huang, Heyan, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Joint entity and relation extraction is an essential task in natural language processing and knowledge graph construction. Existing approaches usually decompose the joint extraction task into several basic modules or processing steps to make it easy to conduct. However, such a paradigm ignores the fact that the three elements of a triple are interdependent and indivisible. Therefore, previous joint methods suffer from the problems of cascading errors and redundant information. To address these issues, in this paper, we propose a novel joint entity and relation extraction model, named OneRel, which casts joint extraction as a fine-grained triple classification problem. Specifically, our model consists of a scoring-based classifier and a relation-specific horns tagging strategy. The former evaluates whether a token pair and a relation belong to a factual triple. The latter ensures a simple but effective decoding process. Extensive experimental results on two widely used datasets demonstrate that the proposed method performs better than the state-of-the-art baselines, and delivers consistent performance gain on complex scenarios of various overlapping patterns and multiple triples., Comment: AAAI-2022 Accepted
Published: 2022

36. Bridging insight gaps in topic dependency discovery with a knowledge-inspired topic model

Author: Tang, Yi-Kun, Huang, Heyan, Shi, Xuewen, and Mao, Xian-Ling
Published: 2025
Full Text: View/download PDF

37. Exploring Dense Retrieval for Dialogue Response Selection

Author: Lan, Tian, Cai, Deng, Wang, Yan, Su, Yixuan, Huang, Heyan, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Recent progress in deep learning has continuously improved the accuracy of dialogue response selection. In particular, sophisticated neural network architectures are leveraged to capture the rich interactions between dialogue context and response candidates. While remarkably effective, these models also bring in a steep increase in computational cost. Consequently, such models can only be used as a re-rank module in practice. In this study, we present a solution to directly select proper responses from a large corpus or even a nonparallel corpus that only consists of unpaired sentences, using a dense retrieval model. To push the limits of dense retrieval, we design an interaction layer upon the dense retrieval models and apply a set of tailor-designed learning strategies. Our model shows superiority over strong baselines on the conventional re-rank evaluation setting, which is remarkable given its efficiency. To verify the effectiveness of our approach in realistic scenarios, we also conduct full-rank evaluation, where the target is to select proper responses from a full candidate pool that may contain millions of candidates and evaluate them fairly through human annotations. Our proposed model notably outperforms pipeline baselines that integrate fast recall and expressive re-rank modules. Human evaluation results show that enlarging the candidate pool with nonparallel corpora improves response quality further., Comment: 11 pages, 4 figures, 6 tables
Published: 2021

38. Cross-Lingual Language Model Meta-Pretraining

Author: Chi, Zewen, Huang, Heyan, Liu, Luyang, Bai, Yu, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: The success of pretrained cross-lingual language models relies on two essential abilities, i.e., generalization ability for learning downstream tasks in a source language, and cross-lingual transferability for transferring the task knowledge to other languages. However, current methods jointly learn the two abilities in a single-phase cross-lingual pretraining process, resulting in a trade-off between generalization and cross-lingual transfer. In this paper, we propose cross-lingual language model meta-pretraining, which learns the two abilities in different training phases. Our method introduces an additional meta-pretraining phase before cross-lingual pretraining, where the model learns generalization ability on a large-scale monolingual corpus. Then, the model focuses on learning cross-lingual transfer on a multilingual corpus. Experimental results show that our method improves both generalization and cross-lingual transfer, and produces better-aligned representations across different languages.
Published: 2021

39. Context-aware Entity Typing in Knowledge Graphs

Author: Pan, Weiran, Wei, Wei, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: Knowledge graph entity typing aims to infer entities' missing types in knowledge graphs which is an important but under-explored issue. This paper proposes a novel method for this task by utilizing entities' contextual information. Specifically, we design two inference mechanisms: i) N2T: independently use each neighbor of an entity to infer its type; ii) Agg2T: aggregate the neighbors of an entity to infer its type. Those mechanisms will produce multiple inference results, and an exponentially weighted pooling method is used to generate the final inference result. Furthermore, we propose a novel loss function to alleviate the false-negative problem during training. Experiments on two real-world KGs demonstrate the effectiveness of our method. The source code and data of this paper can be obtained from https://github.com/CCIIPLab/CET.
Published: 2021

40. OSTOD: One-Step Task-Oriented Dialogue with activated state and retelling response

Author: Huang, Heyan, Yang, Puhai, Wei, Wei, Shi, Shumin, and Mao, Xian-Ling
Published: 2024
Full Text: View/download PDF

41. XLM-E: Cross-lingual Language Model Pre-training via ELECTRA

Author: Chi, Zewen, Huang, Shaohan, Dong, Li, Ma, Shuming, Zheng, Bo, Singhal, Saksham, Bajaj, Payal, Song, Xia, Mao, Xian-Ling, Huang, Heyan, and Wei, Furu
Subjects: Computer Science - Computation and Language
Abstract: In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training. Specifically, we present two pre-training tasks, namely multilingual replaced token detection, and translation replaced token detection. Besides, we pretrain the model, named as XLM-E, on both multilingual and parallel corpora. Our model outperforms the baseline models on various cross-lingual understanding tasks with much less computation cost. Moreover, analysis shows that XLM-E tends to obtain better cross-lingual transferability., Comment: ACL-2022
Published: 2021

42. Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

Author: Chi, Zewen, Dong, Li, Zheng, Bo, Huang, Shaohan, Mao, Xian-Ling, Huang, Heyan, and Wei, Furu
Subjects: Computer Science - Computation and Language
Abstract: The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences. In this paper, we introduce denoising word alignment as a new cross-lingual pre-training task. Specifically, the model first self-labels word alignments for parallel sentences. Then we randomly mask tokens in a bitext pair. Given a masked token, the model uses a pointer network to predict the aligned token in the other language. We alternately perform the above two steps in an expectation-maximization manner. Experimental results show that our method improves cross-lingual transferability on various datasets, especially on the token-level tasks, such as question answering, and structured prediction. Moreover, the model can serve as a pretrained word aligner, which achieves reasonably low error rates on the alignment benchmarks. The code and pretrained parameters are available at https://github.com/CZWin32768/XLM-Align., Comment: ACL 2021
Published: 2021

43. Global Context Enhanced Graph Neural Networks for Session-based Recommendation

Author: Wang, Ziyang, Wei, Wei, Cong, Gao, Li, Xiao-Li, Mao, Xian-Ling, and Qiu, Minghui
Subjects: Computer Science - Information Retrieval
Abstract: Session-based recommendation (SBR) is a challenging task, which aims at recommending items based on anonymous behavior sequences. Almost all the existing solutions for SBR model user preference only based on the current session without exploiting the other sessions, which may contain both relevant and irrelevant item-transitions to the current session. This paper proposes a novel approach, called Global Context Enhanced Graph Neural Networks (GCE-GNN) to exploit item transitions over all sessions in a more subtle manner for better inferring the user preference of the current session. Specifically, GCE-GNN learns two levels of item embeddings from session graph and global graph, respectively: (i) Session graph, which is to learn the session-level item embedding by modeling pairwise item-transitions within the current session; and (ii) Global graph, which is to learn the global-level item embedding by modeling pairwise item-transitions over all sessions. In GCE-GNN, we propose a novel global-level item representation learning layer, which employs a session-aware attention mechanism to recursively incorporate the neighbors' embeddings of each node on the global graph. We also design a session-level item representation learning layer, which employs a GNN on the session graph to learn session-level item embeddings within the current session. Moreover, GCE-GNN aggregates the learnt item representations in the two levels with a soft attention mechanism. Experiments on three benchmark datasets demonstrate that GCE-GNN outperforms the state-of-the-art methods consistently., Comment: arXiv admin note: substantial text overlap with arXiv:2011.10173
Published: 2021
Full Text: View/download PDF

44. Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking

Author: Xu, Heng-Da, Li, Zhongli, Zhou, Qingyu, Li, Chao, Wang, Zizhen, Cao, Yunbo, Huang, Heyan, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: Chinese Spell Checking (CSC) aims to detect and correct erroneous characters for user-generated text in the Chinese language. Most of the Chinese spelling errors are misused semantically, phonetically or graphically similar characters. Previous attempts noticed this phenomenon and try to use the similarity for this task. However, these methods use either heuristics or handcrafted confusion sets to predict the correct character. In this paper, we propose a Chinese spell checker called ReaLiSe, by directly leveraging the multimodal information of the Chinese characters. The ReaLiSe model tackles the CSC task by (1) capturing the semantic, phonetic and graphic information of the input characters, and (2) selectively mixing the information in these modalities to predict the correct output. Experiments on the SIGHAN benchmarks show that the proposed model outperforms strong baselines by a large margin., Comment: In ACL Findings 2021
Published: 2021

45. Comprehensive Study: How the Context Information of Different Granularity Affects Dialogue State Tracking?

Author: Yang, Puhai, Huang, Heyan, and Mao, Xian-Ling
Subjects: Computer Science - Computation and Language
Abstract: Dialogue state tracking (DST) plays a key role in task-oriented dialogue systems to monitor the user's goal. In general, there are two strategies to track a dialogue state: predicting it from scratch and updating it from previous state. The scratch-based strategy obtains each slot value by inquiring all the dialogue history, and the previous-based strategy relies on the current turn dialogue to update the previous dialogue state. However, it is hard for the scratch-based strategy to correctly track short-dependency dialogue state because of noise; meanwhile, the previous-based strategy is not very useful for long-dependency dialogue state tracking. Obviously, it plays different roles for the context information of different granularity to track different kinds of dialogue states. Thus, in this paper, we will study and discuss how the context information of different granularity affects dialogue state tracking. First, we explore how greatly different granularities affect dialogue state tracking. Then, we further discuss how to combine multiple granularities for dialogue state tracking. Finally, we apply the findings about context granularity to few-shot learning scenario. Besides, we have publicly released all codes., Comment: Accepted as long paper at main conference of ACL 2021
Published: 2021

46. A Robust and Domain-Adaptive Approach for Low-Resource Named Entity Recognition

Author: Yu, Houjin, Mao, Xian-Ling, Chi, Zewen, Wei, Wei, and Huang, Heyan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Recently, it has attracted much attention to build reliable named entity recognition (NER) systems using limited annotated data. Nearly all existing works heavily rely on domain-specific resources, such as external lexicons and knowledge bases. However, such domain-specific resources are often not available, meanwhile it's difficult and expensive to construct the resources, which has become a key obstacle to wider adoption. To tackle the problem, in this work, we propose a novel robust and domain-adaptive approach RDANER for low-resource NER, which only uses cheap and easily obtainable resources. Extensive experiments on three benchmark datasets demonstrate that our approach achieves the best performance when only using cheap and easily obtainable resources, and delivers competitive results against state-of-the-art methods which use difficultly obtainable domainspecific resources. All our code and corpora can be found on https://github.com/houking-can/RDANER., Comment: Best Student Paper of 2020 IEEE International Conference on Knowledge Graph (ICKG)
Published: 2021
Full Text: View/download PDF

47. Self-attention Comparison Module for Boosting Performance on Retrieval-based Open-Domain Dialog Systems

Author: Lan, Tian, Mao, Xian-Ling, Zhao, Zhipeng, Wei, Wei, and Huang, Heyan
Subjects: Computer Science - Computation and Language
Abstract: Since the pre-trained language models are widely used, retrieval-based open-domain dialog systems, have attracted considerable attention from researchers recently. Most of the previous works select a suitable response only according to the matching degree between the query and each individual candidate response. Although good performance has been achieved, these recent works ignore the comparison among the candidate responses, which could provide rich information for selecting the most appropriate response. Intuitively, better decisions could be made when the models can get access to the comparison information among all the candidate responses. In order to leverage the comparison information among the candidate responses, in this paper, we propose a novel and plug-in Self-attention Comparison Module for retrieval-based open-domain dialog systems, called SCM. Extensive experiment results demonstrate that our proposed self-attention comparison module effectively boosts the performance of the existing retrieval-based open-domain dialog systems. Besides, we have publicly released our source codes for future research.
Published: 2020

48. Ultra-Fast, Low-Storage, Highly Effective Coarse-grained Selection in Retrieval-based Chatbot by Using Deep Semantic Hashing

Author: Lan, Tian, Mao, Xian-Ling, Gao, Xiaoyan, Wei, Wei, and Huang, Heyan
Subjects: Computer Science - Computation and Language
Abstract: We study the coarse-grained selection module in retrieval-based chatbot. Coarse-grained selection is a basic module in a retrieval-based chatbot, which constructs a rough candidate set from the whole database to speed up the interaction with customers. So far, there are two kinds of approaches for coarse-grained selection module: (1) sparse representation; (2) dense representation. To the best of our knowledge, there is no systematic comparison between these two approaches in retrieval-based chatbots, and which kind of method is better in real scenarios is still an open question. In this paper, we first systematically compare these two methods from four aspects: (1) effectiveness; (2) index stoarge; (3) search time cost; (4) human evaluation. Extensive experiment results demonstrate that dense representation method significantly outperforms the sparse representation, but costs more time and storage occupation. In order to overcome these fatal weaknesses of dense representation method, we propose an ultra-fast, low-storage, and highly effective Deep Semantic Hashing Coarse-grained selection method, called DSHC model. Specifically, in our proposed DSHC model, a hashing optimizing module that consists of two autoencoder models is stacked on a trained dense representation model, and three loss functions are designed to optimize it. The hash codes provided by hashing optimizing module effectively preserve the rich semantic and similarity information in dense vectors. Extensive experiment results prove that, our proposed DSHC model can achieve much faster speed and lower storage than sparse representation, with limited performance loss compared with dense representation. Besides, our source codes have been publicly released for future research.
Published: 2020

49. Exploiting Group-level Behavior Pattern forSession-based Recommendation

Author: Wang, Ziyang, Wei, Wei, Mao, Xian-Ling, Li, Xiao-Li, and Feng, Shanshan
Subjects: Computer Science - Information Retrieval
Abstract: Session-based recommendation (SBR) is a challenging task, which aims to predict users' future interests based on anonymous behavior sequences. Existing methods leverage powerful representation learning approaches to encode sessions into a low-dimensional space. However, despite such achievements, all the existing studies focus on the instance-level session learning, while neglecting the group-level users' preference, which is significant to model the users' behavior. To this end, we propose a novel Repeat-aware Neural Mechanism for Session-based Recommendation (RNMSR). In RNMSR, we propose to learn the user preference from both instance-level and group-level, respectively: (i) instance-level, which employs GNNs on a similarity-based item-pairwise session graph to capture the users' preference in instance-level. (ii) group-level, which converts sessions into group-level behavior patterns to model the group-level users' preference. In RNMSR, we combine instance-level user preference and group-level user preference to model the repeat consumption of users, \ie whether users take repeated consumption and which items are preferred by users. Extensive experiments are conducted on three real-world datasets, \ie Diginetica, Yoochoose, and Nowplaying, demonstrating that the proposed method consistently achieves state-of-the-art performance in all the tests.
Published: 2020

50. Overview of NLPCC Shared Task 2: Multi-perspective Scientific Machine Reading Comprehension

Author: Zhang, Xiao, Zheng, Heqi, Nie, Yuxiang, Mao, Xian-Ling, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Fei, editor, Duan, Nan, editor, Xu, Qingting, editor, and Hong, Yu, editor
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

337 results on '"Mao, Xian-ling"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources