Author: "Deng, Shumin" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Deng, Shumin"' showing total 274 results

Start Over Author "Deng, Shumin"

274 results on '"Deng, Shumin"'

1. Knowledge Mechanisms in Large Language Models: A Survey and Perspective

Author: Wang, Mengru, Yao, Yunzhi, Xu, Ziwen, Qiao, Shuofei, Deng, Shumin, Wang, Peng, Chen, Xiang, Gu, Jia-Chen, Jiang, Yong, Xie, Pengjun, Huang, Fei, Chen, Huajun, and Zhang, Ningyu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: Understanding knowledge mechanisms in Large Language Models (LLMs) is crucial for advancing towards trustworthy AGI. This paper reviews knowledge mechanism analysis from a novel taxonomy including knowledge utilization and evolution. Knowledge utilization delves into the mechanism of memorization, comprehension and application, and creation. Knowledge evolution focuses on the dynamic progression of knowledge within individual and group LLMs. Moreover, we discuss what knowledge LLMs have learned, the reasons for the fragility of parametric knowledge, and the potential dark knowledge (hypothesis) that will be challenging to address. We hope this work can help understand knowledge in LLMs and provide insights for future research., Comment: Ongoing work (v2); add Section 5: Application of Knowledge Mechanism; revise Section 6 and 7; fix typos
Published: 2024

2. Knowledge Circuits in Pretrained Transformers

Author: Yao, Yunzhi, Zhang, Ningyu, Xi, Zekun, Wang, Mengru, Xu, Ziwen, Deng, Shumin, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: The remarkable capabilities of modern large language models are rooted in their vast repositories of knowledge encoded within their parameters, enabling them to perceive the world and engage in reasoning. The inner workings of how these models store knowledge have long been a subject of intense interest and investigation among researchers. To date, most studies have concentrated on isolated components within these models, such as the Multilayer Perceptrons and attention head. In this paper, we delve into the computation graph of the language model to uncover the knowledge circuits that are instrumental in articulating specific knowledge. The experiments, conducted with GPT2 and TinyLLAMA, has allowed us to observe how certain information heads, relation heads, and Multilayer Perceptrons collaboratively encode knowledge within the model. Moreover, we evaluate the impact of current knowledge editing techniques on these knowledge circuits, providing deeper insights into the functioning and constraints of these editing methodologies. Finally, we utilize knowledge circuits to analyze and interpret language model behaviors such as hallucinations and in-context learning. We believe the knowledge circuit holds potential for advancing our understanding of Transformers and guiding the improved design of knowledge editing. Code and data are available in https://github.com/zjunlp/KnowledgeCircuits., Comment: Work in progress, 25 pages
Published: 2024

3. Decompose and Aggregate: A Step-by-Step Interpretable Evaluation Framework

Author: Li, Minzhi, Liu, Zhengyuan, Deng, Shumin, Joty, Shafiq, Chen, Nancy F., and Kan, Min-Yen
Subjects: Computer Science - Computation and Language
Abstract: The acceleration of Large Language Models (LLMs) research has opened up new possibilities for evaluating generated texts. They serve as scalable and economical evaluators, but the question of how reliable these evaluators are has emerged as a crucial research question. Prior research efforts in the meta-evaluation of LLMs as judges limit the prompting of an LLM to a single use to obtain a final evaluation decision. They then compute the agreement between LLMs' outputs and human labels. This lacks interpretability in understanding the evaluation capability of LLMs. In light of this challenge, we propose Decompose and Aggregate, which breaks down the evaluation process into different stages based on pedagogical practices. Our experiments illustrate that it not only provides a more interpretable window for how well LLMs evaluate, but also leads to improvements up to 39.6% for different LLMs on a variety of meta-evaluation benchmarks.
Published: 2024

4. Agent Planning with World Knowledge Model

Author: Qiao, Shuofei, Fang, Runnan, Zhang, Ningyu, Zhu, Yuqi, Chen, Xiang, Deng, Shumin, Jiang, Yong, Xie, Pengjun, Huang, Fei, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Multiagent Systems
Abstract: Recent endeavors towards directly using large language models (LLMs) as agent models to execute interactive planning tasks have shown commendable results. Despite their achievements, however, they still struggle with brainless trial-and-error in global planning and generating hallucinatory actions in local planning due to their poor understanding of the ''real'' physical world. Imitating humans' mental world knowledge model which provides global prior knowledge before the task and maintains local dynamic knowledge during the task, in this paper, we introduce parametric World Knowledge Model (WKM) to facilitate agent planning. Concretely, we steer the agent model to self-synthesize knowledge from both expert and sampled trajectories. Then we develop WKM, providing prior task knowledge to guide the global planning and dynamic state knowledge to assist the local planning. Experimental results on three complex real-world simulated datasets with three state-of-the-art open-source LLMs, Mistral-7B, Gemma-7B, and Llama-3-8B, demonstrate that our method can achieve superior performance compared to various strong baselines. Besides, we analyze to illustrate that our WKM can effectively alleviate the blind trial-and-error and hallucinatory action issues, providing strong support for the agent's understanding of the world. Other interesting findings include: 1) our instance-level task knowledge can generalize better to unseen tasks, 2) weak WKM can guide strong agent model planning, and 3) unified WKM training has promising potential for further development. Code will be available at https://github.com/zjunlp/WKM., Comment: Work in progress
Published: 2024

5. Detoxifying Large Language Models via Knowledge Editing

Author: Wang, Mengru, Zhang, Ningyu, Xu, Ziwen, Xi, Zekun, Deng, Shumin, Yao, Yunzhi, Zhang, Qishen, Yang, Linyi, Wang, Jindong, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: This paper investigates using knowledge editing techniques to detoxify Large Language Models (LLMs). We construct a benchmark, SafeEdit, which covers nine unsafe categories with various powerful attack prompts and equips comprehensive metrics for systematic evaluation. We conduct experiments with several knowledge editing approaches, indicating that knowledge editing has the potential to detoxify LLMs with a limited impact on general performance efficiently. Then, we propose a simple yet effective baseline, dubbed Detoxifying with Intraoperative Neural Monitoring (DINM), to diminish the toxicity of LLMs within a few tuning steps via only one instance. We further provide an in-depth analysis of the internal mechanism for various detoxifying approaches, demonstrating that previous methods like SFT and DPO may merely suppress the activations of toxic parameters, while DINM mitigates the toxicity of the toxic parameters to a certain extent, making permanent adjustments. We hope that these insights could shed light on future work of developing detoxifying approaches and the underlying knowledge mechanisms of LLMs. Code and benchmark are available at https://github.com/zjunlp/EasyEdit., Comment: ACL 2024. Project website: https://zjunlp.github.io/project/SafeEdit Benchmark: https://huggingface.co/datasets/zjunlp/SafeEdit
Published: 2024

6. Editing Conceptual Knowledge for Large Language Models

Author: Wang, Xiaohan, Mao, Shengyu, Zhang, Ningyu, Deng, Shumin, Yao, Yunzhi, Shen, Yue, Liang, Lei, Gu, Jinjie, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Databases, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Recently, there has been a growing interest in knowledge editing for Large Language Models (LLMs). Current approaches and evaluations merely explore the instance-level editing, while whether LLMs possess the capability to modify concepts remains unclear. This paper pioneers the investigation of editing conceptual knowledge for LLMs, by constructing a novel benchmark dataset ConceptEdit and establishing a suite of new metrics for evaluation. The experimental results reveal that, although existing editing methods can efficiently modify concept-level definition to some extent, they also have the potential to distort the related instantial knowledge in LLMs, leading to poor performance. We anticipate this can inspire further progress in better understanding LLMs. Our project homepage is available at https://zjunlp.github.io/project/ConceptEdit., Comment: Work in progress. Code: https://github.com/zjunlp/EasyEdit Dataset: https://huggingface.co/datasets/zjunlp/ConceptEdit
Published: 2024

7. KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents

Author: Zhu, Yuqi, Qiao, Shuofei, Ou, Yixin, Deng, Shumin, Zhang, Ningyu, Lyu, Shiwei, Shen, Yue, Liang, Lei, Gu, Jinjie, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning, Computer Science - Multiagent Systems
Abstract: Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges, especially when interacting with environments through generating executable actions. This inadequacy primarily stems from the lack of built-in action knowledge in language agents, which fails to effectively guide the planning trajectories during task solving and results in planning hallucination. To address this issue, we introduce KnowAgent, a novel approach designed to enhance the planning capabilities of LLMs by incorporating explicit action knowledge. Specifically, KnowAgent employs an action knowledge base and a knowledgeable self-learning strategy to constrain the action path during planning, enabling more reasonable trajectory synthesis, and thereby enhancing the planning performance of language agents. Experimental results on HotpotQA and ALFWorld based on various backbone models demonstrate that KnowAgent can achieve comparable or superior performance to existing baselines. Further analysis indicates the effectiveness of KnowAgent in terms of planning hallucinations mitigation. Code is available in https://github.com/zjunlp/KnowAgent., Comment: Work in progress. Project page: https://zjunlp.github.io/project/KnowAgent/ Code: https://github.com/zjunlp/KnowAgent
Published: 2024

8. KnowPhish: Large Language Models Meet Multimodal Knowledge Graphs for Enhancing Reference-Based Phishing Detection

Author: Li, Yuexin, Huang, Chengyu, Deng, Shumin, Lock, Mei Lin, Cao, Tri, Oo, Nay, Lim, Hoon Wei, and Hooi, Bryan
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Phishing attacks have inflicted substantial losses on individuals and businesses alike, necessitating the development of robust and efficient automated phishing detection approaches. Reference-based phishing detectors (RBPDs), which compare the logos on a target webpage to a known set of logos, have emerged as the state-of-the-art approach. However, a major limitation of existing RBPDs is that they rely on a manually constructed brand knowledge base, making it infeasible to scale to a large number of brands, which results in false negative errors due to the insufficient brand coverage of the knowledge base. To address this issue, we propose an automated knowledge collection pipeline, using which we collect a large-scale multimodal brand knowledge base, KnowPhish, containing 20k brands with rich information about each brand. KnowPhish can be used to boost the performance of existing RBPDs in a plug-and-play manner. A second limitation of existing RBPDs is that they solely rely on the image modality, ignoring useful textual information present in the webpage HTML. To utilize this textual information, we propose a Large Language Model (LLM)-based approach to extract brand information of webpages from text. Our resulting multimodal phishing detection approach, KnowPhish Detector (KPD), can detect phishing webpages with or without logos. We evaluate KnowPhish and KPD on a manually validated dataset, and a field study under Singapore's local context, showing substantial improvements in effectiveness and efficiency compared to state-of-the-art baselines., Comment: Accepted by USENIX Security 2024
Published: 2024

9. A Comprehensive Study of Knowledge Editing for Large Language Models

Author: Zhang, Ningyu, Yao, Yunzhi, Tian, Bozhong, Wang, Peng, Deng, Shumin, Wang, Mengru, Xi, Zekun, Mao, Shengyu, Zhang, Jintian, Ni, Yuansheng, Cheng, Siyuan, Xu, Ziwen, Xu, Xin, Gu, Jia-Chen, Jiang, Yong, Xie, Pengjun, Huang, Fei, Liang, Lei, Zhang, Zhiqiang, Zhu, Xiaowei, Zhou, Jun, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the knowledge editing problem and then provide a comprehensive review of cutting-edge approaches. Drawing inspiration from educational and cognitive research theories, we propose a unified categorization criterion that classifies knowledge editing methods into three groups: resorting to external knowledge, merging knowledge into the model, and editing intrinsic knowledge. Furthermore, we introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches. Additionally, we provide an in-depth analysis of knowledge location, which can give a deeper understanding of the knowledge structures inherent within LLMs. Finally, we discuss several potential applications of knowledge editing, outlining its broad and impactful implications., Comment: Ongoing work; 52 pages, 282 citations; benchmark is available at https://huggingface.co/datasets/zjunlp/KnowEdit code is available at https://github.com/zjunlp/EasyEdit paper list is available at https://github.com/zjunlp/KnowledgeEditingPapers
Published: 2024

10. Towards A Unified View of Answer Calibration for Multi-Step Reasoning

Author: Deng, Shumin, Zhang, Ningyu, Oo, Nay, and Hooi, Bryan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) employing Chain-of-Thought (CoT) prompting have broadened the scope for improving multi-step reasoning capabilities. We generally divide multi-step reasoning into two phases: path generation to generate the reasoning path(s); and answer calibration post-processing the reasoning path(s) to obtain a final answer. However, the existing literature lacks systematic analysis on different answer calibration approaches. In this paper, we summarize the taxonomy of recent answer calibration techniques and break them down into step-level and path-level strategies. We then conduct a thorough evaluation on these strategies from a unified view, systematically scrutinizing step-level and path-level answer calibration across multiple paths. Experimental results reveal that integrating the dominance of both strategies tends to derive optimal outcomes. Our study holds the potential to illuminate key insights for optimizing multi-step reasoning with answer calibration., Comment: Accepted by NLRSE@ACL2024
Published: 2023

11. Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View

Author: Zhang, Jintian, Xu, Xin, Zhang, Ningyu, Liu, Ruibo, Hooi, Bryan, and Deng, Shumin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Machine Learning, Computer Science - Multiagent Systems
Abstract: As Natural Language Processing (NLP) systems are increasingly employed in intricate social environments, a pressing query emerges: Can these NLP systems mirror human-esque collaborative intelligence, in a multi-agent society consisting of multiple large language models (LLMs)? This paper probes the collaboration mechanisms among contemporary NLP systems by melding practical experiments with theoretical insights. We fabricate four unique `societies' comprised of LLM agents, where each agent is characterized by a specific `trait' (easy-going or overconfident) and engages in collaboration with a distinct `thinking pattern' (debate or reflection). Through evaluating these multi-agent societies on three benchmark datasets, we discern that certain collaborative strategies not only outshine previous top-tier approaches, but also optimize efficiency (using fewer API tokens). Moreover, our results further illustrate that LLM agents manifest human-like social behaviors, such as conformity and consensus reaching, mirroring foundational social psychology theories. In conclusion, we integrate insights from social psychology to contextualize the collaboration of LLM agents, inspiring further investigations into the collaboration mechanism for LLMs. We commit to sharing our code and datasets\footnote{\url{https://github.com/zjunlp/MachineSoM}.}, hoping to catalyze further research in this promising avenue., Comment: ACL 2024 Main Conference. 64 pages (8 main), 70 figures, 37 tables. Blog: https://www.zjukg.org/project/MachineSoM
Published: 2023

12. When Do Program-of-Thoughts Work for Reasoning?

Author: Bi, Zhen, Zhang, Ningyu, Jiang, Yinuo, Deng, Shumin, Zheng, Guozhou, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Software Engineering
Abstract: In the realm of embodied artificial intelligence, the reasoning capabilities of Large Language Models (LLMs) play a pivotal role. Although there are effective methods like program-of-thought prompting for LLMs which uses programming language to tackle complex reasoning tasks, the specific impact of code data on the improvement of reasoning capabilities remains under-explored. To address this gap, we propose complexity-impacted reasoning score (CIRS), which combines structural and logical attributes, to measure the correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity by considering the difficulty and the cyclomatic complexity. Through an empirical analysis, we find not all code data of complexity can be learned or understood by LLMs. Optimal level of complexity is critical to the improvement of reasoning abilities by program-aided prompting. Then we design an auto-synthesizing and stratifying algorithm, and apply it to instruction generation for mathematical reasoning and code data filtering for code generation tasks. Extensive results demonstrates the effectiveness of our proposed approach. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct., Comment: AAAI 2024
Published: 2023

13. From Sky to the Ground: A Large-scale Benchmark and Simple Baseline Towards Real Rain Removal

Author: Guo, Yun, Xiao, Xueyao, Chang, Yi, Deng, Shumin, and Yan, Luxin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Learning-based image deraining methods have made great progress. However, the lack of large-scale high-quality paired training samples is the main bottleneck to hamper the real image deraining (RID). To address this dilemma and advance RID, we construct a Large-scale High-quality Paired real rain benchmark (LHP-Rain), including 3000 video sequences with 1 million high-resolution (1920*1080) frame pairs. The advantages of the proposed dataset over the existing ones are three-fold: rain with higher-diversity and larger-scale, image with higher-resolution and higher-quality ground-truth. Specifically, the real rains in LHP-Rain not only contain the classical rain streak/veiling/occlusion in the sky, but also the \textbf{splashing on the ground} overlooked by deraining community. Moreover, we propose a novel robust low-rank tensor recovery model to generate the GT with better separating the static background from the dynamic rain. In addition, we design a simple transformer-based single image deraining baseline, which simultaneously utilize the self-attention and cross-layer attention within the image and rain layer with discriminative feature representation. Extensive experiments verify the superiority of the proposed dataset and deraining method over state-of-the-art., Comment: Accepted by ICCV 2023
Published: 2023

14. LLMs for knowledge graph construction and reasoning: recent capabilities and future opportunities

Author: Zhu, Yuqi, Wang, Xiaohan, Chen, Jing, Qiao, Shuofei, Ou, Yixin, Yao, Yunzhi, Deng, Shumin, Chen, Huajun, and Zhang, Ningyu
Published: 2024
Full Text: View/download PDF

15. SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres

Author: Deng, Shumin, Mao, Shengyu, Zhang, Ningyu, and Hooi, Bryan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Event-centric structured prediction involves predicting structured outputs of events. In most NLP cases, event structures are complex with manifold dependency, and it is challenging to effectively represent these complicated structured events. To address these issues, we propose Structured Prediction with Energy-based Event-Centric Hyperspheres (SPEECH). SPEECH models complex dependency among event structured components with energy-based modeling, and represents event classes with simple but effective hyperspheres. Experiments on two unified-annotated event datasets indicate that SPEECH is predominant in event detection and event-relation extraction tasks., Comment: Accepted by ACL 2023 Main Conference. Code is released at \url{https://github.com/zjunlp/SPEECH}
Published: 2023

16. Editing Large Language Models: Problems, Methods, and Opportunities

Author: Yao, Yunzhi, Wang, Peng, Tian, Bozhong, Cheng, Siyuan, Li, Zhoubo, Deng, Shumin, Chen, Huajun, and Zhang, Ningyu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Despite the ability to train capable LLMs, the methodology for maintaining their relevancy and rectifying errors remains elusive. To this end, the past few years have witnessed a surge in techniques for editing LLMs, the objective of which is to efficiently alter the behavior of LLMs within a specific domain without negatively impacting performance across other inputs. This paper embarks on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs. In particular, we provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal. We also build a new benchmark dataset to facilitate a more robust evaluation and pinpoint enduring issues intrinsic to existing techniques. Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context. Code and datasets are available at https://github.com/zjunlp/EasyEdit., Comment: EMNLP 2023. Updated with new experiments
Published: 2023

17. LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

Author: Zhu, Yuqi, Wang, Xiaohan, Chen, Jing, Qiao, Shuofei, Ou, Yixin, Yao, Yunzhi, Deng, Shumin, Chen, Huajun, and Zhang, Ningyu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Databases, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: This paper presents an exhaustive quantitative and qualitative evaluation of Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We engage in experiments across eight diverse datasets, focusing on four representative tasks encompassing entity and relation extraction, event extraction, link prediction, and question-answering, thereby thoroughly exploring LLMs' performance in the domain of construction and inference. Empirically, our findings suggest that LLMs, represented by GPT-4, are more suited as inference assistants rather than few-shot information extractors. Specifically, while GPT-4 exhibits good performance in tasks related to KG construction, it excels further in reasoning tasks, surpassing fine-tuned models in certain cases. Moreover, our investigation extends to the potential generalization ability of LLMs for information extraction, leading to the proposition of a Virtual Knowledge Extraction task and the development of the corresponding VINE dataset. Based on these empirical findings, we further propose AutoKG, a multi-agent-based approach employing LLMs and external sources for KG construction and reasoning. We anticipate that this research can provide invaluable insights for future undertakings in the field of knowledge graphs. The code and datasets are in https://github.com/zjunlp/AutoKG., Comment: World Wide Web Journal
Published: 2023

18. Reasoning with Language Model Prompting: A Survey

Author: Qiao, Shuofei, Ou, Yixin, Zhang, Ningyu, Chen, Xiang, Yao, Yunzhi, Deng, Shumin, Tan, Chuanqi, Huang, Fei, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Reasoning, as an essential ability for complex problem-solving, can provide back-end support for various real-world applications, such as medical diagnosis, negotiation, etc. This paper provides a comprehensive survey of cutting-edge research on reasoning with language model prompting. We introduce research works with comparisons and summaries and provide systematic resources to help beginners. We also discuss the potential reasons for emerging such reasoning abilities and highlight future research directions. Resources are available at https://github.com/zjunlp/Prompt4ReasoningPapers (updated periodically)., Comment: ACL 2023, 24 pages, add references of theoretical analysis
Published: 2022

19. Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph Construction

Author: Yao, Yunzhi, Mao, Shengyu, Zhang, Ningyu, Chen, Xiang, Deng, Shumin, Chen, Xi, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Databases, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: With the development of pre-trained language models, many prompt-based approaches to data-efficient knowledge graph construction have been proposed and achieved impressive performance. However, existing prompt-based learning methods for knowledge graph construction are still susceptible to several potential limitations: (i) semantic gap between natural language and output structured knowledge with pre-defined schema, which means model cannot fully exploit semantic knowledge with the constrained templates; (ii) representation learning with locally individual instances limits the performance given the insufficient features, which are unable to unleash the potential analogical capability of pre-trained language models. Motivated by these observations, we propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP), for data-efficient knowledge graph construction. It can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample, which is model-agnostic and can be plugged into widespread existing approaches. Experimental results demonstrate that previous methods integrated with RAP can achieve impressive performance gains in low-resource settings on five datasets of relational triple extraction and event extraction for knowledge graph construction. Code is available in https://github.com/zjunlp/RAP., Comment: Accepted by SIGIR 2023
Published: 2022
Full Text: View/download PDF

20. Multimodal Analogical Reasoning over Knowledge Graphs

Author: Zhang, Ningyu, Li, Lei, Chen, Xiang, Liang, Xiaozhuan, Deng, Shumin, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Multimedia
Abstract: Analogical reasoning is fundamental to human cognition and holds an important place in various fields. However, previous studies mainly focus on single-modal analogical reasoning and ignore taking advantage of structure knowledge. Notably, the research in cognitive psychology has demonstrated that information from multimodal sources always brings more powerful cognitive transfer than single modality sources. To this end, we introduce the new task of multimodal analogical reasoning over knowledge graphs, which requires multimodal reasoning ability with the help of background knowledge. Specifically, we construct a Multimodal Analogical Reasoning dataSet (MARS) and a multimodal knowledge graph MarKG. We evaluate with multimodal knowledge graph embedding and pre-trained Transformer baselines, illustrating the potential challenges of the proposed task. We further propose a novel model-agnostic Multimodal analogical reasoning framework with Transformer (MarT) motivated by the structure mapping theory, which can obtain better performance. Code and datasets are available in https://github.com/zjunlp/MKG_Analogy., Comment: Accepted by ICLR 2023. The project website is https://zjunlp.github.io/project/MKG_Analogy/introduction.html
Published: 2022

21. Construction and Applications of Billion-Scale Pre-Trained Multimodal Business Knowledge Graph

Author: Deng, Shumin, Wang, Chengming, Li, Zhoubo, Zhang, Ningyu, Dai, Zelin, Chen, Hehong, Xiong, Feiyu, Yan, Ming, Chen, Qiang, Chen, Mosha, Chen, Jiaoyan, Pan, Jeff Z., Hooi, Bryan, and Chen, Huajun
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Business Knowledge Graphs (KGs) are important to many enterprises today, providing factual knowledge and structured data that steer many products and make them more intelligent. Despite their promising benefits, building business KG necessitates solving prohibitive issues of deficient structure and multiple modalities. In this paper, we advance the understanding of the practical challenges related to building KG in non-trivial real-world systems. We introduce the process of building an open business knowledge graph (OpenBG) derived from a well-known enterprise, Alibaba Group. Specifically, we define a core ontology to cover various abstract products and consumption demands, with fine-grained taxonomy and multimodal facts in deployed applications. OpenBG is an open business KG of unprecedented scale: 2.6 billion triples with more than 88 million entities covering over 1 million core classes/concepts and 2,681 types of relations. We release all the open resources (OpenBG benchmarks) derived from it for the community and report experimental results of KG-centric tasks. We also run up an online competition based on OpenBG benchmarks, and has attracted thousands of teams. We further pre-train OpenBG and apply it to many KG- enhanced downstream tasks in business scenarios, demonstrating the effectiveness of billion-scale multimodal knowledge for e-commerce. All the resources with codes have been released at \url{https://github.com/OpenBGBenchmark/OpenBG}., Comment: OpenBG. Accepted by ICDE 2023. The project is released at https://github.com/OpenBGBenchmark/OpenBG . Website: https://kg.alibaba.com/ , Leaderboard: https://tianchi.aliyun.com/dataset/dataDetail?dataId=122271
Published: 2022

22. Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning

Author: Chen, Xiang, Li, Lei, Zhang, Ningyu, Liang, Xiaozhuan, Deng, Shumin, Tan, Chuanqi, Huang, Fei, Si, Luo, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Prompt learning approaches have made waves in natural language processing by inducing better few-shot performance while they still follow a parametric-based learning paradigm; the oblivion and rote memorization problems in learning may encounter unstable generalization issues. Specifically, vanilla prompt learning may struggle to utilize atypical instances by rote during fully-supervised training or overfit shallow patterns with low-shot data. To alleviate such limitations, we develop RetroPrompt with the motivation of decoupling knowledge from memorization to help the model strike a balance between generalization and memorization. In contrast with vanilla prompt learning, RetroPrompt constructs an open-book knowledge-store from training instances and implements a retrieval mechanism during the process of input, training and inference, thus equipping the model with the ability to retrieve related contexts from the training corpus as cues for enhancement. Extensive experiments demonstrate that RetroPrompt can obtain better performance in both few-shot and zero-shot settings. Besides, we further illustrate that our proposed RetroPrompt can yield better generalization abilities with new datasets. Detailed analysis of memorization indeed reveals RetroPrompt can reduce the reliance of language models on memorization; thus, improving generalization for downstream tasks. Code is available in https://github.com/zjunlp/PromptKG/tree/main/research/RetroPrompt., Comment: NeurIPS 2022 (Spotlight)
Published: 2022

23. Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction

Author: Chen, Xiang, Zhang, Ningyu, Li, Lei, Yao, Yunzhi, Deng, Shumin, Tan, Chuanqi, Huang, Fei, Si, Luo, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Multimedia
Abstract: Multimodal named entity recognition and relation extraction (MNER and MRE) is a fundamental and crucial branch in information extraction. However, existing approaches for MNER and MRE usually suffer from error sensitivity when irrelevant object images incorporated in texts. To deal with these issues, we propose a novel Hierarchical Visual Prefix fusion NeTwork (HVPNeT) for visual-enhanced entity and relation extraction, aiming to achieve more effective and robust performance. Specifically, we regard visual representation as pluggable visual prefix to guide the textual representation for error insensitive forecasting decision. We further propose a dynamic gated aggregation strategy to achieve hierarchical multi-scaled visual features as visual prefix for fusion. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our method, and achieve state-of-the-art performance. Code is available in https://github.com/zjunlp/HVPNeT., Comment: Accepted by NAACL 2022
Published: 2022

24. Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion

Author: Chen, Xiang, Zhang, Ningyu, Li, Lei, Deng, Shumin, Tan, Chuanqi, Xu, Changliang, Huang, Fei, Si, Luo, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Multimedia
Abstract: Multimodal Knowledge Graphs (MKGs), which organize visual-text factual knowledge, have recently been successfully applied to tasks such as information retrieval, question answering, and recommendation system. Since most MKGs are far from complete, extensive knowledge graph completion studies have been proposed focusing on the multimodal entity, relation extraction and link prediction. However, different tasks and modalities require changes to the model architecture, and not all images/objects are relevant to text input, which hinders the applicability to diverse real-world scenarios. In this paper, we propose a hybrid transformer with multi-level fusion to address those issues. Specifically, we leverage a hybrid transformer architecture with unified input-output for diverse multimodal knowledge graph completion tasks. Moreover, we propose multi-level fusion, which integrates visual and text representation via coarse-grained prefix-guided interaction and fine-grained correlation-aware fusion modules. We conduct extensive experiments to validate that our MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER. Code is available in https://github.com/zjunlp/MKGformer., Comment: Accepted by SIGIR 2022. Fix a severe bug
Published: 2022
Full Text: View/download PDF

25. Information Extraction in Low-Resource Scenarios: Survey and Perspective

Author: Deng, Shumin, Ma, Yubo, Zhang, Ningyu, Cao, Yixin, and Hooi, Bryan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Information Extraction (IE) seeks to derive structured information from unstructured texts, often facing challenges in low-resource scenarios due to data scarcity and unseen classes. This paper presents a review of neural approaches to low-resource IE from \emph{traditional} and \emph{LLM-based} perspectives, systematically categorizing them into a fine-grained taxonomy. Then we conduct empirical study on LLM-based methods compared with previous state-of-the-art models, and discover that (1) well-tuned LMs are still predominant; (2) tuning open-resource LLMs and ICL with GPT family is promising in general; (3) the optimal LLM-based technical solution for low-resource IE can be task-dependent. In addition, we discuss low-resource IE with LLMs, highlight promising applications, and outline potential research directions. This survey aims to foster understanding of this field, inspire new ideas, and encourage widespread applications in both academia and industry., Comment: Work in Progress. Paper List: \url{https://github.com/zjunlp/Low-resource-KEPapers}; Data and Code: \url{ https://github.com/mayubo2333/LLM_project}
Published: 2022

26. From Discrimination to Generation: Knowledge Graph Completion with Generative Transformer

Author: Xie, Xin, Zhang, Ningyu, Li, Zhoubo, Deng, Shumin, Chen, Hui, Xiong, Feiyu, Chen, Mosha, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Databases, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Knowledge graph completion aims to address the problem of extending a KG with missing triples. In this paper, we provide an approach GenKGC, which converts knowledge graph completion to sequence-to-sequence generation task with the pre-trained language model. We further introduce relation-guided demonstration and entity-aware hierarchical decoding for better representation learning and fast inference. Experimental results on three datasets show that our approach can obtain better or comparable performance than baselines and achieve faster inference speed compared with previous methods with pre-trained language models. We also release a new large-scale Chinese knowledge graph dataset AliopenKG500 for research purpose. Code and datasets are available in https://github.com/zjunlp/PromptKG/tree/main/GenKGC., Comment: Accepted by WWW 2022 Poster
Published: 2022
Full Text: View/download PDF

27. Ontology-enhanced Prompt-tuning for Few-shot Learning

Author: Ye, Hongbin, Zhang, Ningyu, Deng, Shumin, Chen, Xiang, Chen, Hui, Xiong, Feiyu, Chen, Xi, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Few-shot Learning (FSL) is aimed to make predictions based on a limited number of samples. Structured data such as knowledge graphs and ontology libraries has been leveraged to benefit the few-shot setting in various tasks. However, the priors adopted by the existing methods suffer from challenging knowledge missing, knowledge noise, and knowledge heterogeneity, which hinder the performance for few-shot learning. In this study, we explore knowledge injection for FSL with pre-trained language models and propose ontology-enhanced prompt-tuning (OntoPrompt). Specifically, we develop the ontology transformation based on the external knowledge graph to address the knowledge missing issue, which fulfills and converts structure knowledge to text. We further introduce span-sensitive knowledge injection via a visible matrix to select informative knowledge to handle the knowledge noise issue. To bridge the gap between knowledge and text, we propose a collective training algorithm to optimize representations jointly. We evaluate our proposed OntoPrompt in three tasks, including relation extraction, event extraction, and knowledge graph completion, with eight datasets. Experimental results demonstrate that our approach can obtain better few-shot performance than baselines., Comment: Accepted by WWW2022
Published: 2022
Full Text: View/download PDF

28. OntoProtein: Protein Pretraining With Gene Ontology Embedding

Author: Zhang, Ningyu, Bi, Zhen, Liang, Xiaozhuan, Cheng, Siyuan, Hong, Haosen, Deng, Shumin, Lian, Jiazhang, Zhang, Qiang, and Chen, Huajun
Subjects: Quantitative Biology - Biomolecules, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Self-supervised protein language models have proved their effectiveness in learning the proteins representations. With the increasing computational power, current protein language models pre-trained with millions of diverse sequences can advance the parameter scale from million-level to billion-level and achieve remarkable improvement. However, those prevailing approaches rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better protein representations. We argue that informative biology knowledge in KGs can enhance protein representation with external knowledge. In this work, we propose OntoProtein, the first general framework that makes use of structure in GO (Gene Ontology) into protein pre-training models. We construct a novel large-scale knowledge graph that consists of GO and its related proteins, and gene annotation texts or protein sequences describe all nodes in the graph. We propose novel contrastive learning with knowledge-aware negative sampling to jointly optimize the knowledge graph and protein embedding during pre-training. Experimental results show that OntoProtein can surpass state-of-the-art methods with pre-trained protein language models in TAPE benchmark and yield better performance compared with baselines in protein-protein interaction and protein function prediction. Code and datasets are available in https://github.com/zjunlp/OntoProtein., Comment: Accepted by ICLR 2022
Published: 2022

29. DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

Author: Zhang, Ningyu, Xu, Xin, Tao, Liankuan, Yu, Haiyang, Ye, Hongbin, Qiao, Shuofei, Xie, Xin, Chen, Xiang, Li, Zhoubo, Li, Lei, Liang, Xiaozhuan, Yao, Yunzhi, Deng, Shumin, Wang, Peng, Zhang, Wen, Zhang, Zhenru, Tan, Chuanqi, Chen, Qiang, Xiong, Feiyu, Huang, Fei, Zheng, Guozhou, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: We present an open-source and extensible knowledge extraction toolkit DeepKE, supporting complicated low-resource, document-level and multimodal scenarios in the knowledge base population. DeepKE implements various information extraction tasks, including named entity recognition, relation extraction and attribute extraction. With a unified framework, DeepKE allows developers and researchers to customize datasets and models to extract information from unstructured data according to their requirements. Specifically, DeepKE not only provides various functional modules and model implementation for different tasks and scenarios but also organizes all components by consistent frameworks to maintain sufficient modularity and extensibility. We release the source code at GitHub in https://github.com/zjunlp/DeepKE with Google Colab tutorials and comprehensive documents for beginners. Besides, we present an online system in http://deepke.openkg.cn/EN/re_doc_show.html for real-time extraction of various tasks, and a demo video., Comment: Accepted by EMNLP 2022 System Demonstrations and the project website is http://deepke.zjukg.cn/
Published: 2022

30. Knowledge Graph Embedding in E-commerce Applications: Attentive Reasoning, Explanations, and Transferable Rules

Author: Zhang, Wen, Deng, Shumin, Chen, Mingyang, Wang, Liang, Chen, Qiang, Xiong, Feiyu, Liu, Xiangwen, and Chen, Huajun
Subjects: Computer Science - Artificial Intelligence
Abstract: Knowledge Graphs (KGs), representing facts as triples, have been widely adopted in many applications. Reasoning tasks such as link prediction and rule induction are important for the development of KGs. Knowledge Graph Embeddings (KGEs) embedding entities and relations of a KG into continuous vector spaces, have been proposed for these reasoning tasks and proven to be efficient and robust. But the plausibility and feasibility of applying and deploying KGEs in real-work applications has not been well-explored. In this paper, we discuss and report our experiences of deploying KGEs in a real domain application: e-commerce. We first identity three important desiderata for e-commerce KG systems: 1) attentive reasoning, reasoning over a few target relations of more concerns instead of all; 2) explanation, providing explanations for a prediction to help both users and business operators understand why the prediction is made; 3) transferable rules, generating reusable rules to accelerate the deployment of a KG to new systems. While non existing KGE could meet all these desiderata, we propose a novel one, an explainable knowledge graph attention network that make prediction through modeling correlations between triples rather than purely relying on its head entity, relation and tail entity embeddings. It could automatically selects attentive triples for prediction and records the contribution of them at the same time, from which explanations could be easily provided and transferable rules could be efficiently produced. We empirically show that our method is capable of meeting all three desiderata in our e-commerce application and outperform typical baselines on datasets from real domain applications., Comment: Accepted at IJCKG2021
Published: 2021

31. LOGEN: Few-shot Logical Knowledge-Conditioned Text Generation with Self-training

Author: Deng, Shumin, Yang, Jiacheng, Ye, Hongbin, Tan, Chuanqi, Chen, Mosha, Huang, Songfang, Huang, Fei, Chen, Huajun, and Zhang, Ningyu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Natural language generation from structured data mainly focuses on surface-level descriptions, suffering from uncontrollable content selection and low fidelity. Previous works leverage logical forms to facilitate logical knowledge-conditioned text generation. Though achieving remarkable progress, they are data-hungry, which makes the adoption for real-world applications challenging with limited data. To this end, this paper proposes a unified framework for logical knowledge-conditioned text generation in the few-shot setting. With only a few seeds logical forms (e.g., 20/100 shot), our approach leverages self-training and samples pseudo logical forms based on content and structure consistency. Experimental results demonstrate that our approach can obtain better few-shot performance than baselines., Comment: Accepted by IEEE/ACM Transactions on Audio Speech and Language Processing
Published: 2021

32. Molecular Contrastive Learning with Chemical Element Knowledge Graph

Author: Fang, Yin, Zhang, Qiang, Yang, Haihong, Zhuang, Xiang, Deng, Shumin, Zhang, Wen, Qin, Ming, Chen, Zhuo, Fan, Xiaohui, and Chen, Huajun
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Quantitative Biology - Quantitative Methods
Abstract: Molecular representation learning contributes to multiple downstream tasks such as molecular property prediction and drug design. To properly represent molecules, graph contrastive learning is a promising paradigm as it utilizes self-supervision signals and has no requirements for human annotations. However, prior works fail to incorporate fundamental domain knowledge into graph semantics and thus ignore the correlations between atoms that have common attributes but are not directly connected by bonds. To address these issues, we construct a Chemical Element Knowledge Graph (KG) to summarize microscopic associations between elements and propose a novel Knowledge-enhanced Contrastive Learning (KCL) framework for molecular representation learning. KCL framework consists of three modules. The first module, knowledge-guided graph augmentation, augments the original molecular graph based on the Chemical Element KG. The second module, knowledge-aware graph representation, extracts molecular representations with a common graph encoder for the original molecular graph and a Knowledge-aware Message Passing Neural Network (KMPNN) to encode complex information in the augmented molecular graph. The final module is a contrastive objective, where we maximize agreement between these two views of molecular graphs. Extensive experiments demonstrated that KCL obtained superior performances against state-of-the-art baselines on eight molecular datasets. Visualization experiments properly interpret what KCL has learned from atoms and attributes in the augmented molecular graphs. Our codes and data are available at https://github.com/ZJU-Fangyin/KCL., Comment: Accepted in AAAI 2022 Main track
Published: 2021

33. Learning to Ask for Data-Efficient Event Argument Extraction

Author: Ye, Hongbin, Zhang, Ningyu, Bi, Zhen, Deng, Shumin, Tan, Chuanqi, Chen, Hui, Huang, Fei, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Event argument extraction (EAE) is an important task for information extraction to discover specific argument roles. In this study, we cast EAE as a question-based cloze task and empirically analyze fixed discrete token template performance. As generating human-annotated question templates is often time-consuming and labor-intensive, we further propose a novel approach called "Learning to Ask," which can learn optimized question templates for EAE without human annotations. Experiments using the ACE-2005 dataset demonstrate that our method based on optimized questions achieves state-of-the-art performance in both the few-shot and supervised settings., Comment: work in progress
Published: 2021

34. LightNER: A Lightweight Tuning Paradigm for Low-resource NER via Pluggable Prompting

Author: Chen, Xiang, Li, Lei, Deng, Shumin, Tan, Chuanqi, Xu, Changliang, Huang, Fei, Si, Luo, Chen, Huajun, and Zhang, Ningyu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Databases, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Most NER methods rely on extensive labeled data for model training, which struggles in the low-resource scenarios with limited training data. Existing dominant approaches usually suffer from the challenge that the target domain has different label sets compared with a resource-rich source domain, which can be concluded as class transfer and domain transfer. In this paper, we propose a lightweight tuning paradigm for low-resource NER via pluggable prompting (LightNER). Specifically, we construct the unified learnable verbalizer of entity categories to generate the entity span sequence and entity categories without any label-specific classifiers, thus addressing the class transfer issue. We further propose a pluggable guidance module by incorporating learnable parameters into the self-attention layer as guidance, which can re-modulate the attention and adapt pre-trained weights. Note that we only tune those inserted module with the whole parameter of the pre-trained language model fixed, thus, making our approach lightweight and flexible for low-resource scenarios and can better transfer knowledge across domains. Experimental results show that LightNER can obtain comparable performance in the standard supervised setting and outperform strong baselines in low-resource settings. Code is in https://github.com/zjunlp/DeepKE/tree/main/example/ner/few-shot., Comment: Accepted by COLING 2022
Published: 2021

35. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Author: Zhang, Ningyu, Li, Luoqiu, Chen, Xiang, Deng, Shumin, Bi, Zhen, Tan, Chuanqi, Huang, Fei, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Large-scale pre-trained language models have contributed significantly to natural language processing by demonstrating remarkable abilities as few-shot learners. However, their effectiveness depends mainly on scaling the model parameters and prompt design, hindering their implementation in most real-world applications. This study proposes a novel pluggable, extensible, and efficient approach named DifferentiAble pRompT (DART), which can convert small language models into better few-shot learners without any prompt engineering. The main principle behind this approach involves reformulating potential natural language processing tasks into the task of a pre-trained language model and differentially optimizing the prompt template as well as the target label with backpropagation. Furthermore, the proposed approach can be: (i) Plugged to any pre-trained language models; (ii) Extended to widespread classification tasks. A comprehensive evaluation of standard NLP tasks demonstrates that the proposed approach achieves a better few-shot performance. Code is available in https://github.com/zjunlp/DART., Comment: Accepted by ICLR 2022
Published: 2021

36. Document-level Relation Extraction as Semantic Segmentation

Author: Zhang, Ningyu, Chen, Xiang, Xie, Xin, Deng, Shumin, Tan, Chuanqi, Chen, Mosha, Huang, Fei, Si, Luo, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Document-level relation extraction aims to extract relations among multiple entity pairs from a document. Previously proposed graph-based or transformer-based models utilize the entities independently, regardless of global information among relational triples. This paper approaches the problem by predicting an entity-level relation matrix to capture local and global information, parallel to the semantic segmentation task in computer vision. Herein, we propose a Document U-shaped Network for document-level relation extraction. Specifically, we leverage an encoder module to capture the context information of entities and a U-shaped segmentation module over the image-style feature map to capture global interdependency among triples. Experimental results show that our approach can obtain state-of-the-art performance on three benchmark datasets DocRED, CDR, and GDA., Comment: Accepted by IJCAI 2021
Published: 2021

37. AliCG: Fine-grained and Evolvable Conceptual Graph Construction for Semantic Search at Alibaba

Author: Zhang, Ningyu, Jia, Qianghuai, Deng, Shumin, Chen, Xiang, Ye, Hongbin, Chen, Hui, Tou, Huaixiao, Huang, Gang, Wang, Zhao, Hua, Nengwei, and Chen, Huajun
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Databases, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Conceptual graphs, which is a particular type of Knowledge Graphs, play an essential role in semantic search. Prior conceptual graph construction approaches typically extract high-frequent, coarse-grained, and time-invariant concepts from formal texts. In real applications, however, it is necessary to extract less-frequent, fine-grained, and time-varying conceptual knowledge and build taxonomy in an evolving manner. In this paper, we introduce an approach to implementing and deploying the conceptual graph at Alibaba. Specifically, We propose a framework called AliCG which is capable of a) extracting fine-grained concepts by a novel bootstrapping with alignment consensus approach, b) mining long-tail concepts with a novel low-resource phrase mining approach, c) updating the graph dynamically via a concept distribution estimation method based on implicit and explicit user behaviors. We have deployed the framework at Alibaba UC Browser. Extensive offline evaluation as well as online A/B testing demonstrate the efficacy of our approach., Comment: Accepted by KDD 2021 (Applied Data Science Track)
Published: 2021

38. OntoED: Low-resource Event Detection with Ontology Embedding

Author: Deng, Shumin, Zhang, Ningyu, Li, Luoqiu, Chen, Hui, Tou, Huaixiao, Chen, Mosha, Huang, Fei, and Chen, Huajun
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Event Detection (ED) aims to identify event trigger words from a given text and classify it into an event type. Most of current methods to ED rely heavily on training instances, and almost ignore the correlation of event types. Hence, they tend to suffer from data scarcity and fail to handle new unseen event types. To address these problems, we formulate ED as a process of event ontology population: linking event instances to pre-defined event types in event ontology, and propose a novel ED framework entitled OntoED with ontology embedding. We enrich event ontology with linkages among event types, and further induce more event-event correlations. Based on the event ontology, OntoED can leverage and propagate correlation knowledge, particularly from data-rich to data-poor event types. Furthermore, OntoED can be applied to new unseen event types, by establishing linkages to existing ones. Experiments indicate that OntoED is more predominant and robust than previous approaches to ED, especially in data-scarce scenarios., Comment: Accepted to appear at the ACL 2021 main conference. Add the description of evaluation metrics
Published: 2021

39. MLBiNet: A Cross-Sentence Collective Event Detection Network

Author: Lou, Dongfang, Liao, Zhilin, Deng, Shumin, Zhang, Ningyu, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: We consider the problem of collectively detecting multiple events, particularly in cross-sentence settings. The key to dealing with the problem is to encode semantic information and model event inter-dependency at a document-level. In this paper, we reformulate it as a Seq2Seq task and propose a Multi-Layer Bidirectional Network (MLBiNet) to capture the document-level association of events and semantic information simultaneously. Specifically, a bidirectional decoder is firstly devised to model event inter-dependency within a sentence when decoding the event tag vector sequence. Secondly, an information aggregation module is employed to aggregate sentence-level semantic and event tag information. Finally, we stack multiple bidirectional decoders and feed cross-sentence information, forming a multi-layer bidirectional tagging architecture to iteratively propagate information across sentences. We show that our approach provides significant improvement in performance compared to the current state-of-the-art results., Comment: Accepted by ACL 2021
Published: 2021

40. KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction

Author: Chen, Xiang, Zhang, Ningyu, Xie, Xin, Deng, Shumin, Yao, Yunzhi, Tan, Chuanqi, Huang, Fei, Si, Luo, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Recently, prompt-tuning has achieved promising results for specific few-shot classification tasks. The core idea of prompt-tuning is to insert text pieces (i.e., templates) into the input and transform a classification task into a masked language modeling problem. However, for relation extraction, determining an appropriate prompt template requires domain expertise, and it is cumbersome and time-consuming to obtain a suitable label word. Furthermore, there exists abundant semantic and prior knowledge among the relation labels that cannot be ignored. To this end, we focus on incorporating knowledge among relation labels into prompt-tuning for relation extraction and propose a Knowledge-aware Prompt-tuning approach with synergistic optimization (KnowPrompt). Specifically, we inject latent knowledge contained in relation labels into prompt construction with learnable virtual type words and answer words. Then, we synergistically optimize their representation with structured constraints. Extensive experimental results on five datasets with standard and low-resource settings demonstrate the effectiveness of our approach. Our code and datasets are available in https://github.com/zjunlp/KnowPrompt for reproducibility., Comment: Accepted by WWW2022
Published: 2021
Full Text: View/download PDF

41. Disentangled Contrastive Learning for Learning Robust Textual Representations

Author: Chen, Xiang, Xie, Xin, Bi, Zhen, Ye, Hongbin, Deng, Shumin, Zhang, Ningyu, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Although the self-supervised pre-training of transformer models has resulted in the revolutionizing of natural language processing (NLP) applications and the achievement of state-of-the-art results with regard to various benchmarks, this process is still vulnerable to small and imperceptible permutations originating from legitimate inputs. Intuitively, the representations should be similar in the feature space with subtle input permutations, while large variations occur with different meanings. This motivates us to investigate the learning of robust textual representation in a contrastive manner. However, it is non-trivial to obtain opposing semantic instances for textual samples. In this study, we propose a disentangled contrastive learning method that separately optimizes the uniformity and alignment of representations without negative sampling. Specifically, we introduce the concept of momentum representation consistency to align features and leverage power normalization while conforming the uniformity. Our experimental results for the NLP benchmarks demonstrate that our approach can obtain better results compared with the baselines, as well as achieve promising improvements with invariance tests and adversarial attacks. The code is available in https://github.com/zxlzr/DCL., Comment: Accepted by CICAI 2021
Published: 2021

42. Text-guided Legal Knowledge Graph Reasoning

Author: Li, Luoqiu, Bi, Zhen, Ye, Hongbin, Deng, Shumin, Chen, Hui, and Tou, Huaixiao
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Recent years have witnessed the prosperity of legal artificial intelligence with the development of technologies. In this paper, we propose a novel legal application of legal provision prediction (LPP), which aims to predict the related legal provisions of affairs. We formulate this task as a challenging knowledge graph completion problem, which requires not only text understanding but also graph reasoning. To this end, we propose a novel text-guided graph reasoning approach. We collect amounts of real-world legal provision data from the Guangdong government service website and construct a legal dataset called LegalLPP. Extensive experimental results on the dataset show that our approach achieves better performance compared with baselines. The code and dataset are available in \url{https://github.com/zxlzr/LegalPP} for reproducibility.
Published: 2021

43. Normal vs. Adversarial: Salience-based Analysis of Adversarial Samples for Relation Extraction

Author: Li, Luoqiu, Chen, Xiang, Bi, Zhen, Xie, Xin, Deng, Shumin, Zhang, Ningyu, Tan, Chuanqi, Chen, Mosha, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Databases, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Recent neural-based relation extraction approaches, though achieving promising improvement on benchmark datasets, have reported their vulnerability towards adversarial attacks. Thus far, efforts mostly focused on generating adversarial samples or defending adversarial attacks, but little is known about the difference between normal and adversarial samples. In this work, we take the first step to leverage the salience-based method to analyze those adversarial samples. We observe that salience tokens have a direct correlation with adversarial perturbations. We further find the adversarial perturbations are either those tokens not existing in the training set or superficial cues associated with relation labels. To some extent, our approach unveils the characters against adversarial samples. We release an open-source testbed, "DiagnoseAdv" in https://github.com/zjunlp/DiagnoseAdv., Comment: IJCKG 2021
Published: 2021

44. ZJUKLAB at SemEval-2021 Task 4: Negative Augmentation with Language Model for Reading Comprehension of Abstract Meaning

Author: Xie, Xin, Chen, Xiangnan, Chen, Xiang, Wang, Yong, Zhang, Ningyu, Deng, Shumin, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: This paper presents our systems for the three Subtasks of SemEval Task4: Reading Comprehension of Abstract Meaning (ReCAM). We explain the algorithms used to learn our models and the process of tuning the algorithms and selecting the best model. Inspired by the similarity of the ReCAM task and the language pre-training, we propose a simple yet effective technology, namely, negative augmentation with language model. Evaluation results demonstrate the effectiveness of our proposed approach. Our models achieve the 4th rank on both official test sets of Subtask 1 and Subtask 2 with an accuracy of 87.9% and an accuracy of 92.8%, respectively. We further conduct comprehensive model analysis and observe interesting error cases, which may promote future researches., Comment: Accepted by SemEval-2021 Workshop, ACL-IJCNLP 2021
Published: 2021

45. Bridging Text and Knowledge with Multi-Prototype Embedding for Few-Shot Relational Triple Extraction

Author: Yu, Haiyang, Zhang, Ningyu, Deng, Shumin, Ye, Hongbin, Zhang, Wei, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Databases, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Current supervised relational triple extraction approaches require huge amounts of labeled data and thus suffer from poor performance in few-shot settings. However, people can grasp new knowledge by learning a few instances. To this end, we take the first step to study the few-shot relational triple extraction, which has not been well understood. Unlike previous single-task few-shot problems, relational triple extraction is more challenging as the entities and relations have implicit correlations. In this paper, We propose a novel multi-prototype embedding network model to jointly extract the composition of relational triples, namely, entity pairs and corresponding relations. To be specific, we design a hybrid prototypical learning mechanism that bridges text and knowledge concerning both entities and relations. Thus, implicit correlations between entities and relations are injected. Additionally, we propose a prototype-aware regularization to learn more representative prototypes. Experimental results demonstrate that the proposed method can improve the performance of the few-shot triple extraction., Comment: accepted at COLING 2020
Published: 2020

46. The Devil is the Classifier: Investigating Long Tail Relation Classification with Decoupling Analysis

Author: Yu, Haiyang, Zhang, Ningyu, Deng, Shumin, Yuan, Zonggang, Jia, Yantao, and Chen, Huajun
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Databases, Computer Science - Information Retrieval, Statistics - Machine Learning
Abstract: Long-tailed relation classification is a challenging problem as the head classes may dominate the training phase, thereby leading to the deterioration of the tail performance. Existing solutions usually address this issue via class-balancing strategies, e.g., data re-sampling and loss re-weighting, but all these methods adhere to the schema of entangling learning of the representation and classifier. In this study, we conduct an in-depth empirical investigation into the long-tailed problem and found that pre-trained models with instance-balanced sampling already capture the well-learned representations for all classes; moreover, it is possible to achieve better long-tailed classification ability at low cost by only adjusting the classifier. Inspired by this observation, we propose a robust classifier with attentive relation routing, which assigns soft weights by automatically aggregating the relations. Extensive experiments on two datasets demonstrate the effectiveness of our proposed approach. Code and datasets are available in https://github.com/zjunlp/deepke.
Published: 2020

47. Contrastive Triple Extraction with Generative Transformer

Author: Ye, Hongbin, Zhang, Ningyu, Deng, Shumin, Chen, Mosha, Tan, Chuanqi, Huang, Fei, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Databases, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Triple extraction is an essential task in information extraction for natural language processing and knowledge graph construction. In this paper, we revisit the end-to-end triple extraction task for sequence generation. Since generative triple extraction may struggle to capture long-term dependencies and generate unfaithful triples, we introduce a novel model, contrastive triple extraction with a generative transformer. Specifically, we introduce a single shared transformer module for encoder-decoder-based generation. To generate faithful results, we propose a novel triplet contrastive training object. Moreover, we introduce two mechanisms to further improve model performance (i.e., batch-wise dynamic attention-masking and triple-wise calibration). Experimental results on three datasets (i.e., NYT, WebNLG, and MIE) show that our approach achieves better performance than that of baselines., Comment: Accepted by AAAI 2021
Published: 2020

48. On Robustness and Bias Analysis of BERT-based Relation Extraction

Author: Li, Luoqiu, Chen, Xiang, Ye, Hongbin, Bi, Zhen, Deng, Shumin, Zhang, Ningyu, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Databases, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Fine-tuning pre-trained models have achieved impressive performance on standard natural language processing benchmarks. However, the resultant model generalizability remains poorly understood. We do not know, for example, how excellent performance can lead to the perfection of generalization models. In this study, we analyze a fine-tuned BERT model from different perspectives using relation extraction. We also characterize the differences in generalization techniques according to our proposed improvements. From empirical experimentation, we find that BERT suffers a bottleneck in terms of robustness by way of randomizations, adversarial and counterfactual tests, and biases (i.e., selection and semantic). These findings highlight opportunities for future improvements. Our open-sourced testbed DiagnoseRE is available in \url{https://github.com/zjunlp/DiagnoseRE}., Comment: work in progress
Published: 2020

49. Relation Adversarial Network for Low Resource Knowledge Graph Completion

Author: Zhang, Ningyu, Deng, Shumin, Sun, Zhanlin, Chen, Jiaoayan, Zhang, Wei, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Databases, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Knowledge Graph Completion (KGC) has been proposed to improve Knowledge Graphs by filling in missing connections via link prediction or relation extraction. One of the main difficulties for KGC is a low resource problem. Previous approaches assume sufficient training triples to learn versatile vectors for entities and relations, or a satisfactory number of labeled sentences to train a competent relation extraction model. However, low resource relations are very common in KGs, and those newly added relations often do not have many known samples for training. In this work, we aim at predicting new facts under a challenging setting where only limited training instances are available. We propose a general framework called Weighted Relation Adversarial Network, which utilizes an adversarial procedure to help adapt knowledge/features learned from high resource relations to different but related low resource relations. Specifically, the framework takes advantage of a relation discriminator to distinguish between samples from different relations, and help learn relation-invariant features more transferable from source relations to target relations. Experimental results show that the proposed approach outperforms previous methods regarding low resource settings for both link prediction and relation extraction., Comment: WWW2020
Published: 2019

50. Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection

Author: Deng, Shumin, Zhang, Ningyu, Kang, Jiaojian, Zhang, Yichi, Zhang, Wei, and Chen, Huajun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Event detection (ED), a sub-task of event extraction, involves identifying triggers and categorizing event mentions. Existing methods primarily rely upon supervised learning and require large-scale labeled event datasets which are unfortunately not readily available in many real-life applications. In this paper, we consider and reformulate the ED task with limited labeled data as a Few-Shot Learning problem. We propose a Dynamic-Memory-Based Prototypical Network (DMB-PN), which exploits Dynamic Memory Network (DMN) to not only learn better prototypes for event types, but also produce more robust sentence encodings for event mentions. Differing from vanilla prototypical networks simply computing event prototypes by averaging, which only consume event mentions once, our model is more robust and is capable of distilling contextual information from event mentions for multiple times due to the multi-hop mechanism of DMNs. The experiments show that DMB-PN not only deals with sample scarcity better than a series of baseline models but also performs more robustly when the variety of event types is relatively large and the instance quantity is extremely small., Comment: Accepted by WSDM 2020
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

274 results on '"Deng, Shumin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources