Author: "Yu, Bowen" / Topic: fos: computer and information sciences - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yu, Bowen"' showing total 18 results

Start Over Author "Yu, Bowen" Topic fos: computer and information sciences

18 results on '"Yu, Bowen"'

1. Causal Document-Grounded Dialogue Pre-training

Author: Zhao, Yingxiu, Yu, Bowen, Yu, Haiyang, Li, Bowen, Li, Jinyang, Wang, Chao, Huang, Fei, Li, Yongbin, and Zhang, Nevin L.
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: The goal of document-grounded dialogue (DocGD) is to generate a response by grounding the evidence in a supporting document in accordance with the dialogue context. This process involves four variables that are causally connected. Recently, task-specific pre-training has greatly boosted performances on many downstream tasks. Existing DocGD methods, however, continue to rely on general pre-trained language models without a specifically tailored pre-training approach that explicitly captures the causal relationships. To tackle this issue, we are the first to present a causally-complete dataset construction strategy for building million-level DocGD pre-training corpora. To better capture causality, we further propose a causally-perturbed pre-training strategy, which introduces causal perturbations on the variables and optimizes the overall causal effect. Experiments on three benchmark datasets demonstrate that our causal pre-training achieves considerable and consistent improvements under fully-supervised, low-resource, few-shot, and zero-shot settings., Work in progress
Published: 2023

2. Domain Incremental Lifelong Learning in an Open World

Author: Dai, Yi, Lang, Hao, Zheng, Yinhe, Yu, Bowen, Huang, Fei, and Li, Yongbin
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Lifelong learning (LL) is an important ability for NLP models to learn new tasks continuously. Architecture-based approaches are reported to be effective implementations for LL models. However, it is non-trivial to extend previous approaches to domain incremental LL scenarios since they either require access to task identities in the testing phase or cannot handle samples from unseen tasks. In this paper, we propose \textbf{Diana}: a \underline{d}ynam\underline{i}c \underline{a}rchitecture-based lifelo\underline{n}g le\underline{a}rning model that tries to learn a sequence of tasks with a prompt-enhanced language model. Four types of hierarchically organized prompts are used in Diana to capture knowledge from different granularities. Specifically, we dedicate task-level prompts to capture task-specific knowledge to retain high LL performances and maintain instance-level prompts to learn knowledge shared across input samples to improve the model's generalization performance. Moreover, we dedicate separate prompts to explicitly model unseen tasks and introduce a set of prompt key vectors to facilitate knowledge sharing between tasks. Extensive experiments demonstrate that Diana outperforms state-of-the-art LL models, especially in handling unseen tasks. We release the code and data at \url{https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/diana}., ACL2023 Findings Long Paper. arXiv admin note: substantial text overlap with arXiv:2208.14602
Published: 2023

3. API-Bank: A Benchmark for Tool-Augmented LLMs

Author: Li, Minghao, Song, Feifan, Yu, Bowen, Yu, Haiyang, Li, Zhoujun, Huang, Fei, and Li, Yongbin
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: Recent research has shown that Large Language Models (LLMs) can utilize external tools to improve their contextual processing abilities, moving away from the pure language modeling paradigm and paving the way for Artificial General Intelligence. Despite this, there has been a lack of systematic evaluation to demonstrate the efficacy of LLMs using tools to respond to human instructions. This paper presents API-Bank, the first benchmark tailored for Tool-Augmented LLMs. API-Bank includes 53 commonly used API tools, a complete Tool-Augmented LLM workflow, and 264 annotated dialogues that encompass a total of 568 API calls. These resources have been designed to thoroughly evaluate LLMs' ability to plan step-by-step API calls, retrieve relevant APIs, and correctly execute API calls to meet human needs. The experimental results show that GPT-3.5 emerges the ability to use the tools relative to GPT3, while GPT-4 has stronger planning performance. Nevertheless, there remains considerable scope for further improvement when compared to human performance. Additionally, detailed error analysis and case studies demonstrate the feasibility of Tool-Augmented LLMs for daily use, as well as the primary challenges that future research needs to address.
Published: 2023

4. Preference Ranking Optimization for Human Alignment

Author: Song, Feifan, Yu, Bowen, Li, Minghao, Yu, Haiyang, Huang, Fei, Li, Yongbin, and Wang, Houfeng
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: Large language models (LLMs) often contain misleading content, emphasizing the need to align them with human values to ensure secur AI systems. Reinforcement learning from human feedback (RLHF) has been employed to achieve this alignment by combining a reward model, typically based on Bradley-Terry paired comparison, with an RL algorithm such as Proximal Policy Optimization (PPO) to optimize LLM responses. However, RLHF exhibits complexity, instability, and sensitivity to hyperparameters. In this paper, we propose Preference Ranking Optimization (PRO) as an alternative to PPO for directly aligning LLMs with the Bradley-Terry comparison. PRO extends the pairwise Bradley-Terry comparison to accommodate preference rankings of any length. By iteratively contrasting the likelihood of generating responses, PRO instructs the LLM to prioritize the best response while progressively ranking the remaining responses. In this manner, PRO effectively transforms human alignment into aligning the probability ranking of $n$ responses generated by LLM with the preference ranking of humans towards these responses. Experiments have shown that PRO outperforms existing alignment algorithms, achieving comparable results to ChatGPT and human responses through automatic-based, reward-based, GPT-4, and human evaluations. Furthermore, we demonstrate that longer, more diverse, and higher-quality preference ranking sequences can consistently enhance the performance of human alignment.
Published: 2023
Full Text: View/download PDF

5. PolyLM: An Open Source Polyglot Large Language Model

Author: Wei, Xiangpeng, Wei, Haoran, Lin, Huan, Li, Tianhao, Zhang, Pei, Ren, Xingzhang, Li, Mei, Wan, Yu, Cao, Zhiwei, Xie, Binbin, Hu, Tianxiang, Li, Shangjie, Hui, Binyuan, Yu, Bowen, Liu, Dayiheng, Yang, Baosong, Huang, Fei, and Xie, Jun
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Large language models (LLMs) demonstrate remarkable ability to comprehend, reason, and generate following nature language instructions. However, the development of LLMs has been primarily focused on high-resource languages, such as English, thereby limiting their applicability and research in other languages. Consequently, we present PolyLM, a multilingual LLM trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B. To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training. Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning. To assess the model's performance, we collect several existing multilingual tasks, including multilingual understanding, question answering, generation, and translation. Extensive experiments show that PolyLM surpasses other open-source models such as LLaMA and BLOOM on multilingual tasks while maintaining comparable performance in English. Our models, alone with the instruction data and multilingual benchmark, are available at: \url{https://modelscope.cn/models/damo/nlp_polylm_13b_text_generation}.
Published: 2023
Full Text: View/download PDF

6. ID-MixGCL: Identity Mixup for Graph Contrastive Learning

Author: Zhang, Gehang, Yu, Bowen, Cao, Jiangxia, Zhang, Xinghua, Liu, Tingwen, and Zhou, Chuan
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)
Abstract: Recently developed graph contrastive learning (GCL) approaches compare two different "views" of the same graph in order to learn node/graph representations. The core assumption of these approaches is that by graph augmentation, it is possible to generate several structurally different but semantically similar graph structures, and therefore, the identity labels of the original and augmented graph/nodes should be identical. However, in this paper, we observe that this assumption does not always hold, for example, any perturbation to nodes or edges in a molecular graph will change the graph labels to some degree. Therefore, we believe that augmenting the graph structure should be accompanied by an adaptation of the labels used for the contrastive loss. Based on this idea, we propose ID-MixGCL, which allows for simultaneous modulation of both the input graph and the corresponding identity labels, with a controllable degree of change, leading to the capture of fine-grained representations from unlabeled graphs. Experimental results demonstrate that ID-MixGCL improves performance on graph classification and node classification tasks, as demonstrated by significant improvements on the Cora, IMDB-B, and IMDB-M datasets compared to state-of-the-art techniques, by 3-29% absolute points., Comment: 9 pages, 6 figures
Published: 2023
Full Text: View/download PDF

7. Unified Language Representation for Question Answering over Text, Tables, and Images

Author: Yu, Bowen, Fu, Cheng, Yu, Haiyang, Huang, Fei, and Li, Yongbin
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: When trying to answer complex questions, people often rely on multiple sources of information, such as visual, textual, and tabular data. Previous approaches to this problem have focused on designing input features or model structure in the multi-modal space, which is inflexible for cross-modal reasoning or data-efficient training. In this paper, we call for an alternative paradigm, which transforms the images and tables into unified language representations, so that we can simplify the task into a simpler textual QA problem that can be solved using three steps: retrieval, ranking, and generation, all within a language space. This idea takes advantage of the power of pre-trained language models and is implemented in a framework called Solar. Our experimental results show that Solar outperforms all existing methods by 10.6-32.3 pts on two datasets, MultimodalQA and MMCoQA, across ten different metrics. Additionally, Solar achieves the best performance on the WebQA leaderboard, Comment: Findings of ACL 2023
Published: 2023
Full Text: View/download PDF

8. Towards Generalized Open Information Extraction

Author: Yu, Bowen, Zhang, Zhenyu, Li, Jingyang, Yu, Haiyang, Liu, Tingwen, Sun, Jian, Li, Yongbin, and Wang, Bin
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Open Information Extraction (OpenIE) facilitates the open-domain discovery of textual facts. However, the prevailing solutions evaluate OpenIE models on in-domain test sets aside from the training corpus, which certainly violates the initial task principle of domain-independence. In this paper, we propose to advance OpenIE towards a more realistic scenario: generalizing over unseen target domains with different data distributions from the source training domains, termed Generalized OpenIE. For this purpose, we first introduce GLOBE, a large-scale human-annotated multi-domain OpenIE benchmark, to examine the robustness of recent OpenIE models to domain shifts, and the relative performance degradation of up to 70% implies the challenges of generalized OpenIE. Then, we propose DragonIE, which explores a minimalist graph expression of textual fact: directed acyclic graph, to improve the OpenIE generalization. Extensive experiments demonstrate that DragonIE beats the previous methods in both in-domain and out-of-domain settings by as much as 6.0% in F1 score absolutely, but there is still ample room for improvement., EMNLP 2022
Published: 2022

9. Prompt Conditioned VAE: Enhancing Generative Replay for Lifelong Learning in Task-Oriented Dialogue

Author: Zhao, Yingxiu, Zheng, Yinhe, Tian, Zhiliang, Gao, Chang, Yu, Bowen, Yu, Haiyang, Li, Yongbin, Sun, Jian, and Zhang, Nevin L.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Lifelong learning (LL) is vital for advanced task-oriented dialogue (ToD) systems. To address the catastrophic forgetting issue of LL, generative replay methods are widely employed to consolidate past knowledge with generated pseudo samples. However, most existing generative replay methods use only a single task-specific token to control their models. This scheme is usually not strong enough to constrain the generative model due to insufficient information involved. In this paper, we propose a novel method, prompt conditioned VAE for lifelong learning (PCLL), to enhance generative replay by incorporating tasks' statistics. PCLL captures task-specific distributions with a conditional variational autoencoder, conditioned on natural language prompts to guide the pseudo-sample generation. Moreover, it leverages a distillation process to further consolidate past knowledge by alleviating the noise in pseudo samples. Experiments on natural language understanding tasks of ToD systems demonstrate that PCLL significantly outperforms competitive baselines in building LL models., EMNLP2022 Long Paper (Main Track)
Published: 2022

10. Document-Level Event Extraction via Human-Like Reading Process

Author: Cui, Shiyao, Cong, Xin, Yu, Bowen, Liu, Tingwen, Wang, Yucheng, and Shi, Jinqiao
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Document-level Event Extraction (DEE) is particularly tricky due to the two challenges it poses: scattering-arguments and multi-events. The first challenge means that arguments of one event record could reside in different sentences in the document, while the second one reflects one document may simultaneously contain multiple such event records. Motivated by humans' reading cognitive to extract information of interests, in this paper, we propose a method called HRE (Human Reading inspired Extractor for Document Events), where DEE is decomposed into these two iterative stages, rough reading and elaborate reading. Specifically, the first stage browses the document to detect the occurrence of events, and the second stage serves to extract specific event arguments. For each concrete event role, elaborate reading hierarchically works from sentences to characters to locate arguments across sentences, thus the scattering-arguments problem is tackled. Meanwhile, rough reading is explored in a multi-round manner to discover undetected events, thus the multi-events problem is handled. Experiment results show the superiority of HRE over prior competitive methods., Comment: To apper in ICASSP2022
Published: 2022

11. A Survey on Neural Open Information Extraction: Current Status and Future Directions

Author: Zhou, Shaowen, Yu, Bowen, Sun, Aixin, Long, Cheng, Li, Jingyang, Yu, Haiyang, Sun, Jian, and Li, Yongbin
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Open Information Extraction (OpenIE) facilitates domain-independent discovery of relational facts from large corpora. The technique well suits many open-world natural language understanding scenarios, such as automatic knowledge base construction, open-domain question answering, and explicit reasoning. Thanks to the rapid development in deep learning technologies, numerous neural OpenIE architectures have been proposed and achieve considerable performance improvement. In this survey, we provide an extensive overview of the-state-of-the-art neural OpenIE models, their key design decisions, strengths and weakness. Then, we discuss limitations of current solutions and the open issues in OpenIE problem itself. Finally we list recent trends that could help expand its scope and applicability, setting up promising directions for future research in OpenIE. To our best knowledge, this paper is the first review on this specific topic., Comment: Accepted by IJCAI22 survey track
Published: 2022
Full Text: View/download PDF

12. Profiling and Evolution of Intellectual Property

Author: Yu, Bowen, Shao, Yingxia, and Li, Ang
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Information Retrieval (cs.IR), Computer Science - Information Retrieval
Abstract: In recent years, with the rapid growth of Internet data, the number and types of scientific and technological resources are also rapidly expanding. However, the increase in the number and category of information data will also increase the cost of information acquisition. For technology-based enterprises or users, in addition to general papers, patents, etc., policies related to technology or the development of their industries should also belong to a type of scientific and technological resources. The cost and difficulty of acquiring users. Extracting valuable science and technology policy resources from a huge amount of data with mixed contents and providing accurate and fast retrieval will help to break down information barriers and reduce the cost of information acquisition, which has profound social significance and social utility. This article focuses on the difficulties and problems in the field of science and technology policy, and introduces related technologies and developments., Comment: There are some problems in the conclusions and analysis of this paper, and we need to withdraw this paper
Published: 2022
Full Text: View/download PDF

13. Web Page Content Extraction Based on Multi-feature Fusion

Author: Yu, Bowen, Du, Junping, and Shao, Yingxia
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Information Retrieval (cs.IR), Computer Science - Information Retrieval
Abstract: With the rapid development of Internet technology, people have more and more access to a variety of web page resources. At the same time, the current rapid development of deep learning technology is often inseparable from the huge amount of Web data resources. On the other hand, NLP is also an important part of data processing technology, such as web page data extraction. At present, the extraction technology of web page text mainly uses a single heuristic function or strategy, and most of them need to determine the threshold manually. With the rapid growth of the number and types of web resources, there are still problems to be solved when using a single strategy to extract the text information of different pages. This paper proposes a web page text extraction algorithm based on multi-feature fusion. According to the text information characteristics of web resources, DOM nodes are used as the extraction unit to design multiple statistical features, and high-order features are designed according to heuristic strategies. This method establishes a small neural network, takes multiple features of DOM nodes as input, predicts whether the nodes contain text information, makes full use of different statistical information and extraction strategies, and adapts to more types of pages. Experimental results show that this method has a good ability of web page text extraction and avoids the problem of manually determining the threshold., Comment: Part of the content is for other purposes
Published: 2022
Full Text: View/download PDF

14. Programming Matrices as Staged Sparse Rows to Generate Efficient Matrix-free Differential Equation Solver

Author: Cao, Huanqi, Tang, Shizhi, Yu, Bowen, and Chen, Wenguang
Subjects: FOS: Computer and information sciences, Computer Science - Distributed, Parallel, and Cluster Computing, MathematicsofComputing_NUMERICALANALYSIS, Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Solving differential equations is a critical task in scientific computing. Domain-specific languages (DSLs) have been a promising direction in achieving performance and productivity, but the current state of the art only supports stencil computation, leaving solvers requiring loop-carried dependencies aside. Alternatively, sparse matrices can represent such equation solvers and are more general than existing DSLs, but the performance is sacrificed. This paper points out that sparse matrices can be represented as programs instead of data, having both the generality from the matrix-based representation and the performance from program optimizations. Based on the idea, we propose the Staged Sparse Row (SSR) sparse matrix representation that can efficiently cover applications on structured grids. With SSR representation, users can intuitively define SSR matrices using generator functions and use SSR matrices through a concise object-oriented interface. SSR matrices can then be chained and applied to construct the algorithm, including those with loop-carried dependences. We then apply a set of dedicated optimizations, and ultimately simplify the SSR matrix-based codes into straightforward matrix-free ones, which are efficient and friendly for further analysis. Implementing BT pseudo application in the NAS Parallel Benchmark, with less than $10\%$ lines of code compared with the matrix-free reference FORTRAN implementation, we achieved up to $92.8\%$ performance. Implementing a matrix-free variant for the High-Performance Conjugate Gradient benchmark, we achieve $3.29\times$ performance compared with the reference implementation, while our implementation shares the same algorithm on the same programming abstraction, which is sparse matrices.
Published: 2022
Full Text: View/download PDF

15. Semi-Supervised Lifelong Language Learning

Author: Zhao, Yingxiu, Zheng, Yinhe, Yu, Bowen, Tian, Zhiliang, Lee, Dongkyu, Sun, Jian, Yu, Haiyang, Li, Yongbin, and Zhang, Nevin L.
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Lifelong learning aims to accumulate knowledge and alleviate catastrophic forgetting when learning tasks sequentially. However, existing lifelong language learning methods only focus on the supervised learning setting. Unlabeled data, which can be easily accessed in real-world scenarios, are underexplored. In this paper, we explore a novel setting, semi-supervised lifelong language learning (SSLL), where a model learns sequentially arriving language tasks with both labeled and unlabeled data. We propose an unlabeled data enhanced lifelong learner to explore SSLL. Specially, we dedicate task-specific modules to alleviate catastrophic forgetting and design two modules to exploit unlabeled data: (1) a virtual supervision enhanced task solver is constructed on a teacher-student framework to mine the underlying knowledge from unlabeled data; and (2) a backward augmented learner is built to encourage knowledge transfer from newly arrived unlabeled data to previous tasks. Experimental results on various language tasks demonstrate our model's effectiveness and superiority over competitive baselines under the new setting SSLL., Comment: EMNLP Findings 2022 Long Paper
Published: 2022
Full Text: View/download PDF

16. Label Enhanced Event Detection with Heterogeneous Graph Attention Networks

Author: Cui, Shiyao, Yu, Bowen, Cong, Xin, Liu, Tingwen, Li, Quangang, and Shi, Jinqiao
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Event Detection (ED) aims to recognize instances of specified types of event triggers in text. Different from English ED, Chinese ED suffers from the problem of word-trigger mismatch due to the uncertain word boundaries. Existing approaches injecting word information into character-level models have achieved promising progress to alleviate this problem, but they are limited by two issues. First, the interaction between characters and lexicon words is not fully exploited. Second, they ignore the semantic information provided by event labels. We thus propose a novel architecture named Label enhanced Heterogeneous Graph Attention Networks (L-HGAT). Specifically, we transform each sentence into a graph, where character nodes and word nodes are connected with different types of edges, so that the interaction between words and characters is fully reserved. A heterogeneous graph attention networks is then introduced to propagate relational message and enrich information interaction. Furthermore, we convert each label into a trigger-prototype-based embedding, and design a margin loss to guide the model distinguish confusing event labels. Experiments on two benchmark datasets show that our model achieves significant improvement over a range of competitive baseline methods.
Published: 2020

17. Inductive Unsupervised Domain Adaptation for Few-Shot Classification via Clustering

Author: Cong, Xin, Yu, Bowen, Liu, Tingwen, Cui, Shiyao, Tang, Hengzhu, and Wang, Bin
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, ComputingMethodologies_PATTERNRECOGNITION, Computation and Language (cs.CL)
Abstract: Few-shot classification tends to struggle when it needs to adapt to diverse domains. Due to the non-overlapping label space between domains, the performance of conventional domain adaptation is limited. Previous work tackles the problem in a transductive manner, by assuming access to the full set of test data, which is too restrictive for many real-world applications. In this paper, we set out to tackle this issue by introducing a inductive framework, DaFeC, to improve Domain adaptation performance for Few-shot classification via Clustering. We first build a representation extractor to derive features for unlabeled data from the target domain (no test data is necessary) and then group them with a cluster miner. The generated pseudo-labeled data and the labeled source-domain data are used as supervision to update the parameters of the few-shot classifier. In order to derive high-quality pseudo labels, we propose a Clustering Promotion Mechanism, to learn better features for the target domain via Similarity Entropy Minimization and Adversarial Distribution Alignment, which are combined with a Cosine Annealing Strategy. Experiments are performed on the FewRel 2.0 dataset. Our approach outperforms previous work with absolute gains (in classification accuracy) of 4.95%, 9.55%, 3.99% and 11.62%, respectively, under four few-shot settings., Comment: Accepted by ECML-PKDD 2020
Published: 2020
Full Text: View/download PDF

18. Joint Extraction of Entities and Relations Based on a Novel Decomposition Strategy

Author: Yu, Bowen, Zhang, Zhenyu, Shu, Xiaobo, Wang, Yubin, Liu, Tingwen, Wang, Bin, and Li, Sujian
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Joint extraction of entities and relations aims to detect entity pairs along with their relations using a single model. Prior work typically solves this task in the extract-then-classify or unified labeling manner. However, these methods either suffer from the redundant entity pairs, or ignore the important inner structure in the process of extracting entities and relations. To address these limitations, in this paper, we first decompose the joint extraction task into two interrelated subtasks, namely HE extraction and TER extraction. The former subtask is to distinguish all head-entities that may be involved with target relations, and the latter is to identify corresponding tail-entities and relations for each extracted head-entity. Next, these two subtasks are further deconstructed into several sequence labeling problems based on our proposed span-based tagging scheme, which are conveniently solved by a hierarchical boundary tagger and a multi-span decoding algorithm. Owing to the reasonable decomposition strategy, our model can fully capture the semantic interdependency between different steps, as well as reduce noise from irrelevant entity pairs. Experimental results show that our method outperforms previous work by 5.2%, 5.9% and 21.5% (F1 score), achieving a new state-of-the-art on three public datasets, Accepted by ECAI 2020. Code and data are available at https://github.com/yubowen-ph/JointER
Published: 2019

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

18 results on '"Yu, Bowen"'

1. Causal Document-Grounded Dialogue Pre-training

2. Domain Incremental Lifelong Learning in an Open World

3. API-Bank: A Benchmark for Tool-Augmented LLMs

4. Preference Ranking Optimization for Human Alignment

5. PolyLM: An Open Source Polyglot Large Language Model

6. ID-MixGCL: Identity Mixup for Graph Contrastive Learning

7. Unified Language Representation for Question Answering over Text, Tables, and Images

8. Towards Generalized Open Information Extraction

9. Prompt Conditioned VAE: Enhancing Generative Replay for Lifelong Learning in Task-Oriented Dialogue

10. Document-Level Event Extraction via Human-Like Reading Process

11. A Survey on Neural Open Information Extraction: Current Status and Future Directions

12. Profiling and Evolution of Intellectual Property

13. Web Page Content Extraction Based on Multi-feature Fusion

14. Programming Matrices as Staged Sparse Rows to Generate Efficient Matrix-free Differential Equation Solver

15. Semi-Supervised Lifelong Language Learning

16. Label Enhanced Event Detection with Heterogeneous Graph Attention Networks

17. Inductive Unsupervised Domain Adaptation for Few-Shot Classification via Clustering

18. Joint Extraction of Entities and Relations Based on a Novel Decomposition Strategy

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

Publisher

18 results on '"Yu, Bowen"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources