Back to Search Start Over

Entity Insertion in Multilingual Linked Corpora: The Case of Wikipedia

Authors :
Feith, Tomás
Arora, Akhil
Gerlach, Martin
Paul, Debjit
West, Robert
Publication Year :
2024

Abstract

Links are a fundamental part of information networks, turning isolated pieces of knowledge into a network of information that is much richer than the sum of its parts. However, adding a new link to the network is not trivial: it requires not only the identification of a suitable pair of source and target entities but also the understanding of the content of the source to locate a suitable position for the link in the text. The latter problem has not been addressed effectively, particularly in the absence of text spans in the source that could serve as anchors to insert a link to the target entity. To bridge this gap, we introduce and operationalize the task of entity insertion in information networks. Focusing on the case of Wikipedia, we empirically show that this problem is, both, relevant and challenging for editors. We compile a benchmark dataset in 105 languages and develop a framework for entity insertion called LocEI (Localized Entity Insertion) and its multilingual variant XLocEI. We show that XLocEI outperforms all baseline models (including state-of-the-art prompt-based ranking with LLMs such as GPT-4) and that it can be applied in a zero-shot manner on languages not seen during training with minimal performance drop. These findings are important for applying entity insertion models in practice, e.g., to support editors in adding links across the more than 300 language versions of Wikipedia.<br />Comment: EMNLP 2024; 24 pages; 62 figures

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2410.04254
Document Type :
Working Paper