Back to Search Start Over

Entity Linking for Historical Documents: Challenges and Solutions

Authors :
Pontes, Elvys Linhares
Cabrera-Diego, Luis Adrián
Moreno, José G.
Boros, Emanuela
Pontes, Elvys
Hamdi, Ahmed
Sidère, Nicolas
Coustaty, Mickaël
Doucet, Antoine
Laboratoire Informatique, Image et Interaction - EA 2118 (L3I)
Université de La Rochelle (ULR)
Recherche d’Information et Synthèse d’Information (IRIT-IRIS)
Institut de recherche en informatique de Toulouse (IRIT)
Université Toulouse 1 Capitole (UT1)
Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3)
Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP)
Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1)
Université Fédérale Toulouse Midi-Pyrénées
Source :
Digital Libraries at Times of Massive Societal Transition ISBN: 9783030644512, ICADL, 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, 12504, Springer, pp.215-231, 2020, Lecture Notes in Computer Science, 978-3-030-64452-9. ⟨10.1007/978-3-030-64452-9_19⟩, Digital Libraries at Times of Massive Societal Transition-22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Kyoto, Japan, November 30 – December 1, 2020, Proceedings, Lecture Notes in Computer Science, Lecture Notes in Computer Science-Digital Libraries at Times of Massive Societal Transition
Publication Year :
2020
Publisher :
Springer International Publishing, 2020.

Abstract

International audience; Named entities (NEs) are among the most relevant type of information that can be used to efficiently index and retrieve digital documents. Furthermore, the use of Entity Linking (EL) to disambiguate and relate NEs to knowledge bases, provides supplementary information which can be useful to differentiate ambiguous elements such as geographical locations and peoples' names. In historical documents, the detection and disambiguation of NEs is a challenge. Most historical documents are converted into plain text using an optical character recognition (OCR) system at the expense of some noise. Documents in digital libraries will, therefore, be indexed with errors that may hinder their accessibility. OCR errors affect not only document indexing but the detection, disambiguation, and linking of NEs. This paper aims at analysing the performance of different EL approaches on two multilingual historical corpora, CLEF HIPE 2020 (English, French, German) and NewsEye (Finnish, French, German, Swedish), while proposes several techniques for alleviating the impact of historical data problems on the EL task. Our findings indicate that the proposed approaches not only outperform the baseline in both corpora but additionally they considerably reduce the impact of historical document issues on different subjects and languages.

Details

ISBN :
978-3-030-64451-2
978-3-030-64452-9
ISSN :
03029743 and 16113349
ISBNs :
9783030644512 and 9783030644529
Database :
OpenAIRE
Journal :
Digital Libraries at Times of Massive Societal Transition ISBN: 9783030644512, ICADL, 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, 12504, Springer, pp.215-231, 2020, Lecture Notes in Computer Science, 978-3-030-64452-9. ⟨10.1007/978-3-030-64452-9_19⟩, Digital Libraries at Times of Massive Societal Transition-22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Kyoto, Japan, November 30 – December 1, 2020, Proceedings, Lecture Notes in Computer Science, Lecture Notes in Computer Science-Digital Libraries at Times of Massive Societal Transition
Accession number :
edsair.doi.dedup.....96661764359000eda0bca3d4466bc978