Back to Search
Start Over
Entity Linking for Historical Documents: Challenges and Solutions
- Source :
- Digital Libraries at Times of Massive Societal Transition ISBN: 9783030644512, ICADL, 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, 12504, Springer, pp.215-231, 2020, Lecture Notes in Computer Science, 978-3-030-64452-9. ⟨10.1007/978-3-030-64452-9_19⟩, Digital Libraries at Times of Massive Societal Transition-22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Kyoto, Japan, November 30 – December 1, 2020, Proceedings, Lecture Notes in Computer Science, Lecture Notes in Computer Science-Digital Libraries at Times of Massive Societal Transition
- Publication Year :
- 2020
- Publisher :
- Springer International Publishing, 2020.
-
Abstract
- International audience; Named entities (NEs) are among the most relevant type of information that can be used to efficiently index and retrieve digital documents. Furthermore, the use of Entity Linking (EL) to disambiguate and relate NEs to knowledge bases, provides supplementary information which can be useful to differentiate ambiguous elements such as geographical locations and peoples' names. In historical documents, the detection and disambiguation of NEs is a challenge. Most historical documents are converted into plain text using an optical character recognition (OCR) system at the expense of some noise. Documents in digital libraries will, therefore, be indexed with errors that may hinder their accessibility. OCR errors affect not only document indexing but the detection, disambiguation, and linking of NEs. This paper aims at analysing the performance of different EL approaches on two multilingual historical corpora, CLEF HIPE 2020 (English, French, German) and NewsEye (Finnish, French, German, Swedish), while proposes several techniques for alleviating the impact of historical data problems on the EL task. Our findings indicate that the proposed approaches not only outperform the baseline in both corpora but additionally they considerably reduce the impact of historical document issues on different subjects and languages.
- Subjects :
- Historical data
Computer science
InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL
02 engineering and technology
computer.software_genre
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Entity linking
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
[INFO.INFO-DL]Computer Science [cs]/Digital Libraries [cs.DL]
[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]
Digital libraries
business.industry
Plain text
Search engine indexing
Deep learning
Optical character recognition
computer.file_format
Digital library
Clef
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
Index (publishing)
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
020201 artificial intelligence & image processing
Artificial intelligence
business
computer
Historical document
Natural language processing
Subjects
Details
- ISBN :
- 978-3-030-64451-2
978-3-030-64452-9 - ISSN :
- 03029743 and 16113349
- ISBNs :
- 9783030644512 and 9783030644529
- Database :
- OpenAIRE
- Journal :
- Digital Libraries at Times of Massive Societal Transition ISBN: 9783030644512, ICADL, 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, 12504, Springer, pp.215-231, 2020, Lecture Notes in Computer Science, 978-3-030-64452-9. ⟨10.1007/978-3-030-64452-9_19⟩, Digital Libraries at Times of Massive Societal Transition-22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Kyoto, Japan, November 30 – December 1, 2020, Proceedings, Lecture Notes in Computer Science, Lecture Notes in Computer Science-Digital Libraries at Times of Massive Societal Transition
- Accession number :
- edsair.doi.dedup.....96661764359000eda0bca3d4466bc978