Back to Search
Start Over
Ensemble Named Entity Recognition (NER): Evaluating NER Tools in the Identification of Place Names in Historical Corpora
- Source :
- Frontiers in Digital Humanities, Vol 5 (2018)
- Publication Year :
- 2018
- Publisher :
- Frontiers Media SA, 2018.
-
Abstract
- The field of Spatial Humanities has advanced substantially in the past years. The identification and extraction of toponyms and spatial information mentioned in historical text collections has allowed its use in innovative ways, making possible the application of spatial analysis and the mapping of these places with Geographic Information Systems. For instance, automated place name identification is nowadays possible with Named Entity Recognition (NER) systems. Statistical NER methods based on supervised learning, in particular, are highly successful with modern datasets. However, there are still major challenges to address when dealing with historical corpora. These challenges include language changes over time, spelling variations, transliterations, OCR errors, and sources written in multiple languages among others. In this article, considering a task of place name recognition over two collections of historical correspondence, we report an evaluation of five NER systems and an approach that combines these through a voting system. We found that although individual performance of each NER system was corpus dependent, the ensemble combination was able to achieve consistent measures of precision and recall, outperforming the individual NER systems. Additionally, the results showed that these NER system are not strongly dependent on pre-processing and translation to modern English.
- Subjects :
- Geographic information system
Computer science
0211 other engineering and technologies
Spatial Humanities
toponym recognition
02 engineering and technology
computer.software_genre
lcsh:QA75.5-76.95
Digital Humanities
historical corpora
Named-entity recognition
lcsh:AZ20-999
natural language processing
Spatial analysis
021101 geological & geomatics engineering
060201 languages & linguistics
Modern English
business.industry
Supervised learning
06 humanities and the arts
lcsh:History of scholarship and learning. The humanities
Field (geography)
language.human_language
Identification (information)
0602 languages and literature
Early-Modern English
language
lcsh:Electronic computers. Computer science
Artificial intelligence
Precision and recall
business
computer
Natural language processing
Subjects
Details
- ISSN :
- 22972668
- Volume :
- 5
- Database :
- OpenAIRE
- Journal :
- Frontiers in Digital Humanities
- Accession number :
- edsair.doi.dedup.....b812993907c2af7ef7c1244260bbe03e