Back to Search
Start Over
Learning Advanced Similarities and Training Features for Toponym Interlinking
- Source :
- Lecture Notes in Computer Science ISBN: 9783030454388, ECIR (1)
- Publication Year :
- 2020
- Publisher :
- Springer International Publishing, 2020.
-
Abstract
- Interlinking of spatio-textual entities is an open and quite challenging research problem, with application in several commercial fields, including geomarketing, navigation and social networks. It comprises the process of identifying, between different data sources, entity descriptions that refer to the same real-world entity. In this work, we focus on toponym interlinking, that is we handle spatio-textual entities that are exclusively represented by their name; additional properties, such as categories, coordinates, etc. are considered as either absent or of too low quality to be exploited in this setting. Toponyms are inherently heterogeneous entities; quite often several alternative names exist for the same toponym, with varying degrees of similarity between these names. State of the art approaches adopt mostly generic, domain-agnostic similarity functions and use them as is, or incorporate them as training features within classifiers for performing toponym interlinking. We claim that capturing the specificities of toponyms and exploiting them into elaborate meta-similarity functions and derived training features can significantly increase the effectiveness of interlinking methods. To this end, we propose the LGM-Sim meta-similarity function and a series of novel, similarity-based and statistical training features that can be utilized in similarity-based and classification-based interlinking settings respectively. We demonstrate that the proposed methods achieve large increases in accuracy, in both settings, compared to several methods from the literature in the widely used Geonames toponym dataset.
- Subjects :
- Information retrieval
010504 meteorology & atmospheric sciences
Computer science
Process (engineering)
media_common.quotation_subject
Feature extraction
0211 other engineering and technologies
02 engineering and technology
01 natural sciences
Focus (linguistics)
Similarity (psychology)
Quality (business)
String metric
Function (engineering)
Geomarketing
021101 geological & geomatics engineering
0105 earth and related environmental sciences
media_common
Subjects
Details
- ISBN :
- 978-3-030-45438-8
- ISBNs :
- 9783030454388
- Database :
- OpenAIRE
- Journal :
- Lecture Notes in Computer Science ISBN: 9783030454388, ECIR (1)
- Accession number :
- edsair.doi...........32784d9bbddb1db97a59452677ef691d
- Full Text :
- https://doi.org/10.1007/978-3-030-45439-5_8