Back to Search Start Over

Event Geoparser with Pseudo-Location Entity Identification and Numerical Argument Extraction Implementation and Evaluation in Indonesian News Domain

Authors :
Agung Dewandaru
Dwi Hendratmo Widyantoro
Saiful Akbar
Source :
ISPRS International Journal of Geo-Information, Vol 9, Iss 12, p 712 (2020)
Publication Year :
2020
Publisher :
MDPI AG, 2020.

Abstract

Geoparser is a fundamental component of a Geographic Information Retrieval (GIR) geoparser, which performs toponym recognition, disambiguation, and geographic coordinate resolution from unstructured text domain. However, geoparsing of news articles which report several events across many place-mentions in the document are not yet adequately handled by regular geoparser, where the scope of resolution is either toponym-level or document-level. The capacity to detect multiple events and geolocate their true coordinates along with their numerical arguments is still missing from modern geoparsers, much less in Indonesian news corpora domain. We propose an event geoparser model with three stages of processing, which tightly integrates event extraction model into geoparsing and provides precise event-level resolution scope. The model casts the geotagging and event extraction as sequence labeling and uses LSTM-CRF inferencer equipped with features derived using Aggregated Topic Model from a large corpus to increase the generalizability. Throughout the proposed workflow and features, the geoparser is able to significantly improve the identification of pseudo-location entities, resulting in a 23.43% increase for weighted F1 score compared to baseline gazetteer and POS Tag features. As a side effect of event extraction, various numerical arguments are also extracted, and the output is easily projected to a rich choropleth map from a single news document.

Details

Language :
English
ISSN :
22209964
Volume :
9
Issue :
12
Database :
Directory of Open Access Journals
Journal :
ISPRS International Journal of Geo-Information
Publication Type :
Academic Journal
Accession number :
edsdoj.53d7cf705ba4498bb6791969dc469855
Document Type :
article
Full Text :
https://doi.org/10.3390/ijgi9120712