Back to Search Start Over

Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data

Authors :
Vossen, Piek
Ilievski, Filip
Postma, Marten
Segers, R.H.
Isahara, Hitoshi
Maegaard, Bente
Piperidis, Stelios
Cieri, Christopher
Declerck, Thierry
Hasida, Koiti
Mazo, Helene
Choukri, Khalid
Goggi, Sara
Mariani, Joseph
Moreno, Asuncion
Calzolari, Nicoletta
Odijk, Jan
Tokunaga, Takenobu
Language
Network Institute
Source :
[Proceedings of the] Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 3034-3042, STARTPAGE=3034;ENDPAGE=3042;TITLE=[Proceedings of the] Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Publication Year :
2018
Publisher :
LREC, 2018.

Abstract

In this paper, we present a new method to obtain large volumes of high-quality text corpora with event data for studying identity and reference relations. We report on the current methods to create event reference data by annotating texts and deriving the event data a posteriori. Our method starts from event registries in which event data is defined a priori. From this data, we extract so-called Microworlds of referential data with the Reference Texts that report on these events. This makes it possible to easily establish referential relations with high precision and at a large scale. In a pilot, we successfully obtained data from these resources with extreme ambiguity and variation, while maintaining the identity and reference relations and without having to annotate large quantities of texts word-by-word. The data from this pilot was annotated using an annotation tool created specifically in order to validate our method and to enrich the reference texts with event coreference annotations. This annotation process resulted in the Gun Violence Corpus, whose development process and outcome are described in this paper.

Details

Language :
English
Database :
OpenAIRE
Journal :
[Proceedings of the] Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 3034-3042, STARTPAGE=3034;ENDPAGE=3042;TITLE=[Proceedings of the] Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Accession number :
edsair.narcis........be5885c774ec677956b7a84f649cdc0b