Back to Search Start Over

Closed-domain event extraction for hard news event monitoring: a systematic study

Authors :
David Dukić
Filip Karlo Došilović
Domagoj Pluščec
Jan Šnajder
Source :
PeerJ Computer Science, Vol 10, p e2355 (2024)
Publication Year :
2024
Publisher :
PeerJ Inc., 2024.

Abstract

News event monitoring systems allow real-time monitoring of a large number of events reported in the news, including the urgent and critical events comprising the so-called hard news. These systems heavily rely on natural language processing (NLP) to perform automatic event extraction at scale. While state-of-the-art event extraction models are readily available, integrating them into a news event monitoring system is not as straightforward as it seems due to practical issues related to model selection, robustness, and scale. To address this gap, we present a study on the practical use of event extraction models for news event monitoring. Our study focuses on the key task of closed-domain main event extraction (CDMEE), which aims to determine the type of the story’s main event and extract its arguments from the text. We evaluate a range of state-of-the-art NLP models for this task, including those based on pre-trained language models. Aiming at a more realistic evaluation than done in the literature, we introduce a new dataset manually labeled with event types and their arguments. Additionally, we assess the scalability of CDMEE models and analyze the trade-off between accuracy and inference speed. Our results give insights into the performance of state-of-the-art NLP models on the CDMEE task and provide recommendations for developing effective, robust, and scalable news event monitoring systems.

Details

Language :
English
ISSN :
23765992
Volume :
10
Database :
Directory of Open Access Journals
Journal :
PeerJ Computer Science
Publication Type :
Academic Journal
Accession number :
edsdoj.fed871bbaa54f0da3eb0d918469a58f
Document Type :
article
Full Text :
https://doi.org/10.7717/peerj-cs.2355