Some statistical methods for evaluating information extraction systems

Authors :: Gary King
Will Lowe
Source :: Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing are evaluation methods, metrics and resources reusable? - Evalinitiatives '03.
Publication Year :: 2003
Publisher :: Association for Computational Linguistics, 2003.
Abstract: We present new statistical methods for evaluating information extraction systems. The methods were developed to evaluate a system used by political scientists to extract event information from news leads about international politics. The nature of this data presents two problems for evaluators: 1) the frequency distribution of event types in international event data is strongly skewed, so a random sample of newsleads will typically fail to contain any low frequency events. 2) Manual information extraction necessary to create evaluation sets is costly, and most effort is wasted coding high frequency categories. We present an evaluation scheme that overcomes these problems with considerably less manual effort than traditional methods, and also allows us to interpret an information extraction system as an estimator (in the statistical sense) and to estimate its bias.

Subjects :: Information extraction
Event data
Computer science
Estimator
Data mining
computer.software_genre
computer
Coding (social sciences)

Database :: OpenAIRE
Journal :: Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing are evaluation methods, metrics and resources reusable? - Evalinitiatives '03
Accession number :: edsair.doi...........fe4e3264ff7f5271457eb3dc21e052df
Full Text :: https://doi.org/10.3115/1641396.1641400