Back to Search Start Over

The Trade-off between Quantity and Quality. Comparing a Large Crawled Corpus and a Small Focused Corpus for Medical Terminology Extraction.

Authors :
Hoste, Veronique
Vanopstal, Klaar
Terryn, Ayla Rigouts
Lefever, Els
Source :
Across Languages & Cultures; 2019, Vol. 20 Issue 2, p197-211, 15p
Publication Year :
2019

Abstract

We investigate the cost-effectiveness of special-purpose crawled corpora versus more focused corpora for automatic terminology extraction (ATE). Our focus is on medical terminology on heart failure for two languages, viz. English for which we have more web and specialized resources at our disposal and the less resourced Dutch. We show that, although term density in the dedicated corpora is larger for both languages, the potential for term extraction is higher in the crawled corpora than in the dedicated corpora. Furthermore, in a set of experiments in which we evaluate both types of corpora, while keeping size constant, we observe that more Gold Standard (GS) terms are covered by the "noisy" crawled corpus than with a dedicated corpus of the same size. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15851923
Volume :
20
Issue :
2
Database :
Complementary Index
Journal :
Across Languages & Cultures
Publication Type :
Academic Journal
Accession number :
139118379
Full Text :
https://doi.org/10.1556/084.2019.20.2.3