Back to Search Start Over

The Colorado Richly Annotated Full Text (CRAFT) Corpus: Multi-Model Annotation In The Biomedical Domain

Authors :
Kevin Bretonnel Cohen
Karin Verspoor
Karën Fort
Christopher Funk
Michael Bada
Martha Palmer
Lawrence Hunter
University of Colorado [Boulder]
University of Melbourne
Sens, Texte, Informatique, Histoire (STIH)
Université Paris-Sorbonne (UP4)
Fort, Karën
Source :
Handbook of Linguistic Annotation, Handbook of Linguistic Annotation, 2016, HAL
Publication Year :
2016
Publisher :
HAL CCSD, 2016.

Abstract

International audience; A major question in linguistics is whether theoretical accounts of the general language work for specific domains. Similarly, in natural language processing, it is clear that general-domain solutions often fail when applied to specialized domains. One such specialized domain, which is increasingly seen as crucial to understanding human biology and disease, is the biomedical domain. For this reason, biomedical corpus construction has been an area of considerable activity in recent years—for example, just in the past five years: (ordered by year of publication and then by first author's last name). Historically, the great majority of work in biomedical natural language processing has been done using abstracts of journal articles. In contrast, the Colorado Richly Annotated Full Text (CRAFT) corpus consists entirely of full-text journal articles. The primary motivation for the annotation project was the accumulating body of evidence indicating that the bodies of journal articles contain much information that is not present in the abstracts, and that the textual and structural characteristics of article bodies are different from those of abstracts [8, 26, 90, 84, 18, 2, 48, 51, 13]. When we began the project, there was no large resource of full-text journal articles for system building or evaluation; other than the CRAFT corpus, this continues to be the case. Earlier projects on full-text biomedical journal articles are typically not manually annotated, and none of them that we are aware of have annotation of linguistic structure.

Details

Language :
English
Database :
OpenAIRE
Journal :
Handbook of Linguistic Annotation, Handbook of Linguistic Annotation, 2016, HAL
Accession number :
edsair.dedup.wf.001..4cded1793d75ff0a2898d7c87644dc54