1. Anaphoric reference in clinical reports: Characteristics of an annotated corpus
- Author
-
Rebecca S. Crowley, Wendy W. Chapman, Melissa Tharp, Guergana Savova, and Jiaping Zheng
- Subjects
Computer science ,Health Informatics ,02 engineering and technology ,computer.software_genre ,03 medical and health sciences ,0302 clinical medicine ,0202 electrical engineering, electronic engineering, information engineering ,Data Mining ,Electronic Health Records ,Humans ,030212 general & internal medicine ,Set (psychology) ,Referring expression ,Information retrieval ,business.industry ,Anaphora (linguistics) ,Natural language processing ,Clinical reports ,Resolution (logic) ,Noun phrase ,Semantics ,Computer Science Applications ,Antecedent (grammar) ,Domain knowledge ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Algorithms ,Sentence - Abstract
Graphical abstractDisplay Omitted Highlights? Annotated 180 clinical reports to indicate anaphor-antecedent pairs. ? Identity was the most frequent relation, with set/subset and part/whole too. ? Accurate resolution will require extensive domain knowledge. ? Annotations can be used to develop anaphoric resolution algorithms. MotivationExpressions that refer to a real-world entity already mentioned in a narrative are often considered anaphoric. For example, in the sentence "The pain comes and goes," the expression "the pain" is probably referring to a previous mention of pain. Interpretation of meaning involves resolving the anaphoric reference: deciding which expression in the text is the correct antecedent of the referring expression, also called an anaphor. We annotated a set of 180 clinical reports (surgical pathology, radiology, discharge summaries, and emergency department) from two institutions to indicate all anaphor-antecedent pairs. ObjectiveThe objective of this study is to describe the characteristics of the corpus in terms of the frequency of anaphoric relations, the syntactic and semantic nature of the members of the pairs, and the types of anaphoric relations that occur. Understanding how anaphoric reference is exhibited in clinical reports is critical to developing reference resolution algorithms and to identifying peculiarities of clinical text that may alter the features and methodologies that will be successful for automated anaphora resolution. ResultsWe found that anaphoric reference is prevalent in all types of clinical reports, that annotations of noun phrases, semantic type, and section headings may be especially important for automated resolution of anaphoric reference, and that separate modules for reference resolution may be required for different report types, different institutions, and different types of anaphors. Accurate resolution will probably require extensive domain knowledge-especially for pathology and radiology reports with more part/whole and set/subset relations. ConclusionWe hope researchers will leverage the annotations in this corpus to develop automated algorithms and will add to the annotations to generate a more extensive corpus.
- Published
- 2012
- Full Text
- View/download PDF