Back to Search Start Over

A method for the development of disease-specific reference standards vocabularies from textual biomedical literature resources.

Authors :
Wang L
Bray BE
Shi J
Del Fiol G
Haug PJ
Source :
Artificial intelligence in medicine [Artif Intell Med] 2016 Mar; Vol. 68, pp. 47-57. Date of Electronic Publication: 2016 Feb 27.
Publication Year :
2016

Abstract

Objective: Disease-specific vocabularies are fundamental to many knowledge-based intelligent systems and applications like text annotation, cohort selection, disease diagnostic modeling, and therapy recommendation. Reference standards are critical in the development and validation of automated methods for disease-specific vocabularies. The goal of the present study is to design and test a generalizable method for the development of vocabulary reference standards from expert-curated, disease-specific biomedical literature resources.<br />Methods: We formed disease-specific corpora from literature resources like textbooks, evidence-based synthesized online sources, clinical practice guidelines, and journal articles. Medical experts annotated and adjudicated disease-specific terms in four classes (i.e., causes or risk factors, signs or symptoms, diagnostic tests or results, and treatment). Annotations were mapped to UMLS concepts. We assessed source variation, the contribution of each source to build disease-specific vocabularies, the saturation of the vocabularies with respect to the number of used sources, and the generalizability of the method with different diseases.<br />Results: The study resulted in 2588 string-unique annotations for heart failure in four classes, and 193 and 425 respectively for pulmonary embolism and rheumatoid arthritis in treatment class. Approximately 80% of the annotations were mapped to UMLS concepts. The agreement among heart failure sources ranged between 0.28 and 0.46. The contribution of these sources to the final vocabulary ranged between 18% and 49%. With the sources explored, the heart failure vocabulary reached near saturation in all four classes with the inclusion of minimal six sources (or between four to seven sources if only counting terms occurred in two or more sources). It took fewer sources to reach near saturation for the other two diseases in terms of the treatment class.<br />Conclusions: We developed a method for the development of disease-specific reference vocabularies. Expert-curated biomedical literature resources are substantial for acquiring disease-specific medical knowledge. It is feasible to reach near saturation in a disease-specific vocabulary using a relatively small number of literature sources.<br /> (Published by Elsevier B.V.)

Subjects

Subjects :
Humans
Vocabulary, Controlled

Details

Language :
English
ISSN :
1873-2860
Volume :
68
Database :
MEDLINE
Journal :
Artificial intelligence in medicine
Publication Type :
Academic Journal
Accession number :
26971304
Full Text :
https://doi.org/10.1016/j.artmed.2016.02.003