Back to Search Start Over

EHR-based phenotyping: Bulk learning and evaluation.

Authors :
Chiu PH
Hripcsak G
Source :
Journal of biomedical informatics [J Biomed Inform] 2017 Jun; Vol. 70, pp. 35-51. Date of Electronic Publication: 2017 Apr 12.
Publication Year :
2017

Abstract

In data-driven phenotyping, a core computational task is to identify medical concepts and their variations from sources of electronic health records (EHR) to stratify phenotypic cohorts. A conventional analytic framework for phenotyping largely uses a manual knowledge engineering approach or a supervised learning approach where clinical cases are represented by variables encompassing diagnoses, medicinal treatments and laboratory tests, among others. In such a framework, tasks associated with feature engineering and data annotation remain a tedious and expensive exercise, resulting in poor scalability. In addition, certain clinical conditions, such as those that are rare and acute in nature, may never accumulate sufficient data over time, which poses a challenge to establishing accurate and informative statistical models. In this paper, we use infectious diseases as the domain of study to demonstrate a hierarchical learning method based on ensemble learning that attempts to address these issues through feature abstraction. We use a sparse annotation set to train and evaluate many phenotypes at once, which we call bulk learning. In this batch-phenotyping framework, disease cohort definitions can be learned from within the abstract feature space established by using multiple diseases as a substrate and diagnostic codes as surrogates. In particular, using surrogate labels for model training renders possible its subsequent evaluation using only a sparse annotated sample. Moreover, statistical models can be trained and evaluated, using the same sparse annotation, from within the abstract feature space of low dimensionality that encapsulates the shared clinical traits of these target diseases, collectively referred to as the bulk learning set.<br /> (Copyright © 2017 Elsevier Inc. All rights reserved.)

Details

Language :
English
ISSN :
1532-0480
Volume :
70
Database :
MEDLINE
Journal :
Journal of biomedical informatics
Publication Type :
Academic Journal
Accession number :
28410982
Full Text :
https://doi.org/10.1016/j.jbi.2017.04.009