Back to Search Start Over

Enhancing Predictive Power of Cluster-Boosted Regression With Text-Based Indexing

Authors :
Wutthipong Kongburan
Mark Chignell
Nipon Charoenkitkarn
Jonathan H. Chan
Source :
IEEE Access, Vol 7, Pp 43394-43405 (2019)
Publication Year :
2019
Publisher :
IEEE, 2019.

Abstract

Clustering prior to regression analysis improves the accuracy of prediction in clinical decision making. However, most previously described methods focused on numerical data only. This paper investigated how well textual features can improve the accuracy of regression predictions. Preliminary diagnosis, diagnosis summary, and drug names used in prescriptions as provided in the MIMIC II dataset were used to derive textual features. We proposed the bag-of-entities indexing method, which relies on named entity recognition, a machine learning technique used for locating and identifying words into predefined classes. The proposed technique captured meaningful phrases from texts in health records and represented them in numerical vector format. Dimensionality of the data space was reduced using principal component analysis. The additional well-tuned textual features were then combined with existing numerical features in using cluster-boosted regression to predict patient mortality in ICU. The experimental results showed prediction improvement obtained from textual features over the use of numerical features only. We found that using the proposed indexing method outperformed traditional word-vector representation approaches (bag-of-words and bag-of-bigrams) as well as a state-of-the-art approach (Doc2vec) in terms of resulting accuracy in predicting death status. Moreover, instead of directly interpreting, the identifiable individual features were grouped into types and summarized. The summarized de-identified data of textual features handled by the proposed framework can support predictive classification while also reducing privacy concerns. Grouping of similar patients based on their electronic health records also benefits physicians through the improved differential diagnosis and effective treatment planning.

Details

Language :
English
ISSN :
21693536
Volume :
7
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.7b7bde3dda92479c9d0a5008a1734640
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2019.2908032