Back to Search Start Over

An Unsupervised Error Detection Methodology for Detecting Mislabels in Healthcare Analytics

Authors :
Pei-Yuan Zhou
Faith Lum
Tony Jiecao Wang
Anubhav Bhatti
Surajsinh Parmar
Chen Dan
Andrew K. C. Wong
Source :
Bioengineering, Vol 11, Iss 8, p 770 (2024)
Publication Year :
2024
Publisher :
MDPI AG, 2024.

Abstract

Medical datasets may be imbalanced and contain errors due to subjective test results and clinical variability. The poor quality of original data affects classification accuracy and reliability. Hence, detecting abnormal samples in the dataset can help clinicians make better decisions. In this study, we propose an unsupervised error detection method using patterns discovered by the Pattern Discovery and Disentanglement (PDD) model, developed in our earlier work. Applied to the large data, the eICU Collaborative Research Database for sepsis risk assessment, the proposed algorithm can effectively discover statistically significant association patterns, generate an interpretable knowledge base for interpretability, cluster samples in an unsupervised learning manner, and detect abnormal samples from the dataset. As shown in the experimental result, our method outperformed K-Means by 38% on the full dataset and 47% on the reduced dataset for unsupervised clustering. Multiple supervised classifiers improve accuracy by an average of 4% after removing abnormal samples by the proposed error detection approach. Therefore, the proposed algorithm provides a robust and practical solution for unsupervised clustering and error detection in healthcare data.

Details

Language :
English
ISSN :
23065354
Volume :
11
Issue :
8
Database :
Directory of Open Access Journals
Journal :
Bioengineering
Publication Type :
Academic Journal
Accession number :
edsdoj.f6b6e9dbed4947f18549ee1f2e8e3bdd
Document Type :
article
Full Text :
https://doi.org/10.3390/bioengineering11080770