Back to Search Start Over

Utility Preservation of Clinical Text After De-Identification

Authors :
Vakili, Thomas
Dalianis, Hercules
Vakili, Thomas
Dalianis, Hercules
Publication Year :
2022

Abstract

Electronic health records contain valuable information about symptoms, diagnosis, treatment and outcomes of the treatments of individual patients. However, the records may also contain information that can reveal the identity of the patients. Removing these identifiers - the Protected Health Information (PHI) - can protect the identity of the patient. Automatic de-identification is a process which employs machine learning techniques to detect and remove PHI. However, automatic techniques are imperfect in their precision and introduce noise into the data. This study examines the impact of this noise on the utility of Swedish de-identified clinical data by using human evaluators and by training and testing BERT models. Our results indicate that de-identification does not harm the utility for clinical NLP and that human evaluators are less sensitive to noise from de-identification than expected.

Details

Database :
OAIster
Notes :
English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1356421735
Document Type :
Electronic Resource
Full Text :
https://doi.org/10.18653.v1.2022.bionlp-1.38