Back to Search Start Over

De-Identifying Swedish EHR Text Using Public Resources in the General Domain.

Authors :
CHOMUTARE, Taridzo
YIGZAW, Kassaye Yitbarek
BUDRIONIS, Andrius
MAKHLYSHEVA, Alexandra
GODTLIEBSEN, Fred
DALIANIS, Hercules
Source :
Studies in Health Technology & Informatics; 2020, Vol. 270, p148-152, 5p, 1 Chart, 1 Graph
Publication Year :
2020

Abstract

Sensitive data is normally required to develop rule-based or train machine learning-based models for de-identifying electronic health record (EHR) clinical notes; and this presents important problems for patient privacy. In this study,we add non-sensitive public datasets to EHR training data; (i) scientific medical textand (ii) Wikipedia word vectors. The data, all in Swedish, is used to train a deep learning model using recurrent neural networks. Tests on pseudonymized Swedish EHR clinical notes showed improved precision and recall from 55.62% and 80.02%with the base EHR embedding layer, to 85.01% and 87.15% when Wikipedia word vectors are added. These results suggest that non-sensitive text from the general domain can be used to train robust models for de-identifying Swedish clinical text;and this could be useful in cases where the data is both sensitive and in low-resource languages [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09269630
Volume :
270
Database :
Complementary Index
Journal :
Studies in Health Technology & Informatics
Publication Type :
Academic Journal
Accession number :
144555221
Full Text :
https://doi.org/10.3233/SHTI200140