An Empirical Test of GRUs and Deep Contextualized Word Representations on De- Identification.

Authors :: Lee, Kahyun
Filannino, Michele
Uzuner, Özlem
Source :: Studies in Health Technology & Informatics; 2019, Vol. 264, p218-222, 5p, 8 Charts
Publication Year :: 2019
Abstract: De-identification aims to remove 18 categories of protected health information from electronic health records. Ideally, deidentification systems should be reliable and generalizable. Previous research has focused on improving performance but has not examined generalizability. This paper investigates both performance and generalizability. To improve current state-ofthe- art performance based on long short-term memory (LSTM) units, we introduce a system that uses gated recurrent units (GRUs) and deep contextualized word representations, both of which have never been applied to de-identification. We measure performance and generalizability of each system using the 2014 i2b2/UTHealth and 2016 CEGS N-GRID deidentification datasets. We show that deep contextualized word representations improve state-of-the-art performance, while the benefit of switching LSTM units with GRUs is not significant. The generalizability of de-identification system significantly improved with deep contextualized word representations; in addition, LSTM units-based system is more generalizable than the GRUs-based system. [ABSTRACT FROM AUTHOR]

Full Text Access

Tools