Back to Search
Start Over
Automatic detection of protected health information from clinic narratives.
- Source :
-
Journal of biomedical informatics [J Biomed Inform] 2015 Dec; Vol. 58 Suppl, pp. S30-S38. Date of Electronic Publication: 2015 Jul 29. - Publication Year :
- 2015
-
Abstract
- This paper presents a natural language processing (NLP) system that was designed to participate in the 2014 i2b2 de-identification challenge. The challenge task aims to identify and classify seven main Protected Health Information (PHI) categories and 25 associated sub-categories. A hybrid model was proposed which combines machine learning techniques with keyword-based and rule-based approaches to deal with the complexity inherent in PHI categories. Our proposed approaches exploit a rich set of linguistic features, both syntactic and word surface-oriented, which are further enriched by task-specific features and regular expression template patterns to characterize the semantics of various PHI categories. Our system achieved promising accuracy on the challenge test data with an overall micro-averaged F-measure of 93.6%, which was the winner of this de-identification challenge.<br /> (Copyright © 2015 Elsevier Inc. All rights reserved.)
Details
- Language :
- English
- ISSN :
- 1532-0480
- Volume :
- 58 Suppl
- Database :
- MEDLINE
- Journal :
- Journal of biomedical informatics
- Publication Type :
- Academic Journal
- Accession number :
- 26231070
- Full Text :
- https://doi.org/10.1016/j.jbi.2015.06.015