Back to Search
Start Over
The pattern of name tokens in narrative clinical text and a comparison of five systems for redacting them
- Source :
- Journal of the American Medical Informatics Association : JAMIA
- Publication Year :
- 2013
- Publisher :
- BMJ Publishing Group, 2013.
-
Abstract
- Objective To understand the factors that influence success in scrubbing personal names from narrative text. Materials and methods We developed a scrubber, the NLM Name Scrubber (NLM-NS), to redact personal names from narrative clinical reports, hand tagged words in a set of gold standard narrative reports as personal names or not, and measured the scrubbing success of NLM-NS and that of four other scrubbing/name recognition tools (MIST, MITdeid, LingPipe, and ANNIE/GATE) against the gold standard reports. We ran three comparisons which used increasingly larger name lists. Results The test reports contained more than 1 million words, of which 2388 were patient and 20 160 were provider name tokens. NLM-NS failed to scrub only 2 of the 2388 instances of patient name tokens. Its sensitivity was 0.999 on both patient and provider name tokens and missed fewer instances of patient name tokens in all comparisons with other scrubbers. MIST produced the best all token specificity and F-measure for name instances in our most relevant study (study 2), with values of 0.997 and 0.938, respectively. In that same comparison, NLM-NS was second best, with values of 0.986 and 0.748, respectively, and MITdeid was a close third, with values of 0.985 and 0.796 respectively. With the addition of the Clinical Center name list to their native name lists, Ling Pipe, MITdeid, MIST, and ANNIE/GATE all improved substantially. MITdeid and Ling Pipe gained the most—reaching patient name sensitivity of 0.995 (F-measure=0.705) and 0.989 (F-measure=0.386), respectively. Discussion The privacy risk due to two name tokens missed by NLM-NS was statistically negligible, since neither individual could be distinguished among more than 150 000 people listed in the US Social Security Registry. Conclusions The nature and size of name lists have substantial influences on scrubbing success. The use of very large name lists with frequency statistics accounts for much of NLM-NS scrubbing success.
- Subjects :
- Computer science
Electronic Medical Records
Health Informatics
computer.software_genre
Security token
Research and Applications
PHI
Patient name
World Wide Web
Electronic Health Records
Humans
Names
Narrative
Set (psychology)
Natural Language Processing
De-Identification
Chart Research
National Library of Medicine (U.S.)
business.industry
Narrative text
De-identification
United States
Test (assessment)
Artificial intelligence
business
computer
Natural language processing
Confidentiality
Subjects
Details
- Language :
- English
- ISSN :
- 1527974X and 10675027
- Volume :
- 21
- Issue :
- 3
- Database :
- OpenAIRE
- Journal :
- Journal of the American Medical Informatics Association : JAMIA
- Accession number :
- edsair.doi.dedup.....21577fed80bc6f293e5945506c8b4dce