151. Computer-assisted de-identification of free text in the MIMIC II database
- Author
-
Gari D. Clifford, George B. Moody, Roger G. Mark, Margaret M Douglass, and Andrew T. Reisner
- Subjects
Information privacy ,Database ,Java ,Computer science ,Medical record ,De-identification ,False positive paradox ,Gold standard (test) ,computer.software_genre ,computer ,Personally identifiable information ,computer.programming_language ,Protected health information - Abstract
Medical researchers are legally required to protect patients' privacy by removing personally identifiable information from medical records before sharing the data with other researchers. We present an evaluation of methods for computer-assisted removal and replacement of protected health information (PHI) from free-text nursing notes collected in the intensive care unit as part of the MIMIC II project. A semiautomated method was developed to allow clinicians to highlight PHI on the screen of a tablet PC and to compare and combine the selections of different experts reading the same notes. An analysis of the performance of three human expert de-identifiers and of an automated system is presented. Expert adjudication demonstrated that inter-human variability was high, with few false positives and many false negatives. The sensitivity of human experts working alone ranged from 0.63 to 0.93, with an average of 0.81, and the average positive predictive value was 0.98. An algorithm generated few false negatives but many false positives. Its sensitivity was 0.85, but its positive predictive value was only 0.37. The de-identified database of nursing notes was re-identified with realistic surrogate (but unprotected) dates, serial numbers, names, and phrases to provide a gold standard database of over 2600 notes (approximately 340,000 words) with over 1700 instances of PHI. This reference gold standard database of nursing notes and the Java source code used to evaluate algorithm performance will be made freely available on Physionet in order to facilitate the development and validation of future de-identification algorithms.
- Published
- 2005
- Full Text
- View/download PDF