Back to Search
Start Over
De-identifying Spanish medical texts-named entity recognition applied to radiology reports
- Source :
- Journal of Biomedical Semantics, r-CIPF. Repositorio Institucional Producción Científica del Centro de Investigación Principe Felipe (CIPF), instname, r-FISABIO. Repositorio Institucional de Producción Científica, r-CIPF: Repositorio Institucional Producción Científica del Centro de Investigación Principe Felipe (CIPF), Centro de Investigación Principe Felipe (CIPF), r-FISABIO: Repositorio Institucional de Producción Científica, Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunitat Valenciana (FISABIO), Journal of Biomedical Semantics, Vol 12, Iss 1, Pp 1-13 (2021), CEU Repositorio Institucional, Fundación Universitaria San Pablo CEU (FUSPCEU)
- Publication Year :
- 2021
- Publisher :
- BioMed Central, 2021.
-
Abstract
- Background Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several anonymization strategies for the English language, they are also language-dependent. Here, we introduce a named entity recognition strategy for Spanish medical texts, translatable to other languages. Results We tested 4 neural networks on our radiology reports dataset, achieving a recall of 97.18% of the identifying entities. Alongside, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data. The three best architectures were tested with the MEDDOCAN challenge dataset of electronic health records as an external test, achieving a recall of 69.18%. Conclusions The strategy proposed, combining named entity recognition tasks with randomization of entities, is suitable for Spanish radiology reports. It does not require a big training corpus, thus it could be easily extended to other languages and medical texts, such as electronic health records.
- Subjects :
- Medical staff
020205 medical informatics
Computer science
Radiología
02 engineering and technology
English language
030204 cardiovascular system & hematology
Health records
Spanish
computer.software_genre
030218 nuclear medicine & medical imaging
0302 clinical medicine
0202 electrical engineering, electronic engineering, information engineering
Electronic Health Records
Natural Language Processing | Named Entity Recognition | radiology reports | medical texts | Spanish
Language
Proceso de lenguaje natural
0303 health sciences
Diagnosis, Radioscopic
Artificial neural network
Computer Science Applications
Test (assessment)
lcsh:R858-859.7
Radiology
Personally identifiable information
Information Systems
medicine.medical_specialty
Computer Networks and Communications
Medical texts
Health Informatics
Diagnóstico radiológico
Data protection
lcsh:Computer applications to medicine. Medical informatics
Natural lenguage processing
Synthetic data
03 medical and health sciences
Protección de datos personales
Named-entity recognition
medicine
Humans
030304 developmental biology
Recall
Research
Natural language processing
Named entity recognition
Radiology reports
Neural Networks, Computer
computer
Subjects
Details
- ISSN :
- 20411480
- Database :
- OpenAIRE
- Journal :
- Journal of Biomedical Semantics, r-CIPF. Repositorio Institucional Producción Científica del Centro de Investigación Principe Felipe (CIPF), instname, r-FISABIO. Repositorio Institucional de Producción Científica, r-CIPF: Repositorio Institucional Producción Científica del Centro de Investigación Principe Felipe (CIPF), Centro de Investigación Principe Felipe (CIPF), r-FISABIO: Repositorio Institucional de Producción Científica, Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunitat Valenciana (FISABIO), Journal of Biomedical Semantics, Vol 12, Iss 1, Pp 1-13 (2021), CEU Repositorio Institucional, Fundación Universitaria San Pablo CEU (FUSPCEU)
- Accession number :
- edsair.doi.dedup.....86afc9e027f4e30512d3a28c61bfe4d4