1. Contributions to Clinical Named Entity Recognition in Portuguese
- Author
-
Cesar Teixeira, Hugo Gonçalo Oliveira, and Fábio Lopes
- Subjects
020205 medical informatics ,Computer science ,business.industry ,02 engineering and technology ,computer.software_genre ,language.human_language ,Task (project management) ,Information extraction ,Named-entity recognition ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,Portuguese ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
Having in mind that different languages might present different challenges, this paper presents the following contributions to the area of Information Extraction from clinical text, targeting the Portuguese language: a collection of 281 clinical texts in this language, with manually-annotated named entities; word embeddings trained in a larger collection of similar texts; results of using BiLSTM-CRF neural networks for named entity recognition on the annotated collection, including a comparison of using in-domain or out-of-domain word embeddings in this task. Although learned with much less data, performance is higher when using in-domain embeddings. When tested in 20 independent clinical texts, this model achieved better results than a model using larger out-of-domain embeddings.
- Published
- 2019
- Full Text
- View/download PDF