Back to Search Start Over

GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text

Authors :
Cécile Pereira
Xiaolin Li
Qile Zhu
Ana Conesa
Source :
Bioinformatics, BIOINFORMATICS, r-CIPF: Repositorio Institucional Producción Científica del Centro de Investigación Principe Felipe (CIPF), Centro de Investigación Principe Felipe (CIPF), r-CIPF. Repositorio Institucional Producción Científica del Centro de Investigación Principe Felipe (CIPF), instname
Publication Year :
2017
Publisher :
Oxford University Press, 2017.

Abstract

Motivation Best performing named entity recognition (NER) methods for biomedical literature are based on hand-crafted features or task-specific rules, which are costly to produce and difficult to generalize to other corpora. End-to-end neural networks achieve state-of-the-art performance without hand-crafted features and task-specific knowledge in non-biomedical NER tasks. However, in the biomedical domain, using the same architecture does not yield competitive performance compared with conventional machine learning models. Results We propose a novel end-to-end deep learning approach for biomedical NER tasks that leverages the local contexts based on n-gram character and word embeddings via Convolutional Neural Network (CNN). We call this approach GRAM-CNN. To automatically label a word, this method uses the local information around a word. Therefore, the GRAM-CNN method does not require any specific knowledge or feature engineering and can be theoretically applied to a wide range of existing NER problems. The GRAM-CNN approach was evaluated on three well-known biomedical datasets containing different BioNER entities. It obtained an F1-score of 87.26% on the Biocreative II dataset, 87.26% on the NCBI dataset and 72.57% on the JNLPBA dataset. Those results put GRAM-CNN in the lead of the biological NER methods. To the best of our knowledge, we are the first to apply CNN based structures to BioNER problems. Availability and implementation The GRAM-CNN source code, datasets and pre-trained model are available online at: https://github.com/valdersoul/GRAM-CNN. Supplementary information Supplementary data are available at Bioinformatics online.

Details

Language :
English
ISSN :
13674811 and 13674803
Volume :
34
Issue :
9
Database :
OpenAIRE
Journal :
Bioinformatics
Accession number :
edsair.doi.dedup.....f0b90ff6c05120a4eff5027fe1c56883