Start Over

Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics

Authors :: Batista-Navarro, Riza
Rak, Rafal
Ananiadou, Sophia
Source :: Journal of Cheminformatics, Batista-Navarro, R, Rak, R & Ananiadou, S 2015, ' Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics ', Journal of Cheminformatics, vol. 7, no. (Suppl 1): S6, S6 . https://doi.org/10.1186/1758-2946-7-S1-S6
Publication Year :: 2015
Publisher :: BioMed Central, 2015.
Abstract: Background The development of robust methods for chemical named entity recognition, a challenging natural language processing task, was previously hindered by the lack of publicly available, large-scale, gold standard corpora. The recent public release of a large chemical entity-annotated corpus as a resource for the CHEMDNER track of the Fourth BioCreative Challenge Evaluation (BioCreative IV) workshop greatly alleviated this problem and allowed us to develop a conditional random fields-based chemical entity recogniser. In order to optimise its performance, we introduced customisations in various aspects of our solution. These include the selection of specialised pre-processing analytics, the incorporation of chemistry knowledge-rich features in the training and application of the statistical model, and the addition of post-processing rules. Results Our evaluation shows that optimal performance is obtained when our customisations are integrated into the chemical entity recogniser. When its performance is compared with that of state-of-the-art methods, under comparable experimental settings, our solution achieves competitive advantage. We also show that our recogniser that uses a model trained on the CHEMDNER corpus is suitable for recognising names in a wide range of corpora, consistently outperforming two popular chemical NER tools. Conclusion The contributions resulting from this work are two-fold. Firstly, we present the details of a chemical entity recognition methodology that has demonstrated performance at a competitive, if not superior, level as that of state-of-the-art methods. Secondly, the developed suite of solutions has been made publicly available as a configurable workflow in the interoperable text mining workbench Argo. This allows interested users to conveniently apply and evaluate our solutions in the context of other chemical text mining tasks.

Subjects :: Text mining
Sequence labelling
Feature engineering
Research
Configurable workflows
Physical and Theoretical Chemistry
Library and Information Sciences
Conditional random fields
Chemical named entity recognition
Computer Graphics and Computer-Aided Design
Workflow optimisation
Computer Science Applications

Details

Language :: English
ISSN :: 17582946
Volume :: 7
Issue :: Suppl 1
Database :: OpenAIRE
Journal :: Journal of Cheminformatics
Accession number :: edsair.pmid.dedup....c59d6db4d3f091b92b3ee2fdf60c39c1
Full Text :: https://doi.org/10.1186/1758-2946-7-S1-S6

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources