Back to Search Start Over

Application of machine reading comprehension techniques for named entity recognition in materials science

Authors :
Zihui Huang
Liqiang He
Yuhang Yang
Andi Li
Zhiwen Zhang
Siwei Wu
Yang Wang
Yan He
Xujie Liu
Source :
Journal of Cheminformatics, Vol 16, Iss 1, Pp 1-10 (2024)
Publication Year :
2024
Publisher :
BMC, 2024.

Abstract

Abstract Materials science is an interdisciplinary field that studies the properties, structures, and behaviors of different materials. A large amount of scientific literature contains rich knowledge in the field of materials science, but manually analyzing these papers to find material-related data is a daunting task. In information processing, named entity recognition (NER) plays a crucial role as it can automatically extract entities in the field of materials science, which have significant value in tasks such as building knowledge graphs. The typically used sequence labeling methods for traditional named entity recognition in material science (MatNER) tasks often fail to fully utilize the semantic information in the dataset and cannot effectively extract nested entities. Herein, we proposed to convert the sequence labeling task into a machine reading comprehension (MRC) task. MRC method effectively can solve the challenge of extracting multiple overlapping entities by transforming it into the form of answering multiple independent questions. Moreover, the MRC framework allows for a more comprehensive understanding of the contextual information and semantic relationships within materials science literature, by integrating prior knowledge from queries. State-of-the-art (SOTA) performance was achieved on the Matscholar, BC4CHEMD, NLMChem, SOFC, and SOFC-Slot datasets, with F1-scores of 89.64%, 94.30%, 85.89%, 85.95%, and 71.73%, respectively in MRC approach. By effectively utilizing semantic information and extracting nested entities, this approach holds great significance for knowledge extraction and data analysis in the field of materials science, and thus accelerating the development of material science. Scientific contribution We have developed an innovative NER method that enhances the efficiency and accuracy of automatic entity extraction in the field of materials science by transforming the sequence labeling task into a MRC task, this approach provides robust support for constructing knowledge graphs and other data analysis tasks.

Details

Language :
English
ISSN :
17582946
Volume :
16
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Journal of Cheminformatics
Publication Type :
Academic Journal
Accession number :
edsdoj.39d66da1f66841ed9fc82acc8a585694
Document Type :
article
Full Text :
https://doi.org/10.1186/s13321-024-00874-5