Back to Search Start Over

A study of BERT-based methods for formal citation identification of scientific data.

Authors :
Yang, Ning
Zhang, Zhiqiang
Huang, Feihu
Source :
Scientometrics; Nov2023, Vol. 128 Issue 11, p5865-5881, 17p
Publication Year :
2023

Abstract

A study on scientific data citation is crucial to promote data sharing and is the basis for the examination of scientific data measurement and analysis. To this end, it is necessary to identify and label data reference information. Currently, there are many supervised methods for entity recognition and relationship extraction of diseases, drugs, proteins, symptoms, etc., but they have not discussed the effectiveness of scientific data recognition. To fill this gap, the effectiveness of the classical machine learning model and the deep learning model on recognizing scientific data citation are discussed in this study. In experiments, this study took the full text of scientific and technical papers as the research object, conducted annotated citation classification based on rules and manual recognition of their references to form a dataset. The results of the empirical study showed that: (1) the methods used in this paper can achieve automatic identification and extraction of data citations and can address the problem of automating the construction of citation relationships between scientific and technical literature and scientific data; (2) the BERT-based models have the optimal effectiveness in the recognition task of scientific data citation, especially the BioBERT and SciBERT; (3) the full-text information has a crucial impact on the recognition results. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01389130
Volume :
128
Issue :
11
Database :
Complementary Index
Journal :
Scientometrics
Publication Type :
Academic Journal
Accession number :
173237488
Full Text :
https://doi.org/10.1007/s11192-023-04833-z