Back to Search
Start Over
A study of BERT-based methods for formal citation identification of scientific data.
- Source :
- Scientometrics; Nov2023, Vol. 128 Issue 11, p5865-5881, 17p
- Publication Year :
- 2023
-
Abstract
- A study on scientific data citation is crucial to promote data sharing and is the basis for the examination of scientific data measurement and analysis. To this end, it is necessary to identify and label data reference information. Currently, there are many supervised methods for entity recognition and relationship extraction of diseases, drugs, proteins, symptoms, etc., but they have not discussed the effectiveness of scientific data recognition. To fill this gap, the effectiveness of the classical machine learning model and the deep learning model on recognizing scientific data citation are discussed in this study. In experiments, this study took the full text of scientific and technical papers as the research object, conducted annotated citation classification based on rules and manual recognition of their references to form a dataset. The results of the empirical study showed that: (1) the methods used in this paper can achieve automatic identification and extraction of data citations and can address the problem of automating the construction of citation relationships between scientific and technical literature and scientific data; (2) the BERT-based models have the optimal effectiveness in the recognition task of scientific data citation, especially the BioBERT and SciBERT; (3) the full-text information has a crucial impact on the recognition results. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 01389130
- Volume :
- 128
- Issue :
- 11
- Database :
- Complementary Index
- Journal :
- Scientometrics
- Publication Type :
- Academic Journal
- Accession number :
- 173237488
- Full Text :
- https://doi.org/10.1007/s11192-023-04833-z