Back to Search Start Over

'Paper, Meet Code': A Deep Learning Approach to Linking Scholarly Articles With GitHub Repositories

Authors :
Prahyat Puangjaktha
Morakot Choetkiertikul
Suppawong Tuarob
Source :
IEEE Access, Vol 12, Pp 68410-68426 (2024)
Publication Year :
2024
Publisher :
IEEE, 2024.

Abstract

Computer scientists often publish their source code accompanying their publications, prominently using code repositories across various domains. Despite the concurrent existence of scholarly articles and their associated official code repositories, explicit references linking the two are often missing. Traditionally, identifying whether scholarly content and code repositories pertain to the same research project requires manual inspection, a time-consuming task. This paper proposes a deep learning-based algorithm for automatically matching scholarly articles with their corresponding official code repositories. Our findings indicate that the most common linking information includes the paper title and BibTeX entries, typically found in the repository’s readme document. In this study, we employed SPECTER for vector embedding of paper and repository metadata. Utilizing these embedding representations with the Light Gradient Boosting Machine (LGBM), our method achieved an F1 score of 0.94. Moreover, combining our best model with a rule-based approach improved performance by 5.31%. This study successfully delineates a connection between academic papers and associated official code repositories, minimizing reliance on explicit bibliographic information in repositories.

Details

Language :
English
ISSN :
21693536
Volume :
12
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.46d62805cdfd4adf85d5baa6202d1826
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2024.3399767