Back to Search
Start Over
'Paper, Meet Code': A Deep Learning Approach to Linking Scholarly Articles With GitHub Repositories
- Source :
- IEEE Access, Vol 12, Pp 68410-68426 (2024)
- Publication Year :
- 2024
- Publisher :
- IEEE, 2024.
-
Abstract
- Computer scientists often publish their source code accompanying their publications, prominently using code repositories across various domains. Despite the concurrent existence of scholarly articles and their associated official code repositories, explicit references linking the two are often missing. Traditionally, identifying whether scholarly content and code repositories pertain to the same research project requires manual inspection, a time-consuming task. This paper proposes a deep learning-based algorithm for automatically matching scholarly articles with their corresponding official code repositories. Our findings indicate that the most common linking information includes the paper title and BibTeX entries, typically found in the repository’s readme document. In this study, we employed SPECTER for vector embedding of paper and repository metadata. Utilizing these embedding representations with the Light Gradient Boosting Machine (LGBM), our method achieved an F1 score of 0.94. Moreover, combining our best model with a rule-based approach improved performance by 5.31%. This study successfully delineates a connection between academic papers and associated official code repositories, minimizing reliance on explicit bibliographic information in repositories.
Details
- Language :
- English
- ISSN :
- 21693536
- Volume :
- 12
- Database :
- Directory of Open Access Journals
- Journal :
- IEEE Access
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.46d62805cdfd4adf85d5baa6202d1826
- Document Type :
- article
- Full Text :
- https://doi.org/10.1109/ACCESS.2024.3399767