1. Mining and Creating a Software Repositories Dataset
- Author
-
Thai-Bao Do, Bao-Linh L. Mai, Vu Nguyen, and Huu-Nghia H. Nguyen
- Subjects
Thesaurus (information retrieval) ,Software ,Future studies ,business.industry ,Computer science ,business ,Data science ,Mining software repositories ,Derived Data - Abstract
Mining software repositories to extract meaningful information from them has become an important topic in software engineering. This paper presents our study to mine a very large dataset consisting of over three million software repositories across many version control systems and create derived data for future studies. Through this study, we propose a method for detecting forks and duplicates in repositories. We also preliminarily investigate the possible correlations between forking patterns, software health and risks, and success indicators.
- Published
- 2020
- Full Text
- View/download PDF