1. ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction
- Author
-
Keshavarz, Hossein and Nagappan, Meiyappan
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Programming Languages - Abstract
In this paper, we present ApacheJIT, a large dataset for Just-In-Time defect prediction. ApacheJIT consists of clean and bug-inducing software changes in popular Apache projects. ApacheJIT has a total of 106,674 commits (28,239 bug-inducing and 78,435 clean commits). Having a large number of commits makes ApacheJIT a suitable dataset for machine learning models, especially deep learning models that require large training sets to effectively generalize the patterns present in the historical data to future data.
- Published
- 2022