Back to Search
Start Over
Boosting Commit Classification Based on Multivariate Mixed Features and Heterogeneous Classifier Selection.
- Source :
- International Journal of Software Engineering & Knowledge Engineering; Dec2024, Vol. 34 Issue 12, p1949-1970, 22p
- Publication Year :
- 2024
-
Abstract
- Commit classification plays a crucial role in software maintenance, as it permits developers to make informed decisions regarding resource allocation and code review. There are several approaches for automatic commit classification, yet they do not sufficiently explore the features related to commits and consider the advantages of ensemble models over individual models. Therefore, there is some room for improvement. In this paper, we propose MuheCC, a commit classification approach based on multivariate mixed features and heterogeneous classifier selection to address these challenges. It mainly consists of three phases: (1) Multivariate mixed feature extraction, which extracts features from commit messages, changed code and handcrafted features to construct comprehensive mixed features; (2) Hyperparameter tuning based on genetic algorithm, which utilizes genetic algorithm to optimize candidate traditional models and ensemble models; (3) Heterogeneous classifier selection, which selects the optimal combinations of traditional and ensemble models, respectively, to build a heterogeneous classifier for commit classification. To evaluate this approach, we extend an existing dataset with code changes for each commit and compare MuheCC with three baselines on this real-world dataset. The results show that MuheCC outperforms all baselines, especially improving the best baseline by 7.25% for accuracy, 6.88% for precision, 7.25% for recall and 7.06% for F 1 -score. Furthermore, the ablation experiments validate that the performance advantage of MuheCC is mainly attributed to the multivariate features (e.g. 12.55% contributions to accuracy) and the heterogeneous classifier (e.g. 12.26% contributions to accuracy). We further discuss the impact of hyperparameter tuning and heterogeneous classifier selection on the performance of MuheCC. These results prove the superiority and potential practical value of MuheCC. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 02181940
- Volume :
- 34
- Issue :
- 12
- Database :
- Complementary Index
- Journal :
- International Journal of Software Engineering & Knowledge Engineering
- Publication Type :
- Academic Journal
- Accession number :
- 181553471
- Full Text :
- https://doi.org/10.1142/S021819402450044X