Back to Search Start Over

Boosting Commit Classification Based on Multivariate Mixed Features and Heterogeneous Classifier Selection.

Authors :
Wu, Yuhan
Li, Yingling
Wang, Ziao
Tan, Qushan
Liu, Jing
Jiang, Yuao
Source :
International Journal of Software Engineering & Knowledge Engineering; Dec2024, Vol. 34 Issue 12, p1949-1970, 22p
Publication Year :
2024

Abstract

Commit classification plays a crucial role in software maintenance, as it permits developers to make informed decisions regarding resource allocation and code review. There are several approaches for automatic commit classification, yet they do not sufficiently explore the features related to commits and consider the advantages of ensemble models over individual models. Therefore, there is some room for improvement. In this paper, we propose MuheCC, a commit classification approach based on multivariate mixed features and heterogeneous classifier selection to address these challenges. It mainly consists of three phases: (1) Multivariate mixed feature extraction, which extracts features from commit messages, changed code and handcrafted features to construct comprehensive mixed features; (2) Hyperparameter tuning based on genetic algorithm, which utilizes genetic algorithm to optimize candidate traditional models and ensemble models; (3) Heterogeneous classifier selection, which selects the optimal combinations of traditional and ensemble models, respectively, to build a heterogeneous classifier for commit classification. To evaluate this approach, we extend an existing dataset with code changes for each commit and compare MuheCC with three baselines on this real-world dataset. The results show that MuheCC outperforms all baselines, especially improving the best baseline by 7.25% for accuracy, 6.88% for precision, 7.25% for recall and 7.06% for F 1 -score. Furthermore, the ablation experiments validate that the performance advantage of MuheCC is mainly attributed to the multivariate features (e.g. 12.55% contributions to accuracy) and the heterogeneous classifier (e.g. 12.26% contributions to accuracy). We further discuss the impact of hyperparameter tuning and heterogeneous classifier selection on the performance of MuheCC. These results prove the superiority and potential practical value of MuheCC. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02181940
Volume :
34
Issue :
12
Database :
Complementary Index
Journal :
International Journal of Software Engineering & Knowledge Engineering
Publication Type :
Academic Journal
Accession number :
181553471
Full Text :
https://doi.org/10.1142/S021819402450044X