Back to Search Start Over

Software Fault Prediction Using Optimal Classifier Selection: An Ensemble Approach.

Authors :
Agrawalla, Bikash
Reddy, B Ramachandra
Source :
Procedia Computer Science; 2024, Vol. 235, p2965-2974, 10p
Publication Year :
2024

Abstract

Fault prediction is the process of using data analysis and machine learning models to anticipate potential defects or faults in the software system. Using only the base machine learning models for software fault prediction leads to limited performance, difficulty in handling non-linear relationships and imbalanced data, inadequate feature representation, and limited complexity handling. Hence, in order to overcome these challenges, this paper proposes a new technique for the selection of classifiers that forms a heterogeneous ensemble. The main goal is to remove or trim out the classifiers that show poor performance compared to the other base classifiers, which can result into a more effective ensemble and can produce better results. The algorithm proposed in this paper finds a set of classifiers that can perform better than using all the classifiers. The challenge that was faced was how to identify the poor-performing classifiers. This challenge is dealt with by performing an experiment using different threshold values to choose the trimmed set of classifiers. For evaluation of the proposed model, 8 different benchmark software fault datasets were used, which are taken from PROMISE and the Apache repository, and AUC is used as the performance measure. The results obtained after the experimental analysis demonstrate the effectiveness of our algorithm compared to the traditional approaches, which used all the base classifiers. There is a significant increase in the AUC values for 6 datasets out of 8, while using the average of probabilities and majority voting, it was seen that there is improvement in 7 out of 8 datasets used. The best-performing dataset by using the average of probabilities is ARC, where the AUC values increase from 0.6505 to 0.694, and while using majority voting, the best-performing dataset is XALAN, where the AUC values increase from 0.5455 to 0.679. From this, it can be seen that the proposed ensemble approach achieved higher AUC values for the tested datasets when compared to the base machine learning classifiers. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
18770509
Volume :
235
Database :
Supplemental Index
Journal :
Procedia Computer Science
Publication Type :
Academic Journal
Accession number :
177603860
Full Text :
https://doi.org/10.1016/j.procs.2024.04.280