Back to Search Start Over

A novel attribute reduction method with constraints on empirical risk and decision rule length.

Authors :
Zhang, Xiaoxia
Zhang, Penghao
Liu, Yanjun
Wang, Guoyin
Source :
Information Sciences. Jun2024, Vol. 670, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Attribute reduction is a crucial issue of rough set theory and has been applied to various fields. It aims to remove useless or redundant features from data and extract precise rules. Traditional attribute reduction methods have obvious limitations, that is, it can not guarantee the generalization ability of reduced decision rules to classification but can only guarantee the classification performance to visible objects. To overcome this issue and enhance the model classification generalization ability without increasing rules' complexity based attribute reduction, we propose a novel attribute method by employing the Structural Risk Minimization principle, which is a classic method in machine learning to balance model complexity and performance. The improved attribute reduction method with rough set tries to get a trade-off between the number of features and empirical error, wherein the former is defined by rule confidence, and the latter is explored by the mutual information between the condition attribute subset and the decision attribute. In other words, model accuracy can be regarded as an empirical error term to measure the model's classification or prediction performance, and the model complexity as a penalty term to characterize the size of the reduction subset. To implement this approach, genetic algorithm is employed as a heuristic search technique, to obtain the optimal reduction subset. Several comparative experiments are performed on ten UCI datasets to evaluate the model's classification accuracy and the size of the reduction subset. The experimental results indicate the method proposed in this paper with better generalization ability compared to other traditional algorithms under the condition of equal reducted set length. • We propose a new attribute reduction method by considering the model accuracy and complexity simultaneously. • We use rule confidence and mutual information to define the model's accuracy and complexity, respectively. • We employ the genetic algorithm as a heuristic search technique to obtain the optimal reducted subset. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00200255
Volume :
670
Database :
Academic Search Index
Journal :
Information Sciences
Publication Type :
Periodical
Accession number :
177026781
Full Text :
https://doi.org/10.1016/j.ins.2024.120552