Reconstructing the training data set based on reducing boundary complexity.

Authors :: Ghaffari, Hamidreza
Rafeie, Farzaneh
Source :: Computing; Apr2024, Vol. 106 Issue 4, p1099-1119, 21p
Publication Year :: 2024
Abstract: One of the most important tasks in classification is to design a high-precision classifier that has a small training and testing time. Classifier such as support vector machine is an efficient and well-known classifier. However, in the face of large data sets, it has a problem with performance. Another major problem with most data sets is the high boundary complexity among classes, which will reduce generalizability. To address these problems, a novel strategy of large-scale data classification is proposed. A simple and useful method is presented to reconstruct the training data set. The goal is to speed up the classifier with minimal impact on performance. It has three phases: in the first step, we try to select the appropriate representatives for the data; in the second step, the data causing the complexity of the boundary will be removed; in the third step, the classification is performed on the new data set. The proposed method is tested with 23 datasets and comparatively evaluated against five of the most successful instance-based condensation algorithms. Experiments showed that despite the simplicity of the proposed method, its performance is better than other methods presented in the research literature. [ABSTRACT FROM AUTHOR]