Back to Search Start Over

Reconstructing the training data set based on reducing boundary complexity.

Authors :
Ghaffari, Hamidreza
Rafeie, Farzaneh
Source :
Computing; Apr2024, Vol. 106 Issue 4, p1099-1119, 21p
Publication Year :
2024

Abstract

One of the most important tasks in classification is to design a high-precision classifier that has a small training and testing time. Classifier such as support vector machine is an efficient and well-known classifier. However, in the face of large data sets, it has a problem with performance. Another major problem with most data sets is the high boundary complexity among classes, which will reduce generalizability. To address these problems, a novel strategy of large-scale data classification is proposed. A simple and useful method is presented to reconstruct the training data set. The goal is to speed up the classifier with minimal impact on performance. It has three phases: in the first step, we try to select the appropriate representatives for the data; in the second step, the data causing the complexity of the boundary will be removed; in the third step, the classification is performed on the new data set. The proposed method is tested with 23 datasets and comparatively evaluated against five of the most successful instance-based condensation algorithms. Experiments showed that despite the simplicity of the proposed method, its performance is better than other methods presented in the research literature. [ABSTRACT FROM AUTHOR]

Subjects

Subjects :
BIG data
SUPPORT vector machines

Details

Language :
English
ISSN :
0010485X
Volume :
106
Issue :
4
Database :
Complementary Index
Journal :
Computing
Publication Type :
Academic Journal
Accession number :
176250337
Full Text :
https://doi.org/10.1007/s00607-022-01124-y