Back to Search
Start Over
Reconstructing the training data set based on reducing boundary complexity.
- Source :
- Computing; Apr2024, Vol. 106 Issue 4, p1099-1119, 21p
- Publication Year :
- 2024
-
Abstract
- One of the most important tasks in classification is to design a high-precision classifier that has a small training and testing time. Classifier such as support vector machine is an efficient and well-known classifier. However, in the face of large data sets, it has a problem with performance. Another major problem with most data sets is the high boundary complexity among classes, which will reduce generalizability. To address these problems, a novel strategy of large-scale data classification is proposed. A simple and useful method is presented to reconstruct the training data set. The goal is to speed up the classifier with minimal impact on performance. It has three phases: in the first step, we try to select the appropriate representatives for the data; in the second step, the data causing the complexity of the boundary will be removed; in the third step, the classification is performed on the new data set. The proposed method is tested with 23 datasets and comparatively evaluated against five of the most successful instance-based condensation algorithms. Experiments showed that despite the simplicity of the proposed method, its performance is better than other methods presented in the research literature. [ABSTRACT FROM AUTHOR]
- Subjects :
- BIG data
SUPPORT vector machines
Subjects
Details
- Language :
- English
- ISSN :
- 0010485X
- Volume :
- 106
- Issue :
- 4
- Database :
- Complementary Index
- Journal :
- Computing
- Publication Type :
- Academic Journal
- Accession number :
- 176250337
- Full Text :
- https://doi.org/10.1007/s00607-022-01124-y