1. Distributed independent vector machine for big data classification problems.
- Author
-
Almaspoor, Mohammad Hassan, Safaei, Ali A., Salajegheh, Afshin, and Minaei-Bidgoli, Behrouz
- Subjects
BIG data ,PATTERN recognition systems ,MNEMONICS ,CLASSIFICATION algorithms ,CLASSIFICATION - Abstract
In recent years, various studies have been conducted on SVMs and their applications in different area. They have been developed significantly in many areas. SVM is one of the most robust classification and regression algorithms that plays a significant role in pattern recognition. However, SVM has not been developed significantly in some areas like large-scale datasets, unbalanced datasets, and multiclass classification. Efficient SVM training in large-scale datasets is of great importance in the big data era. However, as the number of samples increases, the time and memory required to train SVM increase, making SVM impractical even for a medium-sized problem. With the emergence of big data, this problem becomes more significant. This paper presents a novel distributed method for SVM training in which a very small subset of training samples is used for classification, which reduces the problem size and thus the required memory and computational resources. The solution of this problem almost converges to standard SVM. This method includes three steps: first, detecting a subset of distributed training samples, second, creating local models of SVM and obtaining partial vectors, and finally combining the partial vectors and obtaining the global vector and the final model. In addition, the datasets which suffer from unbalanced number of samples and tend to the majority class, the proposed method balances the samples of the two classes and it can be used in unbalanced datasets. The empirical results show that using this method is efficient for large-scale problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF