1. Fast semi-supervised self-training algorithm based on data editing.
- Author
-
Li, Bing, Wang, Jikui, Yang, Zhengguo, Yi, Jihai, and Nie, Feiping
- Subjects
- *
DATA editing , *GENOME editing , *SUPERVISED learning , *MACHINE learning , *ALGORITHMS - Abstract
Self-training is a commonly semi-supervised learning Algorithm framework. How to select the high-confidence samples is a crucial step for algorithms based on self-training framework. To alleviate the impact of noise data, researchers have proposed many data editing methods to improve the selection quality of high-confidence samples. However, the state-of-the-art data editing methods have high time complexity, which is not less than O (n 2) , where n denotes the number of samples. To improve the training speed while ensuring the quality of the selected high-confidence samples, inspired by Ball- k -means algorithm, we propose a fast semi-supervised self-training Algorithm based on data editing (EBSA), which defines ball-cluster partition and editing to improve the quality of high-confidence samples. The time complexity of the proposed EBSA is O (t 2 kn + n log n + n + k 2 ) , where k denotes the number of centers, t denotes the number of iterates. k is far less than n , EBSA has linear time complexity with respect to n. A large number of experiments on 20 benchmark data sets have been carried out and the experimental results show that the proposed Algorithm not only ran faster, but also obtained better classification performance compared with the comparison algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF