Back to Search Start Over

Staleness-Reduction Mini-Batch K-Means

Authors :
Zhu, Xueying
Sun, Jie
He, Zhenhao
Jiang, Jiantong
Wang, Zeke
Source :
IEEE Transactions on Neural Networks and Learning Systems; October 2024, Vol. 35 Issue: 10 p14424-14436, 13p
Publication Year :
2024

Abstract

<inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-means (km) is a clustering algorithm that has been widely adopted due to its simple implementation and high clustering quality. However, the standard km suffers from high computational complexity and is therefore time-consuming. Accordingly, the mini-batch (mbatch) km is proposed to significantly reduce computational costs in a manner that updates centroids after performing distance computations on just a mbatch, rather than a full batch, of samples. Even though the mbatch km converges faster, it leads to a decrease in convergence quality because it introduces staleness during iterations. To this end, in this article, we propose the staleness-reduction mbatch (srmbatch) km, which achieves the best of two worlds: low computational costs like the mbatch km and high clustering quality like the standard km. Moreover, srmbatch still exposes massive parallelism to be efficiently implemented on multicore CPUs and many-core GPUs. The experimental results show that srmbatch can converge up to <inline-formula> <tex-math notation="LaTeX">$40\times $ </tex-math></inline-formula>–<inline-formula> <tex-math notation="LaTeX">$130\times $ </tex-math></inline-formula> faster than mbatch when reaching the same target loss, and srmbatch is able to reach 0.2%–1.7% lower final loss than that of mbatch.

Details

Language :
English
ISSN :
2162237x and 21622388
Volume :
35
Issue :
10
Database :
Supplemental Index
Journal :
IEEE Transactions on Neural Networks and Learning Systems
Publication Type :
Periodical
Accession number :
ejs67665938
Full Text :
https://doi.org/10.1109/TNNLS.2023.3279122