Back to Search
Start Over
Big-Data Clustering: K-Means or K-Indicators?
- Publication Year :
- 2019
-
Abstract
- The K-means algorithm is arguably the most popular data clustering method, commonly applied to processed datasets in some "feature spaces", as is in spectral clustering. Highly sensitive to initializations, however, K-means encounters a scalability bottleneck with respect to the number of clusters K as this number grows in big data applications. In this work, we promote a closely related model called K-indicators model and construct an efficient, semi-convex-relaxation algorithm that requires no randomized initializations. We present extensive empirical results to show advantages of the new algorithm when K is large. In particular, using the new algorithm to start the K-means algorithm, without any replication, can significantly outperform the standard K-means with a large number of currently state-of-the-art random replications.
- Subjects :
- FOS: Computer and information sciences
Computer Science - Machine Learning
Optimization and Control (math.OC)
Statistics - Machine Learning
Computer Vision and Pattern Recognition (cs.CV)
FOS: Mathematics
Computer Science - Computer Vision and Pattern Recognition
Machine Learning (stat.ML)
Mathematics - Optimization and Control
Machine Learning (cs.LG)
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....48bb786ced4ba4031c9b30b7573f2a3f