201. Reuse-centric k-means configuration
- Author
-
Hui Guan, Xipeng Shen, Hamid Krim, Lijun Zhang, and Yufei Ding
- Subjects
Set (abstract data type) ,Speedup ,Computer engineering ,Hardware and Architecture ,Computer science ,Computation ,Data classification ,k-means clustering ,Process (computing) ,Feature (machine learning) ,Reuse ,Software ,Information Systems - Abstract
K -means configuration is to find a configuration of k -means (e.g., the number of clusters, feature sets) that maximize some objectives. It is a time-consuming process due to the iterative nature of k -means. This paper proposes reuse-centric k -means configuration to accelerate k -means configuration. It is based on the observation that the explorations of different configurations share lots of common or similar computations. Effectively reusing the computations from prior trials of different configurations could largely shorten the configuration time. To materialize the idea, the paper presents a set of novel techniques, including reuse-based filtering, center reuse, and a two-phase design to capitalize on the reuse opportunities on three levels: validation, number of clusters, and feature sets. Experiments on k -means–based data classification tasks show that reuse-centric k -means configuration can speed up a heuristic search-based configuration process by a factor of 5.8, and a uniform search-based attainment of classification error surfaces by a factor of 9.1. The paper meanwhile provides some important insights on how to effectively apply the acceleration techniques to tap into a full potential.
- Published
- 2021