Start Over

Distributed Density Peaks Clustering Revisited.

Authors :: Lu, Jing
Zhao, Yuhai
Tan, Kian-Lee
Wang, Zhengkui
Source :: IEEE Transactions on Knowledge & Data Engineering; Aug2022, Vol. 34 Issue 8, p3714-3726, 13p
Publication Year :: 2022
Abstract: Density Peaks (DP) Clustering organizes data into clusters by finding peaks in dense regions. This involves computing density ($\rho$ ρ ) and distance ($\delta$ δ ) of every point. As such, though DP has been very effective in producing high quality clusters, their complexity is O($N^2$ N 2 ) where $N$ N is the number of data points. In this paper, we propose a fast distributed density peaks clustering algorithm, FDDP, based on the z-value index. In FDDP, we first employ the z-value index to map multi-dimensional data points into one dimensional space, and then range-partition the data according to the z-value to balance the load across the processing nodes. We ensure minimal overlapping range to handle computations at the boundary points. We also propose FC, an efficient algorithm that employs a forward computing strategy to calculate $\rho$ ρ linearly. Additionally, we propose another algorithm, CB, which uses a caching and efficient searching strategy to compute $\delta$ δ . Moreover, FDDP is able to reduce the time complexity from $O(N^2)$ O (N 2) to $O(N\cdot log(N))$ O (N · l o g (N)) . We provide a theoretical analysis of FDDP and evaluated FDDP empirically. Our experimental results show that FDDP outperforms the state-of-the-art algorithms significantly. [ABSTRACT FROM AUTHOR]