1. Cauchy Kernel-Based Density Peaks Clustering Algorithm for Categorical Data.
- Author
-
SHENG Jinchao, DU Mingjing, LI Yurui, and SUN Jiarui
- Subjects
DISTRIBUTION (Probability theory) ,PROBABILITY measures ,DENSITY ,ALGORITHMS ,QUADRATIC assignment problem - Abstract
The density peak clustering algorithm has difficulty in producing better clustering results when dealing with categorical data. To address this phenomenon, the article analyzes in detail the reasons for its generation: the overlap problem of distance calculation and the aggregation problem of density calculation. To address the above problems, this article proposes a density peak clustering algorithm for categorical data, referred to as CDPCD. The algorithm points out the ordinal feature (the order relationship between attribute values of categorical data) that rarely exists in the current categorical data distance metric process, and then proposes a weighted ordered distance measure based on probability distribution to alleviate the overlap problem. The data density values are re- evaluated by combining the method of the Cauchy kernel function on a shared nearest neighbor density peak clustering algorithm with improved density calculation and quadratic assignment, which enhances the density diversity and reduces the impact caused by the aggregation problem. Experimental results on several real datasets show that CDPCD achieves better clustering results compared to traditional division-based and density-based clustering algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF