Back to Search Start Over

A self-adaptive density-based clustering algorithm for varying densities datasets with strong disturbance factor.

Authors :
Cai, Zihao
Gu, Zhaodong
He, Kejing
Source :
Data & Knowledge Engineering. Sep2024, Vol. 153, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Clustering is a fundamental task in data mining, aiming to group similar objects together based on their features or attributes. With the rapid increase in data analysis volume and the growing complexity of high-dimensional data distribution, clustering has become increasingly important in numerous applications, including image analysis, text mining, and anomaly detection. DBSCAN is a powerful tool for clustering analysis and is widely used in density-based clustering algorithms. However, DBSCAN and its variants encounter challenges when confronted with datasets exhibiting clusters of varying densities in intricate high-dimensional spaces affected by significant disturbance factors. A typical example is multi-density clustering connected by a few data points with strong internal correlations, a scenario commonly encountered in the analysis of crowd mobility. To address these challenges, we propose a Self-adaptive Density-Based Clustering Algorithm for Varying Densities Datasets with Strong Disturbance Factor (SADBSCAN). This algorithm comprises a data block splitter, a local clustering module, a global clustering module, and a data block merger to obtain adaptive clustering results. We conduct extensive experiments on both artificial and real-world datasets to evaluate the effectiveness of SADBSCAN. The experimental results indicate that SADBSCAN significantly outperforms several strong baselines across different metrics, demonstrating the high adaptability and scalability of our algorithm. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0169023X
Volume :
153
Database :
Academic Search Index
Journal :
Data & Knowledge Engineering
Publication Type :
Academic Journal
Accession number :
179793419
Full Text :
https://doi.org/10.1016/j.datak.2024.102345