Back to Search
Start Over
A self-adaptive density-based clustering algorithm for varying densities datasets with strong disturbance factor.
- Source :
-
Data & Knowledge Engineering . Sep2024, Vol. 153, pN.PAG-N.PAG. 1p. - Publication Year :
- 2024
-
Abstract
- Clustering is a fundamental task in data mining, aiming to group similar objects together based on their features or attributes. With the rapid increase in data analysis volume and the growing complexity of high-dimensional data distribution, clustering has become increasingly important in numerous applications, including image analysis, text mining, and anomaly detection. DBSCAN is a powerful tool for clustering analysis and is widely used in density-based clustering algorithms. However, DBSCAN and its variants encounter challenges when confronted with datasets exhibiting clusters of varying densities in intricate high-dimensional spaces affected by significant disturbance factors. A typical example is multi-density clustering connected by a few data points with strong internal correlations, a scenario commonly encountered in the analysis of crowd mobility. To address these challenges, we propose a Self-adaptive Density-Based Clustering Algorithm for Varying Densities Datasets with Strong Disturbance Factor (SADBSCAN). This algorithm comprises a data block splitter, a local clustering module, a global clustering module, and a data block merger to obtain adaptive clustering results. We conduct extensive experiments on both artificial and real-world datasets to evaluate the effectiveness of SADBSCAN. The experimental results indicate that SADBSCAN significantly outperforms several strong baselines across different metrics, demonstrating the high adaptability and scalability of our algorithm. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 0169023X
- Volume :
- 153
- Database :
- Academic Search Index
- Journal :
- Data & Knowledge Engineering
- Publication Type :
- Academic Journal
- Accession number :
- 179793419
- Full Text :
- https://doi.org/10.1016/j.datak.2024.102345