Back to Search Start Over

EDDS: An Enhanced Density-Based Method for Clustering Data Streams

Authors :
Hongbo Du
Sabah Jassim
Ammar Al Abd Alazeez
Source :
ICPP Workshops
Publication Year :
2017
Publisher :
IEEE, 2017.

Abstract

Data stream clustering is an active area of research in big data. It refers to clustering constantly arriving new data records and updating existing cluster patterns and outliers in light of the newly arriving data. Density-based algorithms for solving this problem have the promise for finding arbitrary shape clusters and detecting anomalies without prior knowledge of the number of clusters. In this paper, a new incremental algorithm known as Enhanced Density-based Data Stream (EDDS) is developed to overcome limitations with the existing solutions. The algorithm detects clusters and outliers in an incoming data chunk, merges new clusters from the chunk with the existing clusters, and filters out new outliers for the next round. It modified the traditional DBSCAN algorithm to summarise each cluster in terms of a set of surface-core points. The algorithm applies the density-reachable concept of DBSCAN as its merging strategy and prunes the internal core points using a heuristic solution. The algorithm also removes the aged core points and outliers depending on a fading function. The paper investigates three versions of the algorithm for three possible representations of clusters where either all core points are maintained (EDDS-I), only core points of the new clusters from the incoming chunk are kept (EDDS-II), or only the surface-core points of the cluster shapes are kept (EDDS-III) to examine the balance between the efficiency gain for the algorithm and the amount of overhead time committed for pruning internal core points. The algorithm was evaluated on selected datasets using various quality measures. The experimental results indicate improvements in terms of clustering correctness with a comparable time complexity over the existing solutions for solving the same kind of problems.

Details

Database :
OpenAIRE
Journal :
2017 46th International Conference on Parallel Processing Workshops (ICPPW)
Accession number :
edsair.doi...........672a39949012071306c4bb3d4528f6ff
Full Text :
https://doi.org/10.1109/icppw.2017.27