Back to Search
Start Over
EDDS: An Enhanced Density-Based Method for Clustering Data Streams
- Source :
- ICPP Workshops
- Publication Year :
- 2017
- Publisher :
- IEEE, 2017.
-
Abstract
- Data stream clustering is an active area of research in big data. It refers to clustering constantly arriving new data records and updating existing cluster patterns and outliers in light of the newly arriving data. Density-based algorithms for solving this problem have the promise for finding arbitrary shape clusters and detecting anomalies without prior knowledge of the number of clusters. In this paper, a new incremental algorithm known as Enhanced Density-based Data Stream (EDDS) is developed to overcome limitations with the existing solutions. The algorithm detects clusters and outliers in an incoming data chunk, merges new clusters from the chunk with the existing clusters, and filters out new outliers for the next round. It modified the traditional DBSCAN algorithm to summarise each cluster in terms of a set of surface-core points. The algorithm applies the density-reachable concept of DBSCAN as its merging strategy and prunes the internal core points using a heuristic solution. The algorithm also removes the aged core points and outliers depending on a fading function. The paper investigates three versions of the algorithm for three possible representations of clusters where either all core points are maintained (EDDS-I), only core points of the new clusters from the incoming chunk are kept (EDDS-II), or only the surface-core points of the cluster shapes are kept (EDDS-III) to examine the balance between the efficiency gain for the algorithm and the amount of overhead time committed for pruning internal core points. The algorithm was evaluated on selected datasets using various quality measures. The experimental results indicate improvements in terms of clustering correctness with a comparable time complexity over the existing solutions for solving the same kind of problems.
- Subjects :
- Data stream
DBSCAN
Fuzzy clustering
Data stream mining
Computer science
Correlation clustering
OPTICS algorithm
02 engineering and technology
computer.software_genre
Determining the number of clusters in a data set
Data stream clustering
SUBCLU
CURE data clustering algorithm
020204 information systems
Nearest-neighbor chain algorithm
Outlier
0202 electrical engineering, electronic engineering, information engineering
Canopy clustering algorithm
Affinity propagation
020201 artificial intelligence & image processing
Algorithm design
Data mining
Cluster analysis
computer
k-medians clustering
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2017 46th International Conference on Parallel Processing Workshops (ICPPW)
- Accession number :
- edsair.doi...........672a39949012071306c4bb3d4528f6ff
- Full Text :
- https://doi.org/10.1109/icppw.2017.27