Back to Search Start Over

Ocean: Online Clustering and Evolution Analysis for Dynamic Streaming Data

Authors :
Feng, Chunhui
Fang, Junhua
Xia, Yue
Chao, Pingfu
Zhao, Pengpeng
Xu, Jiajie
Zhou, Xiaofang
Feng, Chunhui
Fang, Junhua
Xia, Yue
Chao, Pingfu
Zhao, Pengpeng
Xu, Jiajie
Zhou, Xiaofang
Publication Year :
2024

Abstract

With the popularization of mobile applications and the timely acquisition of fresh data, real-time clustering and its evolution analysis have become the primary operations for data processing and knowledge discovery. Such continuous queries on massive objects are computation-intensive tasks in dynamic scenarios. However, existing clustering techniques are incompetent to achieve decent performance when computation-intensive operations frequently occur in streaming scenarios, which is caused by two challenges: (i) uncertainty of the clustering frequency; (ii) unpredictable distribution evolution. Hence, it is critical to find a lightweight model that can cluster the high-speed dynamic instances while exploiting the evolution amid different clustering results. This paper focuses on the problem of real-time clustering on streaming data in computation-intensive and high-dynamics tasks, through a framework Ocean, consisting of the Online clustering algorithm and evolution analysis. Particularly, the framework conceives a flexible composite window to augment the knowledge mining, achieving a proper real-time response in various scenarios. The evolution analysis supports full life-cycle detection, improving the adaptability to dynamic concept drifts and multiple patterns. Inspired by the grid partition strategy, this framework adopts grid feature vectors to capture the significant changes in streaming data. Furthermore, we propose an optimization that removes sparse grids timely and performs the online clustering adaptively for space and time efficiency. It is proven to be effective both theoretically and experimentally. This strategy enables real-time clustering for dynamic streaming data without degrading the clustering quality or increasing the computation cost. Experiments on real datasets and synthetic datasets verify the accuracy and effectiveness of Ocean compared to the state-of-the-art approaches, as well as the superior ability to perform clustering in a real-time

Details

Database :
OAIster
Notes :
English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1452723278
Document Type :
Electronic Resource