1. LTC: A Fast Algorithm to Accurately Find Significant Items in Data Streams
- Author
-
Shiyu Cheng, Tong Yang, Haowei Zhang, Bin Cui, and Dongsheng Yang
- Subjects
Computer science ,Data stream mining ,Denial-of-service attack ,computer.software_genre ,Fast algorithm ,Computer Science Applications ,Task (project management) ,Test (assessment) ,Computational Theory and Mathematics ,Approximation error ,Key (cryptography) ,Data mining ,computer ,Information Systems - Abstract
Finding top-k frequent items has been a hot issue in databases. Finding top-k persistent items is a new issue, and has attracted increasing attention in recent years. In practice, users often want to know which items are significant, i.e., not only frequent but also persistent. No prior art can address both of the above two issues at the same time. Also, for high-speed data streams, prior art cannot achieve high accuracy when the memory is tight. In this paper, we define a new issue, named finding significant items, and propose a novel algorithm namely LTC to address this issue. It includes two key techniques, Long-tail Restoring and CLOCK, as well as three optimizations. In addition, LTC is extended to support finding significant items with thresholds. We theoretically derive the correct rate and error bound, and conduct extensive experiments on three real datasets to test the performance of LTC. Our experimental results show that LTC can achieve ${10^{5}}$ times higher accuracy in terms of average relative error than other related algorithms. Lastly, LTC is applied to a DDoS detection task and it shows that finding significant items is more powerful than finding frequent items.
- Published
- 2022