Back to Search
Start Over
Online anomaly detection over Big Data streams
- Source :
- IEEE BigData, Applied Data Science ISBN: 9783030118204
- Publication Year :
- 2015
- Publisher :
- IEEE, 2015.
-
Abstract
- Data quality is a challenging problem in many real world application domains. While a lot of attention has been given to detect anomalies for data at rest, detecting anomalies for streaming applications still largely remains an open problem. For applications involving several data streams, the challenge of detecting anomalies has become harder over time, as data can dynamically evolve in subtle ways following changes in the underlying infrastructure. In this paper, we describe and empirically evaluate an online anomaly detection pipeline that satisfies two key conditions: generality and scalability. Our technique works on numerical data as well as on categorical data and makes no assumption on the underlying data distributions. We implement two metrics, relative entropy and Pearson correlation, to dynamically detect anomalies. The two metrics we use provide an efficient and effective detection of anomalies over high velocity streams of events. In the following, we describe the design and implementation of our approach in a Big Data scenario using state-of-the-art streaming components. Specifically, we build on Kafka queues and Spark Streaming for realizing our approach while satisfying the generality and scalability requirements given above. We show how a combination of the two metrics we put forward can be applied to detect several types of anomalies — like infrastructure failures, hardware misconfiguration or user-driven anomalies — in large-scale telecommunication networks. We also discuss the merits and limitations of the resulting architecture and empirically evaluate its scalability on a real deployment over live streams capturing events from millions of mobile devices.
Details
- ISBN :
- 978-3-030-11820-4
- ISBNs :
- 9783030118204
- Database :
- OpenAIRE
- Journal :
- 2015 IEEE International Conference on Big Data (Big Data)
- Accession number :
- edsair.doi.dedup.....b225f0b495a76feab46b5f3663b77ec3
- Full Text :
- https://doi.org/10.1109/bigdata.2015.7363865