Back to Search Start Over

Online anomaly detection over Big Data streams

Authors :
Mourad Khayati
Laura Rettig
Michal Piorkowski
Philippe Cudré-Mauroux
Source :
IEEE BigData, Applied Data Science ISBN: 9783030118204
Publication Year :
2015
Publisher :
IEEE, 2015.

Abstract

Data quality is a challenging problem in many real world application domains. While a lot of attention has been given to detect anomalies for data at rest, detecting anomalies for streaming applications still largely remains an open problem. For applications involving several data streams, the challenge of detecting anomalies has become harder over time, as data can dynamically evolve in subtle ways following changes in the underlying infrastructure. In this paper, we describe and empirically evaluate an online anomaly detection pipeline that satisfies two key conditions: generality and scalability. Our technique works on numerical data as well as on categorical data and makes no assumption on the underlying data distributions. We implement two metrics, relative entropy and Pearson correlation, to dynamically detect anomalies. The two metrics we use provide an efficient and effective detection of anomalies over high velocity streams of events. In the following, we describe the design and implementation of our approach in a Big Data scenario using state-of-the-art streaming components. Specifically, we build on Kafka queues and Spark Streaming for realizing our approach while satisfying the generality and scalability requirements given above. We show how a combination of the two metrics we put forward can be applied to detect several types of anomalies — like infrastructure failures, hardware misconfiguration or user-driven anomalies — in large-scale telecommunication networks. We also discuss the merits and limitations of the resulting architecture and empirically evaluate its scalability on a real deployment over live streams capturing events from millions of mobile devices.

Details

ISBN :
978-3-030-11820-4
ISBNs :
9783030118204
Database :
OpenAIRE
Journal :
2015 IEEE International Conference on Big Data (Big Data)
Accession number :
edsair.doi.dedup.....b225f0b495a76feab46b5f3663b77ec3
Full Text :
https://doi.org/10.1109/bigdata.2015.7363865