Back to Search Start Over

A Proposal: High-Throughput Robust Architecture for Log Analysis and Data Stream Mining

Authors :
Adnan Rashid Hussain
Mohd Abdul Hameed
Sana Fatima
Source :
Advances in Intelligent Systems and Computing ISBN: 9789811004179
Publication Year :
2016
Publisher :
Springer Singapore, 2016.

Abstract

Various data mining approaches are now available, which help in handling large static data sets, in spite of limited computational resources. However, these approaches lack in mining high-speed endless streams, as their learning procedure though simple require the entire training process to be repeated for each new arriving information instance. The main challenges while dealing with continuous data streams: they are of sizes many times greater than the available memory, are real-time, and the new instances should be inspected at most once, and predictions must be made. Another issue with continuous real-time data is changing of concepts with time, which is often called concept drift. This paper addresses the above stated problems, and provides a solution by proposing a real-time, scalable, and robust architecture. It is a general-purpose architecture, based on online machine learning, which efficiently logs and mines the stream data in a fault-tolerant manner. It consists of two frameworks: (1) Event aggregation framework, which reliably collects events and messages from multiple sources and ships them to a destination for processing (2) Real-time computation framework, which processes streams online for extraction of information patterns. It guarantees reliable processing of billions of messages per second. Furthermore, it facilitates the evaluation of the stream learning algorithms and offers change detection strategies to detect concept drifts.

Details

ISBN :
978-981-10-0417-9
ISBNs :
9789811004179
Database :
OpenAIRE
Journal :
Advances in Intelligent Systems and Computing ISBN: 9789811004179
Accession number :
edsair.doi...........432155546737b9409b8f8f5de600298d