Start Over

Streaming traffic classification: a hybrid deep learning and big data approach.

Authors :: Seydali, Mehdi
Khunjush, Farshad
Dogani, Javad
Source :: Cluster Computing. Jul2024, Vol. 27 Issue 4, p5165-5193. 29p.
Publication Year :: 2024
Abstract: Massive amounts of real-time streaming network data are generated quickly because of the exponential growth of applications. Analyzing patterns in generated flow traffic streaming offers benefits in reducing traffic congestion, enhancing network management, and improving the quality of service management. Processing massive volumes of generated traffic poses more challenges when data traffic encryption is raised. Classifying encrypted network traffic in real-time with deep learning networks has received attention because of their excellent performance. The substantial volume of incoming packets, characterized by high speed and wide variety, puts real-time traffic classification within the domain of big data problems. Classifying traffic with high speed and accuracy is a significant challenge in the era of big data. The real-time nature of traffic intensifies deep learning networks, necessitating a considerable number of parameters, layers, and resources for optimal network training. Until now, various datasets have been employed to evaluate the effectiveness of previous methods for classifying encrypted traffic. The primary objective has been to enhance accuracy, precision, and F1-measure. Presently, encrypted traffic classification performance depends on pre-existing datasets. The learning and testing phases are done offline, and more research is needed to investigate the feasibility of these methods in real-world scenarios. This paper examines the possibility of a tradeoff between evaluating the model's effectiveness, execution time, and utilization of processing resources when processing stream-based input data for traffic classification. We aim to explore the feasibility of establishing a tradeoff between these factors and determining optimal parameter settings. This paper used the ISCX VPN-Non VPN 2016 public dataset to evaluate the proposed method. All packets from the dataset were streamed continuously through Apache Kafka to the classification framework. Numerous experiments have been designed to demonstrate the efficacy of the proposed method. The experimental results show that the proposed method outperforms the baseline methods by 11% in the F1-measure when the number of workers is two and by 25% when the number of workers is equal to 32. [ABSTRACT FROM AUTHOR]