1. Preprocessing Method for Encrypted Traffic Based on Semisupervised Clustering
- Author
-
Niu Weina, Liang Liu, Shan Liao, Kai Li, Jiayong Liu, and Rongfeng Zheng
- Subjects
DBSCAN ,Science (General) ,Transport Layer Security ,Article Subject ,Computer Networks and Communications ,business.industry ,Computer science ,ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS ,computer.software_genre ,Flow network ,Encryption ,Q1-390 ,ComputingMethodologies_PATTERNRECOGNITION ,T1-995 ,Malware ,Preprocessor ,Noise (video) ,Data mining ,Cluster analysis ,business ,computer ,Technology (General) ,Information Systems - Abstract
The explosive growth in network traffic in recent times has resulted in increased processing pressure on network intrusion detection systems. In addition, there is a lack of reliable methods for preprocessing network traffic generated by benign applications that do not steal users’ data from their devices. To alleviate these problems, this study analyzed the differences between benign and malicious traffic produced by benign applications and malware, respectively. To fully express these differences, this study proposed a new set of statistical features for training a clustering model. Furthermore, to mine the communication channels generated by benign applications in batches, a semisupervised clustering method was adopted. Using a small number of labeled samples, our method aggregated historical network traffic into two types of clusters. The cluster that did not contain labeled malicious samples was regarded as a benign traffic cluster. The experimental results were compared using four types of clustering algorithms. The density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm was selected to mine benign communication channels. We also compared our method with two other methods, and the results demonstrated that the benign channels mined through our method were more reliable. Finally, using our method, 1,811 benign transport layer security (TLS) channels were mined from 18,357 TLS communication channels. The number of flows carried by these benign channels comprised 65.37% of the entire network flows, and no malicious flow was included in our results, which proves the effectiveness of our method.
- Published
- 2020
- Full Text
- View/download PDF