Back to Search Start Over

The Weighted Word2vec Paragraph Vectors for Anomaly Detection Over HTTP Traffic

Authors :
Jieling Li
Hao Zhang
Zhiqiang Wei
Source :
IEEE Access, Vol 8, Pp 141787-141798 (2020)
Publication Year :
2020
Publisher :
IEEE, 2020.

Abstract

Anomaly detection over HTTP traffic has attracted much attention in recent years, which plays a vital role in many domains. This article proposes an efficient machine learning approach to detect anomalous HTTP traffic that addresses the problems of existing methods, such as data redundancy and high training complexity. This algorithm draws on natural language processing (NLP) technology, uses the Word2vec algorithm to deal with the semantic gap, and implements Term Frequency-Inverse Document Frequency (TF-IDF) weighted mapping of HTTP traffic to construct a low-dimensional paragraph vector representation to reduce training complexity. Then we employs boosting algorithm Light Gradient Boosting Machine (LightGBM) and Categorical Boosting (CatBoost) to build an efficient and accurate anomaly detection model. The proposed method is tested on some artificial data sets, such as HTTP DATASET CSIC 2010, UNSW-NB15, and Malicious-URLs. Experimental results reveal that both the boosting algorithms have high detection accuracy, high true positive rate, and low false positive rate. Compared with other anomaly detection methods, the proposed algorithms require relatively short running time and low CPU memory consumption.

Details

Language :
English
ISSN :
21693536
Volume :
8
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.0a119f10d7804a0f8ebbcfc692e62f14
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2020.3013849