1. Building Efficient Regular Expression Matchers Through GA Optimization With ML Surrogates
- Author
-
Johan Garcia, Anders Waldenborg, and Jonathan Hillblom
- Subjects
Finite-state machine ,Optimization problem ,Matching (graph theory) ,Computer science ,business.industry ,Intrusion detection system ,Evaluation function ,Machine learning ,computer.software_genre ,Set (abstract data type) ,Traffic classification ,Artificial intelligence ,Regular expression ,business ,computer - Abstract
Important network functions such as traffic classification and intrusion detection often depend on high-throughput regular expression matching. To achieve high performance, regular expressions can be represented as state machines, which are then merged. However, determining which individual state machines should ideally be merged together is a challenging optimization problem. We address this problem by using genetic algorithms with novel problem-specific operators. To allow large scale evaluation of the new operators, we devise two ML-based surrogate models for the expensive fitness evaluation function. Our results from a set of production scale regular expressions show that using the most appropriate operations provides large gains over a naive baseline, but also that no universal best combination of operators exist. We provide some insights into which operators perform best for different objectives, and show the variation between TCP- and UDP-specific regular expressions.
- Published
- 2021
- Full Text
- View/download PDF