1. On Runtime and Classification Performance of the Discretize-Optimize (DISCO) Classification Approach
- Author
-
Topi Korhonen and Johan Garcia
- Subjects
021103 operations research ,Discretization ,Computer Networks and Communications ,business.industry ,Computer science ,0211 other engineering and technologies ,Decision tree ,Pareto principle ,Feature selection ,02 engineering and technology ,Machine learning ,computer.software_genre ,Random forest ,Hardware and Architecture ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Combinatorial optimization ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Software - Abstract
Using machine learning in high-speed networks for tasks such as flow classification typically requires either very resource efficient classification approaches, large amounts of computational resources, or specialized hardware. Here we provide a sketch of the discretize-optimize (DISCO) approach which can construct an extremely efficient classifier for low dimensional problems by combining feature selection, efficient discretization, novel bin placement, and lookup. As feature selection and discretization parameters are crucial, appropriate combinatorial optimization is an important aspect of the approach. A performance evaluation is performed for a YouTube classification task using a cellular traffic data set. The initial evaluation results show that the DISCO approach can move the Pareto boundary in the classification performance versus runtime trade-off by up to an order of magnitude compared to runtime optimized random forest and decision tree classifiers.
- Published
- 2019