1. Developing Interval-Based Cost-Sensitive Classifiers by Genetic Programming for Binary High-Dimensional Unbalanced Classification [Research Frontier]
- Author
-
Mengjie Zhang, Lin Shang, Wenbin Pei, and Bing Xue
- Subjects
Computer science ,business.industry ,Small number ,Binary number ,Genetic programming ,02 engineering and technology ,Interval (mathematics) ,Function (mathematics) ,Construct (python library) ,Machine learning ,computer.software_genre ,Theoretical Computer Science ,Statistical classification ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Domain knowledge ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer - Abstract
Cost-sensitive learning is a popular approach to addressing the problem of class imbalance for many classification algorithms in machine learning. However, most cost-sensitive algorithms are dependent on manually designed cost matrices. Unfortunately, in many cases, it is often not easy for humans, even experts, to accurately specify misclassification costs for different mistakes due to the lack of domain knowledge related to actual situations in some complex unbalanced problems. As a result, these cost-sensitive algorithms cannot be directly applied. This paper proposes a new genetic programmingbased approach to developing cost-sensitive classifiers that are independent of manually designed cost matrices. The proposed method is able to construct classifiers and learn cost intervals automatically and simultaneously. In the proposed method, a tree representation, terminal sets and function sets are designed and developed. We examine the effectiveness of the proposed method on ten high-dimensional unbalanced datasets. The experimental results show that the proposed method often outperforms compared methods for highdimensional unbalanced classification. Furthermore, according to the analysis of evolved trees, the constructed classifiers often only need a small number of features to achieve a good classification performance.
- Published
- 2021