Back to Search
Start Over
On explaining machine learning models by evolving crucial and compact features
- Source :
- Swarm and Evolutionary Computation, 53:100640. Elsevier BV, SWARM AND EVOLUTIONARY COMPUTATION, 53. ELSEVIER, Swarm and Evolutionary Computation, 53
- Publication Year :
- 2020
-
Abstract
- Feature construction can substantially improve the accuracy of Machine Learning (ML) algorithms. Genetic Programming (GP) has been proven to be effective at this task by evolving non-linear combinations of input features. GP additionally has the potential to improve ML explainability since explicit expressions are evolved. Yet, in most GP works the complexity of evolved features is not explicitly bound or minimized though this is arguably key for explainability. In this article, we assess to what extent GP still performs favorably at feature construction when constructing features that are (1) Of small-enough number, to enable visualization of the behavior of the ML model; (2) Of small-enough size, to enable interpretability of the features themselves; (3) Of sufficient informative power, to retain or even improve the performance of the ML algorithm. We consider a simple feature construction scheme using three different GP algorithms, as well as random search, to evolve features for five ML algorithms, including support vector machines and random forest. Our results on 21 datasets pertaining to classification and regression problems show that constructing only two compact features can be sufficient to rival the use of the entire original feature set. We further find that a modern GP algorithm, GP-GOMEA, performs best overall. These results, combined with examples that we provide of readable constructed features and of 2D visualizations of ML behavior, lead us to positively conclude that GP-based feature construction still works well when explicitly searching for compact features, making it extremely helpful to explain ML models.<br />Comment: We included more experiments: - A high-dimensional dataset is considered - The machine learning algorithm XGBoost is considered We also repeated the experiments using the Naive Bayes classifier, because we discovered that the implementation we relied on had issues (see https://github.com/mlpack/mlpack/issues/2017)
- Subjects :
- FOS: Computer and information sciences
Computer Science - Machine Learning
General Computer Science
Computer science
General Mathematics
Genetic programming
02 engineering and technology
Machine learning
computer.software_genre
Machine Learning (cs.LG)
Random search
0202 electrical engineering, electronic engineering, information engineering
Feature (machine learning)
Neural and Evolutionary Computing (cs.NE)
Interpretability
Interpretable machine learning
GOMEA
business.industry
05 social sciences
Computer Science - Neural and Evolutionary Computing
050301 education
Random forest
Visualization
Support vector machine
Key (cryptography)
020201 artificial intelligence & image processing
Feature construction
Artificial intelligence
business
0503 education
computer
Subjects
Details
- Language :
- English
- ISSN :
- 22106502
- Database :
- OpenAIRE
- Journal :
- Swarm and Evolutionary Computation, 53:100640. Elsevier BV, SWARM AND EVOLUTIONARY COMPUTATION, 53. ELSEVIER, Swarm and Evolutionary Computation, 53
- Accession number :
- edsair.doi.dedup.....bce2a4d2f3c6e72eb23f1cfdfd61dacd
- Full Text :
- https://doi.org/10.1016/j.swevo.2019.100640