Back to Search
Start Over
Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution
- Publication Year :
- 2018
-
Abstract
- Strong intelligent machines powered by deep neural networks are increasingly deployed as black boxes to make decisions in risk-sensitive domains, such as finance and medical. To reduce potential risk and build trust with users, it is critical to interpret how such machines make their decisions. Existing works interpret a pre-trained neural network by analyzing hidden neurons, mimicking pre-trained models or approximating local predictions. However, these methods do not provide a guarantee on the exactness and consistency of their interpretation. In this paper, we propose an elegant closed form solution named $OpenBox$ to compute exact and consistent interpretations for the family of Piecewise Linear Neural Networks (PLNN). The major idea is to first transform a PLNN into a mathematically equivalent set of linear classifiers, then interpret each linear classifier by the features that dominate its prediction. We further apply $OpenBox$ to demonstrate the effectiveness of non-negative and sparse constraints on improving the interpretability of PLNNs. The extensive experiments on both synthetic and real world data sets clearly demonstrate the exactness and consistency of our interpretation.<br />Comment: KDD 2018
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.1802.06259
- Document Type :
- Working Paper