Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution

Authors :: Chu, Lingyang
Hu, Xia
Hu, Juhua
Wang, Lanjun
Pei, Jian
Publication Year :: 2018
Abstract: Strong intelligent machines powered by deep neural networks are increasingly deployed as black boxes to make decisions in risk-sensitive domains, such as finance and medical. To reduce potential risk and build trust with users, it is critical to interpret how such machines make their decisions. Existing works interpret a pre-trained neural network by analyzing hidden neurons, mimicking pre-trained models or approximating local predictions. However, these methods do not provide a guarantee on the exactness and consistency of their interpretation. In this paper, we propose an elegant closed form solution named $OpenBox$ to compute exact and consistent interpretations for the family of Piecewise Linear Neural Networks (PLNN). The major idea is to first transform a PLNN into a mathematically equivalent set of linear classifiers, then interpret each linear classifier by the features that dominate its prediction. We further apply $OpenBox$ to demonstrate the effectiveness of non-negative and sparse constraints on improving the interpretability of PLNNs. The extensive experiments on both synthetic and real world data sets clearly demonstrate the exactness and consistency of our interpretation.<br />Comment: KDD 2018

Subjects :: Computer Science - Computer Vision and Pattern Recognition
Computer Science - Artificial Intelligence

Tools