1. Understanding classifier errors by examining influential neighbors
- Author
-
Kristin Branson, Alice A. Robie, and Mayank Kabra
- Subjects
Training set ,Boosting (machine learning) ,business.industry ,Computer science ,Machine learning ,computer.software_genre ,Margin classifier ,Metric (mathematics) ,Classifier (linguistics) ,Artificial intelligence ,Data mining ,business ,Classifier (UML) ,computer - Abstract
Modern supervised learning algorithms can learn very accurate and complex discriminating functions. But when these classifiers fail, this complexity can also be a drawback because there is no easy, intuitive way to diagnose why they are failing and remedy the problem. This important question has received little attention. To address this problem, we propose a novel method to analyze and understand a classifier's errors. Our method centers around a measure of how much influence a training example has on the classifier's prediction for a test example. To understand why a classifier is mispredicting the label of a given test example, the user can find and review the most influential training examples that caused this misprediction, allowing them to focus their attention on relevant areas of the data space. This will aid the user in determining if and how the training data is inconsistently labeled or lacking in diversity, or if the feature representation is insufficient. As computing the influence of each training example is computationally impractical, we propose a novel distance metric to approximate influence for boosting classifiers that is fast enough to be used interactively. We also show several novel use paradigms of our distance metric. Through experiments, we show that it can be used to find incorrectly or inconsistently labeled training examples, to find specific areas of the data space that need more training data, and to gain insight into which features are missing from the current representation.
- Published
- 2015
- Full Text
- View/download PDF