1. Explainable AI improves task performance in human–AI collaboration
- Author
-
Julian Senoner, Simon Schallmoser, Bernhard Kratzwald, Stefan Feuerriegel, and Torbjørn Netland
- Subjects
Explainable AI ,Task performance ,Decision-making ,Human-centered AI ,Human–AI collaboration ,Medicine ,Science - Abstract
Abstract Artificial intelligence (AI) provides considerable opportunities to assist human work. However, one crucial challenge of human–AI collaboration is that many AI algorithms operate in a black-box manner where the way how the AI makes predictions remains opaque. This makes it difficult for humans to validate a prediction made by AI against their own domain knowledge. For this reason, we hypothesize that augmenting humans with explainable AI improves task performance in human–AI collaboration. To test this hypothesis, we implement explainable AI in the form of visual heatmaps in inspection tasks conducted by domain experts. Visual heatmaps have the advantage that they are easy to understand and help to localize relevant parts of an image. We then compare participants that were either supported by (a) black-box AI or (b) explainable AI, where the latter supports them to follow AI predictions when the AI is accurate or overrule the AI when the AI predictions are wrong. We conducted two preregistered experiments with representative, real-world visual inspection tasks from manufacturing and medicine. The first experiment was conducted with factory workers from an electronics factory, who performed $$N=9,600$$ assessments of whether electronic products have defects. The second experiment was conducted with radiologists, who performed $$N=5,650$$ assessments of chest X-ray images to identify lung lesions. The results of our experiments with domain experts performing real-world tasks show that task performance improves when participants are supported by explainable AI with heatmaps instead of black-box AI. We find that explainable AI as a decision aid improved the task performance by 7.7 percentage points (95% confidence interval [CI]: 3.3% to 12.0%, $$P=0.001$$ ) in the manufacturing experiment and by 4.7 percentage points (95% CI: 1.1% to 8.3%, $$P=0.010$$ ) in the medical experiment compared to black-box AI. These gains represent a significant improvement in task performance.
- Published
- 2024
- Full Text
- View/download PDF