6 results on '"Akula, Arjun R"'
Search Results
2. Attention cannot be an Explanation
- Author
-
Akula, Arjun R and Zhu, Song-Chun
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Human-Computer Interaction ,Human-Computer Interaction (cs.HC) ,Machine Learning (cs.LG) - Abstract
Attention based explanations (viz. saliency maps), by providing interpretability to black box models such as deep neural networks, are assumed to improve human trust and reliance in the underlying models. Recently, it has been shown that attention weights are frequently uncorrelated with gradient-based measures of feature importance. Motivated by this, we ask a follow-up question: "Assuming that we only consider the tasks where attention weights correlate well with feature importance, how effective are these attention based explanations in increasing human trust and reliance in the underlying models?". In other words, can we use attention as an explanation? We perform extensive human study experiments that aim to qualitatively and quantitatively assess the degree to which attention based explanations are suitable in increasing human trust and reliance. Our experiment results show that attention cannot be used as an explanation., arXiv admin note: substantial text overlap with arXiv:2109.01401, arXiv:1909.06907
- Published
- 2022
3. Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding
- Author
-
Akula, Arjun R.
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Visual question answering (VQA) is the multi-modal task of answering natural language questions about an input image. Through cross-dataset adaptation methods, it is possible to transfer knowledge from a source dataset with larger train samples to a target dataset where training set is limited. Suppose a VQA model trained on one dataset train set fails in adapting to another, it is hard to identify the underlying cause of domain mismatch as there could exists a multitude of reasons such as image distribution mismatch and question distribution mismatch. At UCLA, we are working on a VQG module that facilitate in automatically generating OOD shifts that aid in systematically evaluating cross-dataset adaptation capabilities of VQA models.
- Published
- 2022
4. CX-ToM: Counterfactual explanations with theory-of-mind for enhancing human trust in image recognition models
- Author
-
Akula, Arjun R., Wang, Keze, Liu, Changsong, Saba-Sadiya, Sari, Lu, Hongjing, Todorovic, Sinisa, Chai, Joyce, and Zhu, Song-Chun
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial intelligence ,Multidisciplinary ,Computer Science - Artificial Intelligence ,Computer Vision and Pattern Recognition (cs.CV) ,Science ,Computer Science - Computer Vision and Pattern Recognition ,Computer science ,Article ,Human-computer interaction ,Machine Learning (cs.LG) ,Artificial Intelligence (cs.AI) - Abstract
Summary We propose CX-ToM, short for counterfactual explanations with theory-of-mind, a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN). In contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process, i.e., dialogue between the machine and human user. More concretely, our CX-ToM framework generates a sequence of explanations in a dialogue by mediating the differences between the minds of the machine and human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling the human’s intention, the machine’s mind as inferred by the human, as well as human's mind as inferred by the machine. Moreover, most state-of-the-art XAI frameworks provide attention (or heat map) based explanations. In our work, we show that these attention-based explanations are not sufficient for increasing human trust in the underlying CNN model. In CX-ToM, we instead use counterfactual explanations called fault-lines which we define as follows: given an input image I for which a CNN classification model M predicts class cpred, a fault-line identifies the minimal semantic-level features (e.g., stripes on zebra), referred to as explainable concepts, that need to be added to or deleted from I to alter the classification category of I by M to another specified class calt. Extensive experiments verify our hypotheses, demonstrating that our CX-ToM significantly outperforms the state-of-the-art XAI models., Graphical abstract, Highlights • Attention is not a Good Explanation • Explanation is an Interactive Communication Process • We introduce a new XAI framework based on Theory-of-Mind and counterfactual explana- tions., Computer science; Artificial intelligence; Human-computer interaction
- Published
- 2021
5. X-ToM: Explaining with Theory-of-Mind for Gaining Justified Human Trust
- Author
-
Akula, Arjun R., Liu, Changsong, Saba-Sadiya, Sari, Lu, Hongjing, Todorovic, Sinisa, Chai, Joyce Y., and Zhu, Song-Chun
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Human-Computer Interaction ,Human-Computer Interaction (cs.HC) ,Machine Learning (cs.LG) - Abstract
We present a new explainable AI (XAI) framework aimed at increasing justified human trust and reliance in the AI machine through explanations. We pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More concretely, the machine generates sequence of explanations in a dialog which takes into account three important aspects at each dialog turn: (a) human's intention (or curiosity); (b) human's understanding of the machine; and (c) machine's understanding of the human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling human's intention, machine's mind as inferred by the human as well as human's mind as inferred by the machine. In other words, these explicit mental representations in ToM are incorporated to learn an optimal explanation policy that takes into account human's perception and beliefs. Furthermore, we also show that ToM facilitates in quantitatively measuring justified human trust in the machine by comparing all the three mental representations. We applied our framework to three visual recognition tasks, namely, image classification, action recognition, and human body pose estimation. We argue that our ToM based explanations are practical and more natural for both expert and non-expert users to understand the internal workings of complex machine learning models. To the best of our knowledge, this is the first work to derive explanations using ToM. Extensive human study experiments verify our hypotheses, showing that the proposed explanations significantly outperform the state-of-the-art XAI methods in terms of all the standard quantitative and qualitative XAI evaluation metrics including human trust, reliance, and explanation satisfaction., A short version of this was presented at CVPR 2019 Workshop on Explainable AI
- Published
- 2019
6. Discourse Parsing in Videos: A Multi-modal Appraoch
- Author
-
Akula, Arjun R. and Zhu, Song-Chun
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Text-level discourse parsing aims to unmask how two sentences in the text are related to each other. We propose the task of Visual Discourse Parsing, which requires understanding discourse relations among scenes in a video. Here we use the term scene to refer to a subset of video frames that can better summarize the video. In order to collect a dataset for learning discourse cues from videos, one needs to manually identify the scenes from a large pool of video frames and then annotate the discourse relations between them. This is clearly a time consuming, expensive and tedious task. In this work, we propose an approach to identify discourse cues from the videos without the need to explicitly identify and annotate the scenes. We also present a novel dataset containing 310 videos and the corresponding discourse cues to evaluate our approach. We believe that many of the multi-discipline AI problems such as Visual Dialog and Visual Storytelling would greatly benefit from the use of visual discourse cues., Accepted in CVPR 2019 Workshop on Language and Vision (Oral Presentation)
- Published
- 2019
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.