Back to Search Start Over

Visual Dialog.

Authors :
Das, Abhishek
Kottur, Satwik
Gupta, Khushi
Singh, Avi
Yadav, Deshraj
Lee, Stefan
Moura, Jose M. F.
Parikh, Devi
Batra, Dhruv
Source :
IEEE Transactions on Pattern Analysis & Machine Intelligence. May2019, Vol. 41 Issue 5, p1242-1256. 15p.
Publication Year :
2019

Abstract

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being sufficiently grounded in vision to allow objective evaluation of individual responses and benchmark progress. We develop a novel two-person real-time chat data-collection protocol to curate a large-scale Visual Dialog dataset (VisDial). VisDial v0.9 has been released and consists of $\sim$ ∼ 1.2M dialog question-answer pairs from 10-round, human-human dialogs grounded in $\sim$ ∼ 120k images from the COCO dataset. We introduce a family of neural encoder-decoder models for Visual Dialog with 3 encoders—Late Fusion, Hierarchical Recurrent Encoder and Memory Network (optionally with attention over image features)—and 2 decoders (generative and discriminative), which outperform a number of sophisticated baselines. We propose a retrieval-based evaluation protocol for Visual Dialog where the AI agent is asked to sort a set of candidate answers and evaluated on metrics such as mean-reciprocal-rank and recall $@k$ @ k of human response. We quantify the gap between machine and human performance on the Visual Dialog task via human studies. Putting it all together, we demonstrate the first ‘visual chatbot’! Our dataset, code, pretrained models and visual chatbot are available on https://visualdialog.org. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01628828
Volume :
41
Issue :
5
Database :
Academic Search Index
Journal :
IEEE Transactions on Pattern Analysis & Machine Intelligence
Publication Type :
Academic Journal
Accession number :
135773544
Full Text :
https://doi.org/10.1109/TPAMI.2018.2828437