1. A survey on VQA_Datasets and Approaches
- Author
-
Qiyu Xie and Yeyun Zou
- Subjects
FOS: Computer and information sciences ,Feature fusion ,Computer Science - Machine Learning ,Information retrieval ,Computer Applications ,Computer science ,Computer Science - Artificial Intelligence ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,020206 networking & telecommunications ,Cognition ,02 engineering and technology ,Field (computer science) ,Machine Learning (cs.LG) ,Task (project management) ,Visualization ,Artificial Intelligence (cs.AI) ,Knowledge extraction ,0202 electrical engineering, electronic engineering, information engineering ,Question answering ,020201 artificial intelligence & image processing - Abstract
Visual question answering (VQA) is a task that combines both the techniques of computer vision and natural language processing. It requires models to answer a text-based question according to the information contained in a visual. In recent years, the research field of VQA has been expanded. Research that focuses on the VQA, examining the reasoning ability and VQA on scientific diagrams, has also been explored more. Meanwhile, more multimodal feature fusion mechanisms have been proposed. This paper will review and analyze existing datasets, metrics, and models proposed for the VQA task., 10 pages
- Published
- 2021