1. EarthVQANet: Multi-task visual question answering for remote sensing image understanding.
- Author
-
Wang, Junjue, Ma, Ailong, Chen, Zihang, Zheng, Zhuo, Wan, Yuting, Zhang, Liangpei, and Zhong, Yanfei
- Subjects
- *
SURFACE of the earth , *SEMANTICS , *URBAN planning , *HURRICANE Harvey, 2017 , *HUMAN settlements - Abstract
Monitoring and managing Earth's surface resources is critical to human settlements, encompassing essential tasks such as city planning, disaster assessment, etc. To accurately recognize the categories and locations of geographical objects and reason about their spatial or semantic relations , we propose a multi-task framework named EarthVQANet, which jointly addresses segmentation and visual question answering (VQA) tasks. EarthVQANet contains a hierarchical pyramid network for segmentation and semantic-guided attention for VQA, in which the segmentation network aims to generate pixel-level visual features and high-level object semantics, and semantic-guided attention performs effective interactions between visual features and language features for relational modeling. For accurate relational reasoning, we design an adaptive numerical loss that incorporates distance sensitivity for counting questions and mines hard-easy samples for classification questions, balancing the optimization. Experimental results on the EarthVQA dataset (city planning for Wuhan, Changzhou, and Nanjing in China), RSVQA dataset (basic statistics for general objects), and FloodNet dataset (disaster assessment for Texas in America attacked by Hurricane Harvey) show that EarthVQANet surpasses 11 general and remote sensing VQA methods. EarthVQANet simultaneously achieves segmentation and reasoning, providing a solid benchmark for various remote sensing applications. Data is available at http://rsidea.whu.edu.cn/EarthVQA.htm [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF