Back to Search
Start Over
基于多模态推理图神经网络的 场景文本视觉问答模型.
- Source :
-
Application Research of Computers / Jisuanji Yingyong Yanjiu . Jan2022, Vol. 39 Issue 1, p280-302. 6p. - Publication Year :
- 2022
-
Abstract
- Poor text reading ability and inadequate visual reasoning were the main reasons for the insufficient effect of existing visual question answering models. To solve the above problems, this paper designed a MRGNN model. It used various forms of information in images to help understanding the scene text content, preprocessed the scene text image into the visual object graph and text graph respectively, and filtered the redundant information in the question self-attention module. It used an aggregator with attention to perfect the node features between subgraphs and fuse different modality information. The updated nodes used the context information of different modules to provide a better function for answering module. This paper verified the validity of MRGNN model on ST-VQA and Text VQA datasets. The experimental results show that MRGNN model achieves good results compared with some classical models for this task. [ABSTRACT FROM AUTHOR]
Details
- Language :
- Chinese
- ISSN :
- 10013695
- Volume :
- 39
- Issue :
- 1
- Database :
- Academic Search Index
- Journal :
- Application Research of Computers / Jisuanji Yingyong Yanjiu
- Publication Type :
- Academic Journal
- Accession number :
- 154623795
- Full Text :
- https://doi.org/10.19734/j.issn.1001-3695.2021.06.0197