Start Over

Cross-Modal Retrieval with Improved Graph Convolution.

Authors :: ZHANG Hongtu
HUA Chunjian
JIANG Yi
YU Jianfeng
CHEN Ying
Source :: Journal of Computer Engineering & Applications; 6/1/2024, Vol. 60 Issue 11, p95-104, 10p
Publication Year :: 2024
Abstract: Aiming at the problem that existing image text cross-modal retrieval is difficult to fully exploit the local consistency in the mode in the common subspace, a cross-modal retrieval method based on improved graph convolution is proposed. In order to improve the local consistency within each mode, the modal diagram is constructed with a single sample as a node, fully mining the interactive information between features. In order to solve the problem that graph convolution network can only do shallow learning, the method of adding initial residual link and weight identity map in each layer of graph convolution is adopted to alleviate this phenomenon. In order to jointly update the central node features through higher-order and lower-order neighbor information, an improvement is proposed to reduce neighbor nodes and increase the number of layers in graph convolution network. In order to learn highly locally consistent and semantically consistent public representation, it shares the weights of common representation learning layer, and jointly optimizes the semantic constraints within the modes and the modal invariant constraints between modes in the common subspace. The experimental results show that on the two cross-modal data sets of Wikipedia and Pascal sentence, the average mAP values of different retrieval tasks are 2.2%~42.1% and 3.0%~54.0% higher than the 11 existing methods. [ABSTRACT FROM AUTHOR]