Back to Search Start Over

Adaptive Text Denoising Network for Image Caption Editing.

Authors :
MENGQI YUAN
BING-KUN BAO
ZHIYI TAN
CHANGSHENG XU
Source :
ACM Transactions on Multimedia Computing, Communications & Applications; 2023 Suppl 1, Vol. 19, p1-18, 18p
Publication Year :
2023

Abstract

Image caption editing, which aims at editing the inaccurate descriptions of the images, is an interdisciplinary task of computer vision and natural language processing. As the task requires encoding the image and its corresponding inaccurate caption simultaneously and decoding to generate an accurate image caption, the encoder-decoder framework is widely adopted for image caption editing. However, existing methods mostly focus on the decoder, yet ignore a big challenge on the encoder: the semantic inconsistency between image and caption. To this end, we propose a novel Adaptive Text Denoising Network (ATD-Net) to filter out noises at the word level and improve the model's robustness at sentence level. Specifically, at the word level, we design a cross-attention mechanism called Textual Attention Mechanism (TAM), to differentiate the misdescriptive words. The TAM is designed to encode the inaccurate caption word by word based on the content of both image and caption. At the sentence level, in order to minimize the influence of misdescriptive words on the semantic of an entire caption, we introduce a Bidirectional Encoder to extract the correct semantic representation from the raw caption. The Bidirectional Encoder is able to model the global semantics of the raw caption, which enhances the robustness of the framework. We extensively evaluate our proposals on the MS-COCO image captioning dataset and prove the effectiveness of our method when compared with the state-of-the-arts. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15516857
Volume :
19
Database :
Complementary Index
Journal :
ACM Transactions on Multimedia Computing, Communications & Applications
Publication Type :
Academic Journal
Accession number :
161733400
Full Text :
https://doi.org/10.1145/3532627