Author: "Rita Parada Ramos" / Publisher: acm - Searchworks@Jio Institute Digital Library Search Results

Searchworks

Author: Bruno Martins and Rita Parada Ramos
Subjects: Closed captioning, Sequence, Semantic similarity, Computer science, Embedding, Natural language generation, Aerial image, Word (computer architecture), Remote sensing, Image (mathematics)
Abstract: Remote sensing image captioning involves generating a concise textual description for an input aerial image. Most previous methods are based on neural encoder-decoder models trained to generate a sequence of discrete outputs with the standard cross-entropy token-level loss. This paper explores an alternative method based on continuous outputs, generating sequences of embedding vectors instead of directly predicting discrete word tokens. We argue that continuous outputs can facilitate the optimization of semantic similarity, as opposed to exact word-by-word matches. It also facilitates the use of loss functions that compare different views of the data. This includes comparing representations for individual tokens and for the entire captions, and also comparing captions against intermediate image representations. We experimentally compared discrete versus continuous output methods over the RSICD dataset, extensively used in the area. Results show that continuous outputs can indeed lead to better results, and our approach performs competitively with the state-of-the-art model in the area.
Published: 2021

Searchworks