1. Remote Sensing Image Captioning with Continuous Output Neural Models
- Author
-
Bruno Martins and Rita Parada Ramos
- Subjects
Closed captioning ,Sequence ,Semantic similarity ,Computer science ,Embedding ,Natural language generation ,Aerial image ,Word (computer architecture) ,Remote sensing ,Image (mathematics) - Abstract
Remote sensing image captioning involves generating a concise textual description for an input aerial image. Most previous methods are based on neural encoder-decoder models trained to generate a sequence of discrete outputs with the standard cross-entropy token-level loss. This paper explores an alternative method based on continuous outputs, generating sequences of embedding vectors instead of directly predicting discrete word tokens. We argue that continuous outputs can facilitate the optimization of semantic similarity, as opposed to exact word-by-word matches. It also facilitates the use of loss functions that compare different views of the data. This includes comparing representations for individual tokens and for the entire captions, and also comparing captions against intermediate image representations. We experimentally compared discrete versus continuous output methods over the RSICD dataset, extensively used in the area. Results show that continuous outputs can indeed lead to better results, and our approach performs competitively with the state-of-the-art model in the area.
- Published
- 2021