1. Multiscale Methods for Optical Remote-Sensing Image Captioning
- Author
-
Ma Xiaofeng, Rui Zhao, and Zhenwei Shi
- Subjects
Closed captioning ,Computer science ,business.industry ,Feature extraction ,Context (language use) ,Pattern recognition ,Geotechnical Engineering and Engineering Geology ,Field (computer science) ,Task (project management) ,Feature (computer vision) ,Benchmark (computing) ,Task analysis ,Artificial intelligence ,Electrical and Electronic Engineering ,business - Abstract
Recently, the optical remote-sensing image-captioning task has gradually become a research hotspot because of its application prospects in the military and civil fields. Many different methods along with data sets have been proposed. Among them, the models following the encoder–decoder framework have better performance in many aspects like generating more accurate and flexible sentences. However, almost all these methods are of a single fixed receptive field and could not put enough attention on grabbing the multiscale information, which leads to incomplete image representation. In this letter, we deal with the multiscale problem and propose two multiscale methods named multiscale attention (MSA) method and multifeat attention (MFA) method, to obtain better representations for the captioning task in the remote-sensing field. The MSA method extracts features from different layers and uses the multihead attention mechanism to obtain the context feature, respectively. The MFA method combines the target-level features and the scene-level features by using the target-detection task as the auxiliary task to enrich the context feature. The experimental results demonstrate that both of them perform better with regard to the metrics like BLEU, METEOR, ROUGE_L, and CIDEr than the benchmark method.
- Published
- 2021
- Full Text
- View/download PDF