Back to Search
Start Over
Transformer with sparse selfâattention mechanism for image captioning
- Source :
- Electronics Letters. 56:764-766
- Publication Year :
- 2020
- Publisher :
- Institution of Engineering and Technology (IET), 2020.
-
Abstract
- Recently, transformer has been applied to the image caption model, in which the convolutional neural network and the transformer encoder act as the image encoder of the model, and the transformer decoder acts as the decoder of the model. However, transformer may suffer from the interference of non-critical objects of a scene and meet with difficulty to fully capture image information due to its self-attention mechanism's dense characteristics. In this Letter, in order to address this issue, the authors propose a novel transformer model with decreasing attention gates and attention fusion module. Specifically, they firstly use attention gate to force transformer to overcome the interference of non-critical objects and capture objects information more efficiently via truncating all the attention weights that smaller than gate threshold. Secondly, through inheriting attentional matrix from the previous layer of each network layer, the attention fusion module enables each network layer to consider other objects without losing the most critical ones. Their method is evaluated using the benchmark Microsoft COCO dataset and achieves better performance compared to the state-of-the-art methods.
- Subjects :
- Artificial neural network
business.industry
Computer science
020208 electrical & electronic engineering
02 engineering and technology
Convolutional neural network
Object detection
0202 electrical engineering, electronic engineering, information engineering
Computer vision
Artificial intelligence
Electrical and Electronic Engineering
business
Encoder
Image retrieval
Decoding methods
Transformer (machine learning model)
Subjects
Details
- ISSN :
- 1350911X and 00135194
- Volume :
- 56
- Database :
- OpenAIRE
- Journal :
- Electronics Letters
- Accession number :
- edsair.doi...........1b0fd95ea47968b34afcb551eeac5c75