1. SwinHCST: a deep learning network architecture for scene classification of remote sensing images based on improved CNN and Transformer.
- Author
-
Song, Jiayin, Fan, Yiming, Song, Wenlong, Zhou, Hongwei, Yang, Liusong, Huang, Qiqi, Jiang, Zhuoyuan, Wang, Chuangqi, and Liao, Ting
- Subjects
DEEP learning ,REMOTE sensing ,TRANSFORMER models ,IMAGE recognition (Computer vision) ,CLASSIFICATION ,INFORMATION sharing - Abstract
Remote sensing image scene classification is a fundamental task in intelligent interpretation of remote sensing images. Although Transformers possess a powerful attention mechanism, they require lengthy training procedures to achieve good performance levels. To address this issue, this paper proposes a novel deep learning network model by combining CNN and Swin Transformer named SwinHCST. Firstly, the model uses Weighted Normalized CNN to quickly extract low-level features of the image. Secondly, the Receptive Field Block module facilitates multi-scale information fusion, Thirdly, the Information Fusion Transformer further excavates the deep-level features of the image. Furthermore, this paper has designed a plug-and-play Cross Spatial Information Fusion Block, which is used to encodes dimensional information and extracts global information to enhance information exchange. The scene classification experiments show that the proposed model outperforms other methods on the three selected datasets and can achieve excellent performance without requiring large amounts of data and training. Specifically, the classification accuracy of the proposed method on the three datasets is 93.76%, 93.60%, and 98.10%, which is 1.7% to 3.71% higher than ResNet50 and 3.7% to 5.7% higher than Swin Transformer. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF