Back to Search
Start Over
CoupleUNet: Swin Transformer coupling CNNs makes strong contextual encoders for VHR image road extraction.
- Source :
- International Journal of Remote Sensing; Sep2023, Vol. 44 Issue 18, p5788-5813, 26p
- Publication Year :
- 2023
-
Abstract
- Accurately segmenting roads is challenging due to substantial intra-class variations, indistinct inter-class distinctions, and occlusions caused by shadows, trees, and buildings. To address these challenges, it is crucial to simultaneously pay attention to important texture details and perceive global geometric context information. Recent research has shown that CNN-Transformer hybrid structures outperform using CNN or Transformer alone. CNN excels at extracting local detail features and Transformer naturally has the ability to perceive global context information. In this paper, we propose a dual-branch network module called ConSwin, which combines the advantages of ResNet and Swin Transformer for better road information extraction. Taking ConSwin as the basic encoding module, we build an hourglass-shaped network with an encoder-decoder structure. In order to better transfer texture and structural detail information between the encoder and decoder, we also design two novel connection blocks. We conduct comparative experiments on the Massachusetts and CHN6-CUG datasets, and the proposed method outperforms state-of-the-art methods in terms of overall accuracy, IOU, and F1 metrics. Other experiments verify the effectiveness of our method, while visualization results demonstrate its ability to obtain better road representations. [ABSTRACT FROM AUTHOR]
- Subjects :
- TRANSFORMER models
CONVOLUTIONAL neural networks
DATA mining
Subjects
Details
- Language :
- English
- ISSN :
- 01431161
- Volume :
- 44
- Issue :
- 18
- Database :
- Complementary Index
- Journal :
- International Journal of Remote Sensing
- Publication Type :
- Academic Journal
- Accession number :
- 172868715
- Full Text :
- https://doi.org/10.1080/01431161.2023.2255353