Loading [MathJax]/extensions/MathZoom.js
Semantic Segmentation Method for Remote Sensing Urban Scenes Based on Swin-Transformer and Convolutional Neural Network | IEEE Conference Publication | IEEE Xplore

Semantic Segmentation Method for Remote Sensing Urban Scenes Based on Swin-Transformer and Convolutional Neural Network


Abstract:

The task of semantic segmentation of urban scenes in remote sensing has extensive applications in land cover mapping, urban change detection, and environmental protection...Show More

Abstract:

The task of semantic segmentation of urban scenes in remote sensing has extensive applications in land cover mapping, urban change detection, and environmental protection. However, due to the significant intra-class heterogeneity, inter-class similarity, and the presence of small-scale objects in remote sensing urban scenes, convolutional neural networks (CNNs) often struggle to fully utilize contextual information, resulting in low segmentation accuracy, incomplete segmentation, and misclassification of similar classes. To address these issues, we propose the Swin-MDFF for semantic segmentation of remote sensing urban scenes. This method employs an encoder-decoder structure that combines Swin-Transformer and CNNs. The Swin-Transformer serves as the encoder to extract multi-scale semantic features and contextual information, while the Multi-Scale Dilated Feature Fusion (MDFF) CNNs as the decoder to aggregate multi-scale semantic features, effectively considering both local and global contextual information. This method has been tested on the Vaihingen and Potsdam datasets, outperforming mainstream networks with mean Intersection over Union (mIoU) of 84.32% and 87.82%, and mean F1-score (mF1) of 91.35% and 93.40%, respectively. The experimental results demonstrate the effectiveness of this method in the semantic segmentation of remote sensing urban scenes.
Date of Conference: 20-22 September 2024
Date Added to IEEE Xplore: 04 November 2024
ISBN Information:

ISSN Information:

Conference Location: Chongqing, China
References is not available for this document.

I. Introduction

With the continuous advancement of sensor technology, a large number of high-resolution remote sensing images are being captured and used for semantic segmentation tasks in urban scenes. Remote sensing urban scene semantic segmentation can effectively address various issues in urban planning [1], promote the rational use of urban land [2], and accurately monitor changes in urban buildings [3]. Additionally, it is crucial for monitoring urban road traffic facilities [4], green space planning [5], and environmental monitoring [6].

Select All
1.
E. Maggiori, Y. Tarabalka, G. Charpiat and P. Alliez, "Convolutional neural networks for large-scale remote-sensing image classification", IEEE Transactions on Geoscience & Remote Sensing, vol. 55, no. 2, pp. 645-657, 2016.
2.
J. Liu, Z.X. Zhang and S.W. Zhang, "Review and prospect of remote sensing research on land use change in China: based on the guidance of Shupeng's academic thought", Journal of Geoinformation Science, vol. 22, no. 4, pp. 680-687, 2020.
3.
M. Vakalopoulou, K. Karantzalos, N. Komodakis and N. Paragios, "Building detection in very high resolution multispectral data with deep learning features", 2015 IEEE international geoscience and remote sensing symposium (IGARSS), 2015.
4.
P. Shamsolmoali, M. Zareapoor and H. Zhou, "Road Segmentation for Remote Sensing Images Using Adversarial Spatial Pyramid Networks", IEEE Transactions on Geoscience and Remote Sensing, vol. 59, pp. 4673-4688, 2020.
5.
J.X. Gao and H.W. Wang, "Construction and application of remote sensing evaluation method for large-scale ecological quality", Journal of Remote Sensing, vol. 27, no. 12, pp. 2860-2872, 2024.
6.
H.C. Liu and L. Zhang, "Type feature oriented adaptive threshold change detection in remote sensing images", Journal of Remote Sensing, vol. 24, no. 6, 2020.
7.
J. Long, E. Shelhamer and T. Darrell, "Fully Convolutional Networks for Semantic Segmentation", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640-651, 2015.
8.
O. Ronneberger, P. Fischer and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation", International Conference on Medical Image Computing and Computer-Assisted Intervention Springer International Publishing, 2015.
9.
C. Peng, X. Zhang and G. Yu, Large Kernel Matters ‐‐ Improve Semantic Segmentation by Global Convolutional Network, 2017.
10.
L.C. Chen, G. Papandreou and F. Schroff, "Rethinking Atrous Convolution for Semantic Image Segmentation", arXiv preprint, 2017.
11.
L. Wang, C. Zhang and R. Li, "Scale-aware Neural Network for Semantic Segmentation of Multi-resolution Remotely Sensed Images", Remote Sensing, 2021.
12.
L. Chen, Y. Zhu and G. Papandreou, "Encoder-decoder with atrous separable convolution for semantic image segmentation", Proceedings of the European conference on computer vision (ECCV), 2018.
13.
H. Zhao, J. Shi and X. Qi, "Pyramid scene parsing network", Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
14.
A. Vaswani, N. Shazeer and N. Parmar, "Attention is all you need", Advances in neural information processing systems, vol. 30, 2017.
15.
A. Dosovitskiy, L. Beyer and A. Kolesnikov, "An image is worth 16×16 words: Transformers for image recognition at scale", arxiv preprint, 2020.
16.
S. Zheng, J. Lu and H. Zhao, "Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers", Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021.
17.
Z. Liu, Y. Lin and Y. Cao, "Swin transformer: Hierarchical vision transformer using shifted windows", Proceedings of the IEEE/CVF international conference on computer vision, 2021.
18.
R. Niu, X. Sun and Y. Tian, "Hybrid multiple attention network for semantic segmentation in aerial images", IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-18, 2021.
19.
Q. Zhang and Y.B. Yang, "Rest: An efficient transformer for visual recognition", Advances in neural information processing systems, vol. 34, pp. 15475-15485, 2021.
20.
L. Wang, R. Li and D. Wang, "Transformer meets convolution: A bilateral awareness network for semantic segmentation of very fine resolution urban scene images", Remote Sensing, vol. 3, no. 16, pp. 3065, 2021.
21.
D. Wang, J. Zhang and B. Du, "An empirical study of remote sensing pretraining", IEEE Transactions on Geoscience and Remote Sensing, 2022.
22.
L. Wang, R. Li and C. Duan, "A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images", IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1-5, 2022.
23.
T. Panboonyuen, K. Jitkajornwanich and S. Lawawirojwong, "Transformer-based decoder designs for semantic segmentation on remotely sensed images", Remote Sensing, vol. 13, no. 24, pp. 5100, 2021.
24.
J. Zhang, L. Zhao, H. Jiang, S. Shen and J. Wang, "Hyperspectral Image Classification Based on Dense Pyramidal Convolution and Multi-Feature Fusion", Remote Sensing, vol. 15, no. 12, pp. 2990, 2023.
25.
H. Wang, S. Sun, X. Bai, J. Wang and P. Ren, "A Reinforcement Learning Paradigm of Configuring Visual Enhancement for Object Detection in Underwater Scenes", IEEE Journal of Oceanic Engineering, vol. 48, no. 2, pp. 443-461, 2023.
Contact IEEE to Subscribe

References

References is not available for this document.