Semantic Segmentation Method for Remote Sensing Urban Scenes Based on Swin-Transformer and Convolutional Neural Network | IEEE Conference Publication | IEEE Xplore

Semantic Segmentation Method for Remote Sensing Urban Scenes Based on Swin-Transformer and Convolutional Neural Network


Abstract:

The task of semantic segmentation of urban scenes in remote sensing has extensive applications in land cover mapping, urban change detection, and environmental protection...Show More

Abstract:

The task of semantic segmentation of urban scenes in remote sensing has extensive applications in land cover mapping, urban change detection, and environmental protection. However, due to the significant intra-class heterogeneity, inter-class similarity, and the presence of small-scale objects in remote sensing urban scenes, convolutional neural networks (CNNs) often struggle to fully utilize contextual information, resulting in low segmentation accuracy, incomplete segmentation, and misclassification of similar classes. To address these issues, we propose the Swin-MDFF for semantic segmentation of remote sensing urban scenes. This method employs an encoder-decoder structure that combines Swin-Transformer and CNNs. The Swin-Transformer serves as the encoder to extract multi-scale semantic features and contextual information, while the Multi-Scale Dilated Feature Fusion (MDFF) CNNs as the decoder to aggregate multi-scale semantic features, effectively considering both local and global contextual information. This method has been tested on the Vaihingen and Potsdam datasets, outperforming mainstream networks with mean Intersection over Union (mIoU) of 84.32% and 87.82%, and mean F1-score (mF1) of 91.35% and 93.40%, respectively. The experimental results demonstrate the effectiveness of this method in the semantic segmentation of remote sensing urban scenes.
Date of Conference: 20-22 September 2024
Date Added to IEEE Xplore: 04 November 2024
ISBN Information:

ISSN Information:

Conference Location: Chongqing, China

I. Introduction

With the continuous advancement of sensor technology, a large number of high-resolution remote sensing images are being captured and used for semantic segmentation tasks in urban scenes. Remote sensing urban scene semantic segmentation can effectively address various issues in urban planning [1], promote the rational use of urban land [2], and accurately monitor changes in urban buildings [3]. Additionally, it is crucial for monitoring urban road traffic facilities [4], green space planning [5], and environmental monitoring [6].

Contact IEEE to Subscribe

References

References is not available for this document.