Loading [MathJax]/extensions/MathMenu.js
E-UNetFormer for High Accuracy Semantic Segmentation of Urban Remote Sensing Images | IEEE Conference Publication | IEEE Xplore

E-UNetFormer for High Accuracy Semantic Segmentation of Urban Remote Sensing Images


Abstract:

Semantic segmentation of urban remote sensing imagery holds significant utility in terms of many applications, including strategic urban planning and the management of la...Show More

Abstract:

Semantic segmentation of urban remote sensing imagery holds significant utility in terms of many applications, including strategic urban planning and the management of land resources. Conventional architectures based on machine learning can integrate the self-attention mechanism of transformers to capture the global context within images. However, there is still substantial challenges towards bridging the semantic interconnections across disparate image regions, which can lead to diminished accuracy in segmentation tasks and pronounced inaccuracies at the edges. To address these issues, this paper introduces an E-UNetFormer in which the Enhanced Feature Refinement Head (E-FRH) is designed and introduced to meticulously re-weight the features in the channel dimension, narrowing the semantic gap between shallow features and deep features to enhance multi-scale feature extraction. Furthermore, the Edge Guided Context Module (EGCM) is proposed to augment the extraction of edge-region information through sophisticated edge detection techniques. Empirical results indicate that the E-UNetFormer has achieved a mean Intersection over Union (mIoU) of 53.5% on the LoveDA dataset and 69.0% on the UAVid dataset, which are datasets with varying resolutions. Particularly, for the LoveDA dataset, the mIoU for building segmentation exceeds that of the UNetFormer by 4.7%. The proposed model outperforms UNetFormer and other methods on several datasets, which sufficiently validates its efficacy.
Date of Conference: 12-14 July 2024
Date Added to IEEE Xplore: 04 October 2024
ISBN Information:
Conference Location: Xi’an, China

I. Introduction

In recent years, rapid progress in sensor technology has led to a substantial enhancement in the resolution of urban remote sensing imagery. The intelligent interpretation of the spatial nuances and latent semantic content within this imagery has unlocked a wide array of applications across various domains. Notably, these include the realms of high-resolution Earth observation, strategic urban construction planning, and the meticulous management of land resources [1]–[3]. However, the pursuit of high-precision semantic segmentation in urban remote sensing imagery is is challenging due to the inherent diversity of terrestrial objects, fluctuations in lighting conditions, the presence of shadows, and the impact of occlusions. Moreover, the high-precision nature of urban remote sensing imagery can be distinguished based on a substantial volume of data., To this end, efficient computational algorithms capable of facilitating real-time processing and analysis have gained significant interest [4].

Contact IEEE to Subscribe

References

References is not available for this document.