I. Introduction
With the continuous advancement of sensor technology, a large number of high-resolution remote sensing images are being captured and used for semantic segmentation tasks in urban scenes. Remote sensing urban scene semantic segmentation can effectively address various issues in urban planning [1], promote the rational use of urban land [2], and accurately monitor changes in urban buildings [3]. Additionally, it is crucial for monitoring urban road traffic facilities [4], green space planning [5], and environmental monitoring [6].