I. Introduction
Semantic segmentation formulates a pixelwise prediction for remote sensing images, which can be used in thematic mapping [1], vehicle extraction [2], high-definition (HD) maps for autonomous driving [3], and so on. Consequently, semantic segmentation is among fundamental topics in the remote sensing field, especially for very high-resolution (VHR) remote sensing images [4]. VHR images have fine-grained spatial details with only a few spectral channels. Fine-grained details lead to a relatively small spectral between-class variability and a relatively large within-class variability. On the other hand, only a few spectral channels result in less distinctive spectral signatures [4]. The two problems are serious threats to the semantic segmentation of VHR images.