I. Introduction
With the rapid development of remote sensing technology, an increasing number of remote sensing images with high resolution (e.g., 5–10 cm) can be obtained. In these images, small objects, such as cars and trees, can be clearly observed, which makes pixel-level semantic segmentation possible. Image semantic segmentation allocates pixelwise semantic labels in an image, segmenting the image into several regions based on semantic units. Semantic segmentation of remote sensing images has a wide range of applications, including environmental monitoring [1], postdisaster reconstruction [2], agriculture [3], forestry [4], and urban planning [5], [6]. Traditional image semantic segmentation methods [7]–[10] manually design feature classifiers but do not achieve satisfactory performance due to their limited feature extraction and data fitting abilities. These shortcomings have prompted researchers to search for more stable and efficient methods.