I. Introduction
Currently road segmentation in remote sensing (RS) images has become one of the crucial tasks in many urban applications such as traffic management, urban planning, and road monitoring. Meanwhile, it is tremendously time-consuming to manually label roads from the high-resolution images. Unsupervised models, which are based on the predefined features, achieved low accuracy and failed on heterogeneous regions. However, supervised deep learning models, have achieved high performance in most of computer vision tasks, such as object detection [1]–[3], semantic segmentation [4]–[6], and skeleton extraction [7]. With the improvement of convolution neural networks, road detection from RS images tends to be an efficient and effective process.