1. INTRODUCTION
Remote Sensing technology-based road extraction plays an important role in many applications, such as urban planning, navigation, autonomous driving, and geographic information update. With the fast development of sensing platforms, high-resolution remote sensing imagery provides a promising avenue for automatic road network extraction, but it is still challenging. The existing road network extraction methods can be categorized as segmentation-based and graph-based. The segmentation-based methods utilize image segmentation to classify each pixel into road and non-road [1] [2] [3]. Then a manual interpretation or complex post-processing is conducted on the segmentation results to generate the road network extraction. Although existing researches provide promising results by utilizing convolution neural networks for segmentation and extraction [4] [5] [6], the pixel-based road area segmentation results still suffer from occlusion, noise, and complexity background of the high-resolution remote sensing imagery. The graph-based methods [7] [8] utilized the iterative graph construction methods for automatic road network extraction. It starts with a random node and utilizes prior experience rules to generate the whole road network as a graph iteratively. The graph-based methods may generate accurate road network maps, but it usually requires a lot of prior knowledge and human interactions, which makes the graph-based methods not robust to the domain variations. Besides, the graph generation process is usually iterative, which makes the neural network focus more on local information. To avoid the local optimal, recent work [9] proposes the sequential generative model to make the network aware of global information.