I. Introduction
Road segmentation in the remote sensing images is the hot research issue due to its vital applications in cartography, traffic planning, vehicle navigation, intelligent transportation system, and so on. Despite a huge number of works dealing with this problem, the results are still limited and not generic enough to solve the lack of information due to the existence of artifacts such as overcast of building shadows, roadside trees, or cars [1]. With the development of computer vision, deep learning technologies have been widely used in the field of road segmentation. However, training these deep learning models demands massive training samples, while annotating such a large number of samples is a tough, boring, and expensive task. In order to reduce the label cost of training data, weakly supervised semantic segmentation has become a research hotspot in recent years.