1. INTRODUCTION
Scene classification and segmentation play increasingly significant roles in the remote sensing images (RSI) interpretation field, which provide basic information for many high-level applications [1], [2], [3], [4], [5], such as urban planning, disaster response and management, forest monitoring, agricultural management, etc. Due to high capacity and effective way to learn semantic features, supervised deep learning algorithms almost dominate the filed of scene classification and segmentation [6], [7], which rely on requires tremendous labels. Driven by the rapidly development of remote sensing devices and platforms, the RSI with different sources and resolutions are constantly generated. However, Only a some portion of RSI were labeled with high quality. To this end, leveraging unlabeled data to improve model performance has gained more attention in the remote sensing community.