1. INTRODUCTION
Semantic segmentation (SS) is a technique that assigns each pixel in an image to its corresponding category, making it widely applicable in RSIs [1][2][3]. In the field of RSI SS, deep learning methods based on supervised learning have demonstrated remarkable performance by leveraging their robust feature extraction capabilities [4][5]. However, with the increasing resolution of optical RSIs, there has been a significant rise in both human and time costs associated with data annotation. Therefore, an increasing number of researchers are turning their attention to weakly supervised methods and exploring deep learning-based approaches for processing RSI with limited annotations [6][7][8]. These methods aim to train models using weak annotations, thereby reducing the human and time costs associated with annotation.