I. Introduction
Multimodal satellite images are frequently employed in military systems, environmental monitoring, surveying, and mapping services thanks to the quick development of satellite sensors [1], [2], [3], [4]. Simultaneously, fine-resolution earth surface coverage or utilization, change detection, and multimodal classification also have received more and more attention [5], [6], [7], [8]. Due to the constraints of satellite imaging systems and other factors, most remote sensing satellites provide pairs of complementary high spectral resolution but low spatial resolution (LR) multi-spectral (MS) images, and high spatial resolution (HR) but low spectral resolution panchromatic (PAN) images [9], [10]. Therefore, the PAN image can precisely characterize the geometric aspects of ground objects, which is very helpful for remote sensing interpretation, while the MS image is typically used to identify various categories of ground objects [11], [12], [13].