I. Introduction
Remote sensing images have the characteristics of a large observation range and fast information acquisition. They can provide important information services for resource exploration, environmental monitoring, land surveying, and other works [1], [2]. With the development of remote sensing technology, more and more types of remote sensing data are available to improve the accuracy of landcover classification task [3], [4], [5]. Among them, hyperspectral images (HSIs) are the most widely used data type for land classification, because they carry abundant spectral information and can recognize subtle spectral differences among materials [6], [7]. The research results on HSI are quite rich, for example, in [8], a cycle-consistency unmixing network (CyCU-Net) was proposed to solve the problem of losing material information in traditional autoencoders for hyperspectral unmixing tasks. Specifically, a self-perception loss consisting of two spectral reconstruction terms and one abundance reconstruction term is proposed to achieve cycle consistency. To solve the generalized bilinear model (GBM) for nonlinear hyperspectral unmixing tasks, Gao et al. [9] proposed an effective method using low-rank abundance maps and nonlinear interaction abundance maps. Su et al. [10] developed a method based on normalized spectral clustering with kernel-based learning (NSCKL) for HSI classification problems, in which a kernel-based iterative filter (KIF) was designed to make connections between pixels. In [11], a novel blind-spot self-supervised learning network (BS3LNet) was proposed for hyperspectral anomaly detection (HAD) problems. The network reconstructs HSI using a blind-spot architecture, that is to say, when reconstructing a pixel, its receptive field ignores the pixel itself and only uses information from adjacent pixels. Later in [12], a chessboard topology-based anomaly detection (CTAD) method was proposed for HAD problems, which can simultaneously mine high-dimensional data features and dissect HSI adaptively. For unsupervised multispectral-aided hyperspectral image super-resolution (MS-aided HS-SR) tasks, Gao et al. [13] proposed an enhanced unmixing-inspired unsupervised network with attention-embedded degradation learning (EU2ADL) to promote the feature representation ability of unsupervised networks. The method uses two coupled autoencoders to decompose the input image into abundances and corresponding endmembers, and uses a mixed model-constrained loss that includes a perceptual abundance term and a degradation-guided term to train the network to eliminate image distortion. However, HSI is prone to the phenomenon of inconsistency between spectrum and objects, which affects classification performance. Meanwhile, light detection and ranging (LiDAR) data have no advantage in spectral detection, but they are provided with excellent structural and information acquisition ability [14], [15], which is valuable for better describing the same scene obtained only by optical sensors. Other available data types, including multispectral images (MSIs) and synthetic aperture radar (SAR) data, possess different advantages and disadvantages in the landcover classification task. Therefore, reasonable utilization of monitoring information from various remote sensors has a positive impact on improving classification performance.