I. Introduction
Brain-computer interface (BCI) based on motor imagery (MI) paradigm can recognize human motion intention by decoding electroencephalogram (EEG) signals, and it increasingly plays a vital role in neurological rehabilitation. However, the non-static nature of EEG signals, which exhibit significant variability across different sessions and subjects, presents a substantial challenge. This variability makes it difficult to develop an efficient method for MI decoding that works reliably across different subjects and sessions. [1]. With the development of deep learning (DL), many studies devoted to designing adaptable models to mitigate the effects of variability of EEG signals by learning robust features. For example, Altaheri et al. [2] employed an attention temporal convolutional network for MI cross-session classification. Zhang et al. [3] used a bidirectional recurrent neural network (RNN) to distinguish brain states. Additionally, Shi et al. [4] proposed a multiband EEG Transformer that employs temporal self-attention and spatial self-attention to decode brain states. Despite these advancements, DL models remain sensitive to the training data and are prone to overfitting on testing data. This sensitivity becomes particularly problematic when the models are applied to new subjects or sessions, referred to as domains, leading to performance degradation due to domain shift. Consequently, while DL models have made significant strides in MI decoding, cross-session and cross-subject brain states decoding remains one of the challenges in EEG field.