I. Introduction
In recent years, in the research of biological signal processing, feature recognition systems have gradually become an important tool for solving complex problems in the fields of medical treatment, health monitoring, emotion recognition, etc. However, biological signals often have non-stationary, high noise and variable feature distribution, which makes traditional feature recognition methods face great challenges. Self-supervised learning technology has been introduced into the field of biological signal processing, which improves the robustness and accuracy of feature extraction by generating pseudo labels and using multi-task learning. In recent years, research on biological signal processing has focused on how to effectively extract key features from signals and improve system efficiency and accuracy while reducing the amount of data. Reference [1] proposed a biological signal feature recognition method based on multimodal fusion, which achieves more accurate feature extraction by combining multi-source data, effectively solving the problem of insufficient features from a single signal source. However, this method relies on a large amount of labeled data and has a high annotation cost. To this end, reference [2] proposed a method for feature extraction using deep learning autoencoders. It extracts features through unsupervised learning, but its processing effect on high-noise signals is limited and it is prone to inaccurate feature extraction. Reference [3] introduced a generative adversarial network (GAN) for data enhancement. By generating pseudo samples, the generalization ability of the model is improved, which partially solves the difficulty of feature extraction under small sample conditions. However, the GAN model has the problem of high training difficulty in practical applications and requires a lot of computing resources. Reference [4] proposed a time series signal feature extraction method based on self-supervised learning. Through the data perturbation and contrast learning framework, the model's feature extraction ability in an unlabeled data environment is improved, and good recognition results are achieved. However, this method has limited effect when processing multimodal signals. To this end, reference [5] introduced self-supervised learning through a multi-task learning framework. It not only uses the intrinsic relationship of the data to improve the robustness of feature recognition, but also reduces the dependence on manually labeled data, and improves the generalization ability and stability of the model. In addition, the self-supervised learning model proposed in reference [6] has shown strong applicability in medical image processing and effectively improves the recognition accuracy of lesion areas. These studies provide important references for the application of self-supervised learning in biological signal processing, but there are still deficiencies in feature extraction accuracy and system efficiency when facing biological signal processing tasks with strong diversity and high noise.