I. Introduction
As a basic problem in the field of computer vision, the purpose of semantic segmentation is to predict the pixel-wise class of images. FCN replaces the convolutional layers in CNNs with fully connected layers, and utilizes upsampling operations to restore image resolution. Later, U-Net adopts an encoder-decoder structure to connect multi-level features, and enriches the input of convolutional layers by skip connections. Recently, DeepLab V3+ explores a common feature fusion strategy in object detection, which improves segmentation speed while improving segmentation performance. Although these methods have made some progress in semantic segmentation, they need to manually label each training sample and cannot segment unseen categories. Therefore, some researchers try to investigate few-shot semantic segmentation (FSS) methods.