1. Introduction
Training high performance semantic segmentation models [1], [3], [18], [24], [42], based on convolutional neural networks [12], [27], [38], [47], typically requires large amounts of human-annotated training data, e.g., pixel-level annotations are essential for training a desirable segmentation model. However, data annotating by humans is usually costly and labor-intensive. Moreover, these models, almost always, fail to segment novel (unseen) objects, when given very few (one) training images (image) with annotations. To this end, as in conventional zero- and few-shot classification models [28],[36],[37] that aim to mitigate data annotation and novel object recognition issues in the high-level semantic category space, few-shot semantic segmentation (FSS) [25] has become an active research topic for alleviating these issues in the low-level image pixel space, under the object segmentation scenario.