I. Introduction
The astonishing performance of deep learning relies on large amounts of data, which are not always available. Humans, on the other hand, are able to learn new tasks much more quickly, leveraging prior experience to relate knowledge among tasks. Inspired by this property of human intelligence, meta-learning (also known as learning to learn) [1] acquires transferable knowledge from existing tasks in the form of embedding functions [2], [3], [4], initial parameters [5], [6], optimization strategies [7], [8], or models that directly map training samples to network parameters [9], [10]. Recent developments adopt more advanced techniques like transductive inference [11], [12] and causal intervention [13] to achieve further improvements. Although meta-learning has shown success in fields like few-shot image classification and cold-start recommendation, most of them typically assume that all the tasks are drawn from a single distribution and face the challenge of handling tasks that come from different underlying distributions, a problem known as task heterogeneity [14], [15], [16].