I. Introduction
Artificial intelligence (AI) applications are widely deployed on edge devices or clients, e.g., classifying image objects [1], [2], translating text [3], and recognizing patterns in social graphs [4]. Federated learning (FL) [5] enables edge clients to collaboratively learn a global model by aggregating their local models without exchanging the raw data. Federated continuous learning (FCL) addresses scenarios where clients incrementally train models over a sequence of tasks characterized by their data distributions. For example, Fig. 1(a) illustrates a FCL scenario of n edge clients in an image classification application. Each client learns its own sequence of heterogeneous tasks, namely distinct class labels (e.g., cars, ships, and houses in client 1, and cats, dogs, and horses in client 2) and a subspace of input feature distribution for each task. Fig. 1(c) further shows that such heterogeneous tasks widely exist in edge AI applications of different data modalities, such as graph [4], text [3], and multi-modal data [1], [2]. It is exceedingly challenging to learn AI models that can be generalized on such heterogeneous tasks, which vary across clients and over time, especially on resource strenuous edge devices.