I. Introduction
In recent years, edge devices, such as mobile and IoT devices, have become the most popular tools to provide intelligent services due to their rapidly growing computing and communication capacity [1], [2]. Unlike traditional computing platforms, edge devices are likely to locate at the edges of networks, and at the same time are generating the large amount and massive types of big data [3]–[5]. In the traditional cloud computing paradigm, big data are collected from end devices, they are shuffled and fairly distributed to computing nodes in the cloud center. The training data on each node is essentially a random sample from the whole data sets, and thus are independent and identically distributed (IID) [6], [7]. While in the distributed computing paradigm, users tend to keep the raw data locally to protect the user privacy while reducing the bandwidth consumption for data transmission to a remote server [8]–[10]. Due to such advantages, Federated Learning has emerged as a popular way for future AI applications to conduct local training at end devices instead of centralized training [11]–[13]. Specifically, training the local data can contribute to the optimization of a shared model to enable intelligent services [14], [15]. However, there exists a problem that data sets from different end users may not be IID, e.g., the photographs taken from different locations might have totally different characteristics, which may cause difficulties for some ML applications, e.g., target detection, etc. Such non-IID issue is inevitable and has to be considered in Federated Learning algorithms [16].