I. Introduction
With the breakthrough of machine learning (ML), various intelligent services and applications have been witnessed a thriving development recently. However, the centralized ML method requires transmitting a large amount of data to the cloud for model training, which leads to significant resource consumption and privacy protection issues [1]. Hence, federated learning (FL) receives widespread attentions from both academia and industry. Particularly, FL can collaborate with multiple devices to train a shared ML model by transferring model parameters rather than large amounts of private data, thus effectively reducing resource consumption and enhancing privacy protection [2]. Despite of these benefits, FL still faces challenges in data heterogeneity [3], [4] and communication overhead [5], [6], which would significantly decrease the FL performance.