I. Introduction
The surge of smart phones, Internet of Things (IoT) and other devices has led to the arrival of the big data era [1]. Deep learning provides an effective means for processing massive data [2], such as managing a large number of patient data for disease prediction, performing independent safety audit from system logs, etc. However, centralized deep learning often leads to the disclosure of user’s data and a series of privacy problems. Federated learning (FL) [3] has been proposed to solve the dilemma of centralized deep learning. FL allows users to participate in global training without sharing private sample data, so as to protect the privacy of user’s data. Specifically, each user trains the global model with private datasets and only upload the updated parameters (i.e., weights and offsets) to the central cloud server for aggregation and repeat the above process until the model converges. However, with the increasing number of users participating in training and more complex deep learning models are used, parameters uploaded by users are becoming larger and larger, which inevitably will cause bandwidth contention and communication delay [4]. Some communication compression methods such as Sketched updates [5] alleviate the communication pressure by compressing the upload gradient, but it will bring the loss of gradient information and reduce the accuracy of the model. In order to alleviate the pressure of communication and computing, FL has gradually evolved from end-cloud to end-edge-cloud architecture.