I. Introduction
According to Cisco’s forecast, there will be 500 billion devices connected to the Internet by 2030 [1]. These devices equipped with versatile sensors generate massive data at the network edge, opening up new horizons for data-driven learning methods. Federated learning (FL) is an emerging distributed paradigm that enables multiple edge devices to train a global model without sharing local training data [2]. FL-empowered mobile edge computing system is recognized as a promising solution to realize ubiquitous intelligence [3]. In many real-world scenarios, mobile devices are strictly constrained by computing capability, channel state condition and battery lifetime [4]. In order to improve the efficiency of resource utilization, many researchers have proposed to compress the local model update before uploading it to the parameter server.