I. Introduction
The massive amount of wireless edge devices (e.g., mobile phones and sensors) with growing computation and communication capabilities and their generated huge volume of data enable various intelligent applications in wireless networks by collaboratively training machine learning (ML) models [1]. Federated learning (FL), a promising distributed ML paradigm, was recently proposed to allow all participated terminal devices to only exchange the model parameters with the PS and keep the raw data locally to protect the privacy and security [2]. Nonetheless, when implemented in the wireless scenarios, FL consumes a vast amount of communication resources for serving large bunches of terminal devices, since the desired ML model is in general of high dimension and the number of updating rounds in the training process is thus considerably large [3].