I. Introduction
As an emerging and popular technique in edge AI, federated learning (FL) is proposed to train a globally-shared model through collaboration among workers (e.g., IoT devices) in the data-parallel fashion [1]–[6]. Under coordination of the parameter server (PS), participating workers periodically train deep learning (DL) models on their local datasets, and then push the models to the PS for global aggregation without exposing their raw data. FL has been leveraged by Google to develop the Gboard application with improved user experience in a privacy-preserving manner [7]. To boost the performance of AI applications or services, it is usually practical and effective to augment the parameters of DL models [8], [9]. However, training large-scale models is challenging for resource-constrained workers due to their hardware limitations of CPU and memory [10]–[12]. Additionally, transmitting large-scale models between workers and the PS incurs significant communication latency, and the exchange of entire models may violate model privacy [13], [14].