I. Introduction
The advancement of computing techniques and deep learning algorithms has enabled the success in various applications, e.g., computer vision [1], face recognition [2], speech-to-text systems [3] and natural language processing [4]. As far as we know, a high-performance deep learning model requires a large volume of training data from various companies. However, all companies tend not to share their individual data mainly due to privacy concerns or commercial secrets. As a result, federated learning (FL) is proposed to improve the data utilization rate and resolve problems of “data island”. Most existing FL frameworks are synchronous [5], which requires the server to wait for local updates from all participants before completing a parameter aggregation. Whereas, synchronous FL is vulnerable to straggler problem [6]. Specifically, on account of participants’ heterogeneous computing performance or unforeseeable communication delay, some participants may submit their updates later than others in each iteration, leading to a serious delay of the global model update. In addition, synchronous FL is difficult to implement in real applications due to its high demand for coordination among all participants to keep a perfect synchronized common clock. Asynchronous federated learning (AFL) [7] aims to break through the bottleneck of synchronous FL, which allows the server to update the global model immediately when receiving any participant's update without waiting for all participants’ updates. The advantages of AFL stimulate it to be wildly incorporated with deep learning frameworks such as Pytorch [8] and TensorFlow [9].