I. Introduction
With the rapid growth of intelligent surveillance and new appliances in the smart grids in the recent years, the number of Internet of Things(IoT) devices has increased dramatically, which causes data explosion. However, due to the limited network resources and privacy concerns, it is impractical to upload all data from IoT devices to the cloud server for further processing. Under such a circumstance, traditional machine learning approaches that heavily rely on centralized data collection cannot be well utilized and furthermore, how to allocate the computation power for smart grids is a problem. Federated learning (FL) is proposed as a new deep learning framework. Federated learning allows each device to train a model using its own data. Specifically, [1] designs a well known FL scheme, named Fed Avg, involving multiple iterations of local updates at clients and global aggregation at a server. However, the device nodes in the edge network are heterogeneous, thus slow nodes delay the training progress. One approach to deal with this problem in [2] is to convert the minimization empirical loss into a weight error by adjusting the local updates and achieve dynamic changes in the number of iterations by implementing a non-exact solution. In addition, Asynchronous Federated Learning (AFL) [3] is used to solve the problem. [4], as a typical asynchronous learning algorithm, uses staleness functions to aggregate models. Current AFL related research focuses on maximizing the resource utilization of heterogeneous devices using client selection [5], weighted aggregation, and clustered FL to reduce impact from stale local models. It is implemented in [6] to assign trust values to clients based on their historical activities. Reinforcement Learning (RL) is used as an approach of client selection algorithm. In [7], DDPG is approached to select client devices of FL. [8], [9] utilizes different algorithm to adjust the number of rounds of client devices.