I. Introduction
Federated Learning (FL), as introduced by McMahan et. al. [1], discusses unbalanced and non-i.i.d. (independent and identical distribution) data partitioning across a massive number of unreliable devices, coordinating with a central server, to distributively train learning models without sharing the actual data. In practice, the data samples are generated through device’s usage, such as interactions with applications, results to such statistical heterogeneity. Towards that, related works primarily focus on improving the model performance by tackling data properties, i.e., statistical challenges in the FL [1], [2]. Noticeably, in the initial work [1], the authors show that their proposed Federated Averaging (FedAvg) algorithm empirically works well with non-i.i.d. data. However, the accuracy of FedAvg varies differently for different datasets, as observed in the existing methods [2], [3], and how client selection is made [3]–[5]. For instance, the authors in [3], [4], [6] discussed the impact of having heterogeneous clients, given time requirements for per round training execution, during the decentralized model training over unreliable wireless networks. In doing so, more devices are packed within a training round to improve model performance; however, this lead to consumption of excessive communication resources and larger communication rounds to attain a level of global model accuracy. Also, all received local updates are directly aggregated during model aggregation [1], [3], [6], [7]; thus, fairly ignoring their individual contributions and the rationale behind selecting them. In line with that, the authors in [1]–[3] revealed that adding local computations can dramatically increase communication efficiency and improve the trained model performance. However, this additional computational load may be prohibitive for some devices.