I. Introduction
It is anticipated that big data-driven artificial intelligence (AI) will soon be applied in many aspects of our daily life, including medical care, agriculture, transportation systems, etc. At the same time, the rapid growth of Internet-of-Things (IoT) applications calls for data mining and learning securely and reliably in distributed systems [1]–[3]. When integrating AI into a variety of IoT applications, distributed machine learning (ML) is preferred for many data processing tasks by defining parametrized functions from inputs to outputs as compositions of basic building blocks [4], [5]. Federated learning (FL) is a recent advance in distributed ML in which data are acquired and processed locally at the client side, and then the updated ML parameters are transmitted to a central server for aggregation [6]–[8]. The goal of FL is to fit a model generated by an empirical risk minimization (ERM) objective. However, FL also poses several key challenges, such as private information leakage, expensive communication costs between servers and clients, and device variability [9]–[15].