I. Introduction
The advent of the Internet of Things (IoT) and the widespread adoption of 5G have resulted in an exponential increase in the amount of data generated by user devices. To accommodate the surge in data, Internet service providers (ISPs) have established many data centers (DCs) worldwide for the collection and storage of large datasets from users [1], [2]. Such vast data reservoir provides ample impetus for the large-scale implementation of machine learning (ML) systems to support AI-driven applications. The conventional method for training an ML model involves aggregating the extensive data scattered across different DCs into a single designated DC and subsequently training the ML model in a centralized manner [3]. However, this approach suffers from critical disadvantages such as high communication latency and heavy traffic loading. Additionally, concerns surrounding data security and privacy restrictions continue to impede the progress of the centralized training method [4]. To address these challenges, a novel ML paradigm, Geo-Distributed Machine Learning (Geo-DML), has been proposed. Geo-DML spans multiple DCs to enable privacy-preserving multi-party cooperative learning through Wide Area Networks (WANs) [5].