Conferences >ICC 2022 - IEEE International...

Fast Parameter Synchronization for Distributed Learning with Selective Multicast

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Recent advances in distributed machine learning show theoretically and empirically that, for many models, provided workers would participate in the synchronizations event...Show More

Metadata

Abstract:

Recent advances in distributed machine learning show theoretically and empirically that, for many models, provided workers would participate in the synchronizations eventually, i) the training still converges, even if only p workers take part in each round of synchronization, and ii) a larger p generally leads to a faster rate of convergence. These findings shed light on eliminating the bottleneck effects of parameter synchronization in large-scale data-parallel distributed training, having motivated several optimization designs.In this paper, we focus on optimizing the parameter synchronization for peer-to-peer distributed learning, in which workers generally broadcast or multicast their updated parameters to others for synchronization, and propose SELMCAST, an expressive and Pareto-optimal multicast receiver selection algorithm, to achieve the goal. Compared with the state-of-the-art design that randomly selects exactly p receivers for each worker’s multicast in a bandwidth-agnostic way, SELMCAST chooses receivers based on the global view of their available bandwidth and loads, yielding two advantages. Firstly, it could optimize the bottleneck sending rate, thus cutting down the time cost of parameter synchronization. Secondly, when more than p receivers are with sufficient bandwidth, they would be selected as many as possible, bringing benefits to the convergence of training. Extensive evaluations show that SELMCAST is efficient and always achieves near-optimal performance.

Published in: ICC 2022 - IEEE International Conference on Communications

Date of Conference: 16-20 May 2022

Date Added to IEEE Xplore: 11 August 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/ICC45855.2022.9838266

Conference Location: Seoul, Korea, Republic of

Funding Agency:

No metrics found for this document.

Contents

I. Introduction

Over the past decade, machine learning techniques have obtained huge success and have been widely employed for various applications like email filtering, advertising recommendation, speech recognition, machine translation, computer vision, etc [1]–[4]. With the increasing popularity of machine learning and rapid developments of new technologies, the realistic quantities of training data for a learning task have increased from GBs to TBs and PBs. Data-parallel distributed training becomes the key to obtaining the resulting model over such a massive of data within reasonable times [1]–[3].

Usage

Select a Year

View as

Total usage sinceAug 2022:300

Year Total:9

Data is updated monthly. Usage includes PDF downloads and HTML views.

Citations

Crossref^®

Search for
Citations in
Google Scholar^®

References is not available for this document.

MIT Libraries

MIT Libraries

Fast Parameter Synchronization for Distributed Learning with Selective Multicast

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Fast Parameter Synchronization for Distributed Learning with Selective Multicast

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?