Journals & Magazines >IEEE Transactions on Parallel... >Volume: 30 Issue: 3

Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Broadcast is a widely used operation in many streaming and deep learning applications to disseminate large amounts of data on emerging heterogeneous High-Performance Comp...Show More

Metadata

Abstract:

Broadcast is a widely used operation in many streaming and deep learning applications to disseminate large amounts of data on emerging heterogeneous High-Performance Computing (HPC) systems. However, traditional broadcast schemes do not fully utilize hardware features for Graphics Processing Unit (GPU)-based applications. In this paper, a model-oriented analysis is presented to identify performance bottlenecks of existing broadcast schemes on GPU clusters. Next, streaming-based broadcast schemes are proposed to exploit InfiniBand hardware multicast (IB-MCAST) and NVIDIA GPUDirect technology for efficient message transmission. The proposed designs are evaluated in the context of using Message Passing Interface (MPI) based benchmarks and applications. The experimental results indicate improved scalability and up to 82 percent reduction of latency compared to the state-of-the-art solutions in the benchmark-level evaluation. Furthermore, compared to the state-of-the-art, the proposed design yields stable higher throughput for a synthetic streaming workload, and 1.3x faster training time for a deep learning framework.

Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 30, Issue: 3, 01 March 2019)

Page(s): 575 - 588

Date of Publication: 26 August 2018

ISSN Information:

DOI: 10.1109/TPDS.2018.2867222

Funding Agency:

Contents

1 Introduction

Emerging high-performance computing (HPC) systems are marked by two factors: 1) the usage of accelerators like general purpose graphics processing units (GPGPUs) to boost their computing capabilities, and 2) high-performance commodity interconnects such as InfiniBand (IB) to push the frontiers of performance and scalability. As a result, numerous HPC applications, runtimes, and frameworks are adopting the massive parallelism computing power of GPUs [1], [2], [3], [4], [5], [6].

References is not available for this document.

Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1 Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1 Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References