Conferences >ISC High Performance 2024 Res...

Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters

Abstract:

With the increasing scale of High-Performance Computing (HPC) and Deep Learning (DL) applications through GPU adaptation, the seamless communication of data stored on GPU...Show More

Metadata

Abstract:

With the increasing scale of High-Performance Computing (HPC) and Deep Learning (DL) applications through GPU adaptation, the seamless communication of data stored on GPUs has become a critical factor in enhancing overall application performance. AllReduce is a communication collective operation that is commonly used in HPC applications and distributed DL training, especially Data Parallelism. Data Parallelism is a common strategy where parallel GPUs are used to process the partitioned training dataset using a replica of the DL model. However, AllReduce operation for large GPU data still performs poorly due to the limited interconnect bandwidth between the GPU nodes. Some strategies of Gradient Quantization or Sparse AllReduce modifying the Stochastic Gradient Descent (SGD) algorithms may not support different training scenarios. Recent research shows integrating GPU-based compression into MPI libraries is efficient to achieve faster data transmission. In this paper, we propose optimized Recursive-Doubling and Ring AllReduce algorithms that encompass efficient collective-level GPU-based compression schemes in a state-of-the-art GPU-Aware MPI library. At the microbenchmark level, the proposed Recursive-Doubling and Ring algorithms with compression support achieve benefits of up to 75.3% and 85.5% respectively compared to the baseline, and 24.8% and 66.1% respectively compared to naive point-to-point compression on modern GPU clusters. For distributed DL training with PyTorch-DDP, these two approaches yield up to 32.3% and 35.7% faster training than the baseline, while maintaining similar accuracy.

Published in: ISC High Performance 2024 Research Paper Proceedings (39th International Conference)

Date of Conference: 12-16 May 2024

Date Added to IEEE Xplore: 10 May 2024

Electronic ISBN:978-3-9826336-0-2

DOI: 10.23919/ISC.2024.10528931

Conference Location: Hamburg, Germany

Funding Agency:

No metrics found for this document.

Contents

References is not available for this document.

Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?