Loading [MathJax]/extensions/MathZoom.js
Scalable MPI Collectives using SHARP: Large Scale Performance Evaluation on the TACC Frontera System | IEEE Conference Publication | IEEE Xplore

Scalable MPI Collectives using SHARP: Large Scale Performance Evaluation on the TACC Frontera System


Abstract:

The Message-Passing Interface (MPI) is the de-facto standard for designing and executing applications on massively parallel hardware. MPI collectives provide a convenient...Show More

Abstract:

The Message-Passing Interface (MPI) is the de-facto standard for designing and executing applications on massively parallel hardware. MPI collectives provide a convenient abstraction for multiple processes/threads to communicate with one another. Mellanox's HDR InfiniBand switches pro-vide Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) capabilities to offload collective communication to the network and reduce CPU involvement in the process. In this paper, we design and implement SHARP-based solutions for MPI Reduce and MPI Barrier in MVAPICH2-X. We evaluate the impact of proposed and existing SHARP-based solutions for MPI_Allreduce, MPI_Reduce, and MPI_Barrier operations have on the performance of the collective operation on the 8th ranked TACC Frontera HPC system. Our experimental evaluation of the SHARP-based designs show up to 5.4X reduction in latency for Reduce, 5.1X for Allreduce, and 7.1X for Barrier at full system scale of 7,861 nodes over a host-based solution.
Date of Conference: 13-13 November 2020
Date Added to IEEE Xplore: 04 January 2021
ISBN Information:
Conference Location: Atlanta, GA, USA
No metrics found for this document.

I. Introduction

Super-computing systems have grown in size and scale over the last decade. Two key drivers fueling the growth of supercomputers are the current trends in multi-/many-core architectures and the availability of commodity, RDMA-enabled, and high-performance interconnects such as Infini-Band [1] (IB). Such HPC systems are allowing scientists and engineers to tackle grand challenges in various scientific domains. Users of HPC systems rely on parallel programming models to parallelize their applications and obtain performance improvements.

Usage
Select a Year
2025

View as

Total usage sinceJan 2021:621
01234567JanFebMarAprMayJunJulAugSepOctNovDec646000000000
Year Total:16
Data is updated monthly. Usage includes PDF downloads and HTML views.

Contact IEEE to Subscribe

References

References is not available for this document.