Conferences >2023 IEEE International Confe...

MPI4Spark Meets YARN: Enhancing MPI4Spark through YARN support for HPC

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The MPI4Spark effort was able to reconcile disparities that existed between High-Performance Computing (HPC) environments and Big Data stacks, by adopting an MPI-based so...Show More

Metadata

Abstract:

The MPI4Spark effort was able to reconcile disparities that existed between High-Performance Computing (HPC) environments and Big Data stacks, by adopting an MPI-based solution inside of A pache Spark’s Netty communication layer that was capable of better utilizing high-speed interconnects — such as InfiniBand (IB), Intel Omni-Path (OPA), and HPE Slingshot — across a variety of HPC systems. Apache Spark provides support for several cluster managers, such as YARN, Mesos, and Kubernetes, besides its internal standalone cluster manager. MPI4Spark, however, does not support the YARN cluster manager, instead only relying on Spark’s internal standalone cluster manager. The YARN cluster manager is designed for running large-scale clusters up to hundreds of nodes and provides better scalability in an HPC environment. Therefore, support for the YARN cluster manager is needed for MPI4Spark to provide a solution more fitting for HPC in terms of scalability — this paper addresses this problem. We present a new design for MPI4Spark that supports both YARN and the internal standalone cluster manager. The architectural framework of MPI4Spark remains the same in the new YARN design with an MPI-based Netty layer at its core. The new YARN design for MPI4Spark outperforms both regular Spark and RDMA-Spark. Evaluation of MPI4Spark’s new YARN design was conducted on two HPC systems, TACC Frontera and TACC Stampede2. On Frontera, looking at SortByTest weak-scaling numbers, and cluster size of 64 NodeManagers (3584 cores, 896GB), MPI4Spark outperforms in total execution time both Spark by 4.52x and RDMA-Spark by 2.33x. For GroupByTest strong scaling numbers, and cluster size of 128 NodeManagers (7168 cores, 1344GB), MPI4Spark performs better than Spark by 3.29x and by 2.32x compared to RDMA-Spark. With Intel HiBench performance evaluations on Frontera, on a cluster size of 32 NodeManagers (1792 cores), MPI4Spark fairs better than Spark by 1.91x for the Logistic Regression (LR) benchmark. ...

Published in: 2023 IEEE International Conference on Big Data (BigData)

Date of Conference: 15-18 December 2023

Date Added to IEEE Xplore: 22 January 2024

ISBN Information:

DOI: 10.1109/BigData59044.2023.10386120

Conference Location: Sorrento, Italy

Contents

I. Introduction

The intersection of Big Data and High-Performance Computing (HPC) is becoming more pronounced as data continues to grow rapidly, and although solutions such as Big Data frameworks like Apache Spark [1] exist to solve challenges with processing and managing large sets of data in a parallel and distributed fashion, they do not fully utilize the main features that characterize HPC systems and environments such as high-speed interconnects and the Message Passing Interface (MPI) [2] programming model — the lingua franca programming model for developing parallel scientific and engineering applications on HPC.

References is not available for this document.

MPI4Spark Meets YARN: Enhancing MPI4Spark through YARN support for HPC

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MPI4Spark Meets YARN: Enhancing MPI4Spark through YARN support for HPC

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References