I. Introduction
The intersection of Big Data and High-Performance Computing (HPC) is becoming more pronounced as data continues to grow rapidly, and although solutions such as Big Data frameworks like Apache Spark [1] exist to solve challenges with processing and managing large sets of data in a parallel and distributed fashion, they do not fully utilize the main features that characterize HPC systems and environments such as high-speed interconnects and the Message Passing Interface (MPI) [2] programming model — the lingua franca programming model for developing parallel scientific and engineering applications on HPC.