I. Introduction
Super-computing systems have grown in size and scale over the last decade. Two key drivers fueling the growth of supercomputers are the current trends in multi-/many-core architectures and the availability of commodity, RDMA-enabled, and high-performance interconnects such as Infini-Band [1] (IB). Such HPC systems are allowing scientists and engineers to tackle grand challenges in various scientific domains. Users of HPC systems rely on parallel programming models to parallelize their applications and obtain performance improvements.