I. Introduction
Commodity clusters have helped shaped the landscape of high-performance computing for more than a decade. Leveraging high volume off-the-shelf components, commodity clusters provide excellent cost-performance ratio. Of the several programming models available for scientific computing applications, MPI [1] is the most widely used. There exists a large volume of high-performance scientific applications that are written using MPI [2] [3] and more continue to be written. Modern MPI libraries utilize high-speed networks for communication with remote processes and shared memory for communication with processes local to a compute node.