1 Introduction
Current trends in high-performance computing point to an exponential increase in core count and commensurate decrease in memory bandwidth per core. Similar bandwidth shortages are already observed for I/O, inter-node communication, and between CPU and GPU memory. This trend suggests that the performance of future computations will be dictated in large part by the amount of data movement. Moreover, with large data sets often being generated remotely, e.g. on shared compute clusters or in the cloud, the cost of transferring the results of the computation for visual exploration, quantitative analysis, and archival storage can be substantial.