1 Introduction
Today's data centers widely employ cluster computation frameworks (e.g., MapReduce [1], Dryad [2], CIEL [3], and Spark [4]) to deal with the increasing data process and analysis demands. In these frameworks, data-intensive jobs are divided into multiple successive data-parallel computation stages, and a succeeding computation stage cannot start until all its required inputs are in place, which are the outputs of the previous stage. Recent studies [5], [6], [7] have shown that the intermediate data transmission is not a negligible phase in the process of a job. For example, it accounts for 33 percent of the job running time in Facebook's system [5]. Accordingly, speeding up data transfers between computation stages will accelerate the job completions and increase resource utilizations [5], [6], [7].