Loading [MathJax]/extensions/MathZoom.js
Joint Online Coflow Routing and Scheduling in Data Center Networks | IEEE Journals & Magazine | IEEE Xplore

Joint Online Coflow Routing and Scheduling in Data Center Networks


Abstract:

A coflow is a collection of related parallel flows that occur typically between two stages of a multi-stage computing task in a network, such as shuffle flows in MapReduc...Show More

Abstract:

A coflow is a collection of related parallel flows that occur typically between two stages of a multi-stage computing task in a network, such as shuffle flows in MapReduce. The coflow abstraction allows applications to convey their semantics to the network so that application-level requirements can be better satisfied. In this paper, we study the routing and scheduling of multiple coflows to minimize the total weighted coflow completion time (CCT). We first propose a rounding-based randomized approximation algorithm, called OneCoflow, for single coflow routing and scheduling. The multiple coflow problem is more challenging as coexisting coflows will compete for the same network resources, such as link bandwidth. To minimize the total weighted CCT, we derive an online multiple coflow routing and scheduling algorithm, called OMCoflow. We then derive a competitive ratio bound of our problem and prove that the competitive ratio of OMCoflow is nearly tight. To the best of our knowledge, this is the first online algorithm with theoretical performance guarantees which considers routing and scheduling simultaneously for multi-coflows. Compared with existing methods, OMCoflow runs more efficiently and avoids frequently rerouting the flows. Extensive simulations on a Facebook data trace show that OMCoflow outperforms the state-of-the-art heuristic schemes significantly (e.g., reducing the total weighted CCT by up to 41.8% and the execution time by up to 99.2% against RAPIER).
Published in: IEEE/ACM Transactions on Networking ( Volume: 27, Issue: 5, October 2019)
Page(s): 1771 - 1786
Date of Publication: 14 August 2019

ISSN Information:

Funding Agency:


I. Introduction

Distributed computing frameworks such as MapReduce [3], Dryad [4] and Spark [5] are very popular among cloud applications. In these frameworks, data flows for one job may share a common performance goal, such as minimizing the completion time of the slowest flow. However, such application-level requirements are often largely overlooked when cloud providers aim at optimizing network-level metrics such as (individual) flow completion time.

Contact IEEE to Subscribe

References

References is not available for this document.