I. Introduction
Work stealing is a well-known approach to task distribution that elegantly balances task-based parallelism across multiple worker threads [10], [30]. In a work-stealing runtime, each worker thread enqueues and dequeues tasks onto the tail of its task queue. When a worker finds its queue empty, it attempts to steal a task from the head of another worker thread's task queue. Work stealing has been shown to have good performance, space requirements, and communication overhead in both theory [8] and practice [7], [22]. Optimizing work-stealing runtimes remains a rich research area [4], [12], [13], [15], [18], [4]–7, and work stealing is a critical component in many popular concurrency platforms including Intel's Cilk++, Intel's C++ Threading Building Blocks (TBB), Microsoft's. NET Task Parallel Library, Java's Fork/Join Framework, X10, and OpenMP. Most of the past research and current implementations use asymmetry-oblivious work-stealing runtimes. In this work, we propose asymmetry-aware work-stealing (AAWS) runtimes, which exploit both static asymmetry (e.g., different core microarchitectures) and dynamic asymmetry (e.g., per-core dynamic voltage/frequency scaling) to improve the performance and energy efficiency of multicore processors.