I. Introduction
Reducing energy consumption has become an increasingly important goal as it translates directly into electricity cost savings in data centers and supercomputing centers, and longer battery life in mobile devices. At the CMP level, common techniques to reduce energy include integrating asymmetric cores (i.e. static asymmetry) for efficient execution of different workloads [1], and leveraging DVFS (i.e. dynamic asymmetry) to match the core's performance to the program requirements [2]. However, as core counts increase, scaling DVFS and asymmetry complicate hardware designs. One way to reduce this complexity is by organizing the cores into clusters. This strategy has been used to organize monolithic CMPs [3] and chiplet-based CMP designs [4], [5], and is expected to continue with the trend towards heterogeneous integration [6]. In such designs, all cores in a cluster are of the same type and operate at the same DVFS setting.