I. Introduction
Microprocessors have enjoyed many decades of steady performance gains and speed increases due in large part to progress in CMOS device integration and increased instruction-level parallelism. Nevertheless, as a result of diminishing returns in traditional performance-scaling techniques and practical power limitations, modern chip design is shifting focus away from continued advancements in uniprocessor performance toward processor-level parallelism. Chip multiprocessors (CMPs), which leverage parallelism to perform the program execution by integrating two or more processor cores onto the same chip, have emerged as the new paradigm [1]–[4]. Consequently, new fields of research are emerging to address the many challenges in implementation and scaling brought about by the growing number of cores in the processor—challenges that can be fundamentally different from those faced during the uniprocessor era.