I. Introduction
Modern deep submicrometer fabrication technologies enable very high levels of integration such as a recent dual-core 1.7 billion-transistor chip [1]. A highly promising approach to efficiently using these circuit resources is the integration of multiple processors onto a single chip (called a chip multiprocessor or CMP) to achieve higher performance through parallel processing. CMPs can potentially also provide increased energy efficiency by allowing the clock frequency and supply voltage to be reduced together to dramatically reduce power dissipation during periods when full rate computation is not needed and conditions permit.