I. Introduction
Semiconductor process scaling leads to explosive silicon capacity for higher chip density, faster speed, and better opportunities of reducing power dissipation. Fueled by these technological advances, research in computer architecture has been successful to boost the microprocessor performance beyond that achievable by process scaling alone. There is a clear trend towards chip multithreaded processing [1], [2] to exploit the performance benefits rendered by future nanometer billion-transistor integration.