I. Introduction
The need for continued performance growth in computing systems lead to the integration of many CPU cores on one chip to concurrently run many multi-threaded-applications [1]. Although this ability of massive parallel computations improves the overall system performance [2], it imposes two problems.