I. Introduction
Many applications, such as digital signal processing, math libraries, and on-chip built-in self-test, use data that are determined at design time and stay constant during runtime (which we refer to as static data). Static data may be stored as lookup tables in on-chip read-only memory (ROM). However, storing large amounts of static data in on-chip ROM incurs significant area and power overheads. An alternative method is used to store static data off-chip. In this case, the processor has to fetch the required static data into on-chip cache during program execution, which leads to performance degradation. The problem is further exacerbated when data for each program running concurrently are mapped to the same cache location (also called cache thrashing). The static data used by a program may need to be evicted from cache as a result and the processor has to fetch the evicted data from off-chip memory again in order to continue program execution. Hence, realizing an on-chip ROM with minimal overhead allows static data to be stored closer to the processor, and may be used to accelerate the execution of applications.