1 Introduction
Continuous CMOS technology scaling makes the design of robust and high-density SRAM-based cache an increasingly challenging task [1]. Potential faults in SRAM can be parametric/catastrophic defects or transient soft errors, both of which are becoming increasingly serious as the technology feature size shrinks. In conventional design practice, memory defects are handled by using spare (or redundant) rows, columns, and/or words to repair (i.e., replace) the defective ones, while soft errors are compensated by error-correcting codes (ECC) such as single-error-correcting and double-error-detecting (SEC-DED) codes that are being widely used in L2 cache of modern microprocessors [2], [3]. As the technology continues to scale down, the increasingly severe process variability tends to render future SRAM subject to a parametric random defect of 0.1 percent or even higher [4]. As a result, traditional repair-only defect tolerance strategy may no longer be sufficient to ensure high enough yield, which has motivated recent work on extending the role of ECC for compensating both soft errors and defects in cache memories [5], [6]. In [5], the authors developed techniques that allow the use of the existing SEC-DED codes to handle defects for the cache blocks consisting of a single defect while maintaining soft error tolerance at the cost of memory communication bandwidth loss, and hence, noticeable instructions per cycle (IPC) degradation. In [6], 2D array codes (or product codes) [7] are used to handle clustered soft errors and/or defects. Nevertheless, since one 2D array codeword protects many cache blocks altogether, the use of array codes may incur significant energy cost and IPC degradation in the presence of a large amount of random defects.