Loading web-font TeX/Math/Italic
A 4 + 2T SRAM for Searching and In-Memory Computing With 0.3-V <span class="MathJax_Preview" style="">V_{\mathrm {DDmin}}</span><script type="math/tex" id="MathJax-Element-1">V_{\mathrm {DDmin}}</script> | IEEE Journals & Magazine | IEEE Xplore

A 4 + 2T SRAM for Searching and In-Memory Computing With 0.3-V V_{\mathrm {DDmin}}


Abstract:

This paper presents a 4+2T SRAM for embedded searching and in-memory-computing applications. The proposed SRAM cell uses the n-well as the write wordline to perform write...Show More

Abstract:

This paper presents a 4+2T SRAM for embedded searching and in-memory-computing applications. The proposed SRAM cell uses the n-well as the write wordline to perform write operations and eliminate the write access transistors, achieving 15% area saving compared with conventional 8T SRAM. The decoupled differential read paths significantly improve read noise margin, and therefore reliable multi-word activation can be enabled to perform in-memory Boolean logic functions. Reconfigurable differential sense amplifiers are employed to realize fast normal read or multi-functional logic operations. Moreover, the proposed 4 + 2T SRAM can be reconfigured as binary content-addressable memory (BCAM) or ternary content-addressable memory (TCAM) for searching operations, achieving 0.13 fJ/search/bit at 0.35 V. The chip is fabricated in 55-nm deeply depleted channel technology. The area efficiency is 65% for a 128 \times 128 pushed-rule array including all peripherals such as column-wise sense amplifier for read/logic and row-wise sense amplifier for BCAM/TCAM operations. Forty dies across five wafers in different corners are measured, showing a worst-case read/write V_{\mathrm {DDmin}} of 0.3 V.
Published in: IEEE Journal of Solid-State Circuits ( Volume: 53, Issue: 4, April 2018)
Page(s): 1006 - 1015
Date of Publication: 11 December 2017

ISSN Information:


I. Introduction

Conventional von Neumann (CVN) architecture continuously transfers data between memory banks and compute elements, incurring substantial energy and latency costs that can dominate system power and performance. As shown in Fig. 1, to perform Boolean logic function on two words, a CVN architecture involves three major steps: 1) two memory read cycles; 2) data movement from memory hierarchy to computing element’s registers; and 3) computation within the ALU. Both power consumption and latency are largely dominated by the two memory accesses and the data movement (Fig. 2) [1]–[3]. To minimize the power and latency, in-memory computing (IMC) has been recently proposed for data processing directly inside an on-chip memory macro [4], [5]. As shown in Fig. 1, IMC activates multiple rows simultaneously and computes the logical functions directly on the bitline (BL). Computation results are available immediately at the end of a memory access. Therefore, only one cycle is required instead of multiple cycles of latency [Fig. 2(a)]. As a result, IMC can reduce the energy costs associated with both data movement and memory access, as shown in Fig. 2(b). Since memory banks are typically very wide (many words per word line), it also potentially provides inherently parallel computation.

Data flow required for reading two words to perform logic operation in CVN architecture and IMC system.

Comparison between CVN and IMC in terms of (a) latency and (b) energy for a two-word Boolean operation.

Contact IEEE to Subscribe

References

References is not available for this document.