Journals & Magazines >IEEE Journal of Solid-State C... >Volume: 53 Issue: 4

A 4 + 2T SRAM for Searching and In-Memory Computing With 0.3-V $V_{\mathrm {DDmin}}$

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

This paper presents a 4+2T SRAM for embedded searching and in-memory-computing applications. The proposed SRAM cell uses the n-well as the write wordline to perform write...Show More

Metadata

Abstract:

This paper presents a 4+2T SRAM for embedded searching and in-memory-computing applications. The proposed SRAM cell uses the n-well as the write wordline to perform write operations and eliminate the write access transistors, achieving 15% area saving compared with conventional 8T SRAM. The decoupled differential read paths significantly improve read noise margin, and therefore reliable multi-word activation can be enabled to perform in-memory Boolean logic functions. Reconfigurable differential sense amplifiers are employed to realize fast normal read or multi-functional logic operations. Moreover, the proposed 4 + 2T SRAM can be reconfigured as binary content-addressable memory (BCAM) or ternary content-addressable memory (TCAM) for searching operations, achieving 0.13 fJ/search/bit at 0.35 V. The chip is fabricated in 55-nm deeply depleted channel technology. The area efficiency is 65% for a

$128 \times 128$ pushed-rule array including all peripherals such as column-wise sense amplifier for read/logic and row-wise sense amplifier for BCAM/TCAM operations. Forty dies across five wafers in different corners are measured, showing a worst-case read/write

$V_{\mathrm {DDmin}}$ of 0.3 V.

Published in: IEEE Journal of Solid-State Circuits ( Volume: 53, Issue: 4, April 2018)

Page(s): 1006 - 1015

Date of Publication: 11 December 2017

ISSN Information:

DOI: 10.1109/JSSC.2017.2776309

Contents

I. Introduction

Conventional von Neumann (CVN) architecture continuously transfers data between memory banks and compute elements, incurring substantial energy and latency costs that can dominate system power and performance. As shown in Fig. 1, to perform Boolean logic function on two words, a CVN architecture involves three major steps: 1) two memory read cycles; 2) data movement from memory hierarchy to computing element’s registers; and 3) computation within the ALU. Both power consumption and latency are largely dominated by the two memory accesses and the data movement (Fig. 2) [1]–[3]. To minimize the power and latency, in-memory computing (IMC) has been recently proposed for data processing directly inside an on-chip memory macro [4], [5]. As shown in Fig. 1, IMC activates multiple rows simultaneously and computes the logical functions directly on the bitline (BL). Computation results are available immediately at the end of a memory access. Therefore, only one cycle is required instead of multiple cycles of latency [Fig. 2(a)]. As a result, IMC can reduce the energy costs associated with both data movement and memory access, as shown in Fig. 2(b). Since memory banks are typically very wide (many words per word line), it also potentially provides inherently parallel computation. Fig. 1.

Data flow required for reading two words to perform logic operation in CVN architecture and IMC system.

Fig. 2.

Comparison between CVN and IMC in terms of (a) latency and (b) energy for a two-word Boolean operation.

References is not available for this document.

A 4 + 2T SRAM for Searching and In-Memory Computing With 0.3-V $V_{\mathrm {DDmin}}$

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A 4 + 2T SRAM for Searching and In-Memory Computing With 0.3-V V_{\mathrm {DDmin}}V_{\mathrm {DDmin}}

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A 4 + 2T SRAM for Searching and In-Memory Computing With 0.3-V $V_{\mathrm {DDmin}}$