Loading web-font TeX/Main/Regular
MC2-RAM: An In-8T-SRAM Computing Macro Featuring Multi-Bit Charge-Domain Computing and ADC-Reduction Weight Encoding | IEEE Conference Publication | IEEE Xplore

MC2-RAM: An In-8T-SRAM Computing Macro Featuring Multi-Bit Charge-Domain Computing and ADC-Reduction Weight Encoding


Abstract:

In-memory computing (IMC) is a promising hardware architecture to circumvent the memory walls in data-intensive applications, like deep learning. Among various memory tec...Show More

Abstract:

In-memory computing (IMC) is a promising hardware architecture to circumvent the memory walls in data-intensive applications, like deep learning. Among various memory technologies, static random-access memory (SRAM) is promising thanks to its high computing accuracy, reliability, and scalability to advanced technology nodes. This paper presents a novel multi-bit capacitive convolution in-SRAM computing macro for high accuracy, high throughput and high efficiency deep learning inference. It realizes fully parallel charge-domain multiply-and-accumulate (MAC) within compact 8-transistor 1-capacitor (8T1C) SRAM arrays that is only 41% larger than the standard 6T cells. It performs MAC with multi-bit activations without conventional digital bit-serial shift-and-add schemes, leading to drastically improved throughput for high-precision CNN models. An ADC-reduction encoding scheme complements the compact sram design, by reducing the number of needed ADCs by half for energy and area savings. A 576 \times 130 macro with 64 ADCs is evaluated in 65nm with post-layout simulations, showing 4.60 TOPS/mm2 compute density and 59.7 TOPS/W energy efficiency with 4/4-bit activations/weights. The MC2 - RAM also achieves excellent linearity with only 0.14 mV (4.5% of the LSB) standard deviation of the output voltage in Monte Carlo simulations.
Date of Conference: 26-28 July 2021
Date Added to IEEE Xplore: 04 August 2021
ISBN Information:
Conference Location: Boston, MA, USA

I. Introduction

Deep convolutional neural networks (CNNs) have achieved unprecedented success in the field of artificial intelligence (AI) in the past decade. However, the intensive computation required for even inference makes it challenging to deploy pre-trained models on resource-constrained edge devices. The essential and computationally dominant operation in CNN models–the convolution–requires overwhelming multiply-and-accumulate (MAC) operations with excessive on-/off-chip memory access. It is well-known that the energy bottleneck in such computation lies in the data movement rather than the arithmetic operations, leading to the so-called memory wall [1].

Contact IEEE to Subscribe

References

References is not available for this document.