A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference | IEEE Journals & Magazine | IEEE Xplore

A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference


Abstract:

Compute-in-memory (CIM) is promising in reducing data movement energy and providing large bandwidth for matrix-vector multiplies (MVMs). However, existing work still face...Show More

Abstract:

Compute-in-memory (CIM) is promising in reducing data movement energy and providing large bandwidth for matrix-vector multiplies (MVMs). However, existing work still faces various challenges, such as the digital logic overhead caused by the multiply-add operations (OPs) and structural sparsity. This article presents a 2-to-8-b scalable approximate digital SRAM-based CIM macro co-designed with a multiply-less neural network (NN) approach. It incorporates dynamic-logic-based approximate circuits for the logic area and energy saving by eliminating multiplications. A prototype is fabricated in 28-nm CMOS technology and achieves peak multiply-accumulate (MAC)-level energy efficiency of 102 TOPS/W for 8-b operations. The NN model deployment flow is used to demonstrate CIFAR-10 and ImageNet classification with ResNet-20 and ResNet-50 style multiply-less models, respectively, achieving the accuracy of 91.74% and 74.8% with 8-bit weights and activations.
Published in: IEEE Journal of Solid-State Circuits ( Volume: 60, Issue: 2, February 2025)
Page(s): 695 - 706
Date of Publication: 05 August 2024

ISSN Information:

Funding Agency:


I. Introduction

Artificial intelligence based on neural network (NN) has enabled various emerging applications, ranging from the edge to cloud computing, such as computer vision, language processing, and molecular discovery for scientific applications [1], [2], [3], [4]. A key attribute of those applications is that they heavily rely on computation with huge data, such as high-dimensional matrices and tensors. Thus, the efficiency and performance of the NN inference, which is dominated by matrix-vector multiplies (MVMs), are limited by the memory access and I/O bandwidth [5]. However, due to the high-dimensionality computation in MVMs, Von Neumann architecture-based computing platforms are not well adapted to NN inference. A new hardware solution needs to be proposed to advance modern AI development.

Contact IEEE to Subscribe

References

References is not available for this document.