Journals & Magazines >IEEE Journal of Solid-State C... >Volume: 60 Issue: 2

A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Compute-in-memory (CIM) is promising in reducing data movement energy and providing large bandwidth for matrix-vector multiplies (MVMs). However, existing work still face...Show More

Metadata

Abstract:

Compute-in-memory (CIM) is promising in reducing data movement energy and providing large bandwidth for matrix-vector multiplies (MVMs). However, existing work still faces various challenges, such as the digital logic overhead caused by the multiply-add operations (OPs) and structural sparsity. This article presents a 2-to-8-b scalable approximate digital SRAM-based CIM macro co-designed with a multiply-less neural network (NN) approach. It incorporates dynamic-logic-based approximate circuits for the logic area and energy saving by eliminating multiplications. A prototype is fabricated in 28-nm CMOS technology and achieves peak multiply-accumulate (MAC)-level energy efficiency of 102 TOPS/W for 8-b operations. The NN model deployment flow is used to demonstrate CIFAR-10 and ImageNet classification with ResNet-20 and ResNet-50 style multiply-less models, respectively, achieving the accuracy of 91.74% and 74.8% with 8-bit weights and activations.

Published in: IEEE Journal of Solid-State Circuits ( Volume: 60, Issue: 2, February 2025)

Page(s): 695 - 706

Date of Publication: 05 August 2024

ISSN Information:

DOI: 10.1109/JSSC.2024.3433417

Funding Agency:

References is not available for this document.

Contents

I. Introduction

Artificial intelligence based on neural network (NN) has enabled various emerging applications, ranging from the edge to cloud computing, such as computer vision, language processing, and molecular discovery for scientific applications [1], [2], [3], [4]. A key attribute of those applications is that they heavily rely on computation with huge data, such as high-dimensional matrices and tensors. Thus, the efficiency and performance of the NN inference, which is dominated by matrix-vector multiplies (MVMs), are limited by the memory access and I/O bandwidth [5]. However, due to the high-dimensionality computation in MVMs, Von Neumann architecture-based computing platforms are not well adapted to NN inference. A new hardware solution needs to be proposed to advance modern AI development.

References is not available for this document.

A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?