Loading [MathJax]/extensions/MathMenu.js
A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference | IEEE Journals & Magazine | IEEE Xplore

A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference


Abstract:

Compute-in-memory (CIM) is promising in reducing data movement energy and providing large bandwidth for matrix-vector multiplies (MVMs). However, existing work still face...Show More

Abstract:

Compute-in-memory (CIM) is promising in reducing data movement energy and providing large bandwidth for matrix-vector multiplies (MVMs). However, existing work still faces various challenges, such as the digital logic overhead caused by the multiply-add operations (OPs) and structural sparsity. This article presents a 2-to-8-b scalable approximate digital SRAM-based CIM macro co-designed with a multiply-less neural network (NN) approach. It incorporates dynamic-logic-based approximate circuits for the logic area and energy saving by eliminating multiplications. A prototype is fabricated in 28-nm CMOS technology and achieves peak multiply-accumulate (MAC)-level energy efficiency of 102 TOPS/W for 8-b operations. The NN model deployment flow is used to demonstrate CIFAR-10 and ImageNet classification with ResNet-20 and ResNet-50 style multiply-less models, respectively, achieving the accuracy of 91.74% and 74.8% with 8-bit weights and activations.
Published in: IEEE Journal of Solid-State Circuits ( Volume: 60, Issue: 2, February 2025)
Page(s): 695 - 706
Date of Publication: 05 August 2024

ISSN Information:

Funding Agency:

References is not available for this document.

I. Introduction

Artificial intelligence based on neural network (NN) has enabled various emerging applications, ranging from the edge to cloud computing, such as computer vision, language processing, and molecular discovery for scientific applications [1], [2], [3], [4]. A key attribute of those applications is that they heavily rely on computation with huge data, such as high-dimensional matrices and tensors. Thus, the efficiency and performance of the NN inference, which is dominated by matrix-vector multiplies (MVMs), are limited by the memory access and I/O bandwidth [5]. However, due to the high-dimensionality computation in MVMs, Von Neumann architecture-based computing platforms are not well adapted to NN inference. A new hardware solution needs to be proposed to advance modern AI development.

Select All
1.
A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet classification with deep convolutional neural networks", Commun. ACM, vol. 60, no. 2, pp. 84-90, Jun. 2012.
2.
K. He, X. Zhang, S. Ren and J. Sun, "Deep residual learning for image recognition", Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 770-778, Jun. 2016.
3.
J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You only look once: Unified real-time object detection", Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 779-788, Jun. 2016.
4.
A. Vaswani et al., "Attention is all you need", Proc. NIPS, pp. 6000-6010, 2017.
5.
M. Horowitz, "1.1 Computing’s energy problem (and what we can do about it)", IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 10-14, Feb. 2014.
6.
J. Zhang, Z. Wang and N. Verma, "In-memory computation of a machine-learning classifier in a standard 6T SRAM array", IEEE J. Solid-State Circuits, vol. 52, no. 4, pp. 915-924, Apr. 2017.
7.
M. Kang, S. K. Gonugondla, A. Patil and N. R. Shanbhag, "A multi-functional in-memory inference processor using a standard 6T SRAM array", IEEE J. Solid-State Circuits, vol. 53, no. 2, pp. 642-655, Feb. 2018.
8.
W.-S. Khwa et al., "A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors", IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 496-498, Feb. 2018.
9.
X. Si et al., "24.5 A twin-8T SRAM computation-in-memory macro for multiple-bit CNN-based machine learning", IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 396-398, Feb. 2019.
10.
H. Valavi, P. J. Ramadge, E. Nestler and N. Verma, "A 64-tile 2.4-Mb in-memory-computing CNN accelerator employing charge-domain compute", IEEE J. Solid-State Circuits, vol. 54, no. 6, pp. 1789-1799, Jun. 2019.
11.
Z. Jiang, S. Yin, J.-S. Seo and M. Seok, "C3SRAM: An in-memory-computing SRAM macro based on robust capacitive coupling computing mechanism", IEEE J. Solid-State Circuits, vol. 55, no. 7, pp. 1888-1897, Jul. 2020.
12.
H. Jia et al., "Scalable and programmable neural network inference accelerator based on in-memory computing", IEEE J. Solid-State Circuits, vol. 57, no. 1, pp. 198-211, Jan. 2022.
13.
H. Wang, R. Liu, R. Dorrance, D. Dasalukunte, D. Lake and B. Carlton, "A charge domain SRAM compute-in-memory macro with C-2C ladder-based 8-bit MAC unit in 22-nm FinFET process for edge inference", IEEE J. Solid-State Circuits, vol. 58, no. 4, pp. 1037-1050, Apr. 2023.
14.
H. Kim, T. Yoo, T. T. Kim and B. Kim, "Colonnade: A reconfigurable SRAM-based digital bit-serial compute-in-memory macro for processing neural networks", IEEE J. Solid-State Circuits, vol. 56, no. 7, pp. 2221-2233, Jul. 2021.
15.
D. Wang, C. T. Lin, G. K. Chen, P. Knag, R. K. Krishnamurthy and M. Seok, " DIMC: 2219TOPS/W 2569F 2 /b digital in-memory computing macro in 28nm based on approximate arithmetic hardware ", Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), pp. 266-268, Sep. 2022.
16.
Y.-D. Chih et al., " 16.4 An 89TOPS/W and 16.3TOPS/mm 2 all-digital SRAM-based full-precision compute-in memory macro in 22nm for machine-learning edge applications ", Proc. IEEE Int. Solid- State Circuits Conf. (ISSCC), vol. 64, pp. 252-254, Feb. 2021.
17.
H. Fujiwara et al., " A 5-nm 254-TOPS/W 221-TOPS/mm 2 fully-digital computing-in-memory macro supporting wide-range dynamic-voltage-frequency scaling and simultaneous MAC and write operations ", Proc. IEEE Int. Solid-State Circuits Conf., pp. 1-3, Feb. 2022.
18.
H. Mori et al., " A 4nm 6163-TOPS/W/b 4790-TOPS/mm 2 /b SRAM based digital-computing-in-memory macro supporting bit-width flexibility and simultaneous MAC and weight update ", Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), pp. 132-134, Feb. 2023.
19.
Y. He et al., "7.3 A 28nm 38-to-102-TOPS/W 8b multiply-less approximate digital SRAM compute-in-memory macro for neural-network inference", Proc. IEEE Int. Solid- State Circuits Conf. (ISSCC), pp. 130-132, Feb. 2023.
20.
B. Yan et al., " A 1.041-Mb/mm 2 27.38-TOPS/W signed-INT8 dynamic-logic-based ADC-less SRAM compute-in-memory macro in 28nm with reconfigurable bitwise operation for AI and embedded applications ", Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), vol. 65, pp. 188-190, Feb. 2022.
21.
J. Yue et al., "A 28nm 16.9–300TOPS/W computing-in-memory processor supporting floating-point NN inference/training with intensive-CIM sparse-digital architecture", Proc. IEEE Int. Solid-State Circuits Conf., pp. 1-3, Oct. 2023.
22.
S. Liu et al., "16.2 A 28nm 53.8TOPS/W 8b sparse transformer accelerator with in-memory butterfly zero skipper for unstructured-pruned NN and CIM-based local-attention-reusable engine", Proc. IEEE Int. Solid- State Circuits Conf. (ISSCC), pp. 250-252, Feb. 2023.
23.
H. Diao et al., "A 28nm 128TFLOPS/W computing-in-memory engine supporting one-shot floating-point NN inference and on-device fine-tuning for edge AI", Proc. IEEE Custom Integr. Circuits Conf. (CICC), pp. 1-2, Apr. 2024.
24.
J. Yue et al., "STICKER-IM: A 65 nm computing-in-memory NN processor using block-wise sparsity optimization and inter/intra-macro data reuse", IEEE J. Solid-State Circuits, vol. 57, no. 8, pp. 2560-2573, Aug. 2022.
25.
J.-W. Su et al., "16.3 A 28nm 384kb 6T-SRAM computation-in-memory macro with 8b precision for AI edge chips", IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, vol. 64, pp. 250-252, Feb. 2021.
26.
P.-C. Wu et al., "A 28nm 1Mb time-domain computing-in-memory 6T-SRAM macro with a 6.6ns latency 1241GOPS and 37.01TOPS/W for 8b-MAC operations for edge-AI devices", Proc. IEEE Int. Solid- State Circuits Conf. (ISSCC), vol. 65, pp. 1-3, Feb. 2022.
27.
J. Song et al., "A 4-bit calibration-free computing-in-memory macro with 3T1C current-programed dynamic-cascode Multi-Level-Cell eDRAM", IEEE J. Solid-State Circuits, vol. 59, no. 3, pp. 842-854, Mar. 2024.
28.
H. Chen et al., "AdderNet: Do we really need multiplications in deep learning?", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1465-1474, Jun. 2020.
29.
H. Shu, J. Wang, H. Chen, L. Li, Y. Yang and Y. Wang, "Adder attention for vision transformer", Proc. Adv. Neural Inf. Process. Systems(NIPS), vol. 34, pp. 19899-19909, Sep. 2021.
30.
X. Chen, C. Xu, M. Dong, C. Xu and Y. Wang, "An empirical study of adder neural networks for object detection", Proc. Adv. Neural Inf. Process. Syst., vol. 34, pp. 6894-6905, 2021.

Contact IEEE to Subscribe

References

References is not available for this document.