Loading [MathJax]/extensions/MathMenu.js
A 40nm 1Mb 35.6 TOPS/W MLC NOR-Flash Based Computation-in-Memory Structure for Machine Learning | IEEE Conference Publication | IEEE Xplore

A 40nm 1Mb 35.6 TOPS/W MLC NOR-Flash Based Computation-in-Memory Structure for Machine Learning


Abstract:

Computation-in-memory (CIM) is a feasible method to overcome "Von-Neumann bottleneck" with high throughput and energy efficiency. In this paper, we proposed a 1Mb Multi-L...Show More

Abstract:

Computation-in-memory (CIM) is a feasible method to overcome "Von-Neumann bottleneck" with high throughput and energy efficiency. In this paper, we proposed a 1Mb Multi-Level (MLC) NOR Flash based CIM (MLFlash- CIM) structure with 40nm technology node. A multi-bit readout circuit was proposed to realize adaptive quantization, which comprises a current interface circuit, a multi-level analog shift amplifier (AS-Amp) and an 8-bit SAR-ADC. When applied to a modified VGG-16 Network with 16 layers, the proposed MLFlash-CIM can achieve 92.73% inference accuracy under CIFAR-10 dataset. This CIM structure also achieved a peak throughput of 3.277 TOPS and an energy efficiency of 35.6 TOPS/W with 4-bit multiplication and accumulation (MAC) operations.
Date of Conference: 22-28 May 2021
Date Added to IEEE Xplore: 27 April 2021
Print ISBN:978-1-7281-9201-7
Print ISSN: 2158-1525
Conference Location: Daegu, Korea
References is not available for this document.

I. Introduction

With the rapid development of artificial intelligence (AI) algorithm, the research and application of convolutional neural network (CNN) has become more and more extensive. However, in conventional Von-Neumann architecture, memories and computing units are connected with limited bus. The frequent transfer of data between memories and computing units will generate huge energy consumption. This severely limits the development of convolutional neural networks, which has large amounts of data and high computational density [1]. In order to overcome the above limitations, the architecture of Computation-in-Memory (CIM) was proposed and became a promising field both in the academia and industry [2]-[6]. CIM architecture embeds the computing circuits in the memory, therefore, it can perform some calculations in the memory while serving as an ordinary memory. Computation in memory will greatly reduce the data migration, the energy consumption of accessing memory and increases the calculation speed [2]-[14]. And CIM architecture is considered to be one of the mainstream trends of artificial intelligence algorithm hardware acceleration in the future.

Select All
1.
K. Bong, S. Choi, C. Kim, S. Kang, Y. Kim and H. Yoo, "14.6 A 0.62mW Ultra-Low-Power Convolutional-Neural-Network Face-Recognition Processor and a CIS Integrated with Always-On Haar-Like Face Detector", 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 248-249, 2017.
2.
Z. Zhang et al., "A 55nm 1-to-8 bit Configurable 6T SRAM based Computing-in-Memory Unit-Macro for CNN-based AI Edge Processors", 2019 IEEE Asian Solid-State Circuits Conference (ASSCC), pp. 217-218, 2019.
3.
R. Liu et al., "Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks", 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), pp. 1-6, 2018.
4.
X. Si et al., "24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning", 2019 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 396-398, 2019.
5.
J. Su et al., "15.2 A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips", 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 240-242, 2020.
6.
X. Si et al., "15.5 A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips", 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 246-248, 2020.
7.
M.-A. Lebdeh et al., "Memristive Device Based Circuits for Computation-in-Memory Architectures", 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1-5, 2019.
8.
P. Chi et al., "PRIME:A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory", 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 27-39, 2016.
9.
C.-X. Xue et al., "A CMOS-Integrated Compute-in-Memory Macro based on Resistive Random-Access Memory for AI Edge Devices", Nature Electronics, vol. 4, pp. 81-90, Jan. 2021.
10.
W.-H. Chen et al., "CMOS-Integrated Memristive Non-Volatile Computing-in-Memory for AI Edge Processors", Nature Electronics, vol. 2, no. 9, pp. 420-428, Sep. 2019.
11.
C.-X. Xue et al., "15.4 A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices", 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 244-246, 2020.
12.
W.-H. Chen et al., "A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors", 2018 IEEE International Solid - State Circuits Conference - (ISSCC), pp. 494-496, 2018.
13.
C. Xue et al., "24.1 A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors", 2019 IEEE International Solid-State Circuits Conference - (ISSCC), pp. 388-390, 2019.
14.
X. Guo et al., "Temperature-Insensitive Analog Vector-by-Matrix Multiplier based on 55 nm NOR Flash Memory Cells", 2017 IEEE Custom Integrated Circuits Conference (CICC), pp. 1-4, 2017.
15.
M.-F. Chang et al., "Nonvolatile Circuits-Devices Interaction for Memory Logic and Artificial Intelligence", 2018 IEEE Symposium on VLSI Technology, pp. 171-172, 2018.
16.
M.-F. Chang et al., "An Asymmetric-Voltage-Biased Current-Mode Sensing Scheme for Fast-Read Embedded Flash Macros", IEEE Journal of Solid-State Circuits, vol. 50, no. 9, pp. 2188-2198, Sept. 2015.
17.
M.-F. Chang et al., "A Process Variation Tolerant Embedded Split-Gate Flash Memory Using Pre-Stable Current Sensing Scheme", IEEE Journal of Solid-State Circuits, vol. 44, no. 3, pp. 987-994, March 2009.
18.
X. Bi, Z. Gu and Q. Xu, "Analysis and Design of Ultra-Large Dynamic Range CMOS Transimpedance Amplifier With Automatically-Controlled Multi-Current-Bleeding Paths", IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 66, no. 9, pp. 3266-3278, Sept. 2019.
19.
R. Liu et al., "Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks", 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), pp. 1-6, 2018.
20.
Y. Cai et al., "Low Bit-Width Convolutional Neural Network on RRAM", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 7, pp. 1414-1427, July 2020.
21.
R. Han et al., "A Novel Convolution Computing Paradigm Based on NOR Flash Array With High Computing Speed and Energy Efficiency", IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 66, no. 5, pp. 1692-1703, May 2019.
22.
Y. C. Xiang et al., "Analog Deep Neural Network Based on NOR Flash Computing Array for High Speed/Energy Efficiency Computation", 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1-4, 2019.

Contact IEEE to Subscribe

References

References is not available for this document.