I. Introduction
Artificial intelligence based on neural network (NN) has enabled various emerging applications, ranging from the edge to cloud computing, such as computer vision, language processing, and molecular discovery for scientific applications [1], [2], [3], [4]. A key attribute of those applications is that they heavily rely on computation with huge data, such as high-dimensional matrices and tensors. Thus, the efficiency and performance of the NN inference, which is dominated by matrix-vector multiplies (MVMs), are limited by the memory access and I/O bandwidth [5]. However, due to the high-dimensionality computation in MVMs, Von Neumann architecture-based computing platforms are not well adapted to NN inference. A new hardware solution needs to be proposed to advance modern AI development.