I. Introduction
Conventional von Neumann (CVN) architecture continuously transfers data between memory banks and compute elements, incurring substantial energy and latency costs that can dominate system power and performance. As shown in Fig. 1, to perform Boolean logic function on two words, a CVN architecture involves three major steps: 1) two memory read cycles; 2) data movement from memory hierarchy to computing element’s registers; and 3) computation within the ALU. Both power consumption and latency are largely dominated by the two memory accesses and the data movement (Fig. 2) [1]–[3]. To minimize the power and latency, in-memory computing (IMC) has been recently proposed for data processing directly inside an on-chip memory macro [4], [5]. As shown in Fig. 1, IMC activates multiple rows simultaneously and computes the logical functions directly on the bitline (BL). Computation results are available immediately at the end of a memory access. Therefore, only one cycle is required instead of multiple cycles of latency [Fig. 2(a)]. As a result, IMC can reduce the energy costs associated with both data movement and memory access, as shown in Fig. 2(b). Since memory banks are typically very wide (many words per word line), it also potentially provides inherently parallel computation.
Data flow required for reading two words to perform logic operation in CVN architecture and IMC system.
Comparison between CVN and IMC in terms of (a) latency and (b) energy for a two-word Boolean operation.