I. Introduction
Resistive random access memory (ReRAM)-based processing-in-memory (PIM) architectures can perform in-situ computation within the memory devices, and they have demonstrated great potential in accelerating the deep neural network (DNN) training/inference [1], [2]. However, the manufacturing technology of the ReRAM devices is still in its early stage, and there exist many challenges to its practical adoption [3]. Most of the works about ReRAM-based DNN accelerators have overlooked practical considerations and rely on an idealized assumption regarding the ReRAM devices and associated the analog-to-digital converter (ADC) overhead. They assume that it is possible to activate all the rows and columns of a or array simultaneously within a single clock cycle without impacting computational accuracy [4], [5]. However, there are several challenges that render this assumption impractical. The major problem is the conductance variation of the ReRAM devices. Since, ReRAM crossbar arrays leverage the Kirchhoff’s current law to perform the vector-matrix multiplication (VMM) operations, the conductance variation accumulated along the bitlines (BLs) is proportional to the number of concurrently activated wordlines (WLs) [6]. Activating too many WLs simultaneously also leads to high BL current, which would induce significant input register (IR)-drop and cause nonuniform voltage and current distribution along the crossbar [7]. Therefore, to achieve high-accuracy computation, the number of WLs that can be activated within a crossbar array simultaneously should be limited. Another challenge is that for the practical ReRAM-based DNN accelerators, the number of ADCs for each crossbar array should be restricted as they consume a significant amount of power and area [5], [8]. As such, it is necessary to share one ADC among multiple BLs. Given that an ADC can only convert the signals of one BL in a single clock cycle, the number of BLs that can be activated simultaneously should match the number of ADCs in each crossbar [3]. For a practical ReRAM-based DNN accelerator, the VMM on the crossbar arrays should operate at a much finer granularity, termed as an operation unit (OU), rather than at the subarray granularity [3], [9], [10]. It is demonstrated by several recent studies that for a practical ReRAM-based DNN accelerator to attain an acceptable level of inference accuracy, only nine WLs and eight BLs can be turned on concurrently [3], [11], [12].