I. Introduction
Many state-of-the-art systems for machine learning [1]–[2] can achieve good performance on various recognition tasks, such as image classification, speech recognition and tracking detecting. But the systems are limited by memory in terms of the energy they require and the performance they can achieve [3]. For example, deep neural networks (DNNs) bring huge performance improvement, but it consumes a lot of computational and memory resources [2]. Furthermore, it causes long latency and high power consumption due to heavy data movement [4]. Therefore, many researchers study how to reduce the data movement and computational power consumption of neural network. Recently, the emerging architecture that performs computations inside the memory array, also called computing-in-memory (CIM), is widely used in DNNs to reduce data movement across the memory layers [3]–[6]. While most researchers have focused on designing inference engines, many tasks such as adapting to changing environments in real-time have shown that on-chip training could be extremely important.