# Stochastic Learning in Neuromorphic Hardware via Spike Timing Dependent Plasticity With RRAM Synapses

Giacomo Pedretti, *Student Member, IEEE*, Valerio Milo, *Student Member, IEEE*, Stefano Ambrogio, *Member, IEEE*, Roberto Carboni, *Student Member, IEEE*, Stefano Bianchi, Alessandro Calderoni, Nirmal Ramaswamy, *Senior Member, IEEE*,

Alessandro S. Spinelli, Senior Member, IEEE, and Daniele Ielmini, Senior Member, IEEE

Abstract—Hardware processors for neuromorphic computing are gaining significant interest as they offer the possibility of real in-memory computing, thus by-passing the limitations of speed and energy consumption of the von Neumann architecture. One of the major limitations of current neuromorphic technology is the lack of bio-realistic and scalable devices to improve the current design of artificial synapses and neurons. To overcome these limitations, the emerging technology of resistive switching memory has attracted wide interest as a nanoscaled synaptic element. This paper describes the implementation of a perceptron-like neuromorphic hardware capable of spiketiming dependent plasticity (STDP), and its operation under stochastic learning conditions. The learning algorithm of a single or multiple patterns, consisting of either static or dynamic visual input data, is described. The impact of noise is studied with respect to learning efficiency (false fire, true fire) and learning time. Finally, the impact of stochastic learning rule, such as the inversion of the time dependence of potentiation and depression in STDP, is considered. Overall, the work provides a proof of concept for unsupervised learning by STDP in memristive networks, providing insight into the dynamics of stochastic learning and supporting the understanding and design of neuromorphic networks with emerging memory devices.

*Index Terms*—Resistive switching memory (RRAM), artificial synapse, neuromorphic network, memristive device, pattern learning.

# I. INTRODUCTION

EUROMORPHIC circuits are gaining momentum as a new computing platform for brain-inspired learning and inference with widespread applications including robotics,

Manuscript received June 14, 2017; revised October 6, 2017; accepted November 8, 2017. Date of publication November 13, 2017; date of current version April 3, 2018. This work was supported by the European Research Council under Grant ERC-2014-CoG-648635-RESCUE. This paper was recommended by Guest Editor T. Karnik. (Corresponding author: Daniele Ielmini.)

- G. Pedretti, V. Milo, R. Carboni, S. Bianchi, A. S. Spinelli, and D. Ielmini are with the Dipartimento di Elettronica, Informazione e Bioingegneria and Italian Universities Nanoelectronics Team, Politecnico di Milano, 20133 Milano, Italy (e-mail: daniele.ielmini@polimi.it).
- S. Ambrogio was with the Dipartimento di Elettronica, Informazione e Bioingegneria and Italian Universities Nanoelectronics Team, Politecnico di Milano, 20133 Milano, Italy. He is now with IBM Research-Almaden, San Jose, CA 95120 USA.
- A. Calderoni and N. Ramaswamy are with Micron Technology Inc., Boise, ID 83707 USA

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JETCAS.2017.2773124

automotive, Internet of things (IoT) and Big data. The main characteristic feature of neuromorphic hardware is the nonvon Neumann architecture, where memory and processing units are not physically separated according to the in-memory computing paradigm [1]. A neuromorphic processor consists of a network of neurons connected by synapses, the latter storing information about the strength (weight) of the connections within the network. Information is generally carried by spikes: neurons integrate incoming input spikes and generate an output spike as the internal potential exceeds a certain firing threshold [2]. Such event-driven computing scheme is at the basis of the high energy efficiency of neuromorphic hardware, as energy is consumed only when and where information is processed [3]. This makes neuromorphic hardware highly competitive for complex tasks, such as real-time processing of visual data [4], [5], pattern detection [6], [7], probabilistic inference [8], and constraint satisfaction problems [9].

One of the key limitations of neuromorphic hardware is the typical area of neuron and synaptic circuits, which limits the maximum number of neurons within the chip. For instance, IBM's TrueNorth contains 1 million neurons and 256 million synapses within an area of 4.3 cm<sup>2</sup> [3]. Considering that this is fabricated in 28-nm CMOS process technology, it is hard to upscale the neuron and synapse density to a level comparable to the human brain (about  $10^{11}$  neurons and about  $10^{14}$ synapses), unless a new technology with better scalability and higher functionality is adopted. Recently, emerging nonvolatile memory devices including resistive switching memory (RRAM) and phase change memory (PCM) have been considered as potential neuron and synapse technology in neuromorphic circuits [1]. A distinctive advantage of RRAM and PCM is that they can store multivalued resistance levels within an extremely small area, in the range of just few F<sup>2</sup>, where F is the minimum lithographic feature size. Also, RRAM and PCM can natively feature plasticity rules, which are essential to enable on-line learning in neuromorphic hardware. For instance, spike-timing dependent plasticity (STDP), one of the weight update processes in biological neural networks, was demonstrated in both PCM [10], [11] and RRAM [12]–[15] by overlapping pulses at 2 terminals of the memory devices. Both individual memory devices [10], [12] and hybrid combinations of RRAM and CMOS transistors, such as one-transistor/oneresistor (1T1R) structures [13], [14] and 2-transistor/one-



Fig. 1. (a) Structure of the STDP synapse, (b) 3D-color map of STDP characteristics, namely the measured change of conductance in logarithmic scale  $\eta$  as a function of  $R_0$  and  $\Delta t$ , and (c) measured and calculated STDP curves as a function of  $\Delta t$  for three different initial states  $R_0$  corresponding to LRS (blue), an intermediate resistance state (orange) and HRS (red).

resistor (2T1R) structures [15], were used to implement plastic synapses.

While STDP has been extensively demonstrated in several cases for individual synapses, learning and recognition at the level of spiking neural network have been experimentally explored in only few cases and with insufficient amount of hardware synapses [16]–[18]. Most importantly, previous works did not explore the details of stochastic learning, such as the impact of noise spikes on learning efficiency and speed. This work addresses the stochastic learning in a neuromorphic network with RRAM synapses capable of STDP. The network consists of a fully-connected feed-forward perceptron with up to 16 pre-synaptic neurons (PREs) in the first layer and up to 2 post-synaptic neurons (POSTs) in the second layer. Visual pattern learning is demonstrated by presenting input pattern spikes alternated with stochastic noise spikes to the PRE layer. The integration and fire activity in the POST and the resulting STDP in the synaptic network lead to online learning of the pattern. The impact of random noise and of the shape of the STDP characteristic on learning is comprehensively analyzed by simulations and experiments. Although our results have been obtained for a relatively small scale prototype, a higher scale spiking neural network might take advantage of the same learning schemes. Brain inspired circuits generally consist of reconfigurable networks of spiking neurons and plastic synapses, where STDP is a key learning scheme [19]. For instance, STDP is the enabling mechanism for attractor formation in recurrent networks, which are the basis for associative memories and context-dependent decision making [20]. Further up-scaling of these neuromorphic circuits and processors might require nanoscale devices such as RRAM or PCM elements [21]. Thus, controlling and optimizing STDP-based unsupervised learning in hybrid synapses are essential steps toward the development of large-scale spiking neural networks capable of plasticity.

#### II. SYNAPSE CIRCUIT AND STDP CHARACTERISTICS

Fig. 1(a) shows the structure of the 1T1R synapse adopted in this work. In the 1T1R synapse, the 1R consists of a bipolar switching RRAM device with a  $HfO_x$  dielectric layer, a Ti top electrode (TE) and a TiN bottom electrode (BE) [22]. The application of a positive TE voltage  $V_{TE}$  to the RRAM caused set transition from the high resistance state (HRS) to the low

resistance state (LRS), while a negative  $V_{TE}$  caused reset transition from LRS to HRS. An integrated transistor (1T) allowed for the selection of the RRAM element by the gate voltage  $V_G$ , and for the limitation of the maximum compliance current I<sub>C</sub> during the set transition to control the resistance of the LRS. Typically, I<sub>C</sub> was kept in the range between 10  $\mu$ A and 50  $\mu$ A to minimize power consumption during STDP. The PRE spike drives  $V_G$ , while the resulting synaptic current flows into the POST through the BE of the RRAM. The POST also applies a  $V_{TE}$  bias to induce the synaptic current in correspondence of the PRE spike. As the integrated spikes at the POST reach the fire threshold, a POST spike is generated, and a feedback spike with a positive pulse followed by a negative pulse is applied to  $V_{TE}$  [14]. As a result, if the POST spike follows the PRE spike, or equivalently the PRE/POST spike delay  $\Delta t$ is positive, the overlap between the  $V_G$  pulse and the positive part of the  $V_{TE}$  pulse will cause set transition, or synaptic potentiation. On the other hand, if the POST spike precedes the PRE spike, or equivalently  $\Delta t$  is negative, the overlap between the  $V_G$  pulse and the negative part of the  $V_{TE}$  pulse will cause reset transition, or synaptic depression. Fig. 1(b) shows the measured change of conductance  $\eta = R_0/R$ , where R<sub>0</sub> and R are the synapse resistance values before and after the overlap event, respectively. The figure shows  $\eta$  as a function of  $\Delta t$  and  $R_0$ , evidencing resistance-dependent potentiation and depression for positive and negative  $\Delta t$ , respectively [16], [23]. This result is also supported by measured and calculated STDP curves as a function of  $\Delta t$  for three different initial resistance R<sub>0</sub> shown in Fig. 1(c) [14]. Note that, no overlap takes place for  $|\Delta t| > 10$  ms corresponding to the pulse-width of the PRE spike, therefore the synaptic conductance does not change in this delay time range. Also, note that the change of conductance depends on R<sub>0</sub>: for instance, only potentiation can take place for synapses in the HRS ( $R_0 \approx 1 \text{ M}\Omega$ ), while only depression can take place for synapses in the  $\approx$  25 k $\Omega$ ). This is due to the adoption of a LRS (R<sub>0</sub> 1T1R architecture leading the synapse resistance always to fall within a fixed window between a minimum value of LRS, dictated by I<sub>C</sub>, and a maximum value of HRS, which depends on the RRAM device [14].

### III. NEUROMORPHIC NETWORK

We developed a perceptron-like neuromorphic network where neurons were connected by 1T1R synapses described



Fig. 2. (a) Schematic illustration of the perceptron-like neuromorphic network with one POST, (b) electrical implementation of the circuit, and (c) picture of the circuit implemented on a PCB.

in Sec. II. Fig. 2(a) schematically illustrates the neuromorphic network, consisting of a first layer of N PREs fully connected to one POST by N synapses. Fig. 2(b) shows the schematic circuit that was used to realize the neuromorphic network, with N = 16 neurons and synapses. An Arduino Due microcontroller ( $\mu$ C) was used to send the input spikes of voltage  $V_G$  to the gate of 16 synapses via the PRE switches, and to provide the common  $V_{TE}$  bias to the TE of the synapses. All synaptic currents were collected and sent to a trans-impedance amplifier (TIA), for conversion to a voltage and submission to the  $\mu$ C for numerical integration. At fire, the  $\mu$ C switched the voltage applied to the synaptic TEs by controlling the multiplexer (MUX), to send the feedback spike causing potentiation/depression. During fire, the integration of the synaptic currents was disabled by the fire switch. Fig. 2(c) shows a picture of the printed circuit board (PCB) used to implement the neuromorphic circuit. STDP in the circuit resulted from the overlap between the PRE spikes applied to the transistor gates and the feedback spikes delivered to the RRAM TEs.

# A. On-Line Learning of a Single Pattern

To demonstrate learning of a single pattern in the neuromorphic hardware of Fig. 2, we adopted a stochastic algorithm where the visual pattern was alternated with random noise [8], [11], [14]. All input data were submitted at time steps (epochs) every 10 ms, corresponding to the PRE pulse-width. At each epoch, either the full pattern or a random noise with average relative pixel density N = 5% was submitted to the synaptic network. Each PRE which produced a spike at a given epoch was inhibited for the following epoch, as if it was affected by a refractory time. Fig. 3(a) shows an example of the presented input spikes, where red and blue dots correspond to the pattern and noise input, respectively. In the specific case, the pattern initially consisted of the 4 pixels in the top row of the  $4 \times 4$ matrix, which was submitted for 500 epochs. Fig. 3(b) shows the evolution of the synaptic weights as a function of the epochs, while Fig. 3(c) shows the submitted patterns. The synaptic weights were initially prepared in a random state of resistance continuously and uniformly distributed between the LRS and HRS, as shown in Fig. 3(d). After 500 epochs,



Fig. 3. (a) Input spikes as a function of the epochs, showing input pattern as red dots and input noise as blue dots, (b) corresponding evolution of the synaptic weights, showing pattern synapses as red lines and background synapses as blue lines, (c) submitted patterns and (d) corresponding synaptic weights, showing (from left to right) the weights at epoch 1, epoch 500, epoch 1000, epoch 1500, and epoch 2000.

the synaptic weights corresponding to the submitted pattern were all potentiated (see the red lines in Fig. 3(b)), whereas all other synaptic weights, which will be referred to as the background synapses, were all depressed (see the blue lines in Fig. 3(b)). The different change of conductance in the pattern and background can be explained as follows: as the full pattern is submitted, there is a high probability of fire, thus inducing a feedback spike following the pattern PRE spike. This corresponds to a positive delay at pattern synapses, hence potentiation according to the STDP characteristic in Fig. 1(b). The refractory time (i.e., any neuron spiking at epoch i cannot also spike at epoch i+1), prevents the presentation of the pattern in 2 consecutive epochs, thus the presentation of a pattern is always followed by an input noise occurring at background synapses. The sequence pattern-fire-noise thus corresponds to a negative spike delay taking place at background synapses, thus leading to their corresponding depression. As a result, pattern and background synapses undergo potentiation and depression, respectively. Note that, in the case of 2 subsequent



Fig. 4. (a) Schematic illustration of the perceptron-like neuromorphic network with 2 POSTs, (b) pattern presented in the stochastic training during the first phase (epochs 1-1000) and the second phase (epochs 1001-2000), and (c) measured weights of synapses connected to either POST1 or POST2 at the end of the first phase (epoch 1000) and at the end of the second phase (epoch 2000).

noise presentations, the same channels cannot be active in both epochs due to the refractory time. Therefore, noise spike can appear in pattern synapses, thus causing a certain probability for depression of some individual pattern synapses, randomly due to noise.

The capability of background depression allows to overwrite any pattern in the synaptic array, which is useful for online learning of dynamic (instead of static) visual patterns. To demonstrate on-line learning, the submitted pattern was changed at epoch 501 by a one-position shift along the anticlockwise direction as shown in Fig. 3(c). As a result, the synaptic weights were updated by matching the new pattern at epoch 1000, as shown in Fig. 3(d). Similarly, the pattern was changed again at epochs 1001 and epoch 1501 by onestep shifts along the anticlockwise direction, again resulting in on-line learning of the newly presented pattern after 500 epochs. Note that the adjustment of weights in response to incoming spikes can be suitably disabled by avoiding the feedback spike in Fig. 1(a) and the consequent STDP process. In this 'recognition' mode, the presentation of a new pattern does not affect the stored weights in the network.

## B. On-Line Learning of Multiple Patterns

A similar learning was demonstrated for the case of multiple patterns by including 2 POSTs as shown in Fig. 4(a). Here, both POST1 and POST2 were fully connected to the first layer by plastic excitatory synapses, as well as being mutually connected by inhibitory synapses, where the internal potential of a POST was inhibited, i.e., reduced by a given percentage, in correspondence of fire in the other POST. Inhibitory synapses allow for winner-take-all (WTA) dynamics, where the 2 POSTs specialize to either of the 2 presented patterns, thus maximizing the information storage efficiency within the

neuromorphic network [18], [24]. To test on-line learning of multiple patterns, 2 patterns and random noise were stochastically submitted with the same probabilities to the 1st layer of the network. Fig. 4(b) shows the 2 presented patterns in a  $3 \times 3$  PRE array, corresponding to the top and bottom rows during the first 1000 epochs, followed by the left and right columns during the last 1000 epochs within a total of 2000 epoch sequence. Fig. 4(c) shows the final synaptic weights measured at epoch 1000 and epoch 2000 for the synapses connecting to POST1 and POST2.

Fig. 5 shows the submitted patterns (a) and the resulting synaptic weights as a function of the epoch for POST1 (b) and POST2 (c). The synaptic weights were initially prepared in random states. Note that synaptic weights are readily potentiated and depressed at epoch 1000, as the submitted pattern is changed, thus confirming the ability for on-line learning.

#### IV. IMPACT OF NOISE ON LEARNING

Noise is essential for on-line learning, as it induces depression on background synapses, thus allowing to remove, or 'forget', the previously learnt pattern when a new one is submitted [11], [18]. Depression in fact relies on the pattern-fire-noise sequence to realize the condition of negative delay at randomly selected background synapses, which are thus depressed. On the other hand, excessive noise can induce an unwanted noise-fire-pattern sequence, where a randomly submitted noise induces fires followed by the presentation of the pattern, thus realizing a negative delay condition at pattern synapses, which are then depressed. There is thus a tradeoff in the random noise density N, which must be carefully selected to result in stable and fast learning.

#### A. Impact on Learning Efficiency

To study the impact of noise on learning and to evaluate the optimized N for stable learning, we conducted experiments and simulations of a neuromorphic network of  $4 \times 4$  PREs and one POST, i.e., similar to the one shown in Fig. 2(a) and characterized in Fig. 3. Simulations were conducted with our analytical model for RRAM, based on voltage-controlled filament growth and retraction [25]. For the best accuracy of the simulations, the statistical distribution of resistance in LRS and HRS was also considered, to realistically describe the random variations in the number of defects and in the shape of the conductive filament [26]. In the simulations, we assumed the same sequence of pattern and noise that was experimentally submitted, and the same initial distribution of synaptic weights.

Fig. 6 shows the submitted spikes (top), the measured synaptic weights for pattern and background synapses (center), and the corresponding calculated weights (bottom), for three values of noise density, namely N=5% (a), N=10% (b), and N=15% (c). The noise density is defined as the average number of activated pixels during noise submission, divided by the total number of pixels, i.e., 16 in the present case. Experimental results show that learning is stable for N=5%, and becomes increasingly unstable as N is increased to 10% and 15%.



Fig. 5. (a) Input data presented to the perceptron-like neuromorphic network with 2 POSTs shown in Fig. 4 and corresponding evolution of weights connected to (b) POST1 and (c) POST2 as a function of epochs. Both show potentiation of pattern synapses and depression of background synapses.



Fig. 6. (top) Input spikes, (center) measured synaptic weights for pattern and background synapses, and (bottom) corresponding calculated weights, for three values of noise density, namely (a) N = 5%, (b) N = 10%, and (c) N = 15%. Learning becomes increasingly unstable as N increases from (a) to (c).

This occurs because the increasing noise causes noise-firepattern sequences, thus leading to the sudden depression of the pattern. Simulations well account for the observed phenomena, confirming the presence of a tradeoff between learning speed and stability in the stochastic learning algorithm.

Fig. 7 shows the learning efficiency  $P_{learn}$  (a) and the error probability  $P_{err}$  (b), as a function of N. To measure  $P_{learn}$  and P<sub>err</sub>, we trained the synaptic network with a 4-pixel pattern, i.e., the diagonal of the  $4 \times 4$  PRE array, for 1000 epochs and collected the synaptic weights over the whole duration of the training phase. Then, we made an off-line evaluation of the response of the network to the application of a reference pattern, for any epoch during the training phase. The learning efficiency P<sub>learn</sub>, or 'true fire' probability, is defined as the probability of fire in response to a reference pattern equal to the one used during training. The error probability  $P_{err}$ , or 'false fire' probability, is instead defined as the probability of fire in response to all reference patterns having the same pixel density as the submitted pattern, i.e., a density P = 25% in this case. Simulations were carried out following the same procedure for recognition after training. From the experimental results, Plearn decreases at increasing N as result of the unstable learning evidenced in Fig. 6. On the other hand, Perr first decreases at increasing N, reaching a minimum

value around  $N_{opt} = 3\%$ , which is due to the beneficial effect of noise to induce depression. In fact, if no input noise is presented (N = 0%), background synapses are not depressed, thus causing possible false fires. As N increases, depression of background synapses improves, thus suppressing the probability of false fire. However, increasing N above  $N_{opt}$  results in unstable learning as shown in Fig. 6, thus inducing an increase of the probability of pattern depression and background potentiation, which consequently raises  $P_{err}$ . That noise is beneficial for improving the efficiency and these results indicate that the efficiency of on-line learning improves with noise. Although this result has been obtained with a relatively small scale network, the same conclusion can be drawn for larger networks, although the exact value of  $N_{opt}$  may change with the network size.

#### B. Impact on Learning Time

Noise also affects the learning time  $\tau_{learn}$  to converge to the desired synaptic weights. This is illustrated in Fig. 8, showing the average synaptic weights for the pattern and the background during training as a function of the epoch, for increasing N = 1% (a), 2% (b) and 5% (c). While potentiation is almost instantaneous, almost realizing a one-shot learning





Fig. 7. (a) Measured and calculated learning efficiency  $P_{learn}$  and (b) error probability  $P_{err}$ , namely false fire probability, as a function of N.

in most cases, depression takes place within a larger number of epochs, due to the uncorrelated nature of noise and the relatively small N. Therefore,  $\tau_{learn}$  can be approximated as the time to depress background synapses to a certain level, which was quantified as 66 k $\Omega$  in our experiments (see dashed horizontal line in Fig. 8). As N increases,  $\tau_{learn}$ decreases as background depression becomes faster. On a first approximation,  $\tau_{learn}$  is inversely proportional to the probability of background depression, which is proportional to the noise pixel density N, thus  $\tau_{learn} \propto N^{-1}$ . Fig. shows the measured and calculated  $\tau_{learn}$  as a function of N, indicating a substantial decrease according to the hyperbolic behavior  $\tau_{learn} \propto N^{-1}$ . In considering these results, one should recall that Perr in Fig. 7(b) increases substantially above N = 5%, thus resulting in a tradeoff between learning speed and efficiency. Simulation results also agree very well with the experiments, suggesting a strong reproducibility and predictability of STDP-based learning in our neuromorphic hardware.

# C. Effects of Noise-Pattern Mixing

The stochastic training adopted so far consists of separate submission of pattern and noise, where pattern is not corrupted by noise at any epoch. A more realistic case, however, is the coexistence of noise and pattern, thus resulting in a noise-pattern mixing, or noise-induced pattern corruption. To test more realistic training with noise/pattern intermixing,



Fig. 8. Average synaptic weights for pattern and background synapses during training as a function of the epoch, for increasing (a) N=1%, (b) 2% and (c) 5%.



Fig. 9. Measured and calculated  $\tau_{learn}$  as a function of N. The learning time decreases at increasing N, thanks to the enhanced probability of background depression.

we stochastically submitted the pattern and the noise with uniform probability of 50%. As a result, we obtained 4 possible cases, namely (i) void (no pattern and no noise), (ii) pattern only, (iii) noise only, and (iv) both pattern and noise. In the latter case, pattern was corrupted by noise, which might either turn a background pixel on, or turn a pattern pixel off, depending on the position of the noise pixels. These input data are illustrated in Fig. 10, showing the uncorrupted pattern, namely a  $4\times 4$  diagonal (a), a typical input noise with N=5% (b), and a corrupted pattern (c). After training for 1000 epochs, the average synaptic weights calculated over the whole training sequence were found to adapt to the input pattern as shown in Fig. 10(d) (experimental results) and Fig. 10(e) (simulation results), despite the corruption of the pattern by



Fig. 10. (a) Uncorrupted pattern, (b) typical input noise with N=5%, (c) corrupted pattern, (d) average of measured synaptic weights, (e) average of calculated synaptic weights, (f) submitted input spikes, and (g) evolution of the measured mean synaptic weights for the pattern and the background as a function of epochs.

the additive noise. Fig. 10(f) shows the submitted input spikes, including equal amounts of pattern and noise with pixel density N. Fig. 10(g) shows the measured evolution of the synaptic weights for the pattern and the background as a function of epochs, indicating a stochastic behavior during training. This can be understood by the insertion of noise in the submitted pattern, which results in a random potentiation of background synapses at the submission of a corrupted pattern. At the same time, pattern synapses remain substantially constant after potentiation, thanks to the refractory period which prevents the occurrence of a pattern-fire-pattern sequence, which would cause pattern depression. Despite the stochastic potentiation of random background synapses, their average weight remains significantly lower than the pattern by about a factor 2, thus allowing for discrimination and recognition of the pattern.

#### V. ANTI-STDP LEARNING

The STDP characteristics can be mapped by the shape of the PRE and feedback spikes that overlap across the 1T1R synapse [11], [13], [14]. As a result, the STDP characteristics can be easily changed or even reconfigured by changing the shape of the feedback spike [23], [27]. To explore the impact of the STDP characteristic shape on learning, we replaced the STDP feedback spike (Fig. 11(a)) with the anti-STDP feedback spike (Fig. 11(b)), where the positive and negative pulses are exchanged, thus resulting in potentiation for  $\Delta t < 0$  and depression for  $\Delta t > 0$ . In our neuromorphic hardware, reconfiguration of the feedback spike is as easy as the change of  $V_{TE+}$  and  $V_{TE-}$  connected to the MUX in Fig. 2(b). To achieve anti-STDP behavior, we simply exchanged the position of  $V_{TE+}$  and  $V_{TE-}$  to invert the shape of the feedback spike, as shown in Fig. 11(b).

Fig. 11(c) shows the measured synaptic weights during an experiment with anti-STDP, indicating random switching of pattern and background synapses with no net learning. The figure also shows the measured synaptic weights at epoch 1 (Fig. 11(d)) and epoch 1000 (Fig. 11(e)), and their average during the 1000 epochs of the training phase (Fig. 11(f)). Even the latter average weights indicate a substantial absence



Fig. 11. Schematic illustration of the feedback spike waveform  $V_{TE}$  for (a) STDP and (b) anti-STDP. (c) Evolution of measured synaptic weights as a function of epoch, (d) measured weight maps at epoch 1 and (e) epoch 1000, and (f) average weight map obtained during the 1000 epochs of the training phase.

of learning, which can be explained based on the anti-STDP shape in Fig. 11(b): as soon as pattern synapses are randomly potentiated, their enhanced weights increase the probability of fire in response to the presentation of a pattern as input data. Any resulting sequence pattern-fire, however, would lead to pattern depression, instead of potentiation, since a positive delay  $\Delta t > 0$  causes depression according to the anti-STDP learning rule. As a result, the pattern weights would be readily depressed after potentiation. The same happens for any other random group of synapses that are potentiated as a result of the submission of random noise. Anti-STDP thus results in stochastic oscillatory behavior of synaptic weights.

# VI. CONCLUSIONS

We presented a neuromorphic hardware with RRAM synapses which is capable of unsupervised learning by STDP. The system is implemented on a PCB connecting an Arduino Due  $\mu$ C and discrete RRAM elements with 1T1R integrated structure. Thanks to a stochastic learning scheme with a randomly alternated presentation of pattern and noise, the system is capable of on-line learning for both static and dynamic patterns. Noise is shown to improve learning time and decrease the number of false fires, although exceeding a certain noise density causes unstable learning. The impact of learning algorithms, i.e., mixing noise and patterns, or exchanging the time-dependence of potentiation and depression in an anti-STDP fashion, is finally discussed. Despite the small scale of our network, these results provide helpful guidelines to optimize unsupervised learning conditions, in view of larger scale integrated circuits for neuromorphic computation.

#### REFERENCES

- G. Indiveri and S.-C. Liu, "Memory and information processing in neuromorphic systems," *Proc. IEEE*, vol. 103, no. 8, pp. 1379–1397, Aug. 2015.
- [2] E. Chicca, F. Stefanini, C. Bartolozzi, and G. Indiveri, "Neuromorphic electronic circuits for building autonomous cognitive systems," *Proc. IEEE*, vol. 102, no. 9, pp. 1367–1388, Sep. 2014.
- [3] P. A. Merolla *et al.*, "A million spiking-neuron integrated circuit with a scalable communication network and interface," *Science*, vol. 345, no. 6197, pp. 668–673, Aug. 2014.
- [4] R. Serrano-Gotarredona et al., "CAVIAR: A 45 k neuron, 5 M synapse, 12 G connects/s AER hardware sensory–processing–learning–actuating system for high-speed visual object recognition and tracking," *IEEE Trans. Neural Netw.*, vol. 20, no. 9, pp. 1417–1438, Sep. 2009.
- [5] M. Giulioni, F. Corradi, V. Dante, and P. del Giudice, "Real time unsupervised learning of visual stimuli in neuromorphic VLSI systems," *Sci. Rep.*, vol. 5, p. 14730, Oct. 2015.
- [6] T. Masquelier, R. Guyonneau, and S. Thorpe, "Spike timing dependent plasticity finds the start of repeating patterns in continuous spike trains," *PLoS ONE*, vol. 3, p. e1377, Jan. 2008.
- [7] S. Sheik, S. Paul, C. Augustine, and G. Cauwenberghs, "Membrane-dependent neuromorphic learning rule for unsupervised spike pattern detection," in *Proc. IEEE Biomed. Circuits Syst. Conf. (BioCAS)*, Oct. 2016, pp. 164–167.
- [8] W. Maass, "Noise as a resource for computation and learning in networks of spiking neurons," *Proc. IEEE*, vol. 102, no. 5, pp. 860–880, May 2014.
- [9] H. Mostafa, L. K. Müller, and G. Indiveri, "An event-based architecture for solving constraint satisfaction problems," *Nature Commun.*, vol. 6, p. 8941, Dec. 2015.
- [10] D. Kuzum, R. G. D. Jeyasingh, B. Lee, and H.-S. P. Wong, "Nanoelectronic programmable synapses based on phase change materials for brain-inspired computing," *Nano Lett.*, vol. 12, no. 5, pp. 2179–2186, 2012.
- [11] S. Ambrogio et al., "Unsupervised learning by spike timing dependent plasticity in phase change memory (PCM) synapses," Frontiers Neurosci., vol. 10, p. 56, Mar. 2016.
- [12] S. Yu, Y. Wu, R. Jeyasingh, D. Kuzum, and H.-S. P. Wong, "An electronic synapse device based on metal oxide resistive switching memory for neuromorphic computation," *IEEE Trans. Electron Devices*, vol. 58, no. 8, pp. 2729–2737, Aug. 2011.
- [13] S. Ambrogio, S. Balatti, F. Nardi, S. Facchinetti, and D. Ielmini, "Spike-timing dependent plasticity in a transistor-selected resistive switching memory," *Nanotechnology*, vol. 24, no. 38, p. 384012, Sep. 2013.
- [14] S. Ambrogio et al., "Neuromorphic learning and recognition with one-transistor-one-resistor synapses and bistable metal oxide RRAM," IEEE Trans. Electron Devices, vol. 63, no. 4, pp. 1508–1515, Apr. 2016.
- [15] Z.-Q. Wang, S. Ambrogio, S. Balatti, and D. Ielmini, "A 2-transistor/1resistor artificial synapse capable of communication and stochastic learning in neuromorphic systems," *Frontiers Neurosci.*, vol. 8, p. 438, Jan. 2015.
- [16] V. Milo et al., "Demonstration of hybrid CMOS/RRAM neural networks with spike time/rate-dependent plasticity," in *IEDM Tech. Dig.*, Dec. 2016, pp. 440–443.
- [17] A. Serb, J. Bill, A. Khiat, R. Berdan, R. Legenstein, and T. Prodromakis, "Unsupervised learning in probabilistic neural networks with multi-state metal-oxide memristive synapses," *Nature Commun.*, vol. 7, p. 12611, Sep. 2016.
- [18] G. Pedretti et al., "Memristive neural network for on-line learning and tracking with brain-inspired spike timing dependent plasticity," Sci. Rep., vol. 7, no. 1, p. 5288, 2017.
- [19] N. Qiao et al., "A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128 K synapses," Frontiers Neurosci., vol. 9, p. 141, 2015.
- [20] E. Neftci, J. Binas, U. Rutishauser, E. Chicca, G. Indiveri, and R. J. Douglas, "Synthesizing cognition in neuromorphic electronic systems," *Proc. Nat. Acad. Sci. USA*, vol. 110, no. 37, pp. E3468–E3476, 2013.
- [21] T. Serrano-Gotarredona, T. Masquelier, T. Prodromakis, G. Indiveri, and B. Linares-Barranco, "STDP and STDP variations with memristors for spiking neuromorphic learning systems," *Frontiers Neurosci.*, vol. 7, no. 2, pp. 1–15, 2013.
- [22] A. Calderoni, S. Sills, and N. Ramaswamy, "Performance comparison of O-based and Cu-based ReRAM for high-density applications," in *Proc. IMW*, 2014, pp. 1–4.

- [23] M. Prezioso, F. M. Bayat, B. Hoskins, K. Likharev, and D. Strukov, "Self-adaptive spike-time-dependent plasticity of metal-oxide memristors," Sci. Rep., vol. 6, p. 21331, Feb. 2016.
- [24] T. Masquelier, R. Guyonneau, and S. J. Thorpe, "Competitive STDP-based spike pattern learning," *Neural Comput.*, vol. 21, no. 5, pp. 1259–1276, 2009.
- [25] S. Ambrogio, S. Balatti, D. C. Gilmer, and D. Ielmini, "Analytical modeling of oxide-based bipolar resistive memories and complementary resistive switches," *IEEE Trans. Electron Devices*, vol. 61, no. 7, pp. 2378–2386, Jul. 2014.
- [26] S. Ambrogio, S. Balatti, A. Cubeta, A. Calderoni, N. Ramaswamy, and D. Ielmini, "Statistical fluctuations in HfO<sub>x</sub> resistive-switching memory Part I—Set/Reset variability," *IEEE Trans. Electron Devices*, vol. 61, no. 8, pp. 2912–2919, Aug. 2014.
- [27] C. Zamarreño-Ramos, L. A. Camuñas-Mesa, J. A. Pérez-Carrasco, T. Masquelier, T. Serrano-Gotarredona, and B. Linares-Barranco, "On spike-timing-dependent-plasticity, memristive devices, and building a self-learning visual cortex," *Frontiers Neurosci.*, vol. 5, p. 26, Mar. 2011.



Giacomo Pedretti (S'17) received the B.S. and M.S. degrees in electrical engineering from the Politecnico di Milano, Milan, Italy, in 2013 and 2016, respectively, where he is currently pursuing the Ph.D. degree in electrical engineering. His main research interests are the design and characterization of neuromorphic networks for beyond-CMOS computing systems.



Valerio Milo (S'17) received the B.S. and M.S. degrees in electrical engineering from the Politecnico di Milano, Milan, Italy, in 2012 and 2015, respectively, where he is currently pursuing the Ph.D. degree in electrical engineering. His main research interests are the modeling and neuromorphic applications of resistive switching memories.



Stefano Ambrogio (M'16) received the B.S. degree, the M.S. degree (cum laude) and the Ph.D. degree in electrical engineering from the Politecnico di Milano, Milan, Italy, in 2010, 2012, and 2016, respectively. He is currently a Post-Doctoral Researcher with IBM Research-Almaden, San Jose, CA, USA. His current research interests include non-volatile memory and cognitive computing. He has received the IEEE EDS Rappaport Award in 2015.



Roberto Carboni (S'16) received the B.S. and M.S. degrees in electrical engineering from the Politecnico di Milano, Milan, Italy, in 2013 and 2016, respectively, where he is currently pursuing the Ph.D. degree in electrical engineering. His main research interests are the characterization and modeling of resistive switching and magnetoresistive memories.



**Stefano Bianchi** received the B.S. and M.S. degrees (*cum laude*) in electrical engineering from the Politecnico di Milano, Milan, Italy, in 2015 and 2017, respectively, where he is currently pursuing the Ph.D. degree in electrical engineering. His main research interests are the modeling and design of neuromorphic networks with resistive switching memories.



Alessandro Calderoni received the Laurea degree (cum laude) in electrical engineering from the Politecnico di Milano, Milano, Italy, in 2006. He is currently with the Emerging Memory Cell Technology Team, Micron Technology Inc., Boise, ID, USA, as a Senior Device Engineer. His current research interests include the characterization of various emerging memory devices and selectors for high-density applications.



**Alessandro S. Spinelli** (M'99–SM'07) is currently a Full Professor of electronics with the Politecnico di Milano, Milan, Italy. His current research interests include the experimental characterization and modeling of nonvolatile memories.



Nirmal Ramaswamy (M'07–SM'09) received the bachelor's degree in metallurgical engineering from IIT Madras, Chennai, India, and the M.S. and Ph.D. degrees in material science and engineering from Arizona State University, Phoenix, AZ, USA. He has been with Micron Technology Inc., Boise, ID, USA, since 2002, where he is currently the Manager of the Emerging Memory Cell Technology Team. His current research interests include various emerging memory technologies for high-density applications.



Daniele Ielmini (SM'09) received the Ph.D. degree in nuclear engineering from the Politecnico di Milano, Italy, in 2000. He is currently a Professor with the Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Italy. He conducts research on emerging nanoelectronic devices, such as phase change memory and resistive switching memory. He has received the Intel Outstanding Researcher Award in 2013, the ERC Consolidator Grant in 2014, and the IEEE-EDS Rappaport Award in 2014.