# **A 16 Kb Spin-Transfer Torque Random Access Memory With Self-Enable Switching and Precharge Sensing Schemes**

Li Zhang<sup>1,2</sup>, Weisheng Zhao<sup>3,4</sup>, Yiqi Zhuang<sup>1,2</sup>, Junlin Bao<sup>1,2</sup>, Gefei Wang<sup>4</sup>, Hualian Tang<sup>1,2</sup>, Cong  $Li^{1,2}$ , and Beilei  $Xu^{1,2}$ 

<sup>1</sup>School of Microelectronic, Xi'dian University, Xi'an 710071, China 2Ministry of Education Key Laboratory of Wide Band-Gap Semiconductor Materials and Device, Xi'dian University, Xi'an 710071, China 3Institut d'Electronique Fondamentale, University of Paris-Sud, Orsay F-91405, France <sup>4</sup>School of Electrical Engineering, Beihang University, Beijing 100191, China

**Spin-transfer torque magnetic random access memory (STT-MRAM) is considered one of the most promising non-volatile memory candidates thanks to its excellent performance in terms of access speed, endurance, and compatibility to CMOS. However, high power supply voltage is required in the conventional STT-MRAM writing circuit, which results in high power consumption (e.g.,∼10 pJ/bit). In addition, it suffers from stochastic switching behavior and process voltage temperature variations. These make power-efficient and reliable write/read circuits become critical challenges. In this paper, we present novel circuits and architectures to build a 16 kb STT-MRAM design with low power and high reliability. For example, the self-enable switching scheme reduces the power consumption effectively and the fore-placed sense amplifier improves the robustness to process variation. Using an accurate compact model of 65 nm STT-MRAM and a commercial CMOS design kit, mixed transient and statistical simulations have been performed to validate this design.**

*Index Terms***— High density, high reliability, Monte Carlo simulation, precharge sensing, self-enable switching, spin-transfer torque magnetic random access memory (STT-MRAM).**

### I. INTRODUCTION

**RECENTLY**, magnetic random access memory (MRAM)<br>is considered one of the most promising candidates to build up universal memory. It has many advantages, such as non-volatility, fast read/write speed, unlimited endurance, and compatibility to CMOS  $[1]$ – $[3]$ . As shown in Fig. 1(a), MRAM storage element consists of one magnetic tunnel junction (MTJ) nanopillar and one word selection transistor driven by one word line. An MTJ acts as a resistor with low (*R*P) or high  $(R_{AP})$  value depending on the relative magnetization of the two ferromagnetic layers, either parallel or antiparallel. These low or high resistances allow it to be used as non-volatile binary storage. The tunneling magnetoresistance  $(TMR) = (R_{AP} R_P)/R_P$  characterizes the amplitude of this resistance change, while the resistance area (RA) product, based on  $R<sub>P</sub>$ , characterizes the resistivity of the tunnel barrier. In practical samples used for MRAM, MgO is used for the tunnel barrier [1]–[6] and the TMR rises up to 200% for RA between 5 and 30  $\Omega \cdot \mu \text{m}^2$ .

The conventional MRAM switching approach, which is based on inductive magnetic field writing with currents greater than 10 mA, is the first generation of MRAM [4], [5], which faces a severe limitation that the MTJ needs a higher current  $(>10$  mA) to reverse the magnetization of free layer with the scaling down size. This causes many problems, such as increasing power dissipation, low storage density, low reliability, and so on.

Manuscript received March 21, 2013; revised July 17, 2013, September 4, 2013, and October 18, 2013; accepted November 6, 2013. Date of publication November 14, 2013; date of current version April 4, 2014. Corresponding author: W. Zhao (e-mail: weisheng.zhao@u-psud.fr).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TMAG.2013.2291222

 $MT.lr$ IFree layer (CoFeB) Oxide barrier (MgO)  $\overline{\textbf{t}}$ Pinned layer (CoFeB)\*  $N<sub>2</sub>$ Source Line  $(b)$  $(a)$ Fig. 1. Structure of STT-MRAM sensing. (a) Storage element of STT-

MRAM. An MTJ is composed of two ferromagnetic layers (e.g., CoFeB) separated by a thin oxide barrier (e.g., MgO). (b) Magnetization direction of one ferromagnetic layer is pinned, while that of the other layer is free to take two directions for binary storage.

Spin-transfer torque MRAM (STT-MRAM) builds up the second generation of MRAM [5], [6]. Using STT mechanism, a low bidirectional current  $(I_{\text{write}})$  passing through the MTJ can switch its configuration. Here, *Iwrite* is larger than the switch threshold current  $(I_c)$  [1], [2]. The reading operation of STT-MRAM is to pass low current  $(I_{\text{read}})$  to detect the resistance difference between  $R_{AP}$  and  $R_{P}$ , and then to compare with the current (*I*readref) passing through the reference cell [7], [8], as shown in Fig. 1(b). Unfortunately, the read current flowing through the MTJ (*I*read) can switch the MTJ state by thermal activation, which can be described by the Néel–Brown model shown in (1)–(3) [9]. This is a critical issue in the use of in-plane magnetic anisotropy with low thermal activation energy *E*. Using the perpendicular magnetic anisotropy (PMA) MTJ with high *E*, this read disturbance

0018-9464 © 2013 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



behavior can be decreased [7], [8], [10], [11]. Moreover, the PMA MTJ remains high TMR ratio, fast write/read speed and low threshold current because of the small damping factor of the MTJ storage layer.

$$
Pr = 1 - exp(-Duration/\tau)
$$
 (1)

$$
\tau = \tau_0 \exp\left(\frac{E}{k_B T} \left(1 - \frac{\text{I}_{\text{read}}}{\text{I}_{\text{C}}}\right)\right) \tag{2}
$$

$$
E = \frac{\mu_0 M_S V H_K}{2} \tag{3}
$$

where Pr is the switching probability, Duration is the drive pulse duration,  $\tau_0 \approx 1$  ns is an attempt period, *E* is the thermal activation energy, which determines the thermal stability of MTJ,  $\mu_0$  is permeability in free space,  $M_s$  is the saturation magnetization,  $H_K$  is the anisotropy field, and  $V$  is the volume of the free layer.

STT-MRAM can overcome the scaling limitation of the conventional MRAM [4]. Many STT-MRAM test chips have been fabricated to prove the advantages of STT-MRAM [11]–[14]. However, in the conventional STT-MRAM writing circuit, high power supply voltage is needed to ensure the desired data to be stored correctly [8], [12]. This high power supply voltage results in high writing power consumption. On the other hand, STT-MRAM suffers from stochastic switching behavior [15]–[19], which may lead to erroneous switching of storage cells and low reliability. In this paper, we present novel write/read circuits and architecture to build a 16 kb STT-MRAM design with low power and high reliability. This circuit can be used as a basic page to build up higher density memory chips (e.g., 16 Mb).

This paper is organized as follows. Section II compares the performance of different architectures of STT-MRAM. Section III presents three novel structures to reduce power consumption and to improve reliability, including the writing circuit with low supply voltage, the self-enable switching scheme, and the reading circuit with fore-placed precharge sense amplifier (SA). Section IV reports mixed simulation results. Finally, the conclusion is provided in Section V.

## II. CIRCUIT STRUCTURE FORSTT-MRAM

STT-MRAM mainly consists of memory cell array and peripheral circuits [4], as shown in Fig. 2(a). The 16 kb STT-MRAM can be arranged into 256 rows by 64 columns array, and every word has 16 bits. The writing/reading operation is driven by control circuit. The decoders of word and bit/source line are used to select active storage cells. The write driver provides a bidirectional current passing through MTJ. SA is used to sense the storage data in MTJ  $(i.e., R_{AP}$  or  $R_P$ ).

Fig. 2 shows three different structures of STT-MRAM cell array. These structures present different performances in terms of power, area, and reliability. As shown in Fig. 2(a), one storage bit is represented by a couple of complementary MTJs. Fast speed and good sensing reliability can be obtained benefiting from the complementary storage, which allows the maximum sensing margin to be used for data sensing. However, the complementary MTJs both need driving circuit



Fig. 2. Brief block diagram of STT-MRAM with three type of cell arrays. (a) Cell array, in which one storage bit is represented by a couple of complementary MTJs. (b) Cell array, in which one storage bit is represented by one MTJ, one reference cell is associated to every storage cell. (c) Cell array, in which one storage bit is represented by one MTJ, and every column storage one reference cell.

to program, which result in further area overhead and low power-efficient writing operation. Fig. 2(b) shows the second cell array structure, in which, one storage bit is represented by one MTJ, and every storage cell has one reference cell. In this structure, only one MTJ of every storage bit needs to be programmed in each storage bit, which brings small area and low power. Moreover, sensing reliability is still high since each MTJ for storage has one reference MTJ. However, its drawback is that the cell array area is large.

Fig. 2(c) shows another cell array structure in which not only one storage bit is still represented by one MTJ, but also every column storage cells share one reference cell. This specific cell array structure reduces cell array area. However, the sensing reliability is limited because the parasitic resistances of the storage cell and reference cell branch are asymmetric. If the length, width, and *sheet\_res* of the interlink metal are known, the difference of parasitic resistance will be calculated according to the expression  $R = sheet\_res \times$ *length/width*. For CMOS 130 nm technology, the *sheet\_res* of the metal line is  $\sim 0.057 \Omega$ , and the minimum metal width is 0.13  $\mu$ m. If the difference of the interlink metal length is 0.5 mm, which is long enough for one cell array, the difference of parasitic resistance is only 219  $\Omega$ . According to [8], the  $R_P$  of MTJ is ~3000 Ω and the TMR is 0.6 at worst case. The sum of  $R_P$  and the difference of parasitic resistance of two branches can be calculated, at  $\sim$ 3219  $\Omega$ , which is far

less than  $R_{AP}(\sim 4800 \Omega)$ . Therefore, the sensing reliability reduction can be nearly negligible.

#### III. NOVEL CIRCUITS PROPOSED FORSTT-MRAM

To reduce the write power consumption and improve the STT-MRAM reliability, this paper explores several novel circuits: the writing circuit with low supply voltage, the selfenable switching circuit, and the fore-placed SA scheme.

#### *A. Writing Circuit With Low Supply Voltage*

As described previously, the write driver provides a bidirectional current passing through MTJ. Depending on the direction of the writing current, the magnetization of the free layer can be changed either parallel or antiparallel to that of the pinned layer, then data 0 or 1 will be stored into MTJs [3], [4]. The switching model can be described by (4) and (5) [8]. According to this model, the average switching time *t* will become shorter if the *I*<sub>write</sub> becomes larger, and the switching probability of MTJ with large writing current will be high for a fixed write pulsewidth

$$
I_{C0} = \alpha \frac{\gamma e}{k_B g} (\mu_0 M_S) H_K V = 2\alpha \frac{\gamma e}{k_B g} E \tag{4}
$$

$$
t = \left[\frac{2}{C + \ln\left(\frac{\pi^2 E}{4k_B T}\right)}\right] \frac{\mu_B P}{\text{em} \left(1 + p^2\right)} (\text{Iwrite-ICO)} \tag{5}
$$

where  $\alpha$  is the magnetic damping constant,  $\gamma$  is the gyromagnetic ratio,  $e$  is the elementary charge,  $k_B$  is Boltzmann constant,  $t$  is the average switching time, and  $P$  is the spin polarization.

The dynamic power consumption ( $P_{\text{dynamic}}$ ) in writing operation can be computed by (6). If we do not consider the writing frequency ( $f_{\text{operation}}$ ), we can obtain the energy of each switching operation (*E*<sub>operation</sub>), as described by (7)

$$
P_{\text{dynamics}} = f_{\text{operation}} \times \int_{0}^{T} V_{\text{D}} \times I_{\text{write}}(t) dt
$$
 (6)

$$
E_{\text{operation}} = \int_{0}^{T} V_{\text{D}} \times I_{\text{write}}(t) dt
$$
 (7)

where  $V_D$  is the writing voltage,  $T$  is the write-enable pulse duration,  $I_{\text{write}}(t)$  is the current passing through MTJ. During the write-enable pulse,  $I_{\text{write}}$  value is different before and after the MTJ state is reversed.

The conventional STT writing circuit is shown in Fig. 3 [4]. The W-enable signal controls the writing current to flow through the MTJ or not. Transistors (N1–N4) controlled by W-enable and En-read signals are used to avoid the interference between write/sense currents. Transistors (N5 and N6) controlled by column-s are used to select one column. The *I*<sub>write</sub> value depends deeply on both the transistor resistance and the supply voltage of the writing branch. All the column selected transistors and other transistors controlled by W-enable are in the writing branch, which make total resistance high. With  $(4)$  and  $(5)$ , one low  $I<sub>write</sub>$  needs long



Fig. 3. Conventional STT writing circuit with high supply voltage.



Fig. 4. Writing circuit with low supply voltage.

switching time to ensure the desired data to be stored correctly, which lower the write speed of STT-MRAM. Using higher write current is a possible solution. To generate a high write current, it is necessary to decrease the resistance and increase the supply voltage value of writing branch. Therefore, the column selected logic composed of parallel connection of pMOS and nMOS with bigger size is used to obtain smaller resistance, and one high supply voltage (*V*dda) in writing branch is used to generate a higher current, where *V* dda (e.g., *V* dda = 1.8 V) is higher than the logic voltage *V* dd (e.g.,  $V d d = 1.2$  V). Using (7) and the data given in [4], the calculated switching energy of one switching operation is up to 36 pJ/bit, which makes the writing power consumption high.

To decrease power consumption, we propose a writing circuit with low supply voltage, as shown in Fig. 4. Unlike the conventional writing circuit, different column selected gates are used, respectively, in the writing and reading operations. In writing mode, transistors (N1 and N2) are controlled by the AND logic of column-s and W-enable. These transistors not only isolate writing from reading, but also implement selecting one column. Thereby, transistors number of the writing branch is reduced, which makes the resistance of the writing branch small. In addition, the critical current density of the model is  $\sim$ 3 × 10<sup>6</sup>A/cm<sup>2</sup> [16], which makes write current as small as 100  $\mu$ A in small size MTJ (e.g., 65  $\times$  65 nm<sup>2</sup>). Therefore, using the writing supply voltage as low as the logic

voltage *V* dd, we still can obtain the same writing current value and switching time as those of the conventional writing circuit shown in Fig. 3. Due to the lower supply voltage of writing branch and the almost same switching current and time, the power consumption will be significantly decreased according to (6).

In the reading branch, there are also column selected transistors (N3 and N4). As these transistors are used to select one column, their size can be small, which makes total resistance large and reading current low. According to (1)–(3), this low reading current can reduce significantly the read disturbance probability [20], [21]. The added logic increases the number of transistors in the proposed circuit. However, the total area is not larger than that of the conventional circuit, as the transistor size in the conventional circuit is larger to obtain small resistance in the writing circuit.

## *B. Self-Enable Switching Circuit*

To further reduce write power consumption and improve STT-MRAM reliability, self-enable switching circuit can be adopted in STT-MRAM designing. Although the speed of STT switching has been proved as fast as subnanosecond, the STT switching is stochastic and some input data could not be stored correctly during one limited writing pulse [15], [17]. The most popular solutions to solve this problem are to extend the writing pulse duration or to increase  $I_{\text{write}}$  value, which result in slow speed, high power consumption, and short lifetime of MTJ. Profiting from the stochastic switching, the self-enable switching circuit provided by [17] can overcome both the power and reliability problems of classical circuits.

Fig. 5(a) and (b) shows, respectively, the improved selfenable switching circuit and the operation timing. The MTJ state can be switched just after one short write pulse for stochastic switching [15], as shown in Fig. 5(b). Therefore, the fixed long W-enable signal in the conventional switching shown in Fig. 3 or 4 can be replaced by one short write pulse AND signal [i.e., the AND of W-enable and self-enable shown in Fig. 5(a)]. This short write pulse includes both switching and sensing operations. When the external W-enable signal of STT-MRAM is active, the SA detects the MTJ state and outputs the logic output data under the control of the short periodic duration sense signal. Then, the output data are taken to compare with the input data. Depending on this comparison result, a self-enable logic level is obtained (i.e., 1 or 0, when output and input data are different or same, respectively). In addition, switching numbers of MTJ could be reduced because the MTJ state needs not to be changed if input data are same with the output data of fore-stage. Then, self-enable signal is 0 and no current flows through the MTJ. Thereby, both the power-efficient write and the whole lifetime of MTJ can be greatly improved because of the shortened switching duration and the reduced switching number [17], [23], [24].

With the self-enable switching circuit provided by [17], we add a DELAY logic to adapt to STT-MRAM circuit, as shown in Fig. 5(a). When input data 0 is different from output data 1, the self-enable signal will be 1. This output data are the output of SA precharged. Then, the En-read signal will be low level if



Fig. 5. Self-enable switching circuit and operation timing. (a) Improved scheme of self-enable switching circuit. (b) Operation timing diagram of the self-enable and conventional switching circuits.

there is no delay logic. As a result, the sensing operation could not be worked and the writing operation could not be stopped. The added DELAY logic can overcome this issue. This DELAY logic is used to get enough time to sense the MTJ state after the sense signal becomes 1 every time. The time delay and area overhead depend on the speed of the SA.

To further reduce power and area consumption, the precharge methodology SA with high speed and low power is adopted in [18], [19], and [22].

## *C. Fore-Placed SA*

According to [12], [17], and [18], the sensing operation suffers from high sensitivity to process voltage temperature variations. Although the writing circuit with low supply voltage could further decrease power and area consumption, the Monte Carlo statistical analysis of sensing output is worse due to the high resistance in sensing circuit. To improve the reliability of reading circuit, we propose another novel circuit, as shown in Fig. 6.

The major difference between the proposed and the conventional sensing structures is the location of a column selected transistor in reading operation. The column selected transistor is shifted to the post of SA, and the output of SA is outputted by the column selected transistor. The reduction of transistor number in sensing circuit decreases the sensitivity to process variations. Unfortunately, each column of cell array needs one SA, which



Fig. 6. Reading circuit with fore-placed SA.

TABLE I PARAMETERS AND VARIABLES INCLUDE IN MTJ SIMULATION

| Parameter and<br>Variable | Description                        | Default Value                        |
|---------------------------|------------------------------------|--------------------------------------|
| α                         | <b>Gilbert Damping Coefficient</b> | 0.027                                |
| γ                         | GyroMagnetic Constant              | $2.2\times10^5$ m/A·s                |
| $H_k$                     | Anisotropy field                   | $113.0 \times 10^3$ A/m              |
| $M_s$                     | Saturation magnetization           | $458.0 \times 10^3$ A/m              |
| tox                       | Oxide barrier thickness            | $0.85$ nm                            |
| ts                        | Free layer thickness               | $1.3 \text{ nm}$                     |
| ts1                       | Fixed layer thickness              | $2 \text{ nm}$                       |
| Shape                     | MTJ shape                          | circle                               |
| Surface                   | MTJ area                           | $65$ nm $\times$ 65nm $\times \pi/4$ |
| TMR(0)                    | TMR ratio with $0$ $V_{bias}$      | 120%                                 |
|                           | Volume of free layer               | $Surface \times 1.3 nm$              |

results in area overhead. For the 16 kb STT-MRAM with 256 rows and 64 columns array, the area overhead is  $\sim$ 33  $\mu$ m<sup>2</sup>, which is 10% of the storage array. This area overhead can be negligible because the sensing circuit is often widely shared by many elements of memory array.

# IV. MIXED SIMULATION RESULTS

Using an accurate 65 nm STT-MTJ compact model [8] and a commercial CMOS design kit with nominal *V* dd of 1.2 V, hybrid simulations and calculations have been performed for those circuits. Compared with conventional circuits, the simulation results of the proposed circuit exhibit better performance.

The MTJ compact model integrates physical models and a number of experimental parameters presented in [8] (see Table I). For temperature *T* has an very important impact on data retention [8], *T* is a value randomly drawn from a uniform distribution between −25 °C and 75 °C in our MTJ model to consider this temperature dependence effect. Considering the impact of temperature on data retention, the switching time of a memory cell will be ∼9 ns as the critical current is 75  $\mu$ A.

### *A. Simulations of the Write Circuit With Low Supply Voltage*

Fig. 7 shows the transient simulations of writing circuit with high and low supply voltages. If W-enable is activated, input data will begin to be written. As shown in Fig. 7(e) and (h), at ∼200 ns, the state of MTJs in the writing circuits



Fig. 7. Transient simulation of writing circuit with high supply voltage and low supply voltage. (a) Input. (b) W-enable signal. (c) SL voltage level of storage element with high supply voltage (SL-hw). (d) Writing current of MTJ with high supply voltage ( $I<sub>write</sub>$ -hw). (e) State of MTJ with high supply voltage (state-hw). (f) SL voltage level of storage element with low supply voltage (SL-lw). (g) Writing current of MTJ with low supply voltage  $(I<sub>write</sub>-lw)$ . (h) State of MTJ with low supply voltage (state-lw).



Fig. 8. Comparison of transient simulation between the self-enable switching circuit and the conventional switching circuit. (a) Input. (b) Enable signal of conventional switching circuit (W-enable). (c) MTJ state of the conventional switching circuit (state0). (d) MTJ writing current of the conventional switching circuit (*I*write). (e) Self-enable signal of self-enable switching circuit. (f) MTJ state of self-enable switching circuit (state0-self). (g) MTJ writing current of self-enable switching circuit (*I*write−self).

with high supply voltage and low supply voltage are both changed after almost the same switching time, during which *Iwrite* values are almost same [see Fig. 7(d) and (g)]. However, the supply voltage is different, which can be seen from the SL voltage level  $[1.62$  and  $0.8$  V at 195 ns in Fig. 7(c) and (f), respectively]. During one W-enable signal pulse 45 ns (from 180 to 225 ns shown in Fig. 7), the switching energy of one switching operation can be calculated using (7), which are 4.0 pJ/bit in the writing circuit with low supply voltage and 6.0 pJ/bit in the conventional writing circuit, respectively. Therefore, the switching energy of every bit of the writing circuit with low supply voltage can save energy up to 33%.

### *B. Simulations of the Self-Enable Switching Circuit*

Fig. 8 shows the transient simulations of self-enable and conventional switching circuits. In the conventional switching circuit, if W-enable signal is activated, input data will begin to be written without considering that the input is same or



Fig. 9. Reading error rate according to different TMR ratios and different locations of SA.

different with the output (i.e., State0 signal), and the writing current is ∼75  $\mu$ A [see Fig. 8(d) between 180 and 225 ns]. Fig. 8(e) and (g) between 180 and 225 ns shows the case of the self-enable switching circuit. During W-enable signal pulse, the writing operation is not activated and *I*<sub>write</sub> value is almost zero [see Fig. 8(e)] when the input is same with the output (e.g., at 185 ns). When the input is different from the output (e.g., at 200 ns), the writing operation is activated and  $I_{\text{write}}$  value is almost 75  $\mu$ A [see Fig. 8(g)]. Fig. 8(g) also shows the case that after the MTJ state is reversed (e.g., at 214 ns), the output sensed is same with the input, which makes self-enable becomes 0. Thereby, the writing operation is stopped. During a W-enable signal pulse, the switching energy of one switching operation can be calculated using (7), which are 1.5 pJ/bit of the self-enable switching circuit and 4.0 pJ/bit of the conventional writing circuit, respectively. The switching energy of every bit of self-enable switching circuit can save energy up to 62%.

#### *C. Simulations of the Fore-Placed SA*

Fig. 9 shows the different reading error rates of the reading circuits with different locations of SA, according to different TMR ratios through Monte Carlo simulation. To reduce the reading error rate, a large TMR is required [22], which can reach up 604% at the room temperature in laboratory devices [25]. Thus, the simulations are performed by changing the value of TMR ratio. When the TMR is fixed to 1.5, the reading error rates of the post- and the fore-placed SA are 13% and 0.6%, respectively. Therefore, if the reading circuit with foreplaced SA is adopted, the sensitivity to process variations will be reduced rapidly for the increased sensing current. As a result, the reading circuit with fore-placed SA is more suitable for high-reliable STT-MRAM.

### *D. Simulations of the New STT-MRAM*

Fig. 10 shows the transient simulations of writing and reading operations of the STT-MRAM with two columns,



Fig. 10. Simulations of writing and reading operation of the STT-MRAM with more store cells. (a) Input. (b) Enable signal of switching circuit (W-enable). (c) Self-enable signal of self-enable switching circuit. (d) MTJ writing current of one column ( $I<sub>write</sub>-c1$ ). (e) MTJ state of one column (state0-c1). (f) MTJ writing current of the other column (*I*write-c2). (g) MTJ state of the other column (state0-c2). (h) Output of the SA (output).

consisting of c1 and c2. To validate the function of column selecting clearly, here, each column just has one storage cell, i.e., MTJ-c1 and MTJ-c2. Before 200 ns, the c1 column is selected and the state of MTJ-c1 changed with the input data [see Fig. 10(a) and (e)]. After 200 ns, the other column c2 is selected, then the state of MTJ-c2 is changed with the input data [see Fig. 10(a) and (g)] and the output data are changed with the state of the selected MTJ [see Fig. 10(h)]. In brief, simulations show the correction of column selection, STT-MRAM writing, and reading circuits.

#### V. CONCLUSION

In this paper, we presented some design solutions to reduce the power consumption and improve the reliability of STT-MRAM. The performance comparison based on the new design and theoretical calculations has been performed. Compared with the switching energy 6.0 pJ/bit of conventional writing circuit in [4], the switching energy 4.0 pJ/bit of the writing circuit with low supply voltage can reduce the whole power consumption effectively. Combining this with the selfenable switching circuit benefiting from the stochastic switching, the switching energy of each bit is only 1.5 pJ/bit. This can save energy up to 62% of 4.0 pJ/bit of the conventional writing circuit without self-enable mechanism. Moreover, the reliability of STT-MRAM is improved efficiently because the lifetime of MTJ in self-enable switching circuit is greatly improved. In addition, the reading circuit with fore-placed SA improves the robustness to process variation. These novel circuits and architectures build a 16 kb STT-MRAM design with low power and high reliability. The combining of reliability improvement techniques and error correction code for STT-MRAM [26] is under development to achieve full memory functions.

#### ACKNOWLEDGMENT

This work was supported in part by the Wide Band Gap Semiconductor and Micro/Nano Electronics 111 Project of Xidian University and in part by the Fundamental Research Funds for the Central Universities of China under Grant K5051225017.

#### **REFERENCES**

- [1] C. Chappert, A. Fert, and F. Van Dau, "The emergence of spin electronics in data storage," *Nature Mater.*, vol. 6, pp. 813–823, Nov. 2007.
- [2] B. N. Engel, J. Akerman, B. Butcher, R. W. Dave, M. De Herrera, M. Durlam, *et al.*, "A 4-Mb toggle MRAM based on a novel bit and switching method," *IEEE Trans. Magn.*, vol. 41, no. 1, pp. 132–136, Jan. 2005.
- [3] C. J. Lin, S. H. Kang, Y. J. Wang, K. Lee, X. Zhu, W. C. Chen, *et al.*, "45 nm low power CMOS logic compatible embedded STT MRAM utilizing a reverse-connection 1T/1MTJ cell," in *Proc. IEEE IEDM*, Dec. 2009, pp. 279–282.
- [4] T. Kawahara, R. Takemura, K. Miura, J. Hayakawa, S. Ikeda, Y. M. Lee, *et al.*, "2 Mb SPRAM (SPin-transfer torque RAM) with bit-by-bit bi-directional current write and parallelizing-direction current read," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 109–118, Jan. 2008.
- [5] K. Abe, H. Noguchi, E. Kitagawa, N. Shimomura, J. Ito, and S. Fujita, "Novel hybrid DRAM/MRAM design for reducing power of high performance mobile CPU," in *Proc. IEEE IEDM*, Dec. 2012, pp. 243–246.
- [6] H. Yoda, S. Fujita, N. Shimomura, E. Kitagawa, K. Abe, K. Nomura, *et al.*, "Progress of STT-MRAM technology and the effect on normally-off computing systems," in *Proc. IEEE IEDM*, Dec. 2012, pp. 259–262.
- [7] E. Kitagawa, S. Fujita, K. Nomura, H. Noguchi, K. Abe, K. Ikegami, *et al.*, " Impact of ultra low power and fast write operation of advanced perpendicular MTJ on power reduction for high-performance mobile CPU," in *Proc. IEEE IEDM*, Dec. 2012, pp. 677–680.
- [8] Y. Zhang, W. S. Zhao, Y. Lakys, J.-O. Klein, J.-V. Kim, D. Ravelosona, *et al.*, "Compact modeling of perpendicular-anisotropy CoFeB/MgO magnetic tunnel junctions," *IEEE Trans. Electron. Devices*, vol. 59, no. 3, pp. 819–826, Mar. 2012.
- [9] L.-B. Faber, W. S. Zhao, J.-O. Klein, T. Devolder, and C. Chappert, "Dynamic compact model of spin-transfer torque based magnetic tunnel junction (MTJ)," in *Proc. IEEE DTIS*, Apr. 2009, pp. 130–135.
- [10] M. Nakayama, T. Kai, N. Shimomura, M. Amano, E. Kitagawa, T. Nagase, *et al.*, "Spin transfer switching in TbCoFe/CoFeB/MgO/CoFeB/TbCoFe magnetic tunnel junctions with perpendicular magnetic anisotropy," *J. Appl. Phys.*, vol. 103, no. 7, pp. 07A710-1–07A710-3, Apr. 2008.
- [11] W. S. Zhao, Y. Zhang, T. Devolder, J.-O. Klein, D. Ravelosona, C. Chappert, *et al.*, "Failure and reliability analysis of STT-MRAM," *Microelectron. Rel.*, vol. 52, nos. 9–10, pp. 1848–1852, Sep./Oct. 2012.
- [12] W. S. Zhao, T. Devolder, Y. Lakys, J. O. Klein, C. Chappert, and P. Mazoyer, "Design considerations and strategies for high-reliable STT-MRAM," *Microelectron. Rel.*, vol. 51, nos. 9–11, pp. 1454–1458, Sep./Nov. 2011.
- [13] J. Nahas, T. Andre, C. Subramanian, B. Garni, H. Lin, A. Omair, *et al.*, "A 4 Mb 0.18 μm 1T1MTJ toggle MRAM memory," in *Proc. IEEE ISSCC*, vol. 1. Feb. 2004, pp. 44–51.
- [14] Y. Chen, H. Li, X. Wang, W. Zhu, and T. Zhang, "A 130 nm 1.2 V/3.3 V 16 Kb spin transfer torque random access memory with nondestructive self-reference sensing scheme," *IEEE J. Solid-State Circuits*, vol. 47, no. 2, pp. 560–572, Feb. 2012.
- [15] T. Devolder, J. Hayakawa, K. Ito, H. Takahashi, S. Ikeda, P. Crozat, *et al.*, "Single-shot time-resolved measurement of nanosecond-scale spin-transfer induced switching: Stochastic versus deterministic aspects," *Phys. Rev. Lett.*, vol. 100, no. 5, pp. 057206-1–057206-4, Feb. 2008.
- [16] J. J. Nowak, R. P. Robertazzi, J. Z. Sun, G. Hu, D. W. Abraham, P. L. Trouilloud, *et al.*, "Demonstration of ultralow bit error rates for spin-torque magnetic random-access memory with perpendicular magnetic anisotropy," *IEEE Magn. Lett.*, vol. 2, article no. 3000204, Jun. 2011.
- [17] Y. Lakys, W. S. Zhao, T. Devolder, Y. Zhang, J.-O. Klein, D. Ravelosona, *et al.*, "Self-enabled 'error-free' switching circuit for spin transfer torque MRAM and logic," *IEEE Trans. Magn.*, vol. 48, no. 9, pp. 2403–2406, Sep. 2012.
- [18] W. S. Zhao, C. Chappert, V. Javerliac, and J.-P. Nozière, "High speed, high stability and low power sensing amplifier for MTJ/CMOS hybrid logic circuits," *IEEE Trans. Magn.*, vol. 45, no. 10, pp. 3784–3787, Oct. 2009.
- [19] E. K. S. Au, W.-H. Ki, W. H. Mow, S. T. Hung, and C. Y. Wong, "A novel current-mode sensing scheme for magnetic tunnel junction MRAM," *IEEE Trans. Magn.*, vol. 40, no. 2, pp. 483–488, Mar. 2004.
- [20] K. Miura, T. Kawahara, R. Takemura, J. Hayakawa, S. Ikeda, R. Sasaki, *et al.*, "A novel SPRAM (SPin-transfer torque RAM) with a synthetic ferri magnetic free layer for higher immunity to read disturbance and reducing write-current dispersion," in *Proc. IEEE Symp. VLSI Technol.*, Jul. 2007, pp. 234–235.
- [21] K. Ryu, J. Kim, J. Kim, S. H. Kang, and S.-O. Jung, "A magnetic tunnel junction based zero standby leakage current retention flip-flop, *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 20, no. 11, pp. 2044–2053, Nov. 2012.
- [22] Y. Lakys, W. S. Zhao, J.-O. Klein, and C. Chappert, "Low power, high reliability magnetic flip-flop," *Electron. Lett.*, vol. 46, no. 22, pp. 1493–1494, Oct. 2010.
- [23] Q. Chen, T. Min, T. Torng, C. Horng, D. Tang, and P. Wang, "Study of dielectric breakdown distributions in magnetic tunneling junction with MgO barrier," *J. Appl. Phys.*, vol. 105, pp. 07C931-1–07C931-3, Mar. 2009.
- [24] G. Panagopoulos, C. Augustine, and K. Roy, "Modeling of dielectric breakdown-induced time-dependent STT-MRAM performance degradation," in *Proc. 6th Annu. DRC*, Jun. 2011, pp. 125–126.
- [25] S. Ikeda, J. Hayakawa, Y. Ashizawa, Y. M. Lee, K. Miura, H. Hasegawa, *et al.* "Tunnel magnetoresistance of 604% at 300 K by suppression of Ta diffusion in CoFeB/MgO/CoFeB pseudo-spin-valves annealed at high temperature," *Appl. Phys. Lett.*, vol. 93, pp. 082508-1–082508-3, Aug. 2008.
- [26] W. Kang, W. S. Zhao, Z. H. Wang, Y. Zhang, J.-O. Klein, Y. Zhang, *et al.*, "A low-cost built-in error correction circuit design for STT-MRAM reliability improvement," *Microelectron. Rel.*, vol. 53, nos. 9–11, pp. 1224–1229, Sep./Nov. 2013.