

Received 26 September 2023, accepted 18 October 2023, date of publication 23 October 2023, date of current version 1 November 2023. Digital Object Identifier 10.1109/ACCESS.2023.3327261

# **RESEARCH ARTICLE**

# A Fully Non-Volatile Reconfigurable Magnetic **Arithmetic Logic Unit Based on Majority Logic**

# SREEVATSAN RANGAPRASAD<sup>®</sup> AND VINOD KUMAR JOSHI<sup>®</sup>, (Senior Member, IEEE) Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka

576104. India

Corresponding author: Vinod Kumar Joshi (vinodkumar.joshi@manipal.edu)

**ABSTRACT** Spintronics has been garnering great success in resolving the shortcomings of conventional charge-based electronics and von-Neumann architecture by offering novel computational paradigms almost devoid of leakage effects and volatility issues of traditional CMOS systems. Especially, hybrid CMOS/MTJ circuits integrating all benefits of spintronics with well-evolved CMOS technology have been extensively investigated as candidates for building next-generation processors. Motivated by the importance of magnetic processors, in this work, a novel fully non-volatile reconfigurable magnetic arithmetic logic unit (NVRMALU) based on majority logic has been proposed. This paper encompasses the prospects of NVRMALU at both the architecture level and circuit level by discussing multi-context hybrid CMOS/MTJ LIM architecture and operation of the circuit. Furthermore, the simulation results of the proposed fully non-volatile reconfigurable magnetic full adder (NVRMFA) making up NVRMALU reveal a remarkable total power reduction by around six to forty-seven folds compared to its contemporary magnetic full adders (MFAs) discussed here. Also, NVRMALU is superior to double pass transistor clocked CMOS (DPTLCMOS) ALU in terms of power reduction by six times, thus qualifying it as an excellent normally OFF, Instant ON digital system. A four-bit extension of NVRMALU has been presented as a sign of the feasibility of the design for multi-bit applications. The transient analysis demonstrates nonvolatile and dynamic reconfigurable traits of NVRMALU in addition to functionality verification. Additionally, variability analysis has been performed to study the factors controlling read and write performances of hybrid circuits from a device-level perspective.

**INDEX TERMS** Magnetic tunnel junction (MTJ), hybrid CMOS/MTJ, magnetic ALU, majority logic, fully non-volatile, logic-in-memory (LIM), spin transfer torque (STT), reconfigurable.

#### **I. INTRODUCTION**

Excellent Computational and data storage capabilities with optimal power, speed, and area, have always been the Mantra of the Semiconductor industry. However, the growth of classical charge-based electronics in the past few years has been curbed at both the device and architecture levels. The saturated scaling down of transistors, owing to physical limitations and secondary effects like the leakage effect, in conjunction with the Memory wall issue at the architecture level poses severe challenges in meeting the growing demands of emerging data-centric applications like Artificial

The associate editor coordinating the review of this manuscript and approving it for publication was Yiming  $Huo^{\square}$ .

Intelligence, Image processing, and the Internet of Things. Ergo, the increasing emphasis on Spintronics, an emerging nanotechnology harnessing the intrinsic spin of an electron for computation and storage. Particularly, Magnetic Tunnel Junction (MTJ), the spin device making up the Magnetic Random Access Memory (MRAM) has been extensively investigated by both industrial and academic communities for its promising features [1], [2].

The long interconnects induced latencies between logic and memory, and power consumption, constituting the memory-wall issue in the traditional von-Neumann architecture are overcome by the adoption of Logic-In-Memory (LIM) architecture. This, in addition to addressing the leakage effect and power overheads by employing non-volatile memory devices like MTJ, also paves the way for the close integration of memory and logic into a single entity. Furthermore, the coalescence of the conventional CMOS technology with its high operating speed, reliability, and perks of MTJ such as nonvolatility, zero leakage power, large endurance, fast reading capability, high-density integration, and scalability [3] in the form of hybrid CMOS/MTJ circuits following smart architectures like LIM, and In-Memory Computing (IMC) (Fig.1) serve as a successful alternative to charge-based electronics [4]. Consequently, several hybrid CMOS/MTJ logic designs ranging from magnetic flip-flops [5], non-volatile basic gates [6], and magnetic decoders [7] to magnetic full adders [8], [9], [10], [11], [12] and non-volatile magnetic arithmetic logical units [13], [14] have been reported in the literature.

Magnetic Processor, the extension of such hybrid CMOS/MTJ logic paradigms, leveraging the superior traits of MTJs and sophistications of the LIM architecture and matured CMOS technology, has been the need of the hour [15], [16]. Its potential to alleviate cache coherence, power, and area overheads pertaining to current-day processors by offering complete utilization of the memory bandwidth and nonvolatility has triggered numerous designs in literature [17], [18], [19], [20]. However, in [17] and [19] processing is still performed using CMOS, and MTJs are used as only non-volatile storage units, thus resembling logic near memory (LNM) architecture rather than LIM. All spin logic (ASL) used in [15] and [18] despite offering all the advantages of MTJ and elimination of CMOS transistors for logic and storage operations, possesses serious limitations like high short circuit power overheads, the need for complex control of clocking, spin diffusion length, and spin channel [21], [22]. Thus, hybrid CMOS/MTJ employing Spin Transfer Torque (STT) switching mechanism, similar to MRAM cells, is opted for our design of arithmetic logic unit (ALU), the heart of a processor, instead of ASL. Furthermore, STT-MTJ devices have reached a matured state in terms of commercialization compared to ASL [18] and can be integrated with the existing CMOS technology thanks to advancements in 3-D fabrication techniques in the back-end-of-line (BEOL) process. Nevertheless, the majority logic paradigm, the inherent feature of ASL [18] is adopted in our application for its incredible ability to realize complex/intensive boolean functions with lesser gates compared to NAND/NOR logic [23].

Majority logic is a special case of threshold logic, where the output is evaluated as true or high ("1") when more than half of the "N" inputs (here, N >1 and N is odd) are true [24]. It can be expressed in terms of AND/OR as MAJ(a, b, c) =.b+b.c+a.c, where a, b, and c are the inputs for a three-input majority gate. Additionally, majority logic together with inverter is functionally complete, capable of performing any boolean expressions [23]. Thus, spurring the design of non-volatile full adders using emerging nanotechnologies such as magnetic quantum dot cellular automata [25], ReRAM [23], ASL [26], nanomagnetic logic [27] and single-electron tunneling devices [28], whose primary logic primitive is majority logic. However, to the best of our knowledge not many hybrid CMOS/MTJ full adder designs using majority logic, following the LIM architecture, are reported in literature. Furthermore, the existing full adders [8], [9], [10], [11], [12] based on LIM are either partially non-volatile, where only few of the input operands are made non-volatile or they suffer from high power and area consumptions. Therefore, encouraged by the foregoing discussions, a Fully Non-Volatile Reconfigurable Magnetic Full Adder (NVRMFA) (Figs.3,4,5) based on majority logic using hybrid CMOS/MTJ structure LIM architecture is presented in this article as the building block of a Fully Non-Volatile Reconfigurable Magnetic Arithmetic Logic Unit (NVRMALU). Full non-volatility is incorporated in our designs by employing a modified multi-context hybrid CMOS/MTJ LIM architecture (Fig.1) [22], where all the input operands are stored in the MTJs. Such a setup eliminates the need for backing up data during power loss scenarios as in the case of partially non-volatile counterparts [8], [9], [10], [11]. Also, it is our understanding that the digital comparator functionality, a crucial operation, part of the processor datapath, is not included in the existing magnetic ALU designs reported in literature [13], [14]. Therefore, a novel two-bit magnetic magnitude comparator and equality detector is integrated in NVRMALU, making it first of its kind in literature. Furthermore, the novel usage of multi-context hybrid CMOS/MTJ structure with LIM architecture, incorporating benefits of both LIM and IMC architectures, together with the ultra-low power performance of the proposed designs, distinguishes them from their contemporaries.

The remainder of this paper is organized as follow: Section II discusses some of the fundamentals of MTJ, STT switching mechanism, multi-context hybrid CMOS/MTJ architecture and their advantages over their counterparts, along with a brief review on contemporary full adder design using hybrid CMOS/MTJ LIM architecture. Section III presents the proposed NVRMALU design and unravels various functionalities of NVRMALU and the working of the circuit along with the four-bit extension of the ALU design. In section IV, a comparative analysis of the proposed designs with their contemporaries and variability analysis using Monte Carlo (MC) simulations have been performed. Finally, in section V the interesting findings of above sections along with insights on fabrication challenges have been discussed as a conclusion and scope for future work.

#### **II. BACKGROUND AND RELATED WORK**

#### A. MTJ BASICS

Spintronics is an amalgamation of magnetism and electronics, exploiting the innate magnetic properties of an electron in spin devices such as MTJ for computational nanoelectronics. MTJ is a multilayered nanopillar comprising a non-magnetic dielectric layer sandwiched by two ferromagnetic layers, namely the reference layer (RL) and the free layer (FL).

A very thin MgO layer is employed as the dielectric layer to ensure high tunnel magnetoresistance (TMR) [29], a quantum mechanical phenomenon, the primary physics underlying the working of MTJ. The magnetic orientation of RL is fixed, while the relative magnetic orientation of FL with respect to RL decides the resistance of the device, making MTJ a programmable resistor due to the spin-dependent tunneling of electrons. For our application, Rap, the high resistance state of MTJ, when FL is anti-parallel to RL is used to store logic high or "1", while the low resistance state, Rp, when FL is parallel to RL is used to denote logic low or "0". This work utilizes p-MTJ (perpendicular magnetic tunnel junction) instead of i-MTJ (in-plane magnetic tunnel junction) for the low power dissipation, high thermal stability, low current density, and scalability traits of p-MTJ over i-MTJ [30].

The writing of data values into MTJ by switching the states of the device from Rp to Rap and vice-versa can be achieved by various switching mechanisms that run gamut from Field Induced Magnetic Switching (FIMS) [31], Thermally Assisted Switching (TAS) [32] to Spin orbit torque (SOT) based switching [33] and Voltage controlled Magnetic Anisotropy (VCMA) [34]. However, only STT emerged as the optimal choice for our application given its simplicity and familiarization in terms of commercialization, while other switching techniques suffer from some limitations. For instance, FIMS and TAS suffer from high power dissipation and scalability issues [8], SOT-MRAM is very sensitive to process variations, difficult to scale and possess less memory density compared to STT [33], and VCMA involves complex voltage controlling for write operations [35]. Moreover, these techniques are yet to mature commercially, which is required for a magnetic processor associated with MRAM. The nominal values of the vital MTJ parameters considered for this work [8] are summarised in Table 1. The write circuit from [36] is adopted for our design as shown in Fig.5(C). The four transistor (P1,P2,N1,N2) form an H-bridge structure (Fig. 5 (C)) which provides a bidirectional current (denoted by red and blue arrows in Fig.5(C) for switching the MTJ state from P to AP and vice-versa using the STT effect.

| Parameters      | Description              | Nominal values   |
|-----------------|--------------------------|------------------|
| a,b             | surface axes             | a=b=32nm         |
| RA              | Resistance area product  | $5\Omega\mu m^2$ |
| Rp              | Parallel resistance      | 6.21 kΩ          |
| TMR(0)          | TMR ratio with zero bias | 200%             |
| t <sub>sl</sub> | Free layer thickness     | 1.3nm            |
| $t_{ox}$        | Barrier thickness        | 0.85nm           |
| $\phi$          | Energy barrier height    | 0.4 eV           |

|  | TABLE 1. | MTJ Parameters | used for | simulation | [8] |  |
|--|----------|----------------|----------|------------|-----|--|
|--|----------|----------------|----------|------------|-----|--|

### B. MULTI-CONTEXT HYBRID CMOS/MTJ LIM ARCHITECTURE

Various smart architectures such as LIM [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], LNM [17], [19], [37],

and In-Memory Computing (IMC)/Computation in Memory (CIM) [38], [39] have been proposed as solutions to the von-Neumann bottleneck. The advancements in 3-D Stacked Integrated Circuit Technology, have made close positioning of logic and storage units possible, as done in LNM structures alleviating interconnect delays, and power and widening the memory bandwidth [40]. However, it can not be deemed as true IMC, since the computation and storage units still remain as fundamentally distinct entities. IMC and LIM serve as enhanced versions of LNM, enabling in-situ computation, while offering nonvolatility. The thin line difference between IMC and LIM is that in the former, the memory array structure (MRAM, DRAM, SRAM) is exploited to perform computation, in addition to storage by utilizing the innate analog quantities of memory devices like variable resistance of MTJ, along with peripheral circuitry like sense amplifier and write circuits. Where else, in LIM, the computational ability is embedded into memory cells by adding a plane of distributed logic units, thus modifying the memory entities to provide *in-situ* computation [8], [40]. Unlike IMC, LIM can be used for all hierarchies of memory, not limiting to main memory and cache, which aids in addressing big data challenges, where datasets are too large to fit in main memory [41]. However, LIM inhibits the maximum usage of memory as a storage unit, as the additional transistors in the modified memory cells introduce power and area overheads during the inherent storage functionality [41]. Therefore, we have adopted a modified multi-context hybrid CMOS/MTJ LIM architecture [22] (Fig.1), that provides the flexibility to be used with IMC as shown in Fig.1 (B), according to the applications, thus serving as a bridge between LIM and IMC architectures. The same structure as in Fig.1(A) can be realized using the array architecture in Fig.1(B) by configuring the bit lines, source lines, and word lines accordingly. Multi-context or multiple bits hybrid logic architecture (Fig.1 (A)) has multiple non-volatile cells forming a configuration plane for fast switching between contexts [22], [42].

Such an architecture facilitates the simultaneous usage of memory cells as computational and storage similar to [41] by controlling the enable signals, En1 to EnN (here, N is a natural number) corresponding to the NMOS switch block  $(MOS_L 1 - MOS_L N \& MOS_R 1 - MOS_R N)$  effectively. Logic computation is carried out by making all enable signals En1 to EnN high along with CLK. While the two memory operations, writing into a MTJ and reading the MTJ states are done by selecting only one enable signal corresponding to the desired MTJ when the CLK is low and high, respectively. Also, the symmetric structure of the design (Fig.1) enhances the circuit performance by mitigating the impact of sneak currents [22], masking/hiding the circuit layouts functionality post fabrication [43] and improving data security through the provision for retrieval of lost data from neighboring MTJ cells containing the same data in either true or complementary form [22]. The separated precharge sense amplifier proposed in [44] is employed as a read circuit (Figs.3,4,5) for its



FIGURE 1. Demonstration of resemblance and flexibility to switch between multi-context hybrid CMOS/MTJ LIM architecture shown in (A) and In-memory computing using array architecture shown in (B). In (B), the SL1 – SLN serving as inputs to sense amplifier and BL1 – BLN are source lines and bit lines, respectively. They control the direction of write current (shown by red and blue arrows) and En1 – EnN act as word lines controlling MOS<sub>1</sub> 1-MOS<sub>1</sub> N & MOS<sub>2</sub> 1-MOS<sub>2</sub> N for selecting corresponding MTJs, similar to (A).

higher sensing reliability, and lower power consumption, albeit it consumes more area compared to the precharge sense amplifier. Together with the write circuit and MTJ logic tree, it constitutes three primary blocks of LIM architecture [6].

#### C. RELATED WORK

This section offers a brief qualitative review of existing magnetic adders and ALU designs using hybrid CMOS/MTJ LIM architecture. In [8], [9], [10], [11], and [35], the adders are partially non-volatile and use MTJs as an ancillary device only for storage of only one or few of the input operands (only B stored in MTJs [9], [10], [11], [35] and only  $C_{in}$  stored in MTJs [8]), while CMOS transistors perform logic evaluation. Thus prohibiting their usage during power loss without data backup systems, which incur additional power (especially static dissipation) and area overheads. Such designs are also prone to input scheduling issues requiring precise syncing of input signals and CLK during the evaluation phase for proper output [45]. Hence, understanding the need for full nonvolatility, researchers in [12], [46], and [47] have reported fully non-volatile adders by employing magnetic flip-flops or additional MTJs to store all the input operands. In [46] magnetic flip-flops are integrated with CMOS full adder circuit to provide nonvolatility to inputs but subjected to increased area and power. In [47] the additional MTJs part of the self-terminate write circuit are used for storing inputs A and B, however much like [12] using 2MTJ cells for storing each input (A,B), the logic computation is still done using CMOS transistors. Thus, they do not harness the full benefits of MTJ, and the power and area of such circuits are exacerbated while extending to multiple bits/inputs. These issues are addressed in [45], [48], and [49], where fully non-volatile full adders with MTJs as part of the logic tree performing computation. However the criss-cross arrangement and series connection of MTJs in these designs induce complexity in the writing process as a single write circuit will not suffice to write different input values, thus increasing write power and area. Furthermore, these designs possess the advantage of increased read margin due to higher resistances owing to the series connection of MTJs but at the cost of increased read latency [50]. As for ALU designs following hybrid CMOS/MTJ LIM architecture, the circuits reported in [13] and [14] are partially non-volatile and the discussion of ALU designs using other nanotechnologies such as ASL, domain wall, etc is out of the scope of this work.

#### **III. PROPOSED MAGNETIC ARITHMETIC LOGIC UNIT**

In this section, a novel fully non-volatile magnetic arithmetic logic unit, coupling the prowess of multi-context hybrid CMOS/MTJ LIM architecture and majority logic paradigm is presented. Such an ALU finds itself in applications like data features extraction process in edge computing/detection in big data applications needing low data precision, as demonstrated in Fig.2 [51]. Which enables the main processor to operate on the already processed/cooked data from NVRMALU for high-precision computations, which in addition to reducing the von-Neumann bottleneck (reduced data transfer between main processor and memory), introduces parallel processing where the main processor and the proposed NVRMALU can function simultaneously, increasing the performance. This is made possible by the nonvolatile trait paired with the simultaneous usage of NVRMALU as memory and for computations allowing the main processor to perform a different operation on the same data operands stored in NVRMALU. Furthermore, the nonvolatility characteristic of the presented ALU reduces the standby power consumption by offering a provision to turn/switch off the passive/unused portions of the nonvolatile memory.



FIGURE 2. Block diagram depicting the data transfer between the main processor and NVRMALU, a part of the non-volatile memory unit (NV memory unit). The yellow arrows denote the conventional communication between processor and NV memory unit. In contrast, the green arrow illustrates the transfer of cooked data by NVRMALU from the NV memory unit to the main processor for further computations [51].

Since binary addition operation is the central pillar of an ALU, the design of a 1-bit NVRMFA is presented first in the following subsection. Dynamic reconfigurability, the ability of a circuit to realize multiple logic functionalities during run-time using an external trigger such as control signals [52], is embedded into the NVRMFA by virtue of design. This reconfigurability trait is harnessed to extend the proposed NVRMFA (Figs.4,5) to NVRMALU by the addition of peripheral circuitry like Multiplexers, XNOR gates, AND gates, and inverters. Furthermore, the reconfigurability feature of the proposed design qualifies it as a polymorphic gate, which can be applied to realize hardware security primitives such as the prevention of IC tampering during fabrication, IC fingerprinting, and IC watermarking [43].

#### A. 1-BIT MAGNETIC ADDER DESIGN

In this section, a novel 1-bit two-input fully non-volatile magnetic full adder is presented to address the shortcomings of its predecessors in terms of power, power delay product (PDP), and non-volatility. The circuit works on dynamic logic, where the sense amplifier acts as a current comparator performing logic computation based on the difference in currents in the left and right branches (Figs.4,5), constituted by their corresponding resistance difference,  $|R|-Rr|(\Delta)$ . Here, Rl and Rr denote the net resistance of the MTJ logic tree in the left branch and right branches, respectively. The adder circuit comprises a Carry sub-circuit  $(C_{ckt})$  and a Sum sub-circuit  $(S_{ckt})$ , corresponding to the two primary outputs

of an adder. Similar to [50] and [53], the Carry function is realized using a three-input majority logic (M3), while a fiveinput majority logic (M5) underlies the working of the  $S_{ckt}$ . The governing boolean equations for carry and sum functions are given by,

$$Carry(C_{out})$$

$$= AB + AC_{in} + BC_{in} = M3(A, B, C_{in}).$$

$$Sum$$

$$= ABC_{in} + AbarBbarC_{in} + AbarBC_{in}bar + ABbarC_{in}bar$$
(1)

$$= ABC_{in} + AbarBbarC_{in} + AbarBC_{in}bar + ABbarC_{in}bar$$
  
$$= A \oplus B \oplus C_{in} = ABC_{in} + \overline{M3(A, B, C_{in})}.(A + B + C_{in})$$
  
$$= ABC_{in} + C_{out}bar.(A + B + C_{in})$$
  
$$= M5(A, B, C_{in}, C_{out}bar, C_{out}bar).$$
(2)

Here in Eq.1 and Eq.2, A, B, and  $C_{in}$  are input operands, whose weights are one, and  $C_{out}$  is the output of M3 with weight = 1.

Figs.3,4,5 show the  $S_{ckt}$ -design 1,  $C_{ckt}$  and  $S_{ckt}$ -design 2, respectively, where the conventional multi-context hybrid CMOS/MTJ structure [42] is modified to achieve adder functionality along with full non-volatility. Here, unlike in [8], [9], [10], and [11] 1-bit of data is stored/represented by a single MTJ, instead of a pair of complementary MTJs. All the MTJs present in the  $C_{ckt}$  and  $S_{ckt}$  are reconfigurable whose state can dynamically be changed using the write circuit. This is in contrast to the conventional multi-context architecture [42], where either the right branch or left branch is composed of fixed MTJs serving as reference resistance. Such a modification is done to achieve a higher  $\Delta$  and read margin by storing all input operands using MTJs in the left branch and their complements in the right branch instead of fixing a reference resistance in the right branch yielding less  $\Delta$ . Also, it is difficult to realize a reference resistance value using the parallel arrangement of MTJs, while preserving the symmetric structure. Additionally, the use of reconfigurable MTJs in both branches aids in the extension of the adder to comparator as shown in subsequent sections, albeit at the cost of increased write energy. Two designs of  $S_{ckt}$  are proposed, one following Eq.2 and the other is the optimized version of the former, referred to as design 1 and design 2, respectively.

#### 1) DESIGN 1

This design follows the conventional 5-input majority logic, where five MTJs (Fig.3) namely A, B, C<sub>in</sub>, C<sub>out</sub> bar 1, and  $C_{out}$  bar 2 in the left branch, and their complements in the right branch are used to store the operands of Eq.2, given the symmetric structure. Thus, for "N" 1-bit inputs, 2N MTJs, N on each branch are required. Here, all MTJs are of TMR=200% and the truth table of sum logic along with corresponding resistances is summarized in Fig.3(B). However, this design bears the disadvantage of increased write energy and area overhead which has been optimized to enhance performance in design-2.



**FIGURE 3.** (A) Circuit diagram of design 1 for S<sub>ckt</sub> with 5 MTJs on each branch with same TMR=200%. Here, MTJs A,B,C<sub>in</sub> constitute Req, shown with a resistor in (C). (B) Truth table corresponding to design 1 with Rl,Rr values for all input combinations along with  $\Delta$ . (C) Demonstration of replacement of 5 MTJ structure with 4 MTJ structure with TMR=600% for MTJ C<sub>out</sub>bar and corresponding equation.



FIGURE 4. Carry sub-circuit of the proposed 1-bit NVRMFA.

### 2) DESIGN 2

The original Eq.2 is modified by replacing the two  $C_{out}$  bar terms in Eq.2 with a single  $C_{out}$  bar term but with weight = 2 as represented as follow,

$$Sum = M5(A, B, C_{in}, C_{out}bar, C_{out}bar)$$
$$= M4(A, B, C_{in}, C_{out}bar^*).$$
(3)

Here, the conventional norm of having an odd number of inputs for majority logic is amended by introducing the concept of weight/voter (denoted by \*) [24] status to  $C_{out}$  bar. The modified Eq. 3 is reflected in the hardware by tweaking the resistance value of the MTJ storing  $C_{out}/C_{out}$  bar terms to a higher value compared to other input operand MTJs.

TABLE 2. Comparison between the two proposed Sum subcircuit designs for NVRMFA.

| Parameters            | Design-1 | Design-2 |
|-----------------------|----------|----------|
| Read Energy (fJ/bit)  | 2.53     | 2.21     |
| Read Latency (ps)     | 246.48   | 233.84   |
| Write Energy (fJ/bit) | 1341.25  | 1118.65  |
| Write Latency (ns)    | 2.45     | 2.11     |
| Sense Margin (mV)     | 1.07     | 3.35     |

From Fig.3(C), it can be understood that the choice of resistance of the MTJ replacing MTJs  $C_{out}$  1( $C_{out}$ bar1) &  $C_{out}$  2( $C_{out}$  bar 2) should ensure that the equation Req ||  $C_{out}$ bar 1 ||  $C_{out}$ bar 2 = Req ||  $C_{out}$ bar holds good. Variation of geometric parameters such as MTJ area as done in [54], will not yield satisfactory alteration in resistance MTJ to satisfy the Eq.3. Thus, similar to [55] and [56], TMR, the cardinal physics behind MTJ, is chosen as the candidate for variation to achieve desired higher resistance for MTJ Coutbar (or MTJ Cout). A heuristic approach was adopted in determining the required TMR ratio by calculating Rl and Rr for different combinations. Finally, a TMR ratio of 600% was fixed for MTJ Cout bar (MTJ Cout), as supported by the MTJ model library [57] and advancements in MTJ fabrication with MgO as a barrier layer. For instance, S. Ikeda et al. have experimentally demonstrated a TMR=604% at room temperature for pseudo spin valve MTJ structure in [58]. Fig.5 shows the optimized  $S_{ckt}$  with 4 MTJs on both branches, reducing the energy, latency, and area of the circuit. Table 2 summarizes the performance increment achieved in design 2 compared to design 1. From Table 2 it can be observed that write energy and latency are reduced for design 2 due to lesser MTJs compared to design 1. For design 2,



**FIGURE 5.** (A) Design 2 of S<sub>ckt</sub>, the optimized version that constitutes Sum sub-circuit of the proposed 1-bit NVRMFA. The carry output and its complement from C<sub>ckt</sub> (Fig.4) is given as inputs to write circuits along with primary inputs (A,B,C<sub>in</sub>) and their complements. (B) Buffer circuitry to shift CLK by 2.5ns to obtain CLK<sub>sum</sub>. (C) STT write circuit schematic [36]. Here, Vdda =1.25 V and widths of transistors P1,P2,N1,N2 are taken as 1 $\mu$ m for reliable writing. Input data controls the direction of current, while the Enable signal turns the circuit ON or OFF using the control circuit.

TABLE 3. Truth table for the proposed 1-bit NVRMFA along with corresponding resistances in left and right branches.

| C <sub>in</sub> B A C | Δ       | Α  | Δ  | ٨        | Δ       | A       | C har    | Ca       | rry sub-circ | uit      | S | ım sub-circu | uit | Output 1 | Output 2 | Output 3 |
|-----------------------|---------|----|----|----------|---------|---------|----------|----------|--------------|----------|---|--------------|-----|----------|----------|----------|
|                       | Coutbur | Rl | Rr | $\Delta$ | Rl      | Rr      | $\Delta$ | Output I | Output 2     | Output 5 |   |              |     |          |          |          |
| $0^a$                 | 0       | 0  | 1  | 0.333Rp  | Rp      | 0.66Rp  | 0.318Rp  | 0.5Rp    | 0.182Rp      | 0        | 0 | 1            |     |          |          |          |
| $0^a$                 | 0       | 1  | 1  | 0.428Rp  | 0.6Rp   | 0.171Rp | 0.403Rp  | 0.375Rp  | 0.028Rp      | 0        | 1 | 1            |     |          |          |          |
| $0^a$                 | 1       | 0  | 1  | 0.428Rp  | 0.6Rp   | 0.171Rp | 0.403Rp  | 0.375Rp  | 0.028Rp      | 0        | 1 | 1            |     |          |          |          |
| $0^a$                 | 1       | 1  | 0  | 0.6Rp    | 0.428Rp | 0.171Rp | 0.375Rp  | 0.403Rp  | 0.028Rp      | 1        | 0 | 0            |     |          |          |          |
| $1^b$                 | 0       | 0  | 1  | 0.428Rp  | 0.6Rp   | 0.171Rp | 0.403Rp  | 0.375Rp  | 0.028Rp      | 0        | 1 | 1            |     |          |          |          |
| $1^b$                 | 0       | 1  | 0  | 0.6Rp    | 0.428Rp | 0.171Rp | 0.375Rp  | 0.403Rp  | 0.028Rp      | 1        | 0 | 0            |     |          |          |          |
| $1^b$                 | 1       | 0  | 0  | 0.6Rp    | 0.428Rp | 0.171Rp | 0.375Rp  | 0.403Rp  | 0.028Rp      | 1        | 0 | 0            |     |          |          |          |
| $1^b$                 | 1       | 1  | 0  | Rp       | 0.333Rp | 0.66Rp  | 0.5Rp    | 0.318Rp  | 0.182Rp      | 1        | 1 | 0            |     |          |          |          |

Note: \* For Carry operation,  $Rl = R_{MTJA} || R_{MTJB} || R_{MTJC_{in}}$ ,  $Rr = R_{MTJAbar} || R_{MTJC_{in}bar}$  and for Sum logic,  $Rl = R_{MTJA} || R_{MTJB} || R_{MTJC_{in}} || R_{MTJC_{in}bar}$ ,  $Rr = R_{MTJAbar} || R_{MTJBbar} || R_{MTJC_{in}bar}$ ,  $Rr = R_{MTJAbar} || R_{MTJBbar} || R_{MTJC_{in}bar}$ ,  $Rr = R_{MTJAbar} || R_{MTJC_{in}bar} || R_{MTJC_{in}bar$ 

due to increased Rl and Rr (Table 3) compared to design 1 (Fig.3(B)), a dip in sensing current ( $I_{sensing}$ ) causes reduced read energy. Also, the read latency is improved owing to a predominantly lesser  $\Delta$  value for design 2 (3) compared to design 1 (Fig.3(B)). Therefore, design 2 is adopted for the NVRMFA design, and in further discussions, design 2 is referred to as  $S_{ckt}$ . However, it is to be noted that design 2 containing MTJs with different TMR and device parameters positioned in such close proximity needs special expensive manufacturing techniques with less yield [59], [60], while considering fabrication aspects.

#### 3) OPERATION OF THE CIRCUIT

In  $C_{ckt}$  (Fig.4), the clock signal, CLK, is the primary control signal, typical of a dynamic logic, controlling the write and read operations of the circuit. From Eq.2, It can be inferred that  $C_{ckt}$  has to be evaluated first, whose outputs

( $C_{out}/C_{out}$ bar) serve as the inputs for  $S_{ckt}$ . Thus, mandating the need for two separate clock pulses with an offset/delay of 2.5ns in the time domain between the pulses to ensure evaluation of sum logic after carry computation. Fig.5 shows the  $S_{ckt}$ , where CLK is shifted by 2.5ns using a buffer circuitry (Fig.5(B)) consisting of cascaded inverters to obtain  $CLK_{sum}$ , the primary control signal for  $S_{ckt}$ . The adder circuit operates in two phases, precharge phase (P) and the evaluation phase (E). In the precharge phase, when clocks CLK and  $CLK_{sum}$  are low or "0", the writing of input values into corresponding MTJs is performed using the write circuit. During the evaluation phase, when CLK and  $CLK_{sum}$  are high or "1", the logic computation of Carry and Sum takes place based on the speed of current discharge induced by the difference in Rl and Rr.

The left and right nodes of  $C_{ckt}$  and  $S_{ckt}$  connected to the outputs Carry & Carrybar (Fig.4) and Sum & Sumbar



**FIGURE 6.** Transient waveform for NVRMFA simulated for 100ns, covering all 8 input combinations (1 cycle=12.5ns). Orange and blue rectangular strips denote the evaluation phases of  $C_{ckt}$  and  $S_{ckt}$ , respectively. Example case, A,B, $C_{in} = 0,1,0$  is highlighted by red box.

(Fig.5), respectively are pulled up to Vdd = 1.2 V duringthe precharge phase. When CLK=CLK<sub>sum</sub>=0, transistors M1, M4, M9, and M10 are switched ON in  $C_{ckt}$  and  $S_{ckt}$ , pulling nodes X and Y to Vdd. The access transistors, M13 to M18 in  $C_{ckt}$  and M13 to M20 in  $S_{ckt}$  facilitate the writing of individual input values and their complements into the corresponding MTJs (A, B, Cin, Coutbar, Abar, Bbar,  $C_{in}$  bar, and  $C_{out}$ ) using the enable signals Ena, Enb, Enc and Encout (Figs.4,5). The widths of access transistors are increased to 1  $\mu$ m and 1.5 $\mu$ m in C<sub>ckt</sub> and S<sub>ckt</sub>, respectively to ensure sufficient write current for reliable write operation. The enable signals Ena to Encout are spaced in the time domain such that sequential writing of input values into MTJs happens. Two levels of parallelism are observed in the write operation; one within each subcircuit ( $C_{ckt}$  &  $S_{ckt}$ ), where the values are written into MTJs simultaneously in both branches by using the same set of enable signals and the other being the simultaneous writing of primary inputs (A, B,  $C_{in}$ ) and their complements in both subcircuits  $C_{ckt}$ and  $S_{ckt}$ . A duration of 2.5ns, the same as evaluation time, is required for the writing of each MTJ and the evaluation phase of  $C_{ckt}$  always overlaps with the last write operation of  $S_{ckt}$  for a given cycle. Accordingly, the precharge duration of 10ns is considered with 2.5ns evaluation phase, yielding the operating frequency for the adder as 80MHz. The remaining transistors during the precharge phase are in the cut-off region. In the evaluation phase, transistors M1, M4, M9, and M10 in  $C_{ckt}$  and  $S_{ckt}$  are switched OFF, cutting the connection between Vdd and the circuit. Transistors M11 and M12, the isolating transistors (which isolate the sense amplifier from



**IEEE**Access

FIGURE 7. A top view/block diagram of the proposed NVRMALU cell.

 TABLE 4. Opcodes combinations for various arithmetic and logical functionalities of the proposed NVRMALU.

| Operation      | S0 | S1 | Output | Output 2   | Output 3          |
|----------------|----|----|--------|------------|-------------------|
| type           |    |    | 1 (CL) | (SL/SR)    | (CR)              |
| Addition       | 0  | 0  | Carry  | Sum        | Carrybar          |
| Subtraction    | 1  | 0  | Borrow | Difference | Borrowbar         |
| Logical, Cin=0 | 0  | 1  | AND    | XOR        | NAND              |
| Logical, Cin=1 | 0  | 1  | OR     | XNOR       | NOR               |
| Comparator     | 1  | 1  | A>B    | A=B        | A <b< td=""></b<> |

the MTJ logic tree in the precharge phase) are turned ON, enabling the current discharge through the MTJ logic tree. All the access transistors, M13 to M18 in  $C_{ckt}$  and M13 to M20 in  $S_{ckt}$  are switched ON by making all enable signals high, and the write operation is halted by disabling the write enable signal. There are two discharge paths (Figs.4,5) constituted by two sets of discharge transistors, (M7 & M8 and M19 & M20 in  $C_{ckt}$ ) and (M7 & M8 and M21 & M22 in  $S_{ckt}$ ). The voltage difference between nodes X and Y,  $|V_X - V_Y|$ , a reflection of the difference in Rl and Rr, is amplified by inverters I1 and I2, which serve as gate voltages for M7 & M8, controlling the logic evaluation. Here, unlike in conventional LIM architecture [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], there are two tailing discharge transistors, M19 & M20  $(C_{ckt})$  and M21 & M22  $(S_{ckt})$ , instead of just one transistor, in order to write distinct input values (excluding complements of inputs) into MTJs as in the case of 2-bit comparator design. When two or more of the inputs are "0" or low, RI< Rr, and vice-versa for the case when two or more of the inputs are "1" or high. Table 3 presents the truth table for the adder along with Rl and Rr values for all input combinations. For Rl<Rr, the left node discharges faster than the right node and eventually reaches zero while pulling up the right node to Vdd owing to the cross-coupled structure of the sense amplifier and vice-versa for the case when Rl>Rr. Thus, evaluating carry logic (left node) and its complement (right node) in  $C_{ckt}$  (Fig.4) and sum(left node) along with its complement (right node) in  $S_{ckt}$  (Fig.5). The functionality verification is done by performing transient analysis, which is presented in Fig.6. The working of the circuit can be better understood by considering an example case A, B,  $C_{in} = 0,1,0$  highlighted in Fig.6. From Table 3, it can be observed that Rl = 0.428Rp, Rr = 0.6Rp for  $C_{ckt}$ , Rl<Rr, and for  $S_{ckt}$ , Rl = 0.403Rp, Rr = 0.375Rp, Rl>Rr, thus giving carry=0 and sum=1, in line with the foregoing discussions.



**FIGURE 8.** Circuit diagram of the proposed NVRMALU. (A) Carry sub-circuit (C<sub>ckt</sub>) with muxes (Mux<sub>carryL</sub> & Mux<sub>carryR</sub>) to configure ALU for different functionalities by providing corresponding inputs to write circuit. (B) Sum sub-circuit (S<sub>ckt</sub>) of NVRMALU with C<sub>ckt</sub> outputs, primary inputs for arithmetic and logic operations and equality detector output, serving as inputs to muxes (Mux<sub>sumL</sub> & Mux<sub>sumR</sub>). Here, S<sub>ckt</sub> performs equality detection operation. (C) Peripheral circuit aiding S<sub>ckt</sub> in realising equality detection operation.

TABLE 5. Truth table for the Subtraction functionality of the proposed NVRMALU, along with corresponding resistances in left and right branches.

| A Abar | R    | R. | R har             | Borrow s | ub-circuit | Difference | sub-circuit | Output 1 | Output 2 | Output 3 |          |
|--------|------|----|-------------------|----------|------------|------------|-------------|----------|----------|----------|----------|
|        | Abai | D  | $\mathbf{D}_{in}$ | Dout Dal | RI         | Rr         | Rl          | Rr       | Output I | Output 2 | Output 5 |
| 0      | 1    | 0  | 0                 | 1        | 0.428Rp    | 0.6Rp      | 0.403Rp     | 0.375Rp  | 0        | 0        | 1        |
| 0      | 1    | 0  | 1                 | 0        | 0.6Rp      | 0.428Rp    | 0.375Rp     | 0.403Rp  | 1        | 1        | 0        |
| 0      | 1    | 1  | 0                 | 0        | 0.6Rp      | 0.428Rp    | 0.375Rp     | 0.403Rp  | 1        | 1        | 0        |
| 0      | 1    | 1  | 1                 | 0        | Rp         | 0.333Rp    | 0.5Rp       | 0.318Rp  | 1        | 0        | 0        |
| 1      | 0    | 0  | 0                 | 1        | 0.333Rp    | Rp         | 0.318Rp     | 0.5Rp    | 0        | 1        | 1        |
| 1      | 0    | 0  | 1                 | 1        | 0.428Rp    | 0.6Rp      | 0.403Rp     | 0.375Rp  | 0        | 0        | 1        |
| 1      | 0    | 1  | 0                 | 1        | 0.428Rp    | 0.6Rp      | 0.403Rp     | 0.375Rp  | 0        | 0        | 1        |
| 1      | 0    | 1  | 1                 | 0        | Rp         | 0.6Rp      | 0.375Rp     | 0.403Rp  | 1        | 1        | 0        |

#### **B. 1-BIT MAGNETIC SUBTRACTOR**

Fig.7 is the schematic of the top-level block diagram of the proposed 1-bit NVRMALU cell, built using NVRMFA, including the input-output interconnects, while Fig.8 shows its internal schematic. Each sub-circuit (Fig.8(A),(B)) of the NVRMALU comprises two 4:1 Multiplexers (Mux<sub>carrvL</sub> & Mux<sub>carryR</sub> in C<sub>ckt</sub> and Mux<sub>sumL</sub> & Mux<sub>sumR</sub> in S<sub>ckt</sub>) corresponding to two write circuits, one for each branch. The four functionalities of the NVRMALU; Addition, Subtraction, Logical operations (6 basic logic gates), and 2-bit digital comparator and their corresponding input data values are mapped to the write circuits configuring the MTJs (CL1 - CL3 & CR1 - CR3 in Fig.8(A) and SL1 - SL4 & SR1 - SR4 in Fig.8(B)) accordingly, using Mux<sub>carryL</sub> & Mux<sub>carryR</sub> and Mux<sub>sumL</sub> & Mux<sub>sumR</sub>, respectively. The 4:1 Mux control signals, S0 and S1 are set according to the opcodes summarized in Table 4 to configure the NVRMALU to perform different functionalities. The output nodes, CL and CR (Fig.8(A)) are mapped to Output 1 and Output 3 of the NVRMALU cell (Fig.7), respectively. The left and right nodes of  $S_{ckt}$  (Fig.8(B)), SL, and SR serve as inputs to a 2:1

Mux controlled by the same abovementioned control signal, S0, whose output is connected to Output 2 of the NVRMALU cell (Fig.7).

The reconfigurability trait of the MTJs and the resemblance between addition and subtraction functions are exploited to achieve subtractor functionality. Borrow (Fig.8(A)) and difference (Fig.8(B)) are evaluated using the boolean given by

$$Borrow(B_{out}) = M3(Abar, B, B_{in})$$
  
Difference =  $\overline{M4(Abar, B, B_{in}, B_{out}bar^*)}$ . (4)

The MTJs CL1, CL2, and CL3 (Fig.8(A)) are configured as Abar, B, and B<sub>in</sub>, respectively, while their complements are written into MTJs CR1 to CR3 (Fig.8(A)) by setting the control signals as S0=1, S1=0 for Mux<sub>carryL</sub> and Mux<sub>carryR</sub>(Table 4). Similarly, MTJs SL1 – SL4 and SR1 – SR4 are configured to store Abar, B, B<sub>in</sub>, and B<sub>out</sub>, the borrow output, and their complements, using Mux<sub>sumL</sub> and Mux<sub>sumR</sub> respectively. The operation of the circuit is very similar to that of NVRMFA, where in the precharge

| 40 | A0 A1 B0 |            | R1 | Carry su | ıb-circuit | Sum sul | b-circuit | Output 1 | Output 2 | Output 3 |
|----|----------|------------|----|----------|------------|---------|-----------|----------|----------|----------|
|    |          | <b>D</b> 0 | DI | Rl       | Rr         | Rl      | Rr        | Output I | Output 2 | Output 5 |
|    |          | 0          | 0  | 0.333Rp  | 0.333Rp    | 0.3Rp   | 0.318Rp   | 0*       | 1        | 0*       |
|    | 0        | 1          | 0  | 0.333Rp  | 0.428Rp    | 0.403Rp | 0.3Rp     | 0        | 0        | 1        |
|    |          | 0          | 1  | 0.333Rp  | 0.6Rp      | 0.403Rp | 0.3Rp     | 0        | 0        | 1        |
|    |          | 1          | 1  | 0.333Rp  | Rp         | 0.403Rp | 0.375Rp   | 0        | 0        | 1        |
|    |          | 0          | 0  | 0.428Rp  | 0.333Rp    | 0.552Rp | 0.25Rp    | 1        | 0        | 0        |
| 1  | 0        | 1          | 0  | 0.428Rp  | 0.428Rp    | 0.375Rp | 0.403Rp   | 0*       | 1        | 0*       |
|    |          | 0          | 1  | 0.428Rp  | 0.6Rp      | 0.552Rp | 0.3Rp     | 0        | 0        | 1        |
|    |          | 1          | 1  | 0.428Rp  | Rp         | 0.552Rp | 0.375Rp   | 0        | 0        | 1        |
|    |          | 0          | 0  | 0.6Rp    | 0.333Rp    | 0.552Rp | 0.25Rp    | 1        | 0        | 0        |
|    | 1        | 1          | 0  | 0.6Rp    | 0.428Rp    | 0.552Rp | 0.3Rp     | 1        | 0        | 0        |
|    |          | 0          | 1  | 0.6Rp    | 0.6Rp      | 0.375Rp | 0.403Rp   | 0*       | 1        | 0*       |
|    |          | 1          | 1  | 0.6Rp    | Rp         | 0.552Rp | 0.375Rp   | 0        | 0        | 1        |
|    |          | 0          | 0  | Rp       | 0.333Rp    | 0.875Rp | 0.25Rp    | 1        | 0        | 0        |
| 1  | 1        | 1          | 0  | Rp       | 0.428Rp    | 0.875Rp | 0.3Rp     | 1        | 0        | 0        |
|    |          | 0          | 1  | Rp       | 0.6Rp      | 0.875Rp | 0.3Rp     | 1        | 0        | 0        |
|    |          | 1          | 1  | Rp       | Rp         | 0.5Rp   | 0.552Rp   | 0*       | 1        | 0*       |

TABLE 6. Truth table for the proposed 2-bit comparator along with corresponding resistances in left and right branches.

Note: \* The output nodes do not disharge completely but enough to be considered as logic low or "0", owing to differential structure of sense amplifier.



**FIGURE 9.** Transient response for subtraction operation with an example case,  $A,B,B_{in} = 1,1,0$  illustrated.

phase, the writing of input operands into MTJs happens while the output nodes are pulled up to Vdd. During the evaluation phase, the borrow and difference values are computed based on Rl and Rr, constituted by different MTJ resistances arising from various combinations of inputs written into MTJs as summarized by the Table 5. The subtractor transient waveform is given in Fig.9, where an example case, A, B,  $B_{in} = 1,1,0$  is elucidated. Furthermore, the majority logic primitives can achieve other multi-input boolean functions, outside the framework of conventional functionalities discussed above. For instance, the IMPLY logic can be implemented by making  $B_{in}=1$  in subtractor mode (M3(Abar, B,1)=A $\longrightarrow$  B), without any additional gates/circuitry.

#### C. BASIC LOGIC GATES

The six basic logic functions; AND, OR, XOR, NAND, NOR, XNOR are inherently encapsulated within the adder functionality, as the carry function when expressed in Sum of product form is essentially a combination of AND and OR functions, while the sum function stems from XOR operation. Thus, the full adder function can be modified to realize 2-input logic gates by configuring  $C_{in}$  as a control signal to switch the circuit from AND/NAND/XOR functionality to OR/NOR/XNOR functionality (Table 4), given by Eq.5.

$$M3(A, B, 0) = A.B, M3(A, B, 1) = A + B$$
  

$$M4(A, B, 0, \overline{M3(A, B, 0)^*}) = A \oplus B.$$
 (5)

The simultaneous production of true and complementary outputs in  $C_{ckt}$  and  $S_{ckt}$ , owing to the cross-coupled structure and differential nature of the sense amplifier, eliminates the need for additional inverters to produce complements of AND, OR, and XOR logics. When Cin=0, carry output (CL) (Fig.8(A)) gives the AND logic (Output 1), while CR gives NAND logic (Output 3) and in  $S_{ckt}$  (Fig.8(B)) SL gives XOR logic and SR gives XNOR logic (SL or SR selected as Output 2 according to S0 signal using 2:1 Mux in Fig.7). Similarly, when  $C_{in}=1$ , CL in  $C_{ckt}$  gives OR logic (Output 1) and its complement, NOR (CR) is available at Output 3, while SL in S<sub>ckt</sub> gives XNOR, SR gives XOR logic (Output 2 selected between SL and SR using S0 signal). Fig.10 shows the transient response for all six logic gates along with an example case, A,B = 0,1 highlighted. Additionally, the 5-input majority logic underlying  $S_{ckt}$  can be harnessed to achieve 3-input logic operations such as AND/NAND and OR/NOR by configuring MTJ Cout bar as 0 and 1 respectively, according to Eq.6

$$M4(A, B, C, 0^*) = A.B.C$$
  

$$M4(A, B, C, 1^*) = A + B + C.$$
 (6)



**FIGURE 10.** Transient response for six basic logic gates functionality of NVRMALU. Green box shows how  $C_{in}$  is used as mode signal for switching between AND/NAND and OR/NOR modes. XOR & XNOR outputs are produced in both modes. Example cases,  $A_{,B} = 0,1$  for AND/NAND mode and  $A_{,B} = 1,0$  for OR/NOR mode are highlighted.

#### D. 2-BIT MAGNETIC COMPARATOR

Digital comparators are very vital datapath elements, instrumental in implementing numerous algorithms of the controller within a processor for applications ranging from general-purpose computer architecture operations such as memory addressing logic, queue buffers, and test circuits to specific applications such as digital image processing, pattern matching, arithmetic sorting, data compression, and digital neural network [61]. Hence, its inclusion in NVRMALU is a crucial next step in realizing a complete magnetic processor. Digital comparators are made up of two components; the magnitude comparator (Mag-Comp) which compares the magnitudes of its inputs to assess which is greater or lesser and the equality detector (Eq-Detec) which checks if the inputs are equal in magnitude or not. Since, the concept of significant bits (MSB, LSB) is meaningful only for inputs with two or more bits, a 2-bit 2-input digital comparator (Fig.8) is designed and demonstrated in this work. Also, the functionality of a 1-bit comparator diminishes into a 1-bit zero detector, which could be implemented using a magnetic flip-flop [5]. The intrinsic comparator nature of the sense amplifier is utilized to perform the magnitude comparator operation using  $C_{ckt}$  (Fig.8(A)) instead of the usage of relatively computationally expensive subtraction operation as done in conventional magnitude comparators [62]. However, the differential nature of sense amplifier inhibits the realization of equality detectors without peripheral circuits like XNOR gates. When the inputs A and B are equal the left and right nodes (Fig.8) partially discharge to 350 mV



**FIGURE 11.** Transient response for 2-bit comparator functionality of NVRMALU considering all 16 input combinations of A0,A1 and B0,B1. Blue box illustrates how for the equality case, A0,A1 = B0,B1, CL and CR nodes partially discharge to 350 mV. Red boxes highlight the example cases, (A0,A1 = B0,B1 = 1,0), (A0,A1 = 0,1 & B0,B1 = 1,1) and (A0,A1 = 1,1 & B0,B1 = 1,0).

and are susceptible to arbitrarily discharging based on the instantaneous current profile in the left and right branches. Thus, a 2-bit equality detector circuit using  $S_{ckt}$  (Fig.8(B)) and conventional AND and XNOR gates (Fig.8(C)) is designed to realize the complete comparator operation.

In Fig.8, MTJs CL1, CL2 and CL3 store the values, A0, A1, and A1, respectively, using Mux<sub>carryL</sub>, where A0 & A1 are LSB and MSB of input A, respectively. Here, A1 is written twice (MTJs CL2 & CL3) to establish a weight = 2 for the MSB. Similarly, bits B0 and B1 of input B are written into the right branch MTJs, CR1 - CR3 in  $C_{ckt}$ , using  $Mux_{carrvR}$ (Fig.8(A)) used as Mag-Comp). When, A > B, Rl > Rr, thus CL=1, CR=0 (Table 6), and vice-versa for A<B, following the same working of the NVRMFA discussed above. The truth table of the two-bit two-input comparator is given by Table 6 containing the resistances of each branch for all input combinations. Although the circuit in Fig.8(C) evaluates inputs for equality, it is still volatile, and thus its output and complement are stored in MTJs SR4 and SL4, respectively, using Mux<sub>sumR</sub> and Mux<sub>sumL</sub>, respectively. Having stored the output of the equality detection operation, it is made fully non-volatile by storing the inputs, A0, A1 in MTJs SL1, SL2 and B0, B1 in MTJs SR1, SR2, using Mux<sub>sumL</sub> and Mux<sub>sumR</sub>, respectively (Fig.8 (B) used as Eq-Detec). As shown in Table 6, SR outputs "1" or high only when A=B and to ensure the same, MTJs SL3 and SR3 are fixed as 1/Rap and 0/Rp, respectively. The transient response is presented in Fig.11, where two example cases, one for the A=B case and one for  $A \neq B$  case are highlighted. Additionally, the



FIGURE 12. Demonstration of nonvolatility and dynamic reconfigurability (shown using the green box, illustrating the change in S0,S1 according to Table 4) of the proposed NVRMALU circuit. Power loss scenario has been simulated from 37.5ns to 45ns and from 70ns to 82.5ns. Data retention process during power loss and evaluation post power resumption has been elucidated using red square box. The black arrow in output signals highlights how evaluated outputs before power loss is recovered upon power resumption. Simulation performed for 100ns, considering example cases for all functionalities of NVRMALU.



FIGURE 13. Block diagram of 4-bit extension of NVRMALU for all but comparator functionality. CLK and CLK<sub>sum</sub> are shifted by 2.5ns for every stage using buffer circuits as shown.

nonvolatility trait of the proposed NVRMALU has been demonstrated in Fig.12 for all functionalities. Furthermore, the dynamic reconfigurability ability of the proposed design to switch between multiple functionalities of NVRMALU during run-time using S0 and S1 signals has been elucidated in Fig.12.

#### E. FOUR-BIT EXTENSION OF ALU

Demonstration of the proposed NVRMALU for multiple bits (Fig.13) has been carried out by designing a four-bit NVR-MALU (excluding comparator functionality) by cascading



**FIGURE 14.** Transient response for 4-bit NVRMALU for the addition, subtraction and basic logic gates functionalities. Colored bands, A,B,C,D,E highlight the evaluation phases of  $C_{ckt}$  and  $S_{ckt}$  for cell<sub>i</sub>(i=0,1,2,3, respectively). The outputs 1&3 of cell<sub>i</sub>(i=1,2,3) overlaps/available with output 2 of cell<sub>i-1</sub>. Example case, A=13, B=10 taken for 4-bit addition and subtraction operations. Example case, A=6, B=12 is considered for AND/NAND/OR/NOR/XOR/XNOR logic operations.

1-bit NVRMALU cells in a ripple-carry fashion. The fourbit extension for addition, subtraction, and logic gates has been shown in Fig.13. Here, the four-bit extension of the comparator is not presented as it requires additional circuitry and special architecture [61] other than the ripple carry structure, which is chosen for its simplicity. In Fig.13, four 1-bit NVRMALU cells (Fig.7) are connected sequentially following the rule that N-bit computation requires N 1-bit NVRMALU cells. A0, B0 to A3, and B3 are given as inputs to corresponding cells, while the  $C_{out}$  of the previous cell serves as input to the next cell. Also, here a certain degree of parallelism has been achieved by incorporating a pipelined architecture with four separate sets of clock pulses enabling the simultaneous operation of all four cells. The CLK and  $CLK_{sum}$  signals for cell<sub>i</sub> (i=0,1,2,3) are each shifted by 2.5ns using the buffer circuit (Fig.13) compared to  $cell_{i-1}$  to obtain a four-stage pipeline structure. Such a setup ensures that all four carry and sum output bits are available after 20ns instead of at the end of the 4th cycle of CLK and CLKsum as in the case of [8] which would be 50ns (1 cycle = 12.5 ns). Thus increasing the operating frequency by 250 %, albeit at the cost of area compared to conventional non-pipelined structure. Since all inputs and intermediate values such as  $C_{outi}$  (i=1,2,3) are saved in MTJs the circuit is completely non-volatile and suitable for instant ON-OFF applications and power loss scenarios, while also eliminating the need for registers to store intermediate values [8]. The control

| Parameters                 | DPTLCMOS             | MFA [8]              | MFA [9]              | MFA [12]             | MFA [10]             | MFA [11]            | Proposed            |
|----------------------------|----------------------|----------------------|----------------------|----------------------|----------------------|---------------------|---------------------|
|                            | FA [8], [63]         |                      |                      |                      |                      |                     | NVRMFA              |
| Static Power (nW)*         | 5.39                 | 29.15                | 18.27                | 17.73                | 19.68                | 25.98               | 6.14                |
| Dynamic Power $(\mu W)^*$  | 1.01                 | 16.45                | 11.05                | 21.31                | 32.77                | 4.43                | 0.693               |
| Delay (ps)*                | 40.21 <sup>c,d</sup> | 125.75 <sup>c</sup>  | 75.49 <sup>c</sup>   | 112.74 <sup>c</sup>  | 98.29 <sup>c</sup>   | 81.35 <sup>c</sup>  | 175.90 <sup>c</sup> |
|                            |                      | 145.92 <sup>d</sup>  | 109.03 <sup>d</sup>  | 121.30 <sup>d</sup>  | $100.35^{d}$         | 93.98 <sup>d</sup>  | 233.84 <sup>d</sup> |
| PDP (aJ) <sup>a</sup>      | 40.82 <sup>c,d</sup> | 2072.25 <sup>c</sup> | 835.54 <sup>c</sup>  | 2404.48 <sup>c</sup> | 3222.89 <sup>c</sup> | 362.49 <sup>c</sup> | 122.97 <sup>c</sup> |
|                            |                      | $2404.63^d$          | 1206.77 <sup>d</sup> | 2587.05 <sup>d</sup> | 3290.44 <sup>d</sup> | 418.77 <sup>d</sup> | 163.48 <sup>d</sup> |
| # Transistors <sup>b</sup> | 46                   | 30                   | 30                   | 78                   | 26                   | 23                  | 50                  |
| $\# MTJs^b$                | 0                    | 4                    | 4                    | 8                    | 4                    | 4                   | 14                  |
| Fully Non-Volatile         | No                   | No                   | No                   | Yes                  | No                   | No                  | Yes                 |

# TABLE 7. Performance Comparison amongst 1-bit DPTLCMOS FA, existing 1-bit MFA Designs and the Proposed 1-bit NVRMFA @ 1.2V and 80MHz using 45nm CMOS technology node.

Note: \* Values corresponding to only read/sensing operation, excluding write operation are reported. a: Total power considered for PDP calculation i.e sum of static and dynamic powers, concerning read operation. Also for carry and sum operations PDP are calculated by considering corresponding delays with total power of NVRMFA. b: Write circuit and other peripheral circuitry are excluded. c: Values corresponding to C<sub>ckt</sub>. d: Values corresponding to S<sub>ckt</sub>.

| TABLE 8. Performance metri | cs comparison for variou | s functionalities between | the proposed NVRMALU | and DPTLCMOS ALU [63]. |
|----------------------------|--------------------------|---------------------------|----------------------|------------------------|
|----------------------------|--------------------------|---------------------------|----------------------|------------------------|

| Operation type |            | NVRMALU           |                   | DPTLCMOS ALU                    |                  |                   |  |  |
|----------------|------------|-------------------|-------------------|---------------------------------|------------------|-------------------|--|--|
| Operation type | Power (nW) | Delay (ps)        | PDP (aJ)          | <b>Power</b> ( $\mu$ <b>W</b> ) | Delay (ps)       | PDP (aJ)          |  |  |
| Addition       | 898.26     | Carry=177.41      | Carry=159.36      | 5.55                            | Carry=60.76      | Carry=337.21      |  |  |
|                |            | Sum=244.47        | Sum=219.59        |                                 | Sum=60.83        | Sum=337.60        |  |  |
| Subtraction    | 887.55     | Borrow=175.43     | Borrow=155.70     | 5.92                            | Borrow=60.98     | Borrow=361.00     |  |  |
|                |            | Difference=238.47 | Difference=211.65 |                                 | Difference=61.04 | Difference=361.35 |  |  |
| Logical, Cin=0 | 882.57     | AND/NAND=176.89   | AND/NAND=156.11   | 5.17                            | AND/NAND=60.75   | AND/NAND=314.07   |  |  |
|                |            | XOR=238.32        | XOR=210.33        |                                 | XOR=60.76        | XOR=314.12        |  |  |
| Logical, Cin=1 | 890.42     | OR/NOR=177.13     | OR/NOR=157.72     | 5.21                            | OR/NOR=60.81     | OR/NOR=316.82     |  |  |
|                |            | XNOR=239.68       | XNOR=213.41       |                                 | XNOR=60.83       | XNOR=316.92       |  |  |
| Comparator     | 828.58     | Mag-Comp=172.76   | Mag-Comp=143.14   | NA                              | NA               | NA                |  |  |
|                |            | Eq-Detec=214.18   | Eq-Detec=177.46   |                                 |                  |                   |  |  |

Note: The DPTLCMOS ALU is discussed in the appendix section.

signals S0 and S1 are common to all cells, configuring the functionality/mode of each cell according to the opcode in Table 4.

#### **IV. RESULTS AND ANALYSIS**

Assessment of the robustness of the proposed circuit design and analysis of design resilience as subjects of inevitable process variations during the fabrication process have been carried out in this section. In that regard, a comparative study between the NVRMFA around which revolves the working of NVRMALU and its contemporaries is performed using performance indicators such as power, delay, and PDP. Following that, a variability analysis for the adder circuit to evaluate the effects of process variations in MTJ on the read and write performances has been discussed. All the simulations were performed using CMOS 45nm GPDK technology library and PMA\_MTJ\_6.1.5\_Beta5, a Verilog-A-based model library [57] on the Cadence Spectre simulation platform.

#### A. QUALITATIVE ANALYSIS OF PERFORMANCE METRICS

Table 7 presents a qualitative comparison of performance metrics amongst proposed NVRMFA, various existing hybrid CMOS/MTJ LIM adders, and double pass transistor clocked CMOS (DPTLCMOS) adder obtained from the standard cell library of STMicroelectronics design kit [63], [64]. Here, to provide a fair comparison all adder circuits in Table 7 were simulated using CMOS 45nm GPDK technology and PMA\_MTJ library with 80 MHz operating frequency (precharge phase=10ns, evaluation phase=2.5ns) and supply voltage of 1.2V. Thus ensuring the same simulation setup and environmental conditions as the proposed NVRMFA. Furthermore, the choice of magnetic adders for comparison is restricted to STT-switching mechanism-based adders to enhance the accuracy of the comparison.

From Table 7 it can be seen that a significant power reduction has been achieved for the proposed NVRMFA compared to other MFAs [8], [9], [10], [11], [12]. The static power dissipation is high for MFAs [8], [9], [10], [11], [12], owing to the use of CMOS transistors for computation, which causes significant leakage losses during steady state. Also, the dynamic power for the proposed NVRMFA is greatly reduced compared to other MFAs [8], [9], [10], [11], [12]. Because in the proposed design only transistors associated with CLK and CLK<sub>sum</sub> signals contribute towards dynamic power consumption during the evaluation phase unlike MFAs [8], [9], [10], [11], [12], where transistors corresponding to inputs switch when CLK transitions from high to low and vice-versa, in addition to other transistors associated with CLK. Also, such an incredible power reduction for NVRMFA is possible due to very low current in order of nA passing through MTJ logic tree owing to high resistance (order of M  $\Omega$ ) set by the write circuits connected in parallel, whose enable line is made low, during evaluation phase,

| Type of  | Min           | Mean          | Max                | <b>STD</b> $(\sigma)$ | $\sigma$ /mean |
|----------|---------------|---------------|--------------------|-----------------------|----------------|
| Adder    | (μ <b>W</b> ) | (μ <b>W</b> ) | $(\mu \mathbf{W})$ | (μ <b>W</b> )         |                |
| MFA [8]  | 8.88          | 16.52         | 26.57              | 3.15                  | 0.19           |
| MFA [9]  | 10.58         | 12.2          | 14.03              | 0.56                  | 0.04           |
| MFA [10] | 21.89         | 32.64         | 41.94              | 3.59                  | 0.11           |
| MFA [11] | 2.72          | 5.25          | 12.02              | 1.86                  | 0.35           |
| MFA [12] | 17.11         | 20.68         | 23.38              | 1.07                  | 0.05           |
| Proposed | 0.577         | 0.655         | 0.746              | 0.029                 | 0.04           |
| NVRMFA   |               |               |                    |                       |                |

 
 TABLE 9. Comparison of variation in Power amongst various MFAs and proposed NVRMFA under  $3\sigma$  process variation using 1000 MC runs.

contrast to other MFAs [8], [9], [10], [11], [12]. Additionally, it is observed that most of the works [8], [9], [10], [11], [12] have not disclosed the integration of the write circuit with the main circuit and its performance indexes like write power and delay. Also given the primary idea of LIM is to use the stored values for computation, the write operation and associated values are excluded in the Table 7, where only values corresponding to the logic evaluation phase are reported. Furthermore, the power metrics discussed in Table 7 correspond to the active mode of the adder circuit. It is to be noted that the standby power mode of the proposed NVRMFA is zero, as power can be completely switched OFF given the full nonvolatility, unlike in the case of partially non-volatile MFAs [8], [9], [10], [11], [12] incurring standby power losses. However, the carry and sum delays for NVRMFA are higher compared to that of MFAs [8], [9], [10], [11], [12], owing to the double tail discharge structure of the adopted sense amplifier (Figs. 4,5). Nevertheless, the marvelous reduction in overall power makes up for it as seen through PDP values in Table 7.

In order to benchmark the performance of the designed NVRMALU, the DPTLCMOS adder is extended into DPTLCMOS ALU as shown in the appendix section, whose performance is compared with NVRMALU and summarised in Table 8. The power corresponding to all functionalities; addition, subtraction, and logic gates of the NVRMALU is found to be reduced by around six folds, thus demonstrating superior performance. Therefore the proposed NVRMFA and NVRMALU emerge as excellent instant ON-OFF digital systems for applications where the system predominantly is in standby mode [65] despite suffering from area overheads (high device count).

#### **B. VARIABILITY ANALYSIS**

Reliability analysis is imperative to understand the deviation in circuit performance under process variation during manufacturing and its remedial solutions for a successful design. To that end, a comparative analysis of variability in the power of the proposed NVRMFA and other MFAs has been presented in Table 9. Also to evaluate the sensitivity of the designed NVRMALU to variation in MTJ parameters, Fig.15 has been plotted to identify the dominant MTJ parameter controlling the circuit performance.



**FIGURE 15.** (A) Plot for variation in sense margin (SM for  $C_{ckt} \& S_{ckt}$ ) for variation in  $t_{ox}$  in steps of 0.05nm in presence of 3% Gaussian deviation in  $t_{sl}$  and TMR, obtained using 1000 MC runs. (B) Plot for variation in sense margin (SM for  $C_{ckt} \& S_{ckt}$ ) for variation in  $t_{sl}$  in steps of 0.1nm in presence of 3% Gaussian deviation in  $t_{ox}$  and TMR, obtained using 1000 MC runs. (C) Plot showing variation in  $t_{sl}$  in steps of 0.1nm in presence of 3% Gaussian deviation in  $t_{ox}$  and TMR, obtained using 1000 MC runs. (C) Plot showing variation in Rp for stand-alone MTJ against  $t_{ox}$  varied in steps of 0.025 nm, while keeping other parameters constant. (D) Plot for variation in  $I_{pulse}$  length for different  $V_{MTJ}$  values against  $t_{ox}$  varied in steps of 0.025 nm, while keeping other parameters constant. Here,  $I_{pulse}$  length for P->AP transition and AP->P transition are in close proximity, thus their average values are reported in Y-axis.

In Table 9, 1000 MC runs have been performed by considering a  $3\sigma$  variation in CMOS device parameters such as channel length, threshold voltage, and 3% ( $\sigma = 1\%$ , as mentioned in [66]) Gaussian variation in geometric parameters of MTJ such as MgO barrier thickness (tox), free layer thickness  $(t_{sl})$ , and TMR. Since, MTJ parameters,  $t_{ox}$ ,  $t_{sl}$ , and TMR are more susceptible to variations during fabrication [67], 3% Gaussian variation only in these parameters is considered as supported by the model, while keeping other material parameters constant. Additionally, the effects of Joule heating have also been considered to understand the stochastic nature of MTJ switching. In Table 9, the variability coefficient ( $\sigma$ /mean) for the proposed NVRMFA is remarkably less compared to all but similar to MFA [11], suggesting better immunity of the proposed circuit towards process variations.

An understanding of the various causes of failure in read and write operations, which are expressed as read error rate (RER) and write error rate (WER), respectively, is extremely crucial in determining the reliability of the designed circuit. Since NVRMALU revolves around NVRMFA, a step-wise approach to study the parameters controlling RER and WER corresponding to NVRMFA has been adopted and presented in Fig.15. In order to illustrate the dominance of MTJ parameters in deciding circuit performance, only MTJ variations are considered in the following discussion, while keeping CMOS parameters constant. Process variations during fabrication affect all the MTJ parameters, deviation in a single MTJ parameter can not be isolated for analysis, hence simultaneous consideration of all three geometric parameters is done. However, it is carried out with a focus on  $t_{ox}$  and  $t_{sl}$  in Fig.15(A) and Fig.15(B), respectively to determine which parameter has the most detrimental effect. Also, it is to be noted that read disturbance error is eliminated due to the flow of only the leakage current (Figs.4,5), which is lesser than the critical switching current ( $I_{co} = 38.2\mu A$ ) of MTJ by a large margin.

Sense margin (SM) is defined as the difference in node voltages,  $|V_X-V_Y|$ , which is amplified by the inverters I1 and I2 in Figs 3,4,5 and 8 controlling the discharge speed of M7 and M8 transistors during logic computation. Thus, this voltage difference must be large enough for differential discharge rates of M7 and M8 transistors enabling reliable deterministic logic computation. SM is obtained as the product of sensing current  $(I_{sensing})$  and resistive difference in left and right branches ( $\Delta = |Rl-Rr|$ ). SM can be calculated by just determining node voltages at X and Y for carry and sum sub-circuits of NVRMALU during the simulation and then evaluating their difference, which will be found to be directly proportional to  $\Delta$ . It is a crucial parameter indicating the success and efficiency of read operation. Firstly SM (given by Eq.7), controlling the read performance [68], is plotted as a function of  $t_{ox}$  in the presence of 3% Gaussian variation in  $t_{sl}$ and TMR in Fig.15(A). Similarly, in Fig.15(b), SM is plotted as a function of  $t_{sl}$  while simultaneously varying  $t_{ox}$  and TMR by 3% following Gaussian distribution. The relation among SM,  $\Delta$ , and Rp is given by Eq.7, which partnered with the Rp equation [68] given by Eq.8, explains the growing trend of SM with  $t_{ox}$  in Fig.15(A).

$$|V_X - V_Y| = I_{sensing} \times \Delta = K \times Rp.$$
(7)

Here, K is a positive number from Table 3.

$$Rp = \frac{t_{ox}}{F \times A_F \times \sqrt{\phi}} \times e^{(1.025 \times t_{ox} \times \sqrt{\phi})}.$$
 (8)

Here, F is the fitting parameter associated with resistancearea product (RA) of MTJ,  $\phi$  is the energy barrier height and  $A_F$  is the cross-sectional area of MTJ [68]. The drastic trend in SM pertaining to  $S_{ckt}$  compared to that of  $C_{ckt}$  is due to two reasons, one being the higher resistance values caused by the use of TMR=600% in Sckt. Higher Isensing owing to higher access transistor widths (M13-M20 in Fig.5) and lesser  $\Delta$  (Table 3) constitute the second reason. Thus, Fig.15(A) shows improvement in read margin through an increase in Rp values, given the symmetric nature of the circuit where  $\Delta$  primarily controls the read decision process. This encourages the use of high tox MTJs with high Rp values as in the case of VCMA-based MTJs [34], to tackle the issue of reduced  $\Delta$  in modified multi-context hybrid architecture with parallel arrangement of MTJs. Furthermore Fig. 15(B) shows the minimalistic effect of  $t_{sl}$  on SM, as  $t_{sl}$  is not directly related to Rp (Eq.8), indicating the dominant nature of  $t_{ox}$ compared to  $t_{sl}$ .

It was observed during MC simulations (Fig.15(A),(B)), that RER increases with increasing  $t_{ox}$  despite increasing

SM. This issue can be traced back to the failure to write the desired values into MTJs for proper logic computation, caused by increasing tox value and, consequently rise in Rp (Eq.8) as shown in Fig.15(C). Therefore, as the second step, to analyze the factors affecting write performance, Figs.15 (C),(D) are plotted, where Fig.15(C) suggests a decline in current across MTJ ( $I_{MTJ}$ ) for an increase in  $t_{ox}$ , leading to the reduced probability of deterministic switching of MTJ state as given by the stochastic switching equation in [69]. When  $V_{MTJ}$  is less than  $I_{co}$  (38.2 $\mu$ A) or in its proximity, then a sufficiently large switching current pulse (I<sub>pulse</sub>) is required to completely switch the MTJ state. The length of I<sub>pulse</sub> is directly proportional to the Rp of the MTJ as highlighted by Fig. 15(D), where  $I_{pulse}$  length for different values of  $I_{MTJ}$ , the voltage applied across MTJ ( $V_{MTJ}$ ) is plotted against  $t_{ox}$ . Increasing  $V_{MTJ}$  increases  $I_{MTJ}$  speeding the switching process, thus the inverse relation between  $V_{MTJ}$ and  $I_{pulse}$  in Fig.15(D). In Fig.15(D), for  $V_{MTJ}$ =500mV, beyond tox=0.925nm, irrespective of Ipulse length switching does not occur because of the rise in Rp value to an extent where  $I_{MTJ} < I_{co}$  ( $I_{MTJ}$  37.6 $\mu$ A) given by Fig.15(C), which is eliminated for higher V<sub>MTJ</sub> values. Another interesting trend in Fig.15(D), is that for  $V_{MTJ}$ =750mV and particularly for  $V_{MTJ}=1V$ ,  $I_{pulse}$  increases like a step function. This is due to different ranges of  $I_{MTJ}$  with different threshold values, much like different electron energy bands in the subatomic scale. For a given range of  $I_{MTJ}$ , where  $I_{MTJ}$  is sufficiently higher than  $I_{co}$ , adequate STT current is produced enough to keep I<sub>pulse</sub> length constant despite varying Rp.

Therefore, it can be concluded that  $I_{MTJ} > I_{co}$  and sufficient I<sub>pulse</sub> are the two crucial conditions for proper write operation for the STT mechanism, which in turn determines the success of logic evaluation/read operation. Consequently, it was observed that WER for  $S_{ckt}$  is less compared to  $C_{ckt}$  due to increased  $I_{MTJ}$  (wider access transistors) and bit error rate (BER)  $\uparrow$ , when WER  $\uparrow$ . Also, it was inferred that the read operation is more prone to process variations in CMOS parameters as sense amplifier the primary component of the read operation is composed of only CMOS, while the write operation greatly depends on MTJ resistance variations. However, these failures in write and read operations are inevitable due to invariable process variations during fabrication at the device level. Furthermore, they are aggravated with device scaling to deep submicron technology nodes, where the restricted supply voltage causes reduced write current and switching probability. Thus, error correction schemes such as dynamic current/charge boosting technique, increasing I<sub>MTJ</sub> according to process variations [70], adaptive write scheme where  $I_{pulse}$  length is modified as per deviation in MTJ parameters [71], and adaptive read schemes performing dynamic reference resistance changes for sense amplifier [72] at the circuit level are mandatory to tackle these issues. In addition, other methods at the circuit level like an amplification of  $|V_X - V_Y|$  using cascaded inverters (I1, I2 in Figs.4,5) to improve SM and increasing



FIGURE 16. 1-bit double pass transistor clocked CMOS ALU based on DPTLCMOS full adder from [63].

the widths of access transistors (Fig.4,5) and write circuit (Fig.5(C)) are adopted to improve the reliability of the circuit at the cost of area and power.

## V. CONCLUSION AND SCOPE FOR FUTURE WORK

In this article, as an attempt to design a magnetic processor, a novel ultra-low power magnetic arithmetic logic unit (NVRMALU) has been presented, whose advantages over its counterparts include full nonvolatility, dynamic reconfigurability, and low power, making it ideal for normally OFF and instant ON applications. The extension of NVRMALU for multi-bit computations has been demonstrated for all but comparator functionality, which in addition to the inclusion of bit-wise functionalities such as logical and arithmetic shift operations for a complete magnetic ALU, constitute a scope for future work. Also, from variability analysis, it can be inferred that naturally the resistance of MTJ, a memristor, is the most crucial parameter. The precise control of parameters like tox controlling MTJ resistance during manufacturing, using advanced fabrication tools and techniques [73], and error correction schemes at the circuit level are mandatory for robust performance. Furthermore, it is to be noted that spin devices such as MTJ are yet to evolve to a matured and sophisticated stage as achieved by CMOS technology, especially in terms of fabrication and reliability. Particularly, the fabrication of MTJs with different TMR and characteristics by employing special techniques such as localized rapid thermal annealing processes with different annealing temperatures, and annealing duration, along with different MgO crystal oxidation conditions are to be explored [74], [75].

# APPENDIX

# **DOUBLE PASS TRANSISTOR CLOCKED CMOS ALU CIRCUIT** Kindly refer to Fig.16

VOLUME 11, 2023

### REFERENCES

- [1] Z. Guo, J. Yin, Y. Bai, D. Zhu, K. Shi, G. Wang, K. Cao, and W. Zhao, "Spintronics for energy-efficient computing: An overview and outlook," *Proc. IEEE*, vol. 109, no. 8, pp. 1398–1417, Aug. 2021, doi: 10.1109/JPROC.2021.3084997.
- [2] Semiconductor. Accessed: Jul. 21, 2023. [Online]. Available: https://magazine.semiconductordigest.com/html5/reader/production/default .aspx?pubname=&edid=3b6652fd-175e-4694-9693-913fd1f8a6ac
- [3] W. Zhao and G. Prenat, Spintronics-Based Computing. Berlin, Germany: Springer, 2015.
- [4] V. K. Joshi, "Spintronics: A contemporary review of emerging electronics devices," *Eng. Sci. Technol., Int. J.*, vol. 19, no. 3, pp. 1503–1513, Sep. 2016, doi: 10.1016/j.jestch.2016.05.002.
- [5] W. Zhao, E. Belhaire, and C. Chappert, "Spin-MTJ based non-volatile flipflop," in *Proc. 7th IEEE Conf. Nanotechnol. (IEEE NANO)*, Hong Kong, Aug. 2007, pp. 399–402, doi: 10.1109/NANO.2007.4601218.
- [6] P. Barla, D. Shet, V. K. Joshi, and S. Bhat, "Design and analysis of LIM hybrid MTJ/CMOS logic gates," in *Proc. 5th Int. Conf. Devices, Circuits Syst. (ICDCS)*, Coimbatore, India, Mar. 2020, pp. 41–45, doi: 10.1109/ICDCS48716.2020.243544.
- [7] E. Deng, G. Prenat, L. Anghel, and W. Zhao, "Non-volatile magnetic decoder based on MTJs," *Electron. Lett.*, vol. 52, no. 21, pp. 1774–1776, Oct. 2016, doi: 10.1049/el.2016.2450.
- [8] P. Barla, V. K. Joshi, and S. Bhat, "A novel low power and reduced transistor count magnetic arithmetic logic unit using hybrid STT-MTJ/CMOS circuit," *IEEE Access*, vol. 8, pp. 6876–6889, 2020, doi: 10.1109/ACCESS.2019.2963727.
- [9] P. Shukla, P. Kumar, and P. K. Misra, "An energy efficient, mismatch tolerant offset compensating hybrid MTJ/CMOS magnetic full adder," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 69, no. 11, pp. 4548–4552, Nov. 2022, doi: 10.1109/TCSII.2022.3190455.
- [10] E. Deng, Y. Zhang, J.-O. Klein, D. Ravelsona, C. Chappert, and W. Zhao, "Low power magnetic full-adder based on spin transfer torque MRAM," *IEEE Trans. Magn.*, vol. 49, no. 9, pp. 4982–4987, Sep. 2013, doi: 10.1109/TMAG.2013.2245911.
- [11] H. Thapliyal, F. Sharifi, and S. D. Kumar, "Energy-efficient design of hybrid MTJ/CMOS and MTJ/Nanoelectronics circuits," *IEEE Trans. Magn.*, vol. 54, no. 7, pp. 1–8, Jul. 2018, doi: 10.1109/TMAG.2018.2833431.
- [12] E. Deng, Y. Wang, Z. Wang, J.-O. Klein, B. Dieny, G. Prenat, and W. Zhao, "Robust magnetic full-adder with voltage sensing 2T/2MTJ cell," in *Proc. IEEE/ACM Int. Symp. Nanosc. Architectures (NANOARCH)*, Boston, MA, USA, Jul. 2015, pp. 27–32, doi: 10.1109/NANOARCH.2015.7180582.
- [13] W. Guo, G. Prenat, and B. Dieny, "A novel architecture of non-volatile magnetic arithmetic logic unit using magnetic tunnel junctions," J. Phys. D, Appl. Phys., vol. 47, no. 16, Apr. 2014, Art. no. 165001, doi: 10.1088/0022-3727/47/16/165001.

- [14] B. Lokesh and M. Malathi, "Full adder based reconfigurable spintronic ALU using STT-MTJ," in *Proc. Annu. IEEE India Conf.* (*INDICON*), Mumbai, India, Dec. 2013, pp. 1–5, doi: 10.1109/IND-CON.2013.6726101.
- [15] H. Meng, J. Wang, and J.-P. Wang, "A spintronics full adder for magnetic CPU," *IEEE Electron Device Lett.*, vol. 26, no. 6, pp. 360–362, Jun. 2005, doi: 10.1109/LED.2005.848129.
- [16] Q. An, S. Le Beux, I. O'Connor, J. O. Klein, and W. Zhao, "Arithmetic logic unit based on all-spin logic devices," in *Proc. 15th IEEE Int. New Circuits Syst. Conf. (NEWCAS)*, Strasbourg, France, Jun. 2017, pp. 317–320, doi: 10.1109/NEWCAS.2017.8010169.
- [17] S. Senni, F. Ouattara, J. Modad, K. Sevin, G. Patrigeon, P. Benoit, P. Nouet, L. Torres, F. Duhem, G. D. Pendina, and G. Prenat, "From spintronic devices to hybrid CMOS/magnetic system on chip," in *Proc. IFIP/IEEE Int. Conf. Very Large Scale Integr. (VLSI-SoC)*, Verona, Italy, Oct. 2018, pp. 188–191, doi: 10.1109/VLSI-SoC.2018.8644875.
- [18] J. Kim, A. Paul, P. A. Crowell, S. J. Koester, S. S. Sapatnekar, J.-P. Wang, and C. H. Kim, "Spin-based computing: Device concepts, current status, and a case study on a high-performance microprocessor," *Proc. IEEE*, vol. 103, no. 1, pp. 106–130, Jan. 2015, doi: 10.1109/JPROC.2014.2361767.
- [19] X. Guo, E. Ipek, and T. Soyata, "Resistive computation: Avoiding the power wall with low-leakage, STT-MRAM based computing," in *Proc.* 37th Annu. Int. Symp. Comput. Archit., Jun. 2010, pp. 371–382, doi: 10.1145/1815961.1816012.
- [20] R. Perricone, I. Ahmed, Z. Liang, M. G. Mankalale, X. S. Hu, C. H. Kim, M. Niemier, S. S. Sapatnekar, and J.-P. Wang, "Advanced spintronic memory and logic for non-volatile processors," in *Proc. Design*, *Autom. Test Eur. Conf. Exhib. (DATE)*, Mar. 2017, pp. 972–977, doi: 10.23919/DATE.2017.7927132.
- [21] M. Sharad, C. Augustine, and K. Roy, "Boolean and non-Boolean computation with spin devices," in *IEDM Tech. Dig.*, San Francisco, CA, USA, Dec. 2012, p. 11, doi: 10.1109/IEDM.2012.6479026.
- [22] E. Deng, "Design and development of low-power and reliable logic circuits based on spin-transfer torque magnetic tunnel junctions," Ph.D. dissertation, Comput. Sci. Microelectron. Tech. Lab. Archit. Integr. Syst. Electron., Electrotechnics, Automat., Signal Process. Doctoral School, Université Grenoble Alpes, Saint-Martin-d'Hères, France, 2017. [Online]. Available: http://dx.doi.org/theses.hal.science/tel-.01643939
- [23] J. Reuben and S. Pechmann, "Accelerated addition in resistive RAM array using parallel-friendly majority gates," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 29, no. 6, pp. 1108–1121, Jun. 2021, doi: 10.1109/TVLSI.2021.3068470.
- [24] B. Parhami, D. Abedi, and G. Jaberipur, "Majority-logic, its applications, and atomic-scale embodiments," *Comput. Electr. Eng.*, vol. 83, May 2020, Art. no. 106562, doi: 10.1016/j.compeleceng.2020.106562.
- [25] G. Jaberipur, B. Parhami, and D. Abedi, "Adapting computer arithmetic structures to sustainable supercomputing in low-power, majority-logic nanotechnologies," *IEEE Trans. Sustain. Comput.*, vol. 3, no. 4, pp. 262–273, Oct. 2018, doi: 10.1109/TSUSC.2018.2811181.
- [26] Q. An, L. Su, J.-O. Klein, S. Le Beux, I. O'Connor, and W. Zhao, "Fulladder circuit design based on all-spin logic device," in *Proc. IEEE/ACM Int. Symp. Nanosc. Architectures (NANOARCH)*, Boston, MA, USA, Jul. 2015, pp. 163–168, doi: 10.1109/NANOARCH.2015.7180606.
- [27] S. Breitkreutz, J. Kiermaier, I. Eichwald, C. Hildbrand, G. Csaba, D. Schmitt-Landsiedel, and M. Becherer, "Experimental demonstration of a 1-bit full adder in perpendicular nanomagnetic logic," *IEEE Trans. Magn.*, vol. 49, no. 7, pp. 4464–4467, Jul. 2013, doi: 10.1109/TMAG.2013.2243704.
- [28] C. Lageweg, S. Cotofana, and S. Vassiliadis, "Binary addition based on single electron tunneling devices," in *Proc. 4th IEEE Conf. Nanotechnol.*, Munich, Germany, Jul. 2004, pp. 327–330, doi: 10.1109/NANO.2004.1392340.
- [29] S. Yuasa and D. D. Djayaprawira, "Giant tunnel magnetoresistance in magnetic tunnel junctions with a crystalline MgO(0 0 1) barrier," *J. Phys. D*, *Appl. Phys.*, vol. 40, no. 21, pp. R337–R354, 2007, doi: 10.1088/0022-3727/40/21/R01.
- [30] V. K. Joshi, P. Barla, S. Bhat, and B. K. Kaushik, "From MTJ device to hybrid CMOS/MTJ circuits: A review," *IEEE Access*, vol. 8, pp. 194105–194146, 2020, doi: 10.1109/ACCESS.2020.3033023.
- [31] S. A. Wolf, D. D. Awschalom, R. A. Buhrman, J. M. Daughton, S. von Molnár, M. L. Roukes, A. Y. Chtchelkanova, and D. M. Treger, "Spintronics: A spin-based electronics vision for the future," *Science*, vol. 294, no. 5546, pp. 1488–1495, Nov. 2001, doi: 10.1126/science.1065389.

- [32] I. L. Prejbeanu, M. Kerekes, R. C. Sousa, H. Sibuet, O. Redon, B. Dieny, and J. P. Nozières, "Thermally assisted MRAM," *J. Phys., Condens. Matter*, vol. 19, no. 16, Apr. 2007, Art. no. 165218, doi: 10.1088/0953-8984/19/16/165218.
- [33] Q. Shao, P. Li, L. Liu, H. Yang, S. Fukami, A. Razavi, H. Wu, K. Wang, F. Freimuth, Y. Mokrousov, M. D. Stiles, S. Emori, A. Hoffmann, J. Åkerman, K. Roy, J.-P. Wang, S.-H. Yang, K. Garello, and W. Zhang, "Roadmap of spin–orbit torques," *IEEE Trans. Magn.*, vol. 57, no. 7, pp. 1–39, Jul. 2021, doi: 10.1109/TMAG.2021.3078583.
- [34] S. Alla, V. K. Joshi, and S. Bhat, "Field-free switching of VG-SOTpMTJ device through the interplay of SOT, exchange bias, and VCMA effects," *J. Appl. Phys.*, vol. 134, no. 1, Jul. 2023, Art. no. 013901, doi: 10.1063/5.0156241.
- [35] J. Talafy, F. Zokaee, H. R. Zarandi, and N. Bagherzadeh, "A high performance, multi-bit output logic-in-memory adder," *IEEE Trans. Emerg. Topics Comput.*, vol. 9, no. 4, pp. 2223–2233, Oct. 2021, doi: 10.1109/TETC.2020.2982951.
- [36] Y. Zhang, "Compact modeling and hybrid circuit design for spintronic devices based on current-induced switching," Ph.D. dissertation, Inst. Fundam. Electron. (IEF), Univ. Paris-Sud/CNRS UMR 8622, Bures-sur-Yvette, France, 2014. [Online]. Available: https://hal.science/ tel-01058504/
- [37] H.-S.-P. Wong and S. Salahuddin, "Memory leads the way to better computing," *Nature Nanotechnol.*, vol. 10, no. 3, pp. 191–194, Mar. 2015, doi: 10.1038/nnano.2015.29.
- [38] S. Jain, A. Ranjan, K. Roy, and A. Raghunathan, "Computing in memory with spin-transfer torque magnetic RAM," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 26, no. 3, pp. 470–483, Mar. 2018, doi: 10.1109/TVLSI.2017.2776954.
- [39] M. Zabihi, Z. I. Chowdhury, Z. Zhao, U. R. Karpuzcu, J.-P. Wang, and S. S. Sapatnekar, "In-memory processing on the spintronic CRAM: From hardware design to application mapping," *IEEE Trans. Comput.*, vol. 68, no. 8, pp. 1159–1173, Aug. 2019, doi: 10.1109/TC.2018.2858251.
- [40] G. Santoro, G. Turvani, and M. Graziano, "New logic-in-memory paradigms: An architectural and technological perspective," *Micromachines*, vol. 10, no. 6, p. 368, May 2019, doi: 10.3390/mi10060368.
- [41] S. K. Kingra, V. Parmar, C.-C. Chang, B. Hudec, T.-H. Hou, and M. Suri, "SLIM: Simultaneous logic-in-memory computing exploiting bilayer analog OxRAM devices," *Sci. Rep.*, vol. 10, no. 1, Feb. 2020, doi: 10.1038/s41598-020-59121-0.
- [42] J. Kim, Y. Song, K. Cho, H. Lee, H. Yoon, and E.-Y. Chung, "STT-MRAMbased multicontext FPGA for multithreading computing environment," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 41, no. 5, pp. 1330–1343, May 2022, doi: 10.1109/TCAD.2021.3091440.
- [43] R. Kumar, D. Divyanshu, D. Khan, S. Amara, and Y. Massoud, "Polymorphic hybrid CMOS-MTJ logic gates for hardware security applications," *Electronics*, vol. 12, no. 4, p. 902, Feb. 2023, doi: 10.3390/electronics12040902.
- [44] W. Kang, E. Deng, J.-O. Klein, Y. Zhang, Y. Zhang, C. Chappert, D. Ravelosona, and W. Zhao, "Separated precharge sensing amplifier for deep submicrometer MTJ/CMOS hybrid logic circuits," *IEEE Trans. Magn.*, vol. 50, no. 6, pp. 1–5, Jun. 2014, doi: 10.1109/TMAG. 2013.2297393.
- [45] M. Raouf and S. Timarchi, "Non-volatile and high-performance cascadable spintronic full-adder with no sensitivity to input scheduling," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 70, no. 6, pp. 2236–2240, Jun. 2023, doi: 10.1109/TCSII.2023.3237855.
- [46] E. Deng, Y. Zhang, W. Kang, B. Dieny, J.-O. Klein, G. Prenat, and W. Zhao, "Synchronous 8-bit non-volatile full-adder based on spin transfer torque magnetic tunnel junction," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 7, pp. 1757–1765, Jul. 2015, doi: 10.1109/TCSI.2015.2423751.
- [47] P. Barla, V. K. Joshi, and S. Bhat, "Fully nonvolatile hybrid full adder based on SHE+STT-MTJ/CMOS LIM architecture," *IEEE Trans. Magn.*, vol. 58, no. 9, pp. 1–11, Sep. 2022, doi: 10.1109/TMAG.2022.3187605.
- [48] A. Amirany and R. Rajaei, "Fully nonvolatile and low power full adder based on spin transfer torque magnetic tunnel junction with spin-Hall effect assistance," *IEEE Trans. Magn.*, vol. 54, no. 12, pp. 1–7, Dec. 2018, doi: 10.1109/TMAG.2018.2869811.
- [49] X. Jin, W. Chen, X. Li, N. Yin, C. Wan, M. Zhao, X. Han, and Z. Yu, "High-reliability, reconfigurable, and fully non-volatile full-adder based on SOT-MTJ for image processing applications," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 70, no. 2, pp. 781–785, Feb. 2023, doi: 10.1109/TCSII.2022.3213747.

- [50] A. Amirany, G. Epperson, A. Patooghy, and R. Rajaei, "Accuracy-adaptive spintronic adder for image processing applications," *IEEE Trans. Magn.*, vol. 57, no. 6, pp. 1–10, Jun. 2021, doi: 10.1109/TMAG.2021.3069161.
- [51] Z. He, S. Angizi, and D. Fan, "Exploring STT-MRAM based in-memory computing paradigm with application of image edge extraction," in *Proc. IEEE Int. Conf. Comput. Design (ICCD)*, Boston, MA, USA, Nov. 2017, pp. 439–446, doi: 10.1109/ICCD.2017.78.
- [52] L. Sekanina, "Principles and applications of polymorphic circuits," in *Evolvable Hardware*. Berlin, Germany: Springer, 2015, pp. 209–224, doi: 10.1007/978-3-662-44616-4\_8.
- [53] S. Angizi, H. Jiang, R. F. DeMara, J. Han, and D. Fan, "Majoritybased spin-CMOS primitives for approximate computing," *IEEE Trans. Nanotechnol.*, vol. 17, no. 4, pp. 795–806, Jul. 2018, doi: 10.1109/TNANO.2018.2836918.
- [54] H. Ghanatian, H. Farkhani, Y. Rezaeiyan, T. Böhnert, R. Ferreira, and F. Moradi, "A 3-bit flash spin-orbit torque (SOT)-analog-to-digital converter (ADC)," *IEEE Trans. Electron Devices*, vol. 69, no. 4, pp. 1691–1697, Apr. 2022, doi: 10.1109/TED.2022.3142649.
- [55] A. Amirany, M. H. Moaiyeri, and K. Jafari, "MTMR-SNQM: Multi-tunnel magnetoresistance spintronic non-volatile quaternary memory," in *Proc. IEEE 51st Int. Symp. Multiple-Valued Logic (ISMVL)*, Nur-sultan, Kazakhstan, May 2021, pp. 172–177, doi: 10.1109/ISMVL51352.2021.00037.
- [56] I. Alibeigi, A. Amirany, R. Rajaei, M. Tabandeh, and S. B. Shouraki, "A low-cost highly reliable spintronic true random number generator circuit for secure cryptography," *SPIN*, vol. 10, no. 1, Mar. 2020, Art. no. 2050003, doi: 10.1142/s2010324720500034.
- [57] Y. Zhang, B. Yan, W. Kang, Y. Cheng, J.-O. Klein, Y. Zhang, Y. Chen, and W. Zhao, "Compact model of subvolume MTJ and its design application at nanoscale technology nodes," *IEEE Trans. Electron Devices*, vol. 62, no. 6, pp. 2048–2055, Jun. 2015, doi: 10.1109/TED.2015.2414721.
- [58] S. Ikeda, J. Hayakawa, Y. Ashizawa, Y. M. Lee, K. Miura, H. Hasegawa, M. Tsunoda, F. Matsukura, and H. Ohno, "Tunnel magnetoresistance of 604% at 300K by suppression of Ta diffusion in CoFeB/MgO/CoFeB pseudo-spin-valves annealed at high temperature," *Appl. Phys. Lett.*, vol. 93, no. 8, 2008, Art. no. 082508, doi: 10.1063/1.2976435.
- [59] M. Natsui and T. Hanyu, "Fabrication of a MTJ-based multilevel resistor towards process-variaton-resilient logic LSI," in *Proc. IEEE 12th Int. New Circuits Syst. Conf. (NEWCAS)*, Trois-Rivieres, QC, Canada, Jun. 2014, p. 468, doi: 10.1109/NEWCAS.2014.6934084.
- [60] Y. Zheng, "Magnetic sensor array with different RA TMR film," U.S. Patent Appl. 16 730 730, Mar. 7, 2023.
- [61] P. Tyagi and R. Pandey, "High-speed and area-efficient scalable Nbit digital comparator," *IET Circuits, Devices Syst.*, vol. 14, no. 4, pp. 450–458, Jul. 2020, doi: 10.1049/iet-cds.2018.5562.
- [62] N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed. Upper Saddle River, NJ, USA: Pearson, 2010.
- [63] Manual of Design Kit for CMOS 65 nm, STMicroelectron, Geneva, Switzerland, 2009.
- [64] Y. Gang, W. Zhao, J.-O. Klein, C. Chappert, and P. Mazoyer, "A highreliability, low-power magnetic full adder," *IEEE Trans. Magn.*, vol. 47, no. 11, pp. 4611–4616, Nov. 2011, doi: 10.1109/TMAG.2011.2150238.
- [65] T. Hanyu, T. Endoh, D. Suzuki, H. Koike, Y. Ma, N. Onizawa, M. Natsui, S. Ikeda, and H. Ohno, "Standby-power-free integrated circuits using MTJ-based VLSI computing," *Proc. IEEE*, vol. 104, no. 10, pp. 1844–1863, Oct. 2016, doi: 10.1109/JPROC.2016.2574939.
- [66] Manual of STT-PMA MTJ Model. Accessed: Oct. 16, 2021. [Online]. Available: http://www.spinlib.com/STT\_PMA\_MTJ.html
- [67] S. M. Nair, "Variation analysis, fault modeling and yield improvement of emerging spintronic memories," Ph.D. dissertation, TextitKIT-Bibliothek, KIT Fac. Comput. Sci., Karlsruhe Inst. Technol. (KIT), Germany, 2020. [Online]. Available: http://dx.doi.org/publikationen. bibliothek.kit.edu/1000119696
- [68] J. Song, H. Dixit, B. Behin-Aein, C. H. Kim, and W. Taylor, "Impact of process variability on write error rate and read disturbance in STT-MRAM devices," *IEEE Trans. Magn.*, vol. 56, no. 12, pp. 1–11, Dec. 2020, doi: 10.1109/TMAG.2020.3028045.

- [69] S. Jape, V. K. Joshi, and P. Barla, "Design of a novel non-volatile hybrid spintronic true random number generator," *Int. J. Circuit Theory Appl.*, vol. 50, no. 5, pp. 1487–1501, May 2022, doi: 10.1002/cta.3243.
- [70] S. Motaman, S. Ghosh, and N. Rathi, "Impact of process-variations in STTRAM and adaptive boosting for robustness," in *Proc. Design*, *Autom. Test Eur. Conf. Exhib. (DATE)*, Grenoble, France, Mar. 2015, pp. 1431–1436, doi: 10.7873/DATE.2015.1018.
- [71] S. Wang, H. Lee, C. Grezes, P. K. Amiri, K. L. Wang, and P. Gupta, "Adaptive MRAM write and read with MTJ variation monitor," *IEEE Trans. Emerg. Topics Comput.*, vol. 9, no. 1, pp. 402–413, Jan. 2021, doi: 10.1109/TETC.2018.2866289.
- [72] W. Kang, T. Pang, W. Lv, and W. Zhao, "Dynamic dual-reference sensing scheme for deep submicrometer STT-MRAM," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 1, pp. 122–132, Jan. 2017, doi: 10.1109/TCSI.2016.2606438.
- [73] W. Zhao, X. Zhao, B. Zhang, K. Cao, L. Wang, W. Kang, Q. Shi, M. Wang, Y. Zhang, Y. Wang, S. Peng, J.-O. Klein, L. de Barros Naviner, and D. Ravelosona, "Failure analysis in magnetic tunnel junction nanopillar with interfacial perpendicular magnetic anisotropy," *Materials*, vol. 9, no. 1, p. 41, Jan. 2016, doi: 10.3390/ma9010041.
- [74] W.-G. Wang, S. Hageman, M. Li, S. Huang, X. Kou, X. Fan, J. Q. Xiao, and C. L. Chien, "Rapid thermal annealing study of magnetoresistance and perpendicular anisotropy in magnetic tunnel junctions based on MgO and CoFeB," *Appl. Phys. Lett.*, vol. 99, no. 10, Sep. 2011, Art. no. 102502, doi: 10.1063/1.3634026.
- [75] J. Chatterjee, "Engineering of magnetic tunnel junction stacks for improved STT-MRAM performance and development of novel and costeffective nano-patterning techniques," Doctoral dissertation, Université Grenoble Alpes, Saint-Martin-d'Hères, France, 2018. [Online]. Available: https://theses.hal.science/tel-02373919



**SREEVATSAN RANGAPRASAD** is currently pursuing the Bachelor of Technology degree (Hons.) with the Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India. His research interests include VLSI design, spintronics, and nanomagnetism. He is also working on fully non-volatile hybrid MTJ/CMOS logic circuits based on logicin memory structure under the guidance of Prof. Vinod Kumar Joshi.



VINOD KUMAR JOSHI (Senior Member, IEEE) received the M.Tech. degree from VIT University, Vellore, India, and the Ph.D. degree from Kumaun University, Nainital, India. He is currently an Additional Professor with the Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal, India. His research interests include spintronics-based VLSI and logic-in-memory-based hybrid non-volatile logic circuits for low-power applications.

• • •