# Offset-Canceling Current-Latched Sense Amplifier With Slow Rise Time Control and Reference Voltage Biasing Techniques

Bayartulga Ishdorj<sup>D</sup>, Doyeon Kim, Seongmin Ahn, and Taehui Na<sup>D</sup>, Member, IEEE

Abstract-The current-latched sense amplifier (CLSA) is a promising candidate for detecting stored values in a memory cell. With technology shrinks, however, the input referred offset voltage  $(V_{OS})$  in the SA increases, resulting in a degradation of the memory read yield. To obtain a high read yield,  $V_{OS}$ reduction and cancellation techniques have become essential in deep-submicrometer technology nodes. When determining the  $V_{OS}$  in the CLSA, the voltage mismatch of the input NMOS pair is the dominant factor ( $\sim$ 75%), followed by that of the latch NMOS pair (~25%). In this paper, 1) slow rise time  $(T_{RISE})$ control technique of SA enable signal and 2) reference voltage  $(V_{\text{REF}})$  biasing technique are proposed, and the effectiveness of the proposed techniques are analyzed for the conventional CLSA with footswitch (FS-CLSA) and offset-canceling CLSA (OC-CLSA). Post-layout based HSPICE simulation results using 28 nm model parameters show that the FS-CLSA with size-up strategy (OC-CLSA) achieves a 17.7% (10.5%) reduction of the standard deviation of  $V_{OS}$  ( $\sigma_{OS}$ ) when a slow  $T_{RISE}$  of 0.6 ns is employed. The measurement results from a 28 nm test chip show that the OC-CLSA with  $V_{\text{REF}}$  biasing achieves a 22% reduction of  $\sigma_{OS}$  compared to the conventional OC-CLSA.

Index Terms—Current-latched sense amplifier (CLSA), offsetcanceling CLSA (OC-CLSA), offset voltage, read yield, reference voltage ( $V_{\text{REF}}$ ) biasing, slow rise time ( $T_{\text{RISE}}$ ) of SA enable signal, threshold voltage ( $V_{\text{TH}}$ ) mismatch.

#### I. INTRODUCTION

WHEN designing a memory, the sense amplifier (SA) is an essential peripheral circuit because it senses the small differential input value and amplifies it to a digital one (1 or 0). This can significantly reduce the required power consumption in a read operation [1]. Because the latch type SA consists of a cross-coupled inverter structure, its positive feedback characteristic enables low power consumption and a high-speed read operation. Therefore, it is widely used in various applications [2], [3], [4]. There are two representative

Manuscript received 2 December 2022; revised 7 March 2023; accepted 2 April 2023. Date of publication 13 April 2023; date of current version 28 June 2023. This work was supported by the National Research Foundation (NRF) funded by the Korea Government (MSIT) under Grant 2022M3F3A2A01073562 and Grant 2022M3I7A2079267. This article was recommended by Associate Editor A. James. (*Bayartulga Ishdorj and Doyeon Kim are co-first authors.*) (*Corresponding author: Taehui Na.*)

The authors are with the Department of Electronics Engineering, Incheon National University, Incheon 22012, South Korea (e-mail: taehui.na@ inu.ac.kr).

Color versions of one or more figures in this article are available at https://doi.org/10.1109/TCSI.2023.3264693.

Digital Object Identifier 10.1109/TCSI.2023.3264693

Fig. 1. Two representative latch type sense amplifiers (SAs) [5]. (a) FSPA-VLSA. (b) FS-CLSA.

latch type SAs, namely, a voltage-latched SA with an NMOS footswitch and PMOS access transistors (FSPA-VLSA) and a current-latched SA with an NMOS footswitch (FS-CLSA), as shown in Fig. 1 [5]. The VLSA senses a small input voltage difference ( $\Delta V$ ) between the bit line voltage ( $V_{BL}$ ) and the bit line bar voltage ( $V_{BLB}$ ). The CLSA senses the current difference flowing through an additional differential input transistor pair (MN3 and MN4 in Fig. 1(b)). The VLSA has better performance in terms of area and speed than the CLSA [5].

However, when the global reference voltage ( $V_{REF}$ ) generator circuit, that shares all the  $V_{REF}$  (=  $V_{BLB}$ ) nodes, is used for power consumption saving [6], [7], the VLSA can be vulnerable to noise from the output nodes, unlike the CLSA. In other words, the noise from the OUTB node to the BLB node causes the global  $V_{REF}$  generator to be a temporary nonconstant voltage, because the VLSA's output nodes are directly connected to its input nodes by the access transistors (MP3 and MP4 in Fig. 1(a)). Thus, when the global  $V_{REF}$  generator is used, the CLSA, with separate input and output nodes, is better than the VLSA.

In CLSA, to successfully detect the stored values in a memory cell during the read operation, the following two conditions must be satisfied: 1)  $V_{BL}$  and  $V_{REF}$  must be greater than the threshold voltage ( $V_{TH}$ ) of MN4 and MN3, respectively. If  $V_{BL}$ and  $V_{REF}$  are smaller than  $V_{TH}$ , then the MN3/MN4 turns off, leading to sensing failure. This input voltage range is called the sensing dead zone of the SA [5]. 2) The voltage difference  $\Delta V$  (= | $V_{BL} - V_{REF}$ |) between  $V_{BL}$  and  $V_{REF}$  must be larger than the input referred offset voltage ( $V_{OS}$ ) of the SA.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/



 $V_{OS}$  is dominantly generated by the  $V_{TH}$  mismatch of transistor pairs [3], [8], [9], [10], which is induced by process variations, such as random dopant fluctuation [11], [12], [13]. Moreover, the read yield can be statistically expressed by these two factors ( $\Delta V$  and  $V_{OS}$ ) modeled by Gaussian distributions. The read yield, represented as the read-access pass yield for a single cell (*RAPY*<sub>CELL</sub>) [14] is expressed as

$$RAPY_{\text{CELL}} = \frac{\mu_{\Delta V} - \mu_{\text{OS}}}{\sqrt{\sigma_{\Delta V}^2 + \sigma_{\text{OS}}^2}} \tag{1}$$

where  $\mu_{\Delta V}$  ( $\mu_{OS}$ ) is the mean of  $\Delta V$  ( $V_{OS}$ ), and  $\sigma_{\Delta V}$  $(\sigma_{\rm OS})$  is the standard deviation of  $\Delta V$  (V<sub>OS</sub>). However, as the technology node scales down and the supply voltage  $(V_{\text{DD}})$  decreases, the process variation increases significantly, leading to a greater  $V_{\rm TH}$  mismatch of the transistor pair. The mismatch ends up having a more significant impact on  $V_{\rm OS}$ . If  $V_{\rm OS}$  is higher, a larger  $\Delta V$  is required for accurate sensing, which results in greater power consumption and a delay in correct sensing. Thus, to improve the read yield,  $V_{\rm OS}$  must be minimized. The most straightforward way to reduce  $V_{OS}$  is to increase the size of the transistors. Another straightforward method is to use a higher  $V_{DD}$  for a larger  $\Delta V$ . However, these two techniques are not desirable in deepsubmicrometer technology nodes, because of area overhead and increased power consumption. For this reason, VOS reduction and cancellation techniques have become essential in deep-submicrometer technology nodes.

Recently, numerous VLSA [15], [16], [17], [18], [19], [31], CLSA [20], [21], [22], [23], [24], [25], [26], [27], and hybrid latch type SA [28], [29] designs have been proposed to mitigate coupling effect [18], power consumption [26], [27], and V<sub>OS</sub> problem [15], [16], [17], [20], [21], [22], [23], [24], [25], [28], [29], [31]. Among the  $V_{OS}$  related previous works, a few of them suggested utilizing an external circuit after fabrication for VOS calibration to minimize the  $V_{OS}$  [23], [24] and most of them proposed internal circuit design modifications to minimize the  $V_{\rm OS}$  [15], [16], [17], [20], [21], [22], [25], [28], [29], [31]. In particular, Singh et al. [19] reported a  $V_{OS}$  reduction technique by controlling the rise time  $(T_{RISE})$  of the SAE signal in VLSAs. However, the mechanism of  $V_{OS}$  reduction in VLSA, which uses differential signal injection to increase  $\Delta V$ , is completely different from that of  $V_{OS}$  reduction in CLSA. Na [20] proposed an offset-canceling CLSA (OC-CLSA) that cancels the  $V_{OS}$  caused by the input NMOS pair ( $V_{OS INPUT}$ ). However, because of the  $V_{OS}$  caused by the latch NMOS pair  $(V_{OS LATCH})$ , the effectiveness of the offset cancellation is limited. To the best of our knowledge, none of the previous works provide sensing dead zone elimination or uses rise time control technique to mitigate the  $V_{OS}$  of CLSA.

In this paper, we analyze the  $V_{OS}$  of conventional CLSA using I-V curve. And then, 1) we propose the slow  $T_{RISE}$ control technique of the SAE signal for CLSA [21], apply it to FS-CLSA and OC-CLSA, and compare  $\sigma_{OS}$ , area, sensing time, and energy of the two SAs using post-layout simulations. In addition, 2) we propose the  $V_{REF}$  biasing technique for



Fig. 2. Input referred offset voltage ( $V_{OS}$ ) of the FS-CLSA according to transistor pairs' mismatch levels, when  $V_{BL}$  is 0.8 V [21].

OC-CLSA and analyze the effectiveness of the proposed technique using the fabricated 28 nm test chip.

The remainder of this paper is organized as follows. Section II describes  $V_{OS}$  analysis and operation of the conventional CLSA and OC-CLSA. Section III introduces the proposed slow  $T_{RISE}$  control technique of the SAE signal for CLSA. Section IV introduces the proposed  $V_{REF}$  biasing technique for OC-CLSA. Section V presents the conclusions.

# II. V<sub>OS</sub> Analysis and Operation of Conventional CLSA and OC-CLSA

The CLSA consists of the input NMOS transistor pair (MN3/MN4), the latch NMOS transistor pair (MN1/MN2), the latch PMOS transistor pair (MP1/MP2), the precharge PMOS transistor pair (MP3/MP4), and the NMOS foot switch (MNFOOT), as shown in Fig. 1(b). The sensing operation of the CLSA is as follows: When the SAE signal is low (deactivated), MP3/MP4 is turned on. Then, the OUT and the OUTB nodes are precharged to  $V_{DD}$ , and the differential input voltages ( $V_{BL}$  and  $V_{REF}$ ) are captured.  $V_{BL}$  and  $V_{REF}$  are generated from the BL in a cell array (or sensing circuit), and the global voltage generator, respectively. When the SAE signal becomes high (activated), MP3/MP4 turns off and MNFOOT turns on. Then, as the sensing current begins to flow through MN1/MN2 and MN3/MN4, the voltages of the OUT/OUTB nodes ( $V_{\text{OUT}}$  and  $V_{\text{OUTB}}$ ) start to decrease from  $V_{\text{DD}}$ . The cross-coupled inverter structure (MP1/MN1 for one inverter and MP2/MN2 for the other) begins to compare a small output voltage difference (=  $|V_{OUT} - V_{OUTB}|$ ) caused by the current difference and amplifies it to a rail-to-rail digital value ( $V_{DD}$  or GND). Ideally, the CLSA is symmetric. However, because of the process variation, the sensing current is influenced by the transistor pair's  $V_{\text{TH}}$  mismatch, which leads to the generation of a  $V_{OS}$ . The influence of each transistor pair's  $V_{TH}$  mismatch on the  $V_{\rm OS}$  varies.

Fig. 2 shows the  $V_{OS}$  of the CLSA according to each transistor pair's  $V_{TH}$  mismatch level, when  $V_{BL}$  is 0.8 V [21]. The input NMOS pair's  $V_{TH}$  mismatch increases the  $V_{OS}$  by the  $V_{TH}$  mismatch because the mismatch can determine its drain current, and because it operates in the saturation region, meaning that the input NMOS's small-signal effective resistance ( $R_{INPUT}$ ) is relatively large. The input NMOS acts as a current source. The latch NMOS pair's  $V_{TH}$  mismatch has a smaller influence on the  $V_{OS}$  than that of the input NMOS pair because of the diode-connected configuration in



Fig. 3. The FS-CLSA circuit in its early sensing stage. (a) The realistic representation of the circuit. (b) An equivalent circuit with resistors. (c) I-V curves of input NMOS and latch NMOS when there is an input NMOS pair's  $V_{\rm TH}$  mismatch of 50 mV. (d) I-V curves of input NMOS and latch NMOS when there is a latch NMOS pair's  $V_{\rm TH}$  mismatch of 50 mV.

the early sensing period, meaning that the latch NMOS's small-signal effective resistance ( $R_{LATCH}$ ) is smaller than that of the input NMOS ( $R_{INPUT}$ ). In contrast, the latch PMOS has little influence on the  $V_{OS}$  because it does not operate in the early sensing period. The precharge PMOS pair does not affect the  $V_{OS}$  at all since it is completely turned off during the sensing operation. Therefore, VOS INPUT is the most dominant factor (~75%) in determining the overall  $V_{OS}$ , followed by  $V_{\rm OS LATCH}$  (~25%) because they are activated during the sensing operation and therefore affect the sensing current. Thus, both must be reduced to minimize the  $V_{OS}$ . In the early sensing stage, the CLSA circuit shown in Fig. 3(a) can be simply represented as an equivalent circuit with resistors ( $R_{LATCH}$  and  $R_{\text{INPUT}}$ ), as shown in Fig. 3(b). Figs. 3(c) and (d) show I-V curves of input NMOS (MN3 and MN4) and latch NMOS (MN1 and MN2) when there are input NMOS pair's  $V_{\text{TH}}$ mismatch of 50 mV and latch NMOS pair's V<sub>TH</sub> mismatch of 50 mV, respectively. As clearly shown in these figures, the sensing current difference  $(\Delta I = I_{D1} - I_{D2})$  is directly affected by the input NMOS pair's  $V_{\rm TH}$  mismatch ( $\Delta I$  = 5  $\mu$ A) because of the large  $R_{\text{INPUT}}$ , whereas  $\Delta I$  is only 1.2  $\mu$ A when the same V<sub>TH</sub> mismatch exists in the latch NMOS pair because of the small  $R_{LATCH}$ . Thus, the sensing current from OUT/OUTB to GND can be simply expressed as  $V_{\rm DD}/(R_{\rm INPUT} + R_{\rm LATCH})$ . Because  $R_{\rm INPUT}$  dominates the sensing current, the expression clearly describes the reason why the input NMOS pair's  $V_{\text{TH}}$  mismatch is dominant on  $V_{\text{OS}}$ .

In the CLSA,  $\sigma_{OS}$  is large because it is dominated by the standard deviation of  $V_{OS_INPUT}$  ( $\sigma_{OS_INPUT}$ ).  $\sigma_{OS}$  can be expressed as [5]

$$\sigma_{\rm OS} = \sqrt{\sigma_{\rm OS\_INPUT}^2 + \sigma_{\rm OS\_LATCH}^2}$$
(2)

where  $\sigma_{OS\_LATCH}$  is the standard deviation of  $V_{OS\_LATCH}$ . To minimize  $\sigma_{OS}$ , a reduction in  $\sigma_{OS\_INPUT}$  is essential.



Fig. 4. Schematic and timing diagrams of the OC-CLSA [20].

 $\sigma_{OS INPUT}$  can be reduced using an OC-CLSA, as shown in Fig. 4. The OC-CLSA has the advantage of offset cancellation characteristics caused by the mismatch of the input NMOS pair by using the diode-connected configuration. The operation of the OC-CLSA is as follows: Initially, the PRE signal is high (similar to the initial condition of the CLSA with SAE = low), and the IN (INB) node voltage,  $V_{\rm IN}$  ( $V_{\rm INB}$ ), is precharged to  $V_{DD}$ . In S1, the P1 signal is activated. Then, the input NMOS transistors operate as diode-connected transistors.  $V_{\rm IN}$  and  $V_{\rm INB}$  are gradually discharged through the MNFOOT and then become  $V_{\text{TH}}$  ( $V_{\text{TH}_{\text{IN}}}$  and  $V_{\text{TH}_{\text{INB}}}$ ). In S2, the P2 and P3 signals are activated.  $V_{\rm BL}$  and  $V_{\rm REF}$ are transferred to IN\_SC and INB\_VG, respectively. Then, by the capacitive coupling of  $C_{SA}$ s,  $V_{IN}$  becomes  $V_{BL} + V_{TH IN}$ and  $V_{\text{INB}}$  becomes  $V_{\text{REF}} + V_{\text{TH INB}}$ . Meanwhile, the OUT and OUTB nodes are precharged to  $V_{DD}$  for sensing. In S3, as the SAE signal is activated, the MNFOOT is turned on, and the sensing operation begins with the same operation as the CLSA. During the sensing stage (S3), the sensing currents flowing through the input NMOS pair are no longer influenced by the  $V_{\rm TH}$  mismatch variation. This is because  $V_{\rm IN}$  and  $V_{\rm INB}$  are  $V_{\rm BL} + V_{\rm TH_{IN}}$  and  $V_{\rm REF} + V_{\rm TH_{INB}}$ , respectively, and the drain current is determined by  $V_{GS} - V_{TH}$ . Thus, the OC-CLSA can effectively cancel  $\sigma_{OS}$  INPUT, and  $\sigma_{OS}$  can thus be remarkably reduced.

## III. PROPOSED SLOW T<sub>RISE</sub> CONTROL TECHNIQUE OF SAE SIGNAL FOR CLSA

As described in the previous section,  $V_{OS}$  is determined by the  $V_{TH}$  mismatch of the input and the latch NMOS pairs (~75% and ~25% respectively) when  $V_{BL}$  is 0.8 V. Note that if  $V_{BL}$  becomes higher, the saturation current of input NMOS becomes higher and the operation region moves from saturation to linear region, leading to the decrease in  $R_{INPUT}$ at the operating point ( $I_D$  increases and  $V_D$  decreases in Fig. 3(d)). Thus, the latch NMOS pair's  $V_{TH}$  mismatch has a greater effect on the  $V_{OS}$  than before. To minimize the  $V_{OS}$ , the effect of the latch NMOS pair's  $V_{TH}$  mismatch needs to be reduced as well. To this end, a slow  $T_{RISE}$  control technique for the SAE signal is proposed. In addition to the gate voltage ( $V_{BL}/V_{REF}$ ) of the input NMOS pair,  $T_{RISE}$  can also affect



Fig. 5.  $\sigma_{OS}$  according to  $T_{RISE}$  of SAE. (a) FS-CLSA when  $V_{BL}$  is 0.8 V. (b) OC-CLSA when  $V_{BL}$  is 0.55 V.

the operation region of the input NMOS pair. For the fast  $T_{\rm RISE}$ , the COMN node discharges quickly during the initial sensing period, resulting in the input NMOS pair operating on the boundary between the linear and the saturation regions. This means a decrease in  $R_{\text{INPUT}}$  in the same way as a higher  $V_{\rm BL}$ . In contrast, when using the slow  $T_{\rm RISE}$ , the MNFOOT is slowly turned on, which allows the COMN node voltage to drop slowly and maintain a high voltage at the beginning of the sensing operation. Thus, the saturation current of the input NMOS pair can be kept sufficiently low, resulting in the input NMOS pair operating in the saturation region. This means an increase in  $R_{\text{INPUT}}$ . Thus, the sensing current flowing from OUT/OUTB to COMN can be dominantly determined by  $R_{\text{INPUT}}$  and not  $R_{\text{LATCH}}$ . In other words, by employing the slower  $T_{\text{RISE}}$  control technique for the SAE signal, the impact of the latch NMOS pair's  $V_{\text{TH}}$  mismatch on  $\sigma_{\text{OS}}$  can be minimized, leading to a decrease in  $\sigma_{OS}$ .

Because the OC-CLSA can cancel  $\sigma_{OS\_INPUT}$  effectively, the  $\sigma_{OS}$  in the OC-CLSA is dominated by  $\sigma_{OS\_LATCH}$ , and the slow  $T_{RISE}$  of the SAE signal can be applied to the OC-CLSA to effectively mitigate the remaining  $\sigma_{OS\_LATCH}$ . Therefore, the OC-CLSA with the slow  $T_{RISE}$  control technique is suitable for minimizing  $\sigma_{OS}$ .

To verify the proposed slow  $T_{\text{RISE}}$  control technique of the SAE signal in the conventional CLSA (FS-CLSA) and OC-CLSA, Monte-Carlo HSPICE simulations were performed using industry-compatible 28-nm model parameters with 1.0 V as nominal  $V_{\text{DD}}$ . To fairly compare the effect of each transistor pair's  $V_{\text{TH}}$  mismatch on  $\sigma_{\text{OS}}$ , two pMOSCAPs of the OC-CLSA with a width of 2.0  $\mu$ m and a length of 0.05  $\mu$ m were used. All the other transistors being used had a width of 0.1  $\mu$ m and a length of 0.03  $\mu$ m.  $\Delta V$  was set to 20 mV to determine  $\sigma_{\text{OS}}$ . The pulse widths of the PRE signal ( $T_{\text{PRE}}$ ), P1 signal ( $T_{\text{P1}}$ ), and P2 signal ( $T_{\text{P2}}$ ) were set to 2 ns, 2 ns, and 0.1 ns, respectively. P3 signal rises with P2 signal. Note that in actual application, the PRE signal is initially high, and the same as the SAE signal of the FS-CLSA, which is initially low.

Fig. 5 shows the  $\sigma_{OS}$  of the FS-CLSA and OC-CLSA according to the  $T_{RISE}$  of the SAE. Generally, the  $T_{RISE}$  of an inverter is approximately 0.05 ns. The  $T_{RISE}$  can be controlled simply by an inverter with a capacitor size in the global signal generator. The simulations were performed by adjusting this capacitor size. As the  $T_{RISE}$  increases, the  $\sigma_{OS}$  tends to gradually reduce and saturates at approximately 0.6 ns in both SAs. For a minimum  $\sigma_{OS}$ , the  $T_{RISE}$  is selected as



Fig. 6.  $\sigma_{OS}$  of the FS-CLSA according to  $V_{BL}$  with/without  $T_{RISE}$  control.



Fig. 7.  $\sigma_{OS}$  of the FS-CLSA (red) and the OC-CLSA (blue) according to  $V_{BL}$  without  $T_{RISE}$  control technique. Yellow line shows  $\sigma_{OS}$  of the FS-CLSA (size-up,  $W_{input} = 4 \ \mu m$ ,  $W_{latch} = 4.6 \ \mu m$ ).

0.6 ns. The  $\sigma_{OS}$  of the FS-CLSA (OC-CLSA) is 53.55 mV (18.56 mV) at  $T_{RISE} = 0.05$  ns and the  $\sigma_{OS}$  is 51.05 mV (13.61 mV) at  $T_{RISE} = 0.6$  ns. Thus, by using the slower  $T_{RISE}$ , the  $\sigma_{OS}$  of the FS-CLSA (OC-CLSA) can be reduced by 4.7% (26.7%), owing to the reduction in  $\sigma_{OS\_LATCH}$ . The reason  $\sigma_{OS}$  of the FS-CLSA increases with  $T_{RISE}$  after 0.8 ns is due to a partially turned on MP3/MP4 during the sensing operation. This phenomenon can be easily eliminated by separating the gate signal between MNFOOT and MP3/MP4 like the OC-CLSA.

Fig. 6 shows the  $\sigma_{OS}$  of the FS-CLSA according to the input voltage ( $V_{BL}$ ) with and without the  $T_{RISE}$  control technique. When  $V_{BL}$  is in the sensing dead zone ( $V_{BL} < V_{TH}$ ), the input NMOS pair is not turned on and no sensing operation occurs. Fig. 6 clearly shows the efficacy of the  $T_{RISE}$  control technique. When the  $V_{BL}$  is in the 0.4-0.5 V range, the input NMOS pair already operates in the saturation region without the  $T_{\text{RISE}}$  control technique. Therefore, the effect of  $\sigma_{\text{OS LATCH}}$ on  $\sigma_{OS}$  is negligible. In this case, when applying the slow  $T_{\rm RISE}$  at  $V_{\rm BL}$  = 0.4 V, the  $\sigma_{\rm OS}$  decreases only 0.7% from 51.36 mV to 51.01 mV. In other words, as  $V_{\rm BL}$  decreases, the effect of the  $T_{\rm RISE}$  control technique on  $\sigma_{\rm OS}$  becomes insignificant. However, as  $V_{\rm BL}$  increases, the saturation current of the input NMOS pair increases, leading to the input NMOS pair operating in the linear region. Thus, as  $V_{\rm BL}$  increases,  $\sigma_{OS\_LATCH}$  increases. Therefore, the sensing current is more affected by the mismatch of the latch NMOS pair. When the  $T_{\rm RISE}$  control technique is applied at  $V_{\rm BL} = 0.7$  V, the  $\sigma_{\rm OS}$ decreases by 4.4% from 52.72 mV to 50.48 mV. In other words, the effect of the  $T_{\rm RISE}$  control technique on  $\sigma_{\rm OS}$ increases with increasing  $V_{\rm BL}$ .

Even though  $\sigma_{OS\_LATCH}$  can be reduced by employing the slow  $T_{RISE}$  control technique of the SAE signal,  $\sigma_{OS}$  in the CLSA is still large because it is dominated by  $\sigma_{OS\_INPUT}$ . To minimize  $\sigma_{OS}$ , the OC-CLSA with the  $T_{RISE}$  control



Fig. 8.  $\sigma_{OS}$  of the OC-CLSA according to  $V_{BL}$  with/without the  $T_{RISE}$  control.



Fig. 9. Average  $\sigma_{OS}$  of the OC-CLSA with/without the  $T_{RISE}$  control technique according to (a)  $L_{CSA}$  when  $W_{CSA} = 2.0 \ \mu \text{m}$  and (b)  $T_{P1}$ .



Fig. 10.  $\sigma_{\rm OS}$  of the FS-CLSA according to the width size of the input and the latch NMOS when  $V_{\rm BL}=0.8$  V.

technique is recommended. Fig. 7 shows the  $\sigma_{OS}$  of the FS-CLSA and OC-CLSA according to the  $V_{BL}$  without the  $T_{RISE}$  control technique. Fig. 7 clearly shows that the OC-CLSA (blue line) achieves an average  $\sigma_{OS}$  (from  $V_{BL} = 0 \text{ V to } V_{BL} = 0.65 \text{ V}$ ) of 11.92 mV (minimum  $\sigma_{OS} = 7.22 \text{ mV}$  at  $V_{BL} = 0.2 \text{ V}$ ; maximum  $\sigma_{OS} = 21.7 \text{ mV}$  at  $V_{BL} = 0.65 \text{ V}$ ), which is four times lower than that of the FS-CLSA, 53.23 mV (from  $V_{BL} = 0.35 \text{ V}$  to  $V_{BL} = 1 \text{ V}$ ). This result is because of the significant decrease in  $\sigma_{OS}$ \_INPUT by the OC-CLSA. Because of the decrease in  $R_{INPUT}$  with increasing  $V_{BL}$ , the OC-CLSA has a sensing dead zone of  $V_{BL} > 0.75 \text{ V}$ . The case of FS-CLSA with size-up (yellow line) will be explained later.

Fig. 8 shows the  $\sigma_{OS}$  of the OC-CLSA with/without the  $T_{RISE}$  control technique according to the  $V_{BL}$ . When applying the  $T_{RISE}$  control technique to the OC-CLSA, the  $\sigma_{OS}$  on average is reduced by 20.6% (0%, from 8.26 mV to 8.26 mV to 8.26 mV at  $V_{BL} = 0$  V; 35.64%, from 17.59 mV to 11.32 mV at  $V_{BL} = 0.5$  V). It is noted that in the OC-CLSA, the efficiency (20.6%) of the  $T_{RISE}$  control technique for the average  $\sigma_{OS}$  improves



Fig. 11. Transient responses of SAs. (a) FS-CLSA. (b) FS-CLSA (size-up). (c) OC-CLSA without slow  $T_{RISE}$ . (d) OC-CLSA with slow  $T_{RISE}$ .

by 5.15 times compared to the FS-CLSA's 4.0% (0.35 V to 1 V). This is because the  $\sigma_{OS}$  of the OC-CLSA is dominated by  $\sigma_{OS\_LATCH}$  owing to the cancellation of  $\sigma_{OS\_INPUT}$ , and the slow  $T_{RISE}$  control technique can effectively mitigate the remaining  $\sigma_{OS\_LATCH}$ .

Fig. 9 shows the average  $\sigma_{OS}$  of the OC-CLSA with/without the  $T_{RISE}$  control technique according to the length of  $C_{SA}$ ( $L_{CSA}$ ) when the width of  $C_{SA}$  ( $W_{CSA}$ ) = 2.0  $\mu$ m and  $T_{P1}$ . As the  $L_{CSA}$  increases, the effect of the capacitive coupling increases, owing to the capacitance difference between the parasitic capacitance of the input nodes (IN, INB) and  $C_{SA}$ . The  $L_{CSA}$  was selected as 0.05  $\mu$ m, considering area overhead. As  $T_{P1}$  increases,  $C_{SA}$  becomes more discharged, resulting in a better cancellation of  $\sigma_{OS_INPUT}$ . With considering the performance overhead,  $T_{P1}$  was set to 2.0 ns.

Fig. 10 shows the  $\sigma_{OS}$  of the FS-CLSA according to the width sizes of the input and the latch NMOS when  $V_{\rm BL}$  = 0.8 V. According to Pelgrom's research [30],  $\sigma_{OS}$  can be reduced by increasing the size of the input and the latch NMOS pairs. For a fair comparison of the FS-CLSA and the OC-CLSA in terms of area, the widths of the input NMOS and the latch NMOS pairs of the FS-CLSA were increased to reduce the average  $\sigma_{OS}$ . The total pre-layout area of the SA was estimated by the sum of each transistor's area (width  $\times$  length). To satisfy the average  $\sigma_{OS} = 11.92$  mV of the OC-CLSA without the  $T_{RISE}$  control technique, the FS-CLSA should increase the width of the input (latch) NMOS to 4  $\mu$ m (4.6  $\mu$ m). In this case, the total pre-layout area of the FS-CLSA (size-up) was estimated to be 0.531  $\mu$ m<sup>2</sup>, whereas the total area of the OC-CLSA was 0.272  $\mu$ m<sup>2</sup>. Note that the FS-CLSA (size-up) has  $\sigma_{OS}$  of 11.92 mV when  $V_{BL}$  = 0.8 V and average  $\sigma_{OS}$  (from  $V_{BL} = 0.35$  V to  $V_{BL} = 1$  V) of 17.08 mV. The yellow line in Fig. 7 shows the  $\sigma_{OS}$  of the FS-CLSA (size-up). Although the OC-CLSA generally uses an area 10.1 times larger than that of the FS-CLSA (0.027  $\mu$ m<sup>2</sup>) when the size of transistors in both circuits is minimum, it uses an area 1.95 times smaller than that of the FS-CLSA (sizeup). However, because these calculations are based only on transistor size, layout-based evaluations are required. It will be dealt with later.

Fig. 11 shows the pre-layout transient responses of the FS-CLSA, the FS-CLSA (size-up), the OC-CLSA without



Fig. 12. Layout when considering the same  $\sigma_{OS}$  based on pre-layout simulations. (a) FS-CLSA (size up). (b) OC-CLSA.



Fig. 13.  $\sigma_{OS}$  of OC-CLSA and FS-CLSA (size-up) with/without the  $T_{RISE}$  control technique according to  $V_{BL}$  (post-layout simulation results).

 $T_{\rm RISE}$  control technique, and the OC-CLSA with  $T_{\rm RISE}$  control technique. 1000 sets of Monte-Carlo HSPICE simulations were performed with  $\Delta V$  (=  $|V_{BL} - V_{REF}|$ ) = 50 mV. In the FS-CLSA, the average (worst-case) sensing time is 0.077 ns (0.128 ns). The FS-CLSA encounters many sensing failures when  $\Delta V = 50$  mV, since  $\sigma_{OS}$  of the FS-CLSA is approximately 50 mV, which corresponds to  $RAPY_{CELL}$  = 1 $\sigma$ . In contrast,  $\sigma_{OS}$  of the FS-CLSA (size-up) and OC-CLSA are approximately 10 mV, which corresponds to  $RAPY_{CELL} =$  $5\sigma$ . Thus, there is no sensing failure in these three cases. Compared to the FS-CLSA, in the FS-CLSA (size-up), the average and the worst-case sensing time increases to 0.437 ns and 0.6 ns, respectively, owing to the loading delay. Compared to the FS-CLSA (size-up), the OC-CLSA has 2 ns additional sensing time owing to the offset cancellation stages of S1 and S2. The  $T_{RISE}$  difference between the OC-CLSA with and without the  $T_{\rm RISE}$  control technique is 0.55 ns (= 0.6 ns - 0.05 ns). However, the average sensing time difference is 0.338 ns (= 2.809 ns - 2.471 ns) because of the  $\sigma_{OS}$  reduction.

As mentioned previously, layout-based estimations of delay, area overhead, and power consumption are required since the circuit complexity can make the difference between pre-layout-based result and post-layout-based result large. Figs. 12(a) and (b) shows the layout of FS-CLSA (size-up) and OC-CLSA, respectively. Although the pre-layout area of the OC-CLSA was found to be 1.95 times smaller than that of the FS-CLSA (size-up) when considering the same  $\sigma_{OS}$ , it clearly indicates that this is not the case in reality. Interconnects are the biggest contributors to area overhead. Post-layout area of OC-CLSA (size-up) (14  $\mu$ m<sup>2</sup>).

Fig. 13 shows the  $\sigma_{OS}$  of OC-CLSA and FS-CLSA (size-up) with and without  $T_{RISE}$  according to  $V_{BL}$ , based on post-layout

simulations. When the slow  $T_{\text{RISE}}$  control technique is applied, the average  $\sigma_{\text{OS}}$  of OC-CLSA decreases by 10.5% (15.4 mV to 13.78 mV), while the average  $\sigma_{\text{OS}}$  of FS-CLSA (size-up) decreases by 17.7% (23.9 mV to 19.68 mV).

Table I lists a performance summary and comparison of the conventional FS-CLSAs and OC-CLSAs. The comparative advantages of the proposed slow  $T_{\rm RISE}$  control technique are clearly demonstrated in the post-layout simulation results. The average  $\sigma_{OS}$  of the post-layout based FS-CLSA and OC-CLSA are greater than the pre-layout values because of the parasitic resistances and capacitances introduced by interconnects. The layout area of OC-CLSA (24.15  $\mu$ m<sup>2</sup>) is larger than that of FS-CLSAs (14.0  $\mu$ m<sup>2</sup>) for comparable  $\sigma_{OS}$  (similar minimum  $\sigma_{OS}$  in cases of pre- and post-layout analysis). However, the average energy of OC-CLSA with  $T_{RISE}$  control technique (5.77 fJ) is 31% lower than that of FS-CLSA (size-up) with  $T_{RISE}$  control technique (8.37 fJ). Compared to the FS-CLSA (size-up) with  $T_{RISE}$  control technique, the worst-case sensing time of the OC-CLSA with  $T_{RISE}$  control technique is 2.65 times longer because of the offset cancellation stage (S1, S2). Thus, for low power/energy applications with a moderate performance, the OC-CLSA with  $T_{RISE}$  control technique can be a reasonable choice. For high performance applications without considering energy consumption, the FS-CLSA with size-up strategy and slow  $T_{\rm RISE}$  control technique can be a good choice. The last column in Table I confirms the above analysis results for the case where the layout area is the same.

#### IV. PROPOSED VREF BIASING TECHNIQUE FOR OC-CLSA

When SA is used for memory (e.g., static random access memory), the input voltage difference  $\Delta V \ (= |V_{\rm BL} - V_{\rm REF}|)$ should be large enough with considering  $\sigma_{OS}$ . In general,  $\Delta V$ is designed to be greater than 200 mV. It means  $V_{\text{REF}}$  should be lower than  $V_{DD}$  by at least 200 mV so that  $V_{BL}$  at state 1  $(V_{BL1})$  is larger than  $V_{REF}$  by 200 mV and  $V_{BL}$  at state 0  $(V_{BL0})$  is smaller than  $V_{REF}$  by 200 mV. However, because of the cell leakage (or other non-idealities, such as aging, temperature variation, noise, etc.), V<sub>BL1</sub> cannot maintain its value to  $V_{DD}$  but decreases as time elapses. For this reason,  $V_{\rm BL}$  range should be greater than at least 500 mV (i.e.,  $V_{\rm DD}$  $-500 \text{ mV} \le V_{\text{BL}} \le V_{\text{DD}}$ ). Moreover, as  $V_{\text{DD}}$  reduces with technology node shrinkage, the range of  $V_{\rm BL}$  decreases with it. Furthermore, because non-volatile memories (e.g., MRAM) generate intermediate voltages between  $V_{DD}$  and GND, wide  $V_{\rm BL}$  range is required. Therefore, the operational range of  $V_{\rm BL}$ must be addressed in order for SA to operate effectively and adaptably in diverse  $V_{DD}$  regions and applications.

Both OC-CLSA and FS-CLSA designs have limitations on the  $V_{BL}$  range, as was noted in Section III. As shown in Figs. 7 and 13, the FS-CLSA cannot operate properly until  $V_{BL}$ exceeds the threshold voltage of the input NMOS transistors (e.g.,  $V_{BL} > 0.35$  V), and as  $V_{BL}$  increases, the FS-CLSA efficiency declines as well. Although the OC-CLSA was able to mitigate the sensing dead zone problem of the FS-CLSA to some extent, its effectiveness also decreases when the  $V_{BL}$ is raised. To solve the sensing dead zone problem and to improve efficiency of the OC-CLSA, we propose the  $V_{REF}$ biasing technique for the OC-CLSA.

 TABLE I

 Pre/Post-Layout Performance Summary and Comparison between Conventional FS-CLSAs and OC-CLSAs

 With/Without Slow T<sub>RISE</sub> Control Technique

|                                   | FS-CLSA<br>(pre-layout)<br>w/o T <sub>RISE</sub> | FS-CLSA<br>(size-up <sup>1)</sup> ,<br>pre-layout)<br>w/o T <sub>RISE</sub> | FS-CLSA<br>(size-up <sup>1)</sup> ,<br>post-layout)<br>with T <sub>RISE</sub><br>(w/o T <sub>RISE</sub> ) | OC-CLSA<br>(pre-layout)<br>w/o T <sub>RISE</sub> | OC-CLSA (pre-layout) with $T_{\text{RISE}}$ | OC-CLSA<br>(post-layout)<br>with T <sub>RISE</sub><br>(w/o T <sub>RISE</sub> ) | FS-CLSA<br>(size-up2 <sup>2)</sup> ,<br>post-layout)<br>with T <sub>RISE</sub><br>(w/o T <sub>RISE</sub> ) |
|-----------------------------------|--------------------------------------------------|-----------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|--------------------------------------------------|---------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
| Average $\sigma_{\rm OS}$<br>[mV] | 53.23 <sup>3)</sup>                              | 17.08 <sup>3)</sup><br>(11.92 @V <sub>BL</sub> =0.8V)                       | 19.68 <sup>3)</sup><br>(23.9)                                                                             | 11.924)                                          | 9.47 <sup>4</sup> )                         | 13.78 <sup>4)</sup><br>(15.4)                                                  | 18.2 <sup>3)</sup><br>(22.7)                                                                               |
| Minimum $\sigma_{ m OS}$<br>[mV]  | 51.05                                            | 7.81                                                                        | 7.74<br>(7.79)                                                                                            | 7.22                                             | 6.87                                        | 7.7<br>(8.08)                                                                  | 4.51<br>(4.87)                                                                                             |
| Area [µm <sup>2</sup> ]           | 0.027                                            | 0.531                                                                       | 14.0                                                                                                      | 0.272                                            | 0.272                                       | 24.15                                                                          | 24.15                                                                                                      |
| Worst-case<br>sensing time [ns]   | 1.6063)                                          | 0.6933)                                                                     | 2.55 <sup>3)</sup>                                                                                        | 3.497 <sup>4)</sup>                              | 3.8014)                                     | <b>6.76</b> <sup>4)</sup>                                                      | <b>3.79</b> <sup>3)</sup>                                                                                  |
| Average<br>sensing time [ns]      | 0.1113)                                          | 0.4573)                                                                     | <b>1.01</b> <sup>3)</sup>                                                                                 | 2.593 <sup>4</sup> )                             | 2.9234)                                     | <b>4.</b> 7 <sup>4)</sup>                                                      | <b>2.13</b> <sup>3)</sup>                                                                                  |
| Average power [µW]                | 2.883)                                           | 16.9 <sup>3)</sup>                                                          | <b>8.29</b> <sup>3)</sup>                                                                                 | 1.074)                                           | 0.9444)                                     | <b>1.23</b> <sup>4</sup> )                                                     | <b>12.65</b> <sup>3)</sup>                                                                                 |
| Average energy [fJ]               | 0.323)                                           | 7.75 <sup>3)</sup>                                                          | <b>8.3</b> 7 <sup>3)</sup>                                                                                | 2.784)                                           | 2.764)                                      | <b>5.</b> 77 <sup>4)</sup>                                                     | 26.94 <sup>3)</sup>                                                                                        |

1) The FS-CLSA with increased size to achieve the average  $\sigma_{OS}$  of the OC-CLSA without  $T_{RISE}$  (based on pre-layout analysis).

2) The FS-CLSA with further increased size to achieve the same layout area as the OC-CLSA.

3) The average or worst-case was calculated from  $V_{BL} = 0.35$  V to  $V_{BL} = 1$  V.

4) The average or worst-case was calculated from  $V_{BL} = 0$  V to  $V_{BL} = 0.65$  V.



Fig. 14. Schematic and timing diagrams of the OC-CLSA with the proposed  $V_{\text{REF}}$  biasing technique.

As mentioned in Section III, in S2 of the OC-CLSA (see Fig. 4),  $V_{\rm IN}$  and  $V_{\rm INB}$  are  $V_{\rm BL} + V_{\rm TH\_IN}$  and  $V_{\rm REF} + V_{\rm TH\_INB}$ , respectively. As  $V_{\rm BL}$  increases,  $R_{\rm INPUT}$  decreases and it leads to the increase in  $\sigma_{\rm OS}$ . In S2, because  $V_{\rm IN}$  ( $V_{\rm INB}$ ) is increased by the voltage difference between  $V_{\rm BL}$  ( $V_{\rm REF}$ ) and  $V_{\rm IN\_SC}$  ( $V_{\rm INB\_VG}$ ) in S1, by reducing this voltage difference, the operational range of the OC-CLSA can be controlled.

Fig. 14 shows the schematic and timing diagrams of the OC-CLSA with the proposed  $V_{\text{REF}}$  biasing technique. The concept of the proposed technique is to change the  $V_{\text{IN}\_SC}$  and  $V_{\text{INB}\_VG}$  in S1 of the OC-CLSA (in Fig. 4) to  $V_{\text{REF}}$  instead of GND so that the voltage difference between  $V_{\text{BL}}$  ( $V_{\text{REF}}$ ) and  $V_{\text{IN}\_SC}$  ( $V_{\text{INB}\_VG}$ ) in S1 becomes  $\Delta V$  (0 V). Because  $V_{\text{REF}}$  is adjusted according to the target  $V_{\text{BL}}$  range (e.g., When 0.8 V

<  $V_{\rm BL}$  < 1.0 V,  $V_{\rm REF}$  of 0.9 V is selected. When 0.0 V <  $V_{\rm BL}$  < 0.2 V,  $V_{\rm REF}$  of 0.1 V is selected.), by applying the proposed  $V_{\rm REF}$  biasing technique to the OC-CLSA, the voltage difference between  $V_{\rm BL}$  ( $V_{\rm REF}$ ) and  $V_{\rm IN_SC}$  ( $V_{\rm INB_VG}$ ) in S1 can be minimized. In Fig. 14, source node of the MNBIAS transistor is biased to  $V_{\rm REF}$ . Thus,  $V_{\rm IN}$  and  $V_{\rm INB}$  in S2 are decreased to  $\Delta V + V_{\rm TH_{-IN}}$  and  $V_{\rm TH_{-INB}}$ , respectively. As a result, the sensing dead zone of OC-CLSA can be completely eliminated. In addition, as the gate voltage of input NMOS for the FS-CLSA and OC-CLSA,  $\Delta V + V_{\rm TH}$  is the optimal voltage for minimizing  $\sigma_{\rm OS}$ . Thus, the average  $\sigma_{\rm OS}$  can be significantly reduced.

To further demonstrate the effectiveness of the OC-CLSA with the proposed  $V_{\text{REF}}$  biasing technique, we offer measurement results of the fabricated 28 nm test chip.

Fig. 15(a) shows the test chip structure with  $32 \times 32$  SA array containing 1024 OC-CLSAs (OC-CLSAs with  $V_{REF}$  biasing technique) and 1024 FS-CLSAs (size-up). Fig. 15(b) shows the die and layout photo of the test chip. The test chip includes 1024 FS-CLSAs (size-up) and 1024 OC-CLSAs designed to be able to change the source node voltage of MNBIAS transistor so that it can be used both as conventional OC-CLSA and OC-CLSA with  $V_{REF}$  biasing technique, as shown in Fig. 15(c). Following signals are generated inside the signal generator of the test chip using CLK signal input: SAE, PRE, P1, P2, P3 signals for OC-CLSA and SAE signal for FS-CLSA. Also, the test chip includes multiplexers and decoder to select the test cell, and buffers and D flip-flop (D-FF) to display visible output signal for  $\sigma_{OS}$  testing.

Fig. 16 shows the post-layout simulation results for  $\sigma_{OS}$  of the OC-CLSA with  $T_{RISE}$  control technique, OC-CLSA with  $V_{REF}$  biasing technique, and FS-CLSA (size-up) with  $T_{RISE}$  control technique according to  $V_{BL}$ . As clearly shown



Fig. 15. (a) Test chip structure with  $32 \times 32$  SA array containing 1024 OC-CLSAs (OC-CLSAs with  $V_{\text{REF}}$  biasing technique) and 1024 FS-CLSAs (size-up). (b) Die and layout photo of the test chip implemented in 28 nm CMOS technology. (c) Close look up of the OC-CLSA design modification to test proposed  $V_{\text{REF}}$  biasing technique.



Fig. 16.  $\sigma_{OS}$  of the OC-CLSA with  $T_{RISE}$  control technique, OC-CLSA with  $V_{REF}$  biasing technique, and FS-CLSA (size-up) with  $T_{RISE}$  control technique according to  $V_{BL}$  (post-layout simulation results).

in Fig. 16, the proposed  $V_{\text{REF}}$  biasing technique based OC-CLSA successfully eliminates the sensing death zone problem. Compared to the conventional FS-CLSA (size-up) and OC-CLSA designs with  $T_{\text{RISE}}$  technique, it achieves average  $\sigma_{\text{OS}}$  reduction of 49.7% and 28.2%, respectively. In case of the OC-CLSA with  $V_{\text{REF}}$  biasing technique, the average or worst-case was calculated from  $V_{\text{BL}} = 0$  V to  $V_{\text{BL}} = 1$  V because there is no sensing dead zone.

Fig. 17 shows the test chip results for  $\sigma_{OS}$  of the OC-CLSA, OC-CLSA with  $V_{REF}$  biasing technique, and FS-CLSA (sizeup) according to  $V_{BL}$ . For the  $\sigma_{OS}$  test of the OC-CLSA and OC-CLSA with  $V_{REF}$  biasing technique,  $T_{P1}$  and  $T_{P2}$  were set to 14 ns and 7 ns, respectively, due to the limited resolution of the fabricated chip. The post-layout simulation results are supported by the overall test chip results, albeit with a slight degradation. Degradation of the test chip results can be caused



Fig. 17.  $\sigma_{OS}$  of the OC-CLSA, OC-CLSA with  $V_{REF}$  biasing technique, and FS-CLSA (size-up) according to  $V_{BL}$  (test chip measurement results).

by various extrinsic elements such as voltage or noise drop and temperature. Compared to the OC-CLSA and FS-CLSA (sizeup), the minimum  $\sigma_{OS}$  (test chip) of the OC-CLSA with  $V_{REF}$ biasing technique was 1.3% and 18% higher, respectively. Even though the minimum  $\sigma_{OS}$  of the OC-CLSA with  $V_{REF}$ biasing technique was slightly higher than the OC-CLSA and FS-CLSA (size-up), the results were comparable. The average  $\sigma_{OS}$ (test chip) of the OC-CLSA with  $V_{REF}$  biasing technique was 22% and 58% lower compared to the OC-CLSA and FS-CLSA (size-up), respectively.

Table II shows the overall comparison analysis between conventional FS-CLSA (size-up), OC-CLSA, OC-CLSA with  $V_{\text{REF}}$  biasing technique, and three state-of-the-art SA designs proposed by Patel et al. [25], Sarfraz et al. [27] and Shen et al. [17] (or Na et al. [31]), based on post-layout simulations and test chip measurement results. As indicated in the Table II,

|                                                                | FS-CLSA<br>(size-up <sup>1)</sup> )<br>with T <sub>RISE</sub><br>(w/o T <sub>RISE</sub> ) | Proposed<br>OC-CLSA<br>with T <sub>RISE</sub><br>(w/o T <sub>RISE</sub> ) | DIBBSA-PD [25]<br>same Tr. size<br>(same layout<br>area) | VTSA [27]<br>same layout area | SOSA [17] [31]<br>same Tr. size | Proposed<br>OC-CLSA<br>with V <sub>REF</sub> biasing |  |  |  |  |
|----------------------------------------------------------------|-------------------------------------------------------------------------------------------|---------------------------------------------------------------------------|----------------------------------------------------------|-------------------------------|---------------------------------|------------------------------------------------------|--|--|--|--|
| Average $\sigma_{\rm OS}$<br>based on post-layout<br>[mV]      | $   \begin{array}{r}     19.68^{2)} \\     (23.9^{2)})   \end{array} $                    | $ \begin{array}{c} 13.78^{3)} \\ (15.4^{3)}) \end{array} $                | $21.16^{4)} (5.27^{4)})$                                 | 8.125)                        | 6.796)                          | <b>9.90</b> <sup>6)</sup>                            |  |  |  |  |
| Minimum $\sigma_{OS}$<br>based on post-layout<br>[mV]          | 7.74 <sup>2)</sup><br>(7.79 <sup>2)</sup> )                                               | $7.7^{3)} \\ (8.08^{3)})$                                                 | $20.6^{4)} (5.17^{4)})$                                  | 7.045)                        | 6.036)                          | 8.75 <sup>6)</sup>                                   |  |  |  |  |
| Average $\sigma_{OS}$<br>based on test chip<br>[mV]            | $(40.07^{2})$                                                                             | (21.583)                                                                  | N/A                                                      | N/A                           | N/A                             | 16.83 <sup>6)</sup>                                  |  |  |  |  |
| Minimum $\sigma_{ m OS}$<br>based on test chip<br>[mV]         | (12.47 <sup>2)</sup> )                                                                    | (14.53 <sup>3)</sup> )                                                    | N/A                                                      | N/A                           | N/A                             | 14.72                                                |  |  |  |  |
| Layout area<br>[µm²]                                           | 14.0                                                                                      | 24.15                                                                     | 9<br>(24.15)                                             | 24.08                         | 30.03                           | 24.15                                                |  |  |  |  |
| Sensing dead zone<br>range [V]                                 | $V_{\rm BL} < 0.35$<br>$V_{\rm BL} > 0.9$                                                 | $V_{\rm BL} > 0.75$                                                       | $V_{\rm BL}$ < 0.4                                       | $V_{\rm BL} > 0.45$           | No sensing dead zone            |                                                      |  |  |  |  |
| Average sensing time<br>based on post-layout<br>[ns]           | 1.012)                                                                                    | (4.7 <sup>3</sup> )                                                       | $\begin{array}{c} 0.93^{4)} \\ (1.38^{4)}) \end{array}$  | 165)                          | 10.6%)                          | 8.44 <sup>6)</sup>                                   |  |  |  |  |
| Average power<br>consumption based<br>on post-layout $[\mu W]$ | 8.29 <sup>2)</sup>                                                                        | 1.233)                                                                    | $\begin{array}{c} 2.44^{4)} \\ (20.5^{4)}) \end{array}$  | 0.425)                        | 4.16)                           | 0.53 <sup>6)</sup>                                   |  |  |  |  |
| Average energy<br>consumption based<br>on post-layout [fJ]     | 8.372)                                                                                    | 5.773)                                                                    | $(2.27^{4)}$<br>$(28.4^{4)})$                            | 6.725)                        | 43.466)                         | 4.47 <sup>6)</sup>                                   |  |  |  |  |

TABLE II POST-LAYOUT/TEST CHIP PERFORMANCE SUMMARY AND COMPARISON BETWEEN CONVENTIONAL FS-CLSA (SIZE-UP), OC-CLSA, STATE-OF-THE-ART SAS, AND OC-CL SA WITH VDEE BIASING TECHNIQUE

1) The FS-CLSA with increased size to achieve the average  $\sigma_{OS}$  of the OC-CLSA without  $T_{RISE}$  (based on pre-layout analysis).

2) The average or worst-case was calculated from  $V_{\rm BL} = 0.35$  V to  $V_{\rm BL} = 1$  V.

3) The average or worst-case was calculated from  $V_{BL} = 0$  V to  $V_{BL} = 0.65$  V.

4) The average or worst-case was calculated from  $V_{BL} = 0.4$  V to  $V_{BL} = 1$  V. 5) The average or worst-case was calculated from  $V_{BL} = 0$  V to  $V_{BL} = 0.45$  V.

6) The average of worst case was calculated from  $V_{BL} = 0$  V to  $V_{BL} = 1$  V.

to fairly compare our proposed designs with state-of-the-art SAs, we made post-layout analyses on differential input body biased SA with predischarge output nodes (DIBBSA-PD) [25], variation-tolerant SA (VTSA) [27], and single-ended offset-canceling SA (SOSA) [17], [31].

Fig. 18 shows the layouts used for the comparison analysis. DIBBSA-PD proposed using body-biasing technique on VLSA to lower  $\sigma_{OS}$ . To offer fair comparison between proposed designs and DIBBSA-PD, two different layouts of DIBBSA-PD were made. First layout utilizes same transistor size as OC-CLSA and it is shown in Fig. 18(a), while the second layout has increased the transistor sizes so that the layout area is similar to the OC-CLSA and is shown in Fig. 18(b). Shown in Fig. 18(c) is the layout of the VTSA. It is a hybrid design between VLSA and CLSA, and it offers accurate operation in low voltages. For the layout in Fig. 18(c), transistor sizes were increased so that the VTSA's layout area is similar to our proposed design. Fig. 18(d) shows the layout of SOSA. SOSA is VLSA type design that offers low  $\sigma_{OS}$  while enabling wide-voltage operations. For the layout shown in Fig. 18(d), the transistor sizes were the same as our proposed design because SOSA uses two capacitors for offset-cancellation and the SOSA's layout area is similar to the proposed design. Additionally, we changed the NMOS switch transistors used in SOSA to transmission gates to make the comparison more accurately.

By applying  $V_{\text{REF}}$  biasing technique on OC-CLSA, the average  $\sigma_{\text{OS}}$  (test chip) was successfully lowered by 22% (from 21.58 mV to 16.83 mV) because the proposed technique successfully eliminates the sensing dead zone. Even though the  $V_{\text{REF}}$  biasing technique improves the average  $\sigma_{\text{OS}}$  of OC-CLSA, as a result of the lowered voltage, current degradation occurs in S3 and it leads to delay. However, despite the latency increment in average sensing time (from 4.7 ns to 8.44 ns), the average power consumption is lowered by 56.9% (from 1.23  $\mu$  W to 0.53  $\mu$ W). As a result, the overall energy consumption is lowered by 22.5% (from 5.77 fJ to 4.47 fJ).

When transistor sizes of DIBBSA-PD is chosen to be same as our proposed design, the average energy consumption is 49.2% lower but average  $\sigma_{OS}$  is 113% larger than the OC-CLSA with  $V_{REF}$  biasing. Therefore, we concluded that increasing the transistor sizes of DIBBSA-PD to make the layout area similar to our proposed design is fair. For DIBBSA-PD in Fig. 18(b), the transistor sizes (width/length) were increased to 1.5  $\mu$ m/0.03  $\mu$ m. As a result, the average  $\sigma_{OS}$  of DIBBSA-PD is decreased and it is 47% lower than our proposed design. However, the energy consumption increases dramatically with transistor size increment. Also, the DIBBSA-PD has a sensing dead zone range of  $V_{BL} < 0.4$  V.

Compared to the proposed design, average  $\sigma_{OS}$  of VTSA is 17.9% smaller but the energy consumption of VTSA is 50.9% bigger. Because the VTSA utilizes hybrid design in which



Fig. 18. Layouts of state-of-the-art SA designs. (a) DIBBSA-PD [25] with same transistor size as OC-CLSA with  $V_{\text{REF}}$  biasing. (b) DIBBSA-PD [25] with the same layout area as OC-CLSA with  $V_{\text{REF}}$  biasing. (c) VTSA [27] with the same transistor size as OC-CLSA with  $V_{\text{REF}}$  biasing. (d) SOSA [17] [31] with the same layout area as OC-CLSA with  $V_{\text{REF}}$  biasing.

output nodes are connected to  $V_{BL}$  and  $V_{BLB}$ , average sensing time of VTSA is 89% longer than the proposed design. Also, VTSA has a sensing dead zone range of  $V_{BL} > 0.45$ .

SOSA is the most efficient design in terms of performance among previously proposed designs and it successfully eliminates sensing dead zone. Therefore, average  $\sigma_{OS}$  of SOSA is 31.4% smaller than our proposed design. However, as shown in Table II, the average energy consumption of SOSA is enormous than the proposed design, because it utilizes auto-zeroing technique that uses excessive short-circuit power during this period. Note that for SOSA we analyzed  $\sigma_{OS}$  dependency on PRE and SMP signals and concluded that 5 ns for the  $T_{PRE}$ and  $T_{SMP}$  were reasonable.

### V. CONCLUSION

In the first part of this paper, we proposed a slow  $T_{\text{RISE}}$  control technique for CLSAs, which reduces  $\sigma_{OS\_LATCH}$  without area overhead, and conducted a comparative study between OC-CLSA and FS-CLSA using slow T<sub>RISE</sub> technique on both. Post-layout simulation results showed that the OC-CLSA achieved a 10.5% reduction in  $\sigma_{OS}$  by employing the  $T_{RISE}$ control technique, while the FS-CLSA (size-up) achieved a  $\sigma_{\rm OS}$  reduction of 17.7%. In addition, the simulation results clearly proved that the OC-CLSA is more energy efficient and the FS-CLSA (size-up) is more performance and area efficient. In the second part of this paper, we proposed  $V_{\text{REF}}$  biasing technique for the OC-CLSA. The experimental results using a fabricated 28 nm test chip as well as post-layout simulation results showed that the OC-CLSA with V<sub>REF</sub> biasing technique outperformed the OC-CLSA with TRISE control technique and FS-CLSA (size-up) with  $T_{RISE}$  control technique in terms of  $\sigma_{\rm OS}$  and energy consumption. Compared to the state-of-theart SAs (DIBBSA-PD, VTSA, SOSA), the OC-CLSA with  $V_{\text{REF}}$  biasing offers comparable  $\sigma_{\text{OS}}$  with the lowest energy consumption. Thus, the OC-CLSA with  $V_{\text{REF}}$  biasing can be a promising solution for battery-hungry applications and the FS-CLSA (size-up) with slow  $T_{RISE}$  can be suitable for high performance applications.

#### ACKNOWLEDGMENT

The chip fabrication and EDA tool were supported by the IC Design Education Center (IDEC), South Korea.

#### REFERENCES

- Y. Tsiatouhas et al., "New memory sense amplifier designs in CMOS technology," in *Proc. 7th IEEE Int. Conf. Electron., Circuits Syst.*, vol. 1, Aug. 2000, pp. 19–22.
- [2] T. Kobayashi et al., "A current-mode latch sense amplifier and a static power saving input buffer for low-power architecture," *IEEE J. Solid-State Circuits*, vol. 28, no. 4, pp. 523–527, Apr. 1993.
- [3] B. Wicht, T. Nirschl, and D. Schmitt-Landsiedel, "Yield and speed optimization of a latch-type voltage sense amplifier," *IEEE J. Solid-State Circuits*, vol. 39, no. 7, pp. 1148–1158, Jul. 2004.
- [4] R. Sarpeshkar, J. L. Wyatt, N. C. Lu, and P. D. Gerber, "Mismatch sensitivity of a simultaneously latched CMOS sense amplifier," *IEEE J. Solid-State Circuits*, vol. 26, no. 10, pp. 1413–1422, Oct. 1991.
- [5] T. Na, S.-H. Woo, J. Kim, H. Jeong, and S.-O. Jung, "Comparative study of various latch-type sense amplifiers," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 22, no. 2, pp. 425–429, Feb. 2014.
- [6] B. Song et al., "Reference-circuit analysis for high-bandwidth spin transfer torque random access memory," in *Proc. IEEE/ACM Int. Symp. Low Power Electron. Design (ISLPED)*, Jul. 2015, pp. 365–370.
- [7] T. Na et al., "STT-MRAM sensing: A review," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 68, no. 1, pp. 12–18, Jan. 2021.
- [8] A. Do, Z. Kong, and K. Yeo, "Criterion to evaluate input-offset voltage of a latch-type sense amplifier," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 57, no. 1, pp. 83–92, Jan. 2010.
- [9] A. Hajimiri and R. Heald, "Design issues in cross-coupled inverter sense amplifier," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, vol. 2, May/Jun. 1998, pp. 149–152.
- [10] J. Yeung and H. Mahmoodi, "Robust sense amplifier design under random dopant fluctuations in nano-scale CMOS technologies," in *Proc. IEEE Int. SOC Conf.*, Sep. 2006, pp. 261–264.
- [11] P. A. Stolk, F. P. Widdershoven, and D. B. M. Klaassen, "Modeling statistical dopant fluctuations in MOS transistors," *IEEE Trans. Electron Devices*, vol. 45, no. 9, pp. 1960–1971, Sep. 1998.
- [12] A. J. Bhavnagarwala, X. Tang, and J. D. Meindl, "The impact of intrinsic device fluctuations on CMOS SRAM cell stability," *IEEE J. Solid-State Circuits*, vol. 36, no. 4, pp. 658–665, Apr. 2001.
- [13] H. Mahmoodi et al., "Estimation of delay variations due to randomdopant fluctuations in nanoscale CMOS circuits," *IEEE J. Solid-State Circuits*, vol. 40, no. 9, pp. 1787–1796, Sep. 2005.
- [14] H. Nho, S.-S. Yoon, S. S. Wong, and S.-O. Jung, "Numerical estimation of yield in sub-100-nm SRAM design using Monte Carlo simulation," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 55, no. 9, pp. 907–911, Sep. 2008.

- [15] M. E. Sinangil, "A 28 nm 2 Mbit 6 T SRAM with highly configurable low-voltage write-ability assist implementation and capacitorbased sense-amplifier input offset compensation," *IEEE J. Solid-State Circuits*, vol. 51, no. 2, pp. 557–567, Feb. 2016.
- [16] D. Patel and M. Sachdev, "0.23-V sample-boost-latch-based offset tolerant sense amplifier," *IEEE Solid-State Circuits Lett.*, vol. 1, no. 1, pp. 6–9, Jan. 2018.
- [17] S. Shen, H. Xu, Y. Zhou, and W. Yu, "A single-ended offset-canceling sense amplifier enabling wide-voltage operations," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 70, no. 3, pp. 1139–1143, Mar. 2023.
- [18] Y. Zhang, Z. Wang, C. Zhu, and L. Zhang, "28-nm latch-type sense amplifier modification for coupling suppression," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 25, no. 5, pp. 1767–1773, May 2017.
- [19] R. Singh and N. Bhat, "An offset compensation technique for latch type sense amplifiers in high-speed low-power SRAMs," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 12, no. 6, pp. 652–657, Jun. 2004.
- [20] T. Na, "Robust offset-cancellation sense amplifier for an offset-canceling dual-stage sensing circuit in resistive nonvolatile memories," *Electronics*, vol. 9, no. 9, p. 1403, Sep. 2020.
- [21] T. Na, "Offset voltage analysis and enable signal rise time control based offset reduction technique of current-latched sense amplifier," in *Proc. Int. Conf. Electron., Inf., Commun. (ICEIC)*, Jan./Feb. 2021, pp. 1–2.
- [22] J. S. Shah, D. Nairn, and M. Sachdev, "An energy-efficient offsetcancelling sense amplifier," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 60, no. 8, pp. 477–481, Aug. 2013.
- [23] Y. Sinangil and A. P. Chandrakasan, "A 128 kbit SRAM with an embedded energy monitoring circuit and sense-amplifier offset compensation using body biasing," *IEEE J. Solid-State Circuits*, vol. 49, no. 11, pp. 2730–2739, Nov. 2014.
- [24] B. Liu, J. Cai, J. Yuan, and Y. Hei, "A low-voltage SRAM sense amplifier with offset cancelling using digitized multiple body biasing," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 64, no. 4, pp. 442–446, Apr. 2017.
- [25] D. Patel, A. Neale, D. Wright, and M. Sachdev, "Body biased sense amplifier with auto-offset mitigation for low-voltage SRAMs," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 68, no. 8, pp. 3265–3278, Aug. 2021.
- [26] S. Babayan-Mashhadi and R. Lotfi, "Analysis and design of a low-voltage low-power double-tail comparator," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 22, no. 2, pp. 343–352, Feb. 2014.
- [27] K. Sarfraz, J. He, and M. Chan, "A 140-mV variation-tolerant deep sub-threshold SRAM in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 52, no. 8, pp. 2215–2220, Aug. 2017.
- [28] D. Patel, A. Neale, D. Wright, and M. Sachdev, "Hybrid latch-type offset tolerant sense amplifier for low-voltage SRAMs," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 7, pp. 2519–2532, Jul. 2019.
- [29] M. Sharifkhani, E. Rahiminejad, S. M. Jahinuzzaman, and M. Sachdev, "A compact hybrid current/voltage sense amplifier with offset cancellation for high-speed SRAMs," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 19, no. 5, pp. 883–894, May 2011.
- [30] M. J. M. Pelgrom, A. C. J. Duinmaijer, and A. P. G. Welbers, "Matching properties of MOS transistors," *IEEE J. Solid-State Circuits*, vol. 24, no. 5, pp. 1433–1439, Oct. 1989.
- [31] T. Na, B. Song, S. Choi, J. P. Kim, S. H. Kang, and S.-O. Jung, "Offset-canceling single-ended sensing scheme with onebit-line precharge architecture for resistive nonvolatile memory in 65-nm CMOS," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 27, no. 11, pp. 2548–2555, Nov. 2019.



**Bayartulga Ishdorj** received the B.S. and M.S. degrees in electronics engineering from Incheon National University, Incheon, Republic of Korea, in 2020 and 2023, respectively, where he is currently pursuing the Ph.D. degree in electronics engineering. His current research interests include PVT variation tolerant and low-power circuit designs for resistive nonvolatile memory (NVM), especially in STT-MRAM.



**Doyeon Kim** received the B.S. degree in electronics engineering from Incheon National University, Incheon, Republic of Korea, in 2022. Since 2022, he has been with Samsung Electronics Company Ltd., Hwaseong, Republic of Korea. His current research interests include PVT variation tolerant and low-power circuit designs for DRAM.



Seongmin Ahn received the B.S. degree in electronics engineering from Incheon National University, Incheon, Republic of Korea, in 2023, where he is currently pursuing the M.S. degree in electronics engineering. His current research interests include PVT variation tolerant and low-power circuit designs for memory, microcontroller unit, and neuromorphic SoC.



Taehui Na (Member, IEEE) received the B.S. and Ph.D. degrees in electrical and electronic engineering from Yonsei University, Seoul, Republic of Korea, in 2012 and 2017, respectively. From 2017 to 2019, he was with Samsung Electronics Company Ltd., Hwaseong, Republic of Korea, where he worked on phase-change random access memory (PRAM) and high-performance NAND (ZNAND) core circuit designs. Since 2019, he has been a Professor with Incheon National University, Incheon, Republic of Korea. His current

research interests are focused on PVT variation tolerant and low-power circuit designs for memory, microcontroller unit, and neuromorphic SoC.