

Received March 16, 2021, accepted March 31, 2021, date of publication April 5, 2021, date of current version April 20, 2021. Digital Object Identifier 10.1109/ACCESS.2021.3070851

# Bitline Charge Sharing Suppressed Bitline and Cell Supply Collapse Assists for Energy-Efficient 6T SRAM

# KIRYONG KIM<sup>®</sup>, TAE WOO OH<sup>®</sup>, (Student Member, IEEE), AND SEONG-OOK JUNG<sup>®</sup>, (Senior Member, IEEE)

School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, South Korea

Corresponding author: Seong-Ook Jung (sjung@yonsei.ac.kr)

This work was supported by the National Research Foundation of Korea (NRF) Grant by the Korean Government through MSIT under Grant 2017R1A2B2006679 and Grant 2021R1A2C2008297.

**ABSTRACT** This paper proposes a bitline charge sharing suppressed bitline read assist (BCS RA) and a cell supply collapse write assist (BCS WA). The proposed BCS RA suppresses the bitline (BL) voltage to half of the supply voltage ( $V_{DD}$ ) using the charge sharing BL precharger for improving read stability and energy efficiency. In the proposed BCS WA, the charge on cell  $V_{DD}$  ( $CV_{DD}$ ) is shared with that on the BL precharged to half- $V_{DD}$  by the charge sharing write driver, which causes the collapse in  $CV_{DD}$ . In cells with poor writability,  $CV_{DD}$  can be collapsed more by the self-collapse paths when the write operation is performed. Thus, the BCS WA improves writability and reduces write energy consumption. The simulation results using 22-nm FinFET technology show that static random access memory (SRAM) using BCS RA and WA consumes much less read and write energy than SRAMs using state-of-the-art assists while achieving a comparable minimum operating  $V_{DD}$  to SRAMs using state-of-the-art assists. Even compared to the SRAM without assists, the read and write energy consumption is reduced by 31% and 26%, respectively.

**INDEX TERMS** FinFET SRAM, read assist, write assist, V<sub>MIN</sub> improvement, energy-efficient read and write operations.

#### I. INTRODUCTION

Recently, as the demand for artificial intelligence and big data analysis in battery-powered mobile and wearable devices has exploded, low power consumption and cost have become important in system-on-chip (SoC). On the other hand, as device size and supply voltage ( $V_{DD}$ ) are continuously scaled down to improve the area and power consumption, a delicate and subtle SoC design is required to mitigate the effects of process variation. In static random access memory (SRAM), one of the main building blocks of SoC, the effects of process variation are more critical since SRAM is composed of small-sized devices.

Although an undoped or low-doped FinFET can be used to relieve the process variation [1], it is still difficult to achieve optimum read stability and writability yields due to the quantized channel width determined by the number of fins [2], [3]. In particular, the single-fin 6T SRAM design shown

The associate editor coordinating the review of this manuscript and approving it for publication was Woorham Bae<sup>(b)</sup>.



FIGURE 1. (a) 6T SRAM cell and (b) bit-interleaved SRAM array.

in Fig. 1(a), where the fin number of each device is 1 for high density, makes it harder to achieve high read stability and writability yield simultaneously. In addition, as a low  $V_{DD}$  decreases, SRAM operation becomes more vulnerable to the threshold voltage ( $V_{TH}$ ) variation. Thus, the minimum operating  $V_{DD}$  ( $V_{MIN}$ ) in a single- $V_{DD}$  SoC can be limited by SRAM. To improve  $V_{MIN}$  in the smallest-sized SRAM, various read assist (RA) and write assist (WA) techniques have been proposed.

| Parameter                      | NMOS                           | PMOS                             |  |
|--------------------------------|--------------------------------|----------------------------------|--|
| Gate length                    | 34 nm                          |                                  |  |
| Equivalent oxide thickness     | 0.9 nm                         |                                  |  |
| Fin thickness / height         | 8 nm / 34nm                    |                                  |  |
| On / off current per $\mu m$   | 880 µA / 1 nA                  | 780 µA / 1nA                     |  |
| Sub-threshold swing            | 69 mV/dec                      | 72 mV/dec                        |  |
| DIBL                           | 45.3 mV/V                      | 50.7 mV/V                        |  |
| Threshold voltage $(V_{th})^a$ | 281 mV (Sat.)<br>318 mV (Lin.) | -296 mV (Sat.)<br>-334 mV (Lin.) |  |

**TABLE 1.** Technology Parameters for  $V_{DD} = 0.8 V$ .

 $^aV_{th}$  in the saturation and linear models is measured as  $V_{GS}$  when the drain current per effective width is  $10^{-5}$  A/µm with  $|V_{DS}| = V_{DD}$  and  $|V_{DS}| = 0.05V$ .

The wordline underdrive (WLUD) [4], [5], [7]–[13] or suppressed bitline (SBL) [5], [6] RA is widely used to improve the stability of the selected cells and row half-selected cells shown in Fig. 1(b). For improving writability, the negative bitline (NBL) [5], [7], [10], [12], [14]-[16], transient cell supply collapse (TVC) [4], [8], [13]–[17], transient cell ground bump (TGB) [18]–[20], or wordline overdrive (WLOD) WA [9], [12], [21], [22] is used. Although the various RA and WA techniques reduce the V<sub>MIN</sub>, reducing the energy consumed in the assist circuit is still challenging. The WA techniques used in [18] and [20] share the charge on the floated cell ground  $(CV_{SS})$  with those on the floated cell supply (CV<sub>DD</sub>) in the selected column and the bitline (BL) in the unselected column, respectively. They can reduce the energy consumption required for increasing the voltage of CV<sub>SS</sub> during the write operation. However, they have degraded read stability and speed due to the foot switch between CV<sub>SS</sub> and V<sub>SS</sub>. In [17], the gate-modulated self-collapse (GSC) WA is adopted to reduce the energy consumption. However, the V<sub>MIN</sub> improvement is much smaller than that of conventional TVC WA.

In this paper, a novel BL charge sharing RA (BCS RA) and WA (BCS WA) are proposed to improve read stability, writability, and energy efficiency for the single-fin 6T SRAM. By the BCS RA, the BL pair is precharged to half- $V_{DD}$  with sharing charges between the BL pair. By the BCS WA, the floated  $CV_{DD}$  is shared with that of the BL precharged to half- $V_{DD}$  to collapse  $CV_{DD}$ . Through the charge sharing operation used in the BCS RA and WA, the energy consumption of the read and write operations can be reduced.

The rest of this paper is organized as follows. In Section II, the simulation setup and various assist techniques are presented. In Sections III and IV, the proposed BCS RA and WA are described in detail. In Section V, the simulation results, including  $V_{MIN}$  and energy consumption, are compared with those of conventional assists. Finally, the conclusions to this paper are given in Section VI.

#### **II. BACKGROUND**

#### A. SIMULATION SETUP

All simulation results in this work were obtained from HSPICE Monte Carlo simulations using a BSIM-CMG model fitted to the experimental results of 22-nm FinFET



FIGURE 2. Layout of single-fin 6T SRAM cell.

[1], whose parameters are detailed in Table 1. To obtain yields considering the dynamic operating behavior of SRAM with a smaller number of simulation samples, mean-shift importance sampling [23], in which means of  $V_{th}$  of transistors in SRAM are shifted to produce more failure samples, is utilized. The  $V_{th}$  variation of a transistor is assumed to follow a Gaussian distribution whose standard deviation of  $V_{th}$  ( $\sigma_{Vth}$ ) with length = L and width = W is defined by [24]

$$\sigma_{\rm Vth} = \frac{A_{\rm Vt}}{\sqrt{L \times W}} \tag{1}$$

where Avt is the slope of the Pelgrom plot and set to  $1.8 \text{ mV} \cdot \mu \text{m}$  [25]. The  $\Delta \text{V}_{\text{th}}$ s in the various process corners are properly assumed to be between 35 and 50 mV according to the process corners [16]. The gate-to-source (or drain) capacitance and junction capacitance are set to  $0.252 \text{ fF}/\mu \text{m}$  [26] and  $0.72 \text{ fF}/\mu \text{m}^2$  [27], respectively. In addition, the interconnect capacitance and resistance are set to  $0.16 \text{ fF}/\mu \text{m}$  and 21 ohm/ $\mu \text{m}$  [28], respectively. The number of cells per BL and wordline (WL) is assumed to be 256 and 128, respectively. Then, the layout of the single-fin 6T SRAM using the 22-nm FinFET rule [29], [30] in Fig. 2 is used to calculate the interconnect capacitance and resistance of the SRAM array.

To evaluate the DC characteristics of the single-fin 6T SRAM, the read static noise margin (RSNM) and WL write trip voltage (WWTV) are calculated using a 5000-point Monte Carlo simulation at various corners, as shown in Fig. 3(a). The normalized RSNM and WWTV by each standard deviation are the smallest at a hot temperature (85°C) and fast NMOS slow PMOS (FS) corner and cold temperature  $(-25^{\circ}C)$  and FS corner, respectively. Thus, the dynamic analysis for read stability and writability yields in the subsequent section is performed at the hot temperature (85°C) and FS corner and cold temperature  $(-25^{\circ}C)$  and SF corner, respectively. Fig. 3(b) shows the retention yield according to V<sub>DD</sub>, and the retention voltage is determined to be 400 mV, the minimum voltage to satisfy the target yield of  $6\sigma (10^{-9})$  [31]. Thus, a V<sub>DD</sub> above 400 mV is used in the simulations in the rest of the paper.

# B. EFFECTS OF WLUD AND SBL RA

The read stability is improved by reducing the voltage bump at the storage node (SN) that stores "0" (e.g., SNL in Fig. 1(a)) during the read operation in the selected cell. The voltage bump can be represented by a voltage division between the left pass-gate (PGL) and left pull-down device



FIGURE 3. (a) RSNM, WWTV with 5000-point Monte Carlo simulation at  $V_{DD}$  of 800 mV at various corners and (b) retention yield according to  $V_{DD}$ s.



FIGURE 4. Failure rate of (a) read stability and (b) writability depending on  $V_{WL}$  and precharged  $V_{BL}$  at  $V_{DD}$  of 800 mV.

(PDL). Thus, the voltage bump can be reduced by the WLUD or SBL RA, where the strength ratio of PGL to PDL or the input voltage (precharged BL voltage,  $V_{BL}$ ) of the voltage divider is reduced, respectively.

Fig. 4(a) shows the read stability failure rate depending on the WL voltage ( $V_{WL}$ ) and precharged  $V_{BL}$  at the hot temperature and FS corner, in which  $V_{WL}$  and  $V_{BL}$  are represented by the percentage of  $V_{DD}$ . The WLUD RA is more effective than the SBL RA to improve the read stability since the WLUD RA reduces the  $V_{GS}$  of PG while the SBL RA reduces the  $V_{DS}$  of PG. In addition, when the  $V_{WL}$  is  $V_{DD}$ , as the  $V_{BL}$  is lowered from 100% of  $V_{DD}$ , the read stability is improved, and then it starts to be degraded as  $V_{BL}$  becomes 75% of  $V_{DD}$  by the reverse stability fail [6]. An excessively small  $V_{BL}$  causes the current flowing from right SN (SNR) storing "1" to right BL (BLR), leading to the discharge in



**IEEE**Access

FIGURE 5. (a) SWL technique and (b) various WAs (NBL, WLOD, TVC, and TGB) for writability improvement.

SNR. The discharged SNR can weakly turn on left pull-up device (PUL) and thus increase the voltage bump at left SN (SNL), resulting in the reverse stability fail. Thus, SBL RA has an optimum  $V_{BL}$  to achieve the maximum read stability. However, the WLUD RA impedes the reverse stability fail in SBL RA by decreasing the current flowing from SR to BLR. Thus, when SBL and WLUD RAs are used simultaneously, the optimum  $V_{BL}$  for achieving the maximum read stability becomes smaller at a smaller  $V_{WL}$ . Then, the read stability is improved more than when only the WLUD RA is used.

When the WLUD and SBL RAs are used during write operations to improve the stability of row half-selected cell (RHSC), they can affect the writability. Fig. 4(b) shows the writability failure rate depending on the  $V_{BL}$  and  $V_{WL}$  at the cold temperature and SF corner. The WLUD RA significantly reduces the current flowing through PG due to the decreased  $V_{GS}$  of PG. Thus, The WLUD RA degrades the writability much more than the SBL RA. When the SBL RA is used, the inverter-type write driver (WD) can resolve the writability degradation by charging the suppressed BL to  $V_{DD}$ .

#### C. SRAM WRITABILITY IMPROVEMENT

The WLUD RA is essential to satisfy the target read stability failure rate of  $10^{-9}$ . However, the WLUD RA degrades the writability significantly. To mitigate the writability degradation caused by the WLUD RA, the stepped WL (SWL) technique is widely used [5], [7], [21], [22]. The SWL technique also mitigates the read speed degradation caused by the WLUD RA. In the SWL technique, as shown in Fig. 5(a), the WLUD is disabled after a certain time (T<sub>UD</sub>) at which the BL is discharged enough to preserve read or RHSC stability. To further improve writability, various WAs are required. The NBL, TVC, TGB, and WLOD WAs increase the strength ratio



FIGURE 6. (a) Schematic of conventional BP, BP with discharger [5], and (b) schematic and operation sequence of proposed CSBP.

of PG to PU to flip the data easily, as shown in Fig. 5(b). The NBL WA decreases  $V_{BL}$  below ground voltage (GND) through capacitive coupling to increase the strength of PG. The WLOD WA is used to increase the strength of PG by boosting WL above  $V_{DD}$  through capacitive coupling. The TVC and TGB WAs reduce the strength of PU by decreasing the source and gate voltages of PU, respectively.

However, the NBL WA is vulnerable to leaky cells at a hot temperature or low- $V_{th}$  corner [16]. Moreover, the additional capacitor is required in every WD for capacitive coupling. In the WLOD WA, an RHSC stability issue must be resolved by using a multi-step WL technique [21], [22]. In addition, the NBL and WLOD WAs have a gate oxide reliability concern. The TVC and TGB WAs require control circuits to avoid the stability issue in the column half-selected cells, which causes an area overhead due to the capacitor [8] or static power consumption resulting from the bias circuit [4]. Moreover, the TGB WA using the foot switch degrades read and RHSC stability by incurring a higher voltage bump at the SN due to the increased resistance of the pull-down path in the cell.



FIGURE 7. Read operations of (a) conventional BP, (b) BP with discharger [5], and (c) proposed CSBP.

#### **III. PROPOSED BCS RA**

The conventional SBL RA [6] is implemented by reducing the precharged V<sub>BL</sub> with the conventional bitline precharger (BP) using an additional  $V_{DD}$  separated from the  $V_{DD}$  of the SRAM array. The separated V<sub>DD</sub> is generated by a voltage regulator, which induces a complicated design by the bias circuit, amplifier, and multiple-V<sub>DD</sub> layout. In another conventional SBL RA [5], a BL discharger is added to the BP to suppress the BL pair, as shown in Fig. 6(a). The BL pair is firstly precharged to V<sub>DD</sub> by the BP and then discharged to a certain voltage by the discharger. When the BL discharger is used, the BL is significantly discharged because the BL is discharged by not only the BL discharger but also the SN. Then, the discharged BL should be precharged to full- $V_{DD}$ , leading to high energy consumption. In this paper, to implement the energy-efficient SBL RA with a single-V<sub>DD</sub>, the BCS RA using a charge sharing BL precharger (CSBP) is proposed, as shown in Fig. 6(b). The CSBP consists of a PMOS head switch (PH), an NMOS foot switch (NF), a cross-coupled inverter (P0, P1, N0, and N1), and a PMOS equalizing switch (PE). The PH, NF, and PE are turned on or off by PC, /PC, and EQ signals, respectively. The detailed operation sequence is described in the following subsections.

#### A. READ OPERATION

Fig. 7 shows the successive read operations in the SRAMs using the conventional BL precharger (BP), BP with discharger [5], and proposed CSBP. During the read operation, one BL of the BL pair is discharged by the data in the cell after WL assertion so that a voltage difference between the BL pair is developed. When the conventional BP is used, both left BL (BLL) and BLR are precharged to  $V_{DD}$  for the next operation. In the SRAM using the BP with discharger, BL is discharged before WL assertion. After WL falls, both BLL and BLR are precharged to  $V_{DD}$ .

On the other hand, when the proposed CSBP is used, one BL of the BL pair near the SN storing "1" is precharged to  $V_{DD}$  and the other is predischarged to GND by turning on PH and NF in the CSBP after the read operation. A BL



**FIGURE 8.** V<sub>WL, R6</sub><sub>σ</sub> according to V<sub>DD</sub>s, (b) transient operations using 1000-point Monte Carlo simulation, (c) required T<sub>UD</sub> maintaining target stability using WLUD according to V<sub>DD</sub>s for precharged V<sub>BL</sub> of V<sub>DD</sub> or half-V<sub>DD</sub>.

with a higher voltage than the other is charged to  $V_{DD}$  and the other BL is discharged to GND by the cross-coupled inverter. The read operation is performed in the order of the equalize, operation, and precharge phases. In the equalize phase, PH and NF are turned off by raising the PC signal while PE is turned on by lowering the EQ signal. Then, the BL pair is floated, and their charges are shared with each other so that the voltage of the shared BL pair becomes half-V<sub>DD</sub>. During the equalize phase, the energy consumption in charging BL from GND to half-V<sub>DD</sub> can be reduced due to charge sharing between the floated BL pair. In the operation phase, PE is turned off and WL rises to perform the read operation. Finally, in the precharge phase, the crosscoupled inverter in the CSBP precharges one BL and predischarges the other BL to prepare the next read, write, or hold operation.

Fig. 8(a) shows the V<sub>WL</sub> required to achieve the target read stability ( $V_{WL,R6\sigma}$ ), which is below 85% of  $V_{DD}$  for all V<sub>DD</sub>s, when BL is precharged to V<sub>DD</sub> or half-V<sub>DD</sub>. Although BL is precharged to half- $V_{DD}$ , the reverse stability fail is mitigated as  $V_{WL}$  is reduced below 85% of  $V_{DD}$ . The read stability failure rate with the BL precharged to half-V<sub>DD</sub> is smaller than that with the BL precharged to  $V_{DD}$ , as shown in Fig. 4(a). Thus,  $V_{WL}$ , R6 $\sigma$  with the BL precharged to half-V<sub>DD</sub> is slightly larger than that with the BL precharged to V<sub>DD</sub>. In addition, the V<sub>BL</sub> precharged to half-V<sub>DD</sub> reaches GND rapidly during the read operation. The V<sub>BL</sub> distribution with the BL precharged to half-V<sub>DD</sub> becomes narrow more rapidly than that with the BL precharged to V<sub>DD</sub>, as shown in Fig. 8(b). When using the SWL technique, a short  $T_{UD}$ can be achieved with BL precharged to half-V<sub>DD</sub>, because the rapid BL discharge to GND reduces the voltage bump at SN storing "0" and the narrow V<sub>BL</sub> distribution reduces the probability of cell data flip by the voltage bump. Thus, when the precharged  $V_{BL}$  is half- $V_{DD}$ , the required  $T_{UD}$  to maintain the target read stability achieved by  $V_{WL}$ , R6 $\sigma$  (T<sub>UD</sub>, R6 $\sigma$ ) is always shorter at all  $V_{DD}$ s than that when the precharged  $V_{BL}$ is  $V_{DD}$ , as shown in Fig. 8(c).

The short  $T_{UD}$  is helpful to reduce  $T_{WL}$ , which ensures  $6\sigma$  sensing yield ( $T_{WL,S6\sigma}$ ) during the read operation. The



FIGURE 9. T<sub>WL,S6 $\sigma$ </sub> when BL pair is precharged to V<sub>DD</sub> or half-V<sub>DD</sub>.

 $T_{WL,S6\sigma}$  is obtained by

Sensing yield 
$$(\sigma) = \frac{\mu_{\Delta V_{BL}(T_{WL})} - \mu_{V_{OS}}}{\sqrt{\sigma_{\Delta V_{BL}(T_{WL})}^2 + \sigma_{V_{OS}}^2}} = 6$$
 (2)

where  $\mu_{\Delta V_{BL}(T_{WL})}$  and  $\sigma_{\Delta V_{BL}(T_{WL})}$  are the mean and standard deviation of the voltage difference between the BL pair at T<sub>WL</sub> after WL is asserted and  $\mu_{V_{OS}}$  and  $\sigma_{V_{OS}}$  are the mean and standard deviation of the offset voltage in a sense amplifier. The offset voltage in the sense amplifier ( $V_{OS}$ ) is assumed to follow a Gaussian distribution [32] whose  $\mu_{V_{OS}}$  is 0 mV [33] and  $\sigma_{V_{OS}}$  is 30 mV [34]. Fig. 9 shows the T<sub>WL,S6\sigma</sub> when the BL is precharged to V<sub>DD</sub> or half-V<sub>DD</sub>. When V<sub>DD</sub> is smaller, a much longer T<sub>WL,S6\sigma</sub> is required because BL development is slow and the V<sub>BL</sub> distribution is wide due to the large effect of V<sub>th</sub> variation at a low V<sub>DD</sub>. When the BL is precharged to half-V<sub>DD</sub>, a narrower V<sub>BL</sub> distribution and shorter T<sub>UD</sub> can be achieved, leading to a shorter T<sub>WL,S6\sigma</sub> at all V<sub>DD</sub>s than that when BL is precharged to V<sub>DD</sub>.

Fig. 10 shows the read energy consumption in a column and  $T_{WL,S6\sigma}$  when the conventional BP, proposed CSBP, or BP with discharger [5] is used with the  $V_{WL,R6\sigma}$  and  $T_{UD,R6\sigma}$  required at each  $V_{DD}$  to ensure  $6\sigma$  read stability yield. The read energy consumption consists of the energy consumption in  $CV_{DD}$ , BP, and controlling signals. Most of the read energy is consumed to precharge BL after WL falls. Thus, the read



**FIGURE 10.** Read energy consumption in a column and  $T_{WL,S6\sigma}$  using (a) conventional BP, proposed CSBP, or BP with discharger [5] at various  $V_{DD}s$  while achieving  $6\sigma$  read stability yield.

energy consumption can be represented by

$$E_{read} = V_{DD} \int_{0}^{T_{period}} I_{VDD}(t) dt$$
(3)  
$$\approx V_{DD} \int_{V_{BL}(T_{WL,S6\sigma})}^{V_{DD}} C_{BL} dV$$
$$= C_{BL} V_{DD} \times (V_{DD} - V_{BL0} (T_{WL,S6\sigma}))$$
$$\approx C_{BL} V_{DD} \times \Delta V_{BL} (T_{WL,S6\sigma})$$
(4)

in the conventional SRAM

$$= C_{BL}V_{DD} \times (V_{DD} - V_{BL1}(T_{WL,S6\sigma}))$$
  

$$\approx C_{BL}V_{DD} \times 1/2V_{DD}$$
(5)

### in the proposed BCS RA

where  $E_{read}$  is the read energy consumption,  $T_{period}$  is the period for a single read operation and IVDD is the current flowing from  $V_{DD}$ .  $V_{BL}(T_{WL,S6\sigma})$  is the remained  $V_{BL}$  at  $T_{WL,S6\sigma}$  after WL is asserted, which should be precharged to V<sub>DD</sub> during the precharge phase. C<sub>BL</sub> is the capacitance of BL.  $V_{BL0}$  and  $V_{BL1}$  are the voltages of the BLs that are near the SN storing "0" and "1," respectively.  $\Delta V_{BL}$  is the voltage difference of the BL pair. In the conventional SRAM using conventional BP, the read energy consumption is approximately determined by the product of C<sub>BL</sub>, V<sub>DD</sub>, and " $V_{DD} - V_{BL0}(T_{WL,S6\sigma})$ ," as shown in Equation (4), because most of the energy is consumed in precharging  $V_{BL}$ to  $V_{DD}$  during the precharge phase. " $V_{DD}$ - $V_{BL0}(T_{WL,S6\sigma})$ " can be approximated to  $\Delta V_{BL}(T_{WL,S6\sigma})$ . As  $V_{DD}$  is scaled down, the V<sub>BL</sub> distribution becomes wider, and then a larger  $\Delta V_{BL}(T_{WL,S6\sigma})$  can be required for ensuring a  $6\sigma$ sensing yield, as shown in Equation (2). Thus, the larger  $\Delta V_{BL}(T_{WL,S6\sigma})$  impedes the improvement in the energy consumption by the reduced V<sub>DD</sub>.

On the other hand, when using the proposed CSBP,  $V_{BL1}$ is precharged to  $V_{DD}$  during the precharge phase. Since  $V_{BL1}(T_{WL,S6\sigma})$  is almost half- $V_{DD}$ , " $V_{DD}$ - $V_{BL}(T_{WL,S6\sigma})$ " becomes half- $V_{DD}$ . Thus, the read energy consumption can



FIGURE 11. Successive write operation using (a) conventional BP and (b) CSBP.

be half of the product of  $C_{BL}$  and square of  $V_{DD}$ , as shown in Equation (5). At a  $V_{DD}$  of 900 mV, the energy consumption in the proposed CSBP is comparable to that in the conventional BP. As  $V_{DD}$  is scaled down, the energy consumption in the CSBP is reduced proportionally to the square of  $V_{DD}$ , while that in the conventional BP is reduced proportionally to the product of  $V_{DD}$  and  $\Delta V_{BL}(T_{WL,S6\sigma})$  at each  $V_{DD}$ . Thus, the energy consumption in the proposed CSBP is much lower than that in the conventional BP at low  $V_{DD}$ . When using the BP with discharger [5], the BL discharged by the discharger as well as the SN should be precharged to  $V_{DD}$  after the read operation. Thus, the voltage amount to be precharged is very large although  $T_{WL,S6\sigma}$  is improved by the BL precharged to half- $V_{DD}$ . As a result, the read energy consumption is much larger at all  $V_{DD}$ s than that when the others are used.

#### **B. WRITE OPERATION**

The BCS RA can reduce the energy consumption in precharging BL during the write operation compared to the conventional BP. Fig. 11 shows the successive write operations when the conventional BP or CSBP is used. The write operation using the BP with discharger [5] is the same as the conventional BP because the discharger is disabled for the selected column during the write operation.

During write operation, most of energy is consumed in charging BL to  $V_{DD}$ . Thus, the write energy consumption can be represented by

$$E_{write} = V_{DD} \int_{0}^{T_{period}} I_{VDD}(t) dt$$
(6)

$$\approx V_{DD} \int_0^{V_{DD}} C_{BL} dV \tag{7}$$

in the conventional SRAM

$$\approx V_{DD} \int_{1/2VDD}^{V_{DD}} C_{BL} dV \tag{8}$$

in the proposed BCS RA

# **IEEE**Access



FIGURE 12. Schematic and transient waveforms for (a) PP-TVC [13] and (b) GSC [17] WAs.

where E<sub>write</sub> is the write energy consumption. When using the conventional BP, one BL of the BL pair precharged to V<sub>DD</sub> is driven to GND by the WD. Then, SN in the selected cell is written to "0" after WL assertion. After WL falls, the BL driven to GND is precharged to V<sub>DD</sub> by providing the current from V<sub>DD</sub> to the BL (blue-colored arrow in Fig. 11(a)). Thus, the write energy consumption can be exprssed by Equation (7). On the other hand, when using the CSBP, one BL is charged from half- $V_{DD}$  to  $V_{DD}$  and the other BL is discharged from half-V<sub>DD</sub> to GND by the WD after WL assertion. During the precharge phase after WL falls, the voltages of BLR(BLL) and BLL(BLR) are kept to V<sub>DD</sub> and GND by the cross-coupled inverter in the CSBP, respectively. Then, the voltage of the BL pair is equalized to half-V<sub>DD</sub> through the charge sharing operation that does not require the current from  $V_{DD}$  for charging  $V_{BL}$  driven to GND to half-V<sub>DD</sub> (red-colored arrow in Fig. 11(b)) during the equalize phase. Thus, the charge is provided from  $V_{DD}$  only when a BL is driven from half- $V_{DD}$  to  $V_{DD}$  by the WD (bluecolored arrow in Fig. 11(b)). As a result, the CSBP achieves a lower write energy consumption in a selected column than the conventional BP, as shown in Equation (8).

#### **IV. PROPOSED BCS WA**

Although the conventional TVC WA (pulsed PMOS TVC, PP-TVC WA [13]) improves writability, it significantly increases the energy consumption for charging the collapsed  $CV_{DD}$  with large capacitance, as shown in Fig. 12(a). In addi-



FIGURE 13. (a) Schematic and (b) transient waveforms for proposed BCS WA using CSWD and CSBP.

tion,  $CV_{DD}$  is collapsed without considering the condition of each cell. On the other hand, the GCS WA [17] mitigates the high energy consumption by collapsing  $CV_{DD}$  depending on the cell condition.  $CV_{DD}$  is self-collapsed by the current flowing through the PUL and PGL in the selected cell.  $CV_{DD}$  is collapsed more in cells with poor writability (high- $\sigma$  cells) than in 0- $\sigma$  cells, as shown in Fig. 12(b). However, the collapsed level of  $CV_{DD}$  in the GSC WA is limited by the  $T_{WL}$ . Thus, GSC WA improves  $V_{MIN}$  less than PP-TVC WA does. To improve writability and reduce energy consumption simultaneously, the BCS WA using a charge sharing write driver (CSWD) is proposed.

#### A. STRUCTURE AND OPERATION

The schematic of the proposed BCS WA using the CSWD is shown in Fig. 13(a). The CSWD consists of two PMOSs for connecting  $CV_{DD}$  to  $V_{DD}$  (PS) and charge sharing (PCS). The PS provides  $V_{DD}$  to column-multiplexed  $CV_{DD}$  and PCS is used to share the charge between  $CV_{DD}$  and the  $V_{DD}$  of



FIGURE 14. Energy consumption during write operation with no assist, PP-TVC WA, GSC WA, and BCS WA.

inverters ( $WV_{DD}$ ). The PS and PCS are controlled by charge sharing (CS) and /CS signals, respectively.

Fig. 13(b) shows the transient waveform of the BCS WA. Before WL rises, the BL pair is precharged to half- $V_{DD}$  by the CSBP. With WL assertion, CS rises and /CS falls to turn off PS and turn on PCS, respectively. Then, CV<sub>DD</sub> is floated and the charge on CV<sub>DD</sub> is shared with that on one BL of the BL pair, which is supposed to be driven from half-V<sub>DD</sub> to a high level (BLR in Fig. 13). The other BL (BLL) is discharged to GND. Since the BLR is precharged to half-V<sub>DD</sub> before CS rises, the CV<sub>DD</sub> decreases and the V<sub>BLR</sub> increases until CV<sub>DD</sub> and V<sub>BLR</sub> become the same. Thus, CV<sub>DD</sub> is collapsed firstly by the charge sharing path (① in Fig. 13(a)).  $CV_{DD}$ can be collapsed more by two self-collapse paths when the write operation is performed in the high- $\sigma$  cell. The first path (2)-a in Fig. 13(a)) is made up of PUL and PDL, which is the same path used in the GCS WA. The second self-collapse path (2-b in Fig. 13(a)) consisting of PGR and PDR also collapses the  $CV_{DD}$  since  $CV_{DD}$  and BLR are connected by the charge sharing path. When the write operation is performed successfully, PUL and PDR are turned off by SNR and SNL, respectively, and then the self-collapse paths are cut off. Finally, WL and CS fall, and the collapsed CV<sub>DD</sub> is charged to  $V_{DD}$  by PS and BLR is precharged to  $V_{DD}$  by the CSBP.

In the BCS WA, the collapsed  $CV_{DD} (\Delta V_{CV_{DD}})$  is represented by

$$\Delta V_{CV_{DD}} = \Delta V_{CS} + \Delta V_{SC} \tag{9}$$

$$\Delta V_{CS} = V_{DD} - \frac{V_{DD} \left( C_{CV_{DD}} + 1/2C_{BL} \right)}{C_{CV_{DD}} + C_{BL}} \quad (10)$$

$$\Delta V_{SC} = \frac{\int_0^{T_{WL}} I_{PGL}\left(t\right) + I_{PGR}\left(t\right) dt}{C_{CVDD} + C_{BL}} \tag{11}$$

where  $\Delta V_{CS}$  and  $\Delta V_{SC}$  are the  $CV_{DD}$ s collapsed by the charge sharing and self-collapse paths, respectively.  $C_{CV_{DD}}$  and  $C_{BL}$  are the capacitances of  $CV_{DD}$  and BL. I<sub>PGL</sub> and I<sub>PGR</sub> are the currents flowing through the PGL and PGR during the write operation, respectively. The  $\Delta V_{CS}$  is determined by the

57400

charge sharing between the floated  $CV_{DD}$  and BL precharged to half- $V_{DD}$  by the CSBP, as shown in Equation (10). When the CSWD is used with the conventional BP, the  $\Delta V_{CS}$  is 0 since the voltages of precharged BL and  $CV_{DD}$  are the same as  $V_{DD}$ . The  $\Delta V_{SC}$  is determined by the total current flowing through PGR and PGL during the write operation and the summation of  $C_{CV_{DD}}$  and  $C_{BL}$  as shown in Equation (11). The total current flowing through PGL and PGR during the write operation is larger at a higher- $\sigma$  cell. While the GCS WA has only a self-collapse path that consists of PGL or PGR, the BCS WA has a charge sharing path as well as two selfcollapse paths, which causes  $CV_{DD}$  to be collapsed more than with GCS WA. Thus, the BCS WA can achieve a higher writability than the GSC WA.

#### B. ENERGY EFFICIENCY OF PROPOSED BCS WA

The PP-TVC and GSC WAs consume energy in charging the collapsed  $CV_{DD}$  to  $V_{DD}$  and precharging BL from GND to  $V_{DD}$  after the write operation. On the other hand, in the BCS WA, the BL driven to GND is not precharged but kept to GND after the write operation. Then, the charge for charging BL from GND to half- $V_{DD}$  is provided from the other floated BL precharged to  $V_{DD}$  during the equalize phase. In addition, the charge for charging BL from half- $V_{DD}$  to a high level to write "1" in SN is provided by floated  $CV_{DD}$ , as indicated by the red-colored arrows in Fig. 13(b). Thus, no current from  $V_{DD}$  is required to charge BL from GND to a high level to write "1" in SN. Compared to the PP-TVC and GSC WAs, the energy consumed in charging the BL from GND to a high level is saved, leading to low energy consumption in the BCS WA.

Fig. 14 shows the write energy consumption in the selected column with no assist, PP-TVC, GCS, and BCS WAs at the same  $V_{DD}$ ,  $T_{WL}$ , and  $T_{UD}$  of 650 mV, 1 ns, and 0.55 ns, respectively. The energy consumption consists of the energy consumptions in the BP, WD, CV<sub>DD</sub>, and controlling signals. In the proposed BCS WA, the energy consumption for charging CV<sub>DD</sub> is included in the energy consumption in WD since the  $CV_{DD}$  is charged by the CSWD. The pulse width and level of the collapsed  $CV_{DD}$  in the PP-TVC WA are set to achieve the target writability yield of  $6\sigma$  and preserve the stability yield of the column half-selected cell above  $6\sigma$ . The energy consumption of the PP-TVC WA is much higher than that of GCS WA because the PP-TVC WA collapses the CV<sub>DD</sub> of all selected cells while GCS WA collapses the CV<sub>DD</sub> depending on cell condition. When the CSWD is used with the conventional BP (BCS WA with conventional BP in Fig. 14), the CV<sub>DD</sub> can be collapsed more by the two selfcollapse paths. Thus, the BCS WA with the conventional BP consumes slightly more energy to charge the more collapsed  $CV_{DD}$  to  $V_{DD}$  than the GSC WA having a self-collapse path. On the other hand, the BCS WA with the CSBP has a higher writability due to the charge sharing path to collapse CV<sub>DD</sub> more but consumes more energy in the WD because the WD should charge the more collapsed  $CV_{DD}$  to  $V_{DD}$  than the BCS WA with the conventional BP. However, the energy



**FIGURE 15.** The simulated shmoo plots of (a) conventional and (b) proposed WAs by analyzing the writability yield with various  $T_{WL}$ s and  $V_{DD}$ s cold temperature (–25°C) and SF corner.

consumption in the BP of the BCS WA with the CSBP is much lower than that of the other WAs because the energy consumed for charging the BL from GND to a high level is saved as mentioned previously. As a result, the total energy consumption of the BCS WA with the CSBP is lower than the that of other WAs and even the SRAM without WAs.

# **V. COMPARISONS**

In this section, V<sub>MIN</sub>, delay, and energy consumption in SRAMs using the BCS RA and WA are analyzed and compared with those of the other assists. The target  $6\sigma$  read stability yield is satisfied with V<sub>WL,R6\sigma</sub> and T<sub>UD,R6\sigma</sub> at various V<sub>DD</sub>s obtained in Fig. 8. Thus, V<sub>MIN</sub> is determined by the minimum voltage to ensure the target  $6\sigma$  writability yield.

To compare V<sub>MIN</sub>, the writability yields among PP-TVC, GSC, and BCS WAs are simulated with various T<sub>WL</sub>s and V<sub>DD</sub>s at a cold temperature ( $-25^{\circ}C$ ) and SF corner. Fig. 15 shows the simulated shmoo plots of the SRAM with no assists, PP-TVC, GSC and the proposed BCS WAs. Whereas the collapsed amount of CV<sub>DD</sub> in the GSC WA depends on the T<sub>WL</sub>, the PP-TVC WA can adjust the pulse width and level of CV<sub>DD</sub>. Thus, the PP-TVC WA improves V<sub>MIN</sub> (e.g., 175-mV improvement at T<sub>WL</sub> of 500 ps) more



FIGURE 16. The comparison of energy consumption during write operation for the four columns with 4:1 column multiplexed ratio.

than GSC WA (e.g., 100-mV improvement at  $T_{WL}$  of 500 ps). At a  $T_{WL}$  of 500 ps, the  $V_{MIN}$  of the BCS WA with the conventional BP is 25 mV smaller than that of the GSC WA due to the two self-collapse paths. Finally, the SRAM using the BCS WA with the CSBP improves  $V_{MIN}$  by 175 mV due to the charge sharing path that collapses  $CV_{DD}$  more. At a small  $T_{WL}$ , the proposed BCS WA has a higher  $V_{MIN}$  than the PP-TVC WA. When  $T_{WL}$  is longer, the proposed BCS WA can achieve a comparable improvement in  $V_{MIN}$  with the PP-TVC WA because the total current flowing through the self-collapse path becomes larger as  $T_{WL}$  becomes longer.

Fig. 16 shows the energy consumption with no assist, PP PP-TVC, GSC and the proposed BCS WAs during the write operation in the four columns with a 4:1 multiplexed ratio at  $V_{DD}s$  of 700 mV and 600 mV and  $T_{WL}s$  of 1 ns and 500 ps, respectively. At a  $V_{DD}$  of 700 mV, the PP-TVC, GCS, and BCS WAs with the conventional BP consume more energy than SRAM without WAs by 12%, 2%, and 3%, respectively. At a V<sub>DD</sub> of 600 mV, CV<sub>DD</sub> should be collapsed more to achieve the target writability yield. Thus, PP-TVC, GCS, and BCS WAs with the conventional BP consume 19%, 3%, and 4% more energy compared to SRAM without WAs, respectively. Due to the self-collapse paths, the GSC and BCS WA with the conventional BP consume much less energy than PP-TVC WA and slightly more energy than SRAM without WAs. While the GCS WA does not satisfy the target writability yield at a V<sub>DD</sub> of 600 mV, the BCS WA with the conventional BP satisfies the target writability yield at the cost of a slightly higher energy consumption due to the two self-collapse paths. When the BCS WA is used with the CSBP, energy consumption in both the selected column and unselected column is reduced due to the low energy consumption in precharging BL after the write and dummy read operations, respectively. Even compared to SRAM without WA, the proposed BCS WA with the CSBP reduces the energy consumption by 28% and 26% at V<sub>DD</sub>s of 700 mV and 600 mV, respectively.

Table 2 summarizes the comparison between conventional SRAM and the SRAM with the proposed BCS RA and WA. The area is estimated by the layout rule in [29], [30]. The area of the BCS RA is 115.65  $\mu m^2$ , whereas that of the conventional BP is 38.55  $\mu m^2$ . Considering the 256 rows × 128 columns SRAM (4595.48  $\mu m^2$ ), the area overhead of the

| RAs                                         | w/o R        | w/o RA        |        | Conv. SBL [5]      |      | BCS                  |                    |  |
|---------------------------------------------|--------------|---------------|--------|--------------------|------|----------------------|--------------------|--|
| Area<br>overhead                            | -            |               | 1.5%   |                    | 1.7% |                      |                    |  |
| V <sub>DD</sub>                             | 700 mV       |               |        |                    |      |                      |                    |  |
| $T_{WL,S6\sigma}$                           | 700 ps (1×)  |               | 4      | 450 ps (0.64×)     |      | 450 ps (0.64×)       |                    |  |
| Read energy                                 | 41.5 fJ (1×) |               | 7      | 0.2 fJ (1.69>      | <)   | 28.8 fJ (0.69×)      |                    |  |
| V <sub>DD</sub>                             | 600 mV       |               |        |                    |      |                      |                    |  |
| $T_{WL,S6\sigma}$                           | 1350 ps      | 350 ps (1×)   |        | 50 ps (0.63×)      |      | 850 ps (0.63×)       |                    |  |
| Read energy                                 | 33.6 fJ (    | (1×)          | 5      | 51.9 fJ (1.55×)    |      | 21.2 fJ (0.63×)      |                    |  |
| WAs                                         | w/o WA       | PP-TV<br>[13] | ′C     | GSC<br>[17]        | B    | CS w/<br>conv.<br>BP | BCS w/<br>CSBP     |  |
| Area<br>overhead                            | -            | 1.5%          | ,<br>D | 1.3%               |      | 1.8%                 | 3.5%               |  |
| V <sub>MIN</sub><br>improvement<br>@ 500 ps | -            | 175 m         | īV     | 100 mV             | 12   | 25 mV                | 175 mV             |  |
| Write energy                                | 47.4 fJ      | 53.3          | K)     | 48.3 fJ<br>(1.02×) | 4    | 8.7 fJ<br>03×)       | 33.9 fJ<br>(0.72x) |  |
| @ 700 mV                                    | $(1 \times)$ |               |        |                    |      |                      |                    |  |

TABLE 2. Summary of read and write assists comparison.

BCS RA is 1.7%. The area of the BCS WA is 80.96  $\mu m^2$ , which induces 1.8% overhead, leading to a total area overhead of 3.5% by the BCS RA and WA. Although the area overhead is slightly larger than those of the other assists,  $T_{WL,S6\sigma}$  and energy consumption for both read and write operations are significantly improved. Due to the BL precharged to half- $V_{DD}$ , the proposed BCS RA reduces  $T_{WL,S6\sigma}$  by 36% and 37% at V<sub>DD</sub>s of 700 mV and 600 mV, respectively, compared to the SRAM using conventional BP. At the same time, the read energy consumption is also reduced by 31% and 37% at V<sub>DD</sub>s of 700 mV and 600 mV, respectively, by reducing the energy consumed in precharging BLs. Even though the conventional SBL RA using the BP with discharger in [5] can achieve the same  $T_{WL,S6\sigma}$  as the proposed BCS RA, the read energy consumption is increased by 69% and 55% at V<sub>DD</sub>s of 700mV and 600mV, respectively. In the BCS WA, the V<sub>MIN</sub> improvement is comparable to that in the PP-TVC WA. The PP-TVC and GSC WAs increase the write energy consumption by more than 12% and 3%, respectively, compared to the SRAM without WAs. On the other hand, the proposed BCS WA reduces the write energy consumption by more than 26% compared to the SRAM without WAs by reducing the energy consumption in charging the collapsed  $CV_{DD}$  and BL to  $V_{DD}$ .

## **VI. CONCLUSION**

The BCS RA and WA were proposed to perform energyefficient read and write operations while improving  $V_{MIN}$ . The BCS RA improved read stability and  $T_{WL,S6\sigma}$  by precharging BL to half- $V_{DD}$  while reducing the read energy consumption by reducing the energy consumed in precharging BL through the charge sharing operation. The proposed BCS WA improved the writability by collapsing  $CV_{DD}$  using the charge sharing and two self-collapse paths. In the BCS WA, the energy consumption for charging BL to a high level during the write operation was saved due to the charge sharing operation by the CSBP and CSWD. In addition, the average energy consumption to charge  $CV_{DD}$  in the BCS WA was reduced because the  $CV_{DD}$  was collapsed less at a smaller- $\sigma$  cell by the two self-collapse paths. As a result, the SRAM using the BCS RA and WA consumed less energy than the SRAMs using the state-of-the-art assists while achieving a  $V_{MIN}$  comparable to those of SRAMs using the state-ofthe-art assists. Even compared to SRAMs without assists, the SRAM with the BCS RA and WA reduced the energy consumption by more than 31% and 26% during read and write operations, respectively.

#### ACKNOWLEDGMENT

The EDA Tool was supported by the IC Design Education Center.

#### REFERENCES

- [1] C. Auth, C. Allen, A. Blattner, D. Bergstrom, M. Brazier, M. Bost, M. Buehler, V. Chikarmane, T. Ghani, T. Glassman, and R. Grover, "A 22 nm high performance and low-power CMOS technology featuring fullydepleted tri-gate transistors, self-aligned contacts and high density MIM capacitors," in *Proc. IEEE Symp. VLSI Technol.*, Jun. 2012, pp. 131–132.
- [2] C.-H. Lin, W. Haensch, P. Oldiges, H. Wang, R. Williams, J. Chang, M. Guillorn, A. Bryant, T. Yamashita, T. Standaert, and H. Bu, "Modeling of width quantization induced variations in logic FinFETs for 22 nm and beyond," in *Proc. IEEE Symp. VLSI Technol.*, Jun. 2011, pp. 16–17.
- [3] W.-K. Yeh, W. Zhang, Y.-L. Yang, A.-N. Dai, K. Wu, T.-H. Chou, C.-L. Lin, K.-J. Gan, C.-H. Shih, and P.-Y. Chen, "The observation of width quantization impact on device performance and reliability for highκ/Metal tri-gate FinFET," *IEEE Trans. Device Mater. Rel.*, vol. 16, no. 4, pp. 610–616, Dec. 2016.
- [4] E. Karl, Y. Wang, Y.-G. Ng, Z. Guo, F. Hamzaoglu, U. Bhattacharya, K. Zhang, K. Mistry, and M. Bohr, "A 4.6GHz 162Mb SRAM design in 22nm tri-gate CMOS technology with integrated active V<sub>MIN</sub>-enhancing assist circuitry," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2012, pp. 230–232.
- [5] T. Song, W. Rim, J. Jung, G. Yang, J. Park, S. Park, Y. Kim, K. H. Baek, S. Baek, S. K. Oh, and J. Jung, "A 14 nm FinFET 128 Mb SRAM with V<sub>MIN</sub> enhancement techniques for low-power applications," *IEEE J. Solid-State Circuits*, vol. 50, no. 1, pp. 158–169, Jan. 2015.
- [6] H. Pilo, I. Arsovski, K. Batson, G. Braceras, J. Gabric, R. Houle, S. Lamphier, C. Radens, and A. Seferagic, "A 64 mb SRAM in 32 nm high-κ metal-gate SOI technology with 0.7 v operation enabled by stability, write-ability and read-ability enhancements," *IEEE J. Solid-State Circuits*, vol. 47, no. 1, pp. 97–106, Jan. 2012.
- [7] J. Chang, Y.-H. Chen, H. Cheng, W.-M. Chan, H.-J. Liao, Q. Li, S. Chang, S. Natarajan, R. Lee, P.-W. Wang, S.-S. Lin, C.-C. Wu, K.-L. Cheng, M. Cao, and G. H. Chang, "A 20nm 112Mb SRAM in high-κ metalgate with assist circuitry for low-leakage and low-V<sub>MIN</sub> applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 316–317.
- [8] E. Karl, Z. Guo, J. W. Conary, J. L. Miller, Y.-G. Ng, S. Nalam, D. Kim, J. Keane, U. Bhattacharya, and K. Zhang, "17.1 A 0.6 V 1.5GHz 84Mb SRAM design in 14nm FinFET CMOS technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 310–312.
- [9] T. Song, W. Rim, S. Park, Y. Kim, J. Jung, G. Yang, S. Baek, J. Choi, B. Kwon, Y. Lee, S. Kim, G. Kim, H.-S. Won, J.-H. Ku, S. S. Paak, E. Jung, S. S. Park, and K. Kim, "17.1 A 10nm FinFET 128Mb SRAM with assist adjustment system for power, performance, and area optimization," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Jan. 2016, pp. 306–307.
- [10] T. Song et al., "A 7nm FinFET SRAM macro using EUV lithography for peripheral repair analysis," *IEEE J. Solid-State Circuits*, vol. 52, no. 1, pp. 240–249, Jan. 2017.

- [11] I. Lee, H. Jeong, S. Baeck, S. Gupta, C. Park, D. Seo, J. Choi, J. Kim, H. Kim, J. Kang, S. Jang, D. Moon, S. Han, T. Kim, J. Lim, Y. Park, H. Hwang, J. Kang, J. Choi, and T. Song, "24.3 A voltage and temperature tracking SRAM assist supporting 740 mV dual-rail offset for low-power and high-performance applications in 7nm EUV FinFET technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 392–394.
- [12] T. Song, J. Jung, W. Rim, H. Kim, Y. Kim, C. Park, J. Do, S. Park, S. Cho, H. Jung, B. Kwon, H.-S. Choi, J. Choi, and J. S. Yoon, "A 7nm FinFET SRAM using EUV lithography with dual write-driver-assist circuitry for low-voltage applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 198–200.
- [13] Z. Guo, D. Kim, S. Nalam, J. Wiedemer, X. Wang, and E. Karl, "A 23.6Mb/mm<sup>2</sup> SRAM in 10nm FinFET technology with pulsed PMOS TVC and stepped-WL for low-voltage applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 196–198.
- [14] J. Chang, Y.-H. Chen, G. Chan, H. Cheng, P.-S. Wang, Y. Lin, H. Fujiwara, R. Lee, H.-J. Liao, P.-W. Wang, G. Yeap, and Q. Li, "15.1 A 5nm 135Mb SRAM in EUV and high-mobility-channel FinFET technology with metal coupling and charge-sharing write-assist circuitry schemes for high-density and low-V<sub>MIN</sub> applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 238–240.
- [15] Y. Chen, W. M. Chan, W. C. Wu, H. J. Liao, K. H. Pan, J. J. Liaw, T. H. Chung, Q. Li, C. Y. Lin, M. C. Chiang, and S. Y. Wu, "A 16 nm 128 Mb SRAM in high-κ metal-gate FinFET technology with write-assist circuitry for low-V<sub>MIN</sub> applications," *IEEE J. Solid-State Circuits*, vol. 50, no. 1, pp. 170–177, Jan. 2015.
- [16] E. Karl, Z. Guo, Y.-G. Ng, J. Keane, U. Bhattacharya, and K. Zhang, "The impact of assist-circuit design for 22nm SRAM and beyond," in *IEDM Tech. Dig.*, Dec. 2012, pp. 25.1.1–24.1.4.
- [17] Z. Guo, J. Wiedemer, Y. Kim, P. S. Ramamoorthy, P. B. Sathyaprasad, S. Shridharan, D. Kim, and E. Karl, "A 10nm SRAM design using gatemodulated self-collapse write assist enabling 175 mV V<sub>MIN</sub> reduction with negligible power overhead," in *Proc. IEEE Symp. VLSI Circuits*, Jun. 2020, pp. 1–2.
- [18] K. Kim, H. Jeong, J. Park, and S.-O. Jung, "Transient cell supply voltage collapse write assist using charge redistribution," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 63, no. 10, pp. 964–968, Oct. 2016.
- [19] M.-F. Chang, C.-F. Chen, T.-H. Chang, C.-C. Shuai, Y.-Y. Wang, and H. Yamauchi, "17.3 A 28nm 256kb 6T-SRAM with 280 mV improvement in V<sub>MIN</sub> using a dual-split-control assist scheme," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 314–315.
- [20] H. Jeong, S. H. Oh, T. W. Oh, H. Kim, C. N. Park, W. Rim, T. Song, and S. O. Jung, "Bitline charge-recycling SRAM write assist circuitry for V<sub>MIN</sub> improvement and energy saving," *IEEE J. Solid-State Circuits*, vol. 54, no. 3, pp. 896–906, Mar. 2019.
- [21] K. Takeda, T. Saito, S. Asayama, Y. Aimoto, H. Kobatake, S. Ito, T. Takahashi, M. Nomura, K. Takeuchi, and Y. Hayashi, "Multi-step word-line control technology in hierarchical cell architecture for scaleddown high-density SRAMs," *IEEE J. Solid-State Circuits*, vol. 46, no. 4, pp. 806–816, Apr. 2011.
- [22] M. Bhargava, Y. Chong, V. Schuppe, B. Maiti, M. Kinkade, H.-Y. Chen, A. W. Chen, S. Mangal, J. Wiatrowski, G. Gouya, A. Baradia, S. Thyagarajan, and G. Yeung, "Low V<sub>MIN</sub> 20nm embedded SRAM with multi-voltage wordline control based read and write assist techniques," in *Proc. Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2014, pp. 1–2.
- [23] K. Katayama, S. Hagiwara, H. Tsutsui, H. Ochi, and T. Sato, "Sequential importance sampling for low-probability and high-dimensional SRAM yield analysis," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design* (*ICCAD*), Nov. 2010, pp. 703–708.
- [24] M. J. M. Pelgrom, A. C. J. Duinmaijer, and A. P. G. Welbers, "Matching properties of MOS transistors," *IEEE J. Solid-State Circuits*, vol. 24, no. 5, pp. 1433–1439, Oct. 1989.
- [25] H. Kawasaki *et al.*, "Challenges and solutions of FinFET integration in an SRAM cell and a logic circuit for 22 nm node and beyond," in *IEDM Tech. Dig.*, Dec. 2009, pp. 1–4.
- [26] M. Guillorn et al., "FinFET performance advantage at 22nm: An AC perspective," in Proc. Symp. VLSI Technol., Jun. 2008, pp. 12–13.
- [27] T. Chiarella, L. Witters, A. Mercha, C. Kerner, M. Rakowski, C. Ortolland, L.-Å. Ragnarsson, B. Parvais, A. De Keersgieter, S. Kubicek, A. Redolfi, C. Vrancken, S. Brus, A. Lauwers, P. Absil, S. Biesemans, and T. Hoffmann, "Benchmarking SOI and bulk FinFET alternatives for PLA-NAR CMOS scaling succession," *Solid-State Electron.*, vol. 54, no. 9, pp. 855–860, Sep. 2010.

- [28] D. Ingerly et al., "Low-K interconnect stack with metal-insulator-metal capacitors for 22nm high volume manufacturing," in Proc. IEEE Int. Interconnect Technol. Conf., Jun. 2012, pp. 1–3.
- [29] H. Kawasaki *et al.*, "Demonstration of highly scaled FinFET SRAM cells with high-κ/metal gate and investigation of characteristic variability for the 32 nm node and beyond," in *IEDM Tech. Dig.*, Dec. 2008, pp. 237–240.
- [30] R. W. Mann and B. H. Calhoun, "New category of ultra-thin notchless 6T SRAM cell layout topologies for sub-22nm," in *Proc. 12th Int. Symp. Qual. Electron. Design*, Mar. 2011, pp. 425–430.
- [31] H. Jeong, T. Kim, Y. Yang, T. Song, G. Kim, H.-S. Won, and S.-O. Jung, "Offset-compensated cross-coupled PFET bit-line conditioning and selective negative bit-line write assist for high-density low-power SRAM," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 4, pp. 1062–1070, Apr. 2015.
- [32] H. Nho, S. S. Yoon, S. Wong, and S. O. Jung, "Statistical simulation methodology for sub-100-nm memory design," *Electron. Lett.*, vol. 43, no. 16, pp. 869–870, Aug. 2007.
- [33] B. Wicht, T. Nirschl, and D. Schmitt-Landsiedel, "Yield and speed optimization of a latch-type voltage sense amplifier," *IEEE J. Solid-State Circuits*, vol. 39, no. 7, pp. 1148–1158, Jul. 2004.
- [34] S.-H. Woo, H. Kang, K. Park, and S.-O. Jung, "Offset voltage estimation model for latch-type sense amplifiers," *IET Circuits, Devices Syst.*, vol. 4, no. 6, pp. 503–513, Nov. 2010.



**KIRYONG KIM** was born in Yeongdong, Republic of Korea, in 1989. He received the B.S. degree in electrical and electronic engineering from Yonsei University, Seoul, Republic of Korea, in 2014, where he is currently pursuing the Ph.D. degree in electrical and electronic engineering. His research interests include low-power and highspeed SRAM and compute-in-memory.



**TAE WOO OH** (Student Member, IEEE) was born in Seoul, Republic of Korea, in 1992. He received the B.S. degree in electrical and electronic engineering from Yonsei University, Seoul, in 2015, where he is currently pursuing the Ph.D. degree in electrical and electronic engineering. His research interests include low-power and highspeed SRAM and next-generation semiconductor devices.



**SEONG-OOK JUNG** (Senior Member, IEEE) received the B.S. and M.S. degrees in electrical and electronic engineering from Yonsei University, Seoul, South Korea, in 1987 and 1989, respectively, and the Ph.D. degree in electrical engineering from the University of Illinois at Urbana–Champaign, Urbana, IL, USA, in 2002.

From 1989 to 1998, he was affiliated with Samsung Electronics Company Ltd., Hwasung, South Korea, where he was involved in specialty memo-

ries, such as video, graphic, window RAM, and merged memory logic. From 2001 to 2003, he was affiliated with T-RAM Inc., Mountain View, CA, USA, where he was the Leader of the Thyristor-Based Memory Circuit Design Team. From 2003 to 2006, he was affiliated with Qualcomm Inc., San Diego, CA, USA, where he was involved in high-performance low-power embedded memories, process variation-tolerant circuit design, and low-power circuit techniques. Since 2006, he has been a Professor with Yonsei University. His current research interests include process variation-tolerant, low-power, mixed-mode circuit design, and next-generation memory and technology.

...