

Received April 6, 2021, accepted April 21, 2021, date of publication April 26, 2021, date of current version May 4, 2021. *Digital Object Identifier* 10.1109/ACCESS.2021.3075460

# Differential Read/Write 7T SRAM With Bit-Interleaved Structure for Near-Threshold Operation

# JI SANG OH<sup>®1</sup>, (Member, IEEE), JUHYUN PARK<sup>®1,2</sup>, (Member, IEEE), KEONHEE CHO<sup>®1</sup>, (Member, IEEE), TAE WOO OH<sup>®1</sup>, (Member, IEEE), AND SEONG-OOK UNG<sup>®1</sup> (Senior Member, IEEE)

AND SEONG-OOK JUNG<sup>D1</sup>, (Senior Member, IEEE) <sup>1</sup>School of Electrical and Electronics Engineering, Yonsei University, Seoul 03722, South Korea <sup>2</sup>DRAM Development Division, SK Hynix Inc., Icheon-si 467866, South Korea

Corresponding author: Seong-Ook Jung (sjung@yonsei.ac.kr)

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) under Grant 2021R1A2C2008297.

**ABSTRACT** Near-threshold voltage ( $V_{th}$ ) operation is an effective method for lowering energy consumption. However, it increases the impact of  $V_{th}$  variation significantly, which makes it difficult for previously proposed static random access memory (SRAM) bitcells to achieve high read stability and write ability yields. To achieve these in the near- $V_{th}$  region, a differential 7T SRAM bitcell is proposed in which an additional row-based control signal and an nMOS transistor between the pull-up and pull-down transistors is adopted on one side of the cross-coupled inverter. In addition, the proposed SRAM bitcell can use a bit-interleaved structure without the half-select issue. Compared to differential 10T and 12T SRAM, the proposed differential 7T SRAM achieves 5% and 6% higher SRAM operating frequency and 70% and 23% lower operation energy consumption with a 33% and 49% smaller bitcell area, respectively.

**INDEX TERMS** 7T bitcell, half-select issue, low energy consumption, near-threshold voltage, static random access memory (SRAM).

#### I. INTRODUCTION

Recently, the demand for a low-energy system on chip (SoC) for application in the Internet of Things (IoT), biomedical implants, and energy harvesting devices has increased significantly. The most efficient method to reduce energy consumption is to decrease the supply voltage ( $V_{DD}$ ). This is because it exerts quadratic and exponential impacts on the dynamic and standby energy consumptions, respectively. However, decreasing  $V_{DD}$  to the sub-threshold voltage (sub- $V_{th}$ ) region degrades circuit performance by at least three orders of magnitude from that in the super- $V_{th}$  region. In contrast, in the near- $V_{th}$  region, the performance degradation is by one or two orders of magnitude from that in the super- $V_{th}$  region. Thus, energy consumption and circuit performance can be appropriately balanced in the near- $V_{th}$ region [1], [2].

However, there are several problems with near- $V_{th}$  operation. First, the critical charge for causing soft errors decreases, thereby making the SRAM bitcell vulnerable to them [3]. In the non-bit-interleaved structure shown in Fig. 1(a), if multi-bit errors occur in a word, many parity bits and a complex error correct code (ECC) circuit are required to correct the errors, which causes a large area penalty and high energy consumption. Therefore, to prevent multi-bit errors in a word, the bit-interleaved structure shown in Fig. 1(b) [4] needs to be used.

The other problem is that the circuit operation yield degrades because of the large influence of  $V_{th}$  variation. Among the various modules in an SoC, the yield of the static random access memory (SRAM) is significantly degraded by  $V_{th}$  variation because the SRAM bitcell consists of small transistors for achieving high-density integration. In addition, because the conventional six-transistor (6T) SRAM bitcell has a trade-off between the read stability and write ability yields, the target yield cannot be achieved in the read and write operations simultaneously in the near- $V_{th}$  region [5].

Various SRAM bitcell structures have been proposed to eliminate the trade-off between the read stability and write ability yields. However, bitcells have some drawbacks, such as the half-select issue, large area overhead, high energy consumption, and performance degradation.

The associate editor coordinating the review of this manuscript and approving it for publication was Khursheed Aurangzeb.



FIGURE 1. The (a) non-bit-interleaved and (b) 2:1 bit-interleaved SRAM structures.

In this paper, the differential 7T SRAM bitcell is proposed to address these issues by incorporating a properly controlled nMOS transistor between the pull-up and pull-down transistors on the same side in a conventional 6T SRAM bitcell. This paper is organized as follows. In Section II, the previously proposed SRAM bitcells are overviewed, while Section III covers the proposed differential 7T SRAM bitcell's structure and operation. Section IV provides simulation results, and conclusions are presented in Section V.

#### **II. PREVIOUSLY PROPOSED SRAM BITCELLS**

Various SRAM bitcells have been proposed for operation in the near- $V_{th}$  region. They can be classified as single-ended and differential SRAM bitcells depending on how the stored data is sensed (Fig. 2: single-ended 7T-1 [6], 7T-2 [7], and 8T [8] SRAM bitcells and Fig. 3: differential 10T [10], 12T [11], and P-P-N 10T [12] SRAM bitcells).

#### A. SINGLE-ENDED SRAM BITCELLS

The 7T-1 SRAM bitcell has improved read stability achieved by cutting off the positive feedback in the cross-coupled inverters, while the single-ended 7T-2 and 8T SRAM bitcells have improved read stability due to decoupling of the data node from the additional read bitline (RBL).

However, during the write operation, the row half-selected bitcells (RHSCs) in the 7T-1, 7T-2, and 8T SRAM bitcells suffer from read disturbance, which can be resolved using the write-back scheme [9]. However, this requires a sensing circuit and write driver at each column, which incurs a substantially larger area penalty and write energy consumption overhead.

Single-ended SRAM bitcells have the advantage of saving the read energy because the RBL is discharged with a probability of 50% depending on the stored data. However, they have a long read delay because the inverter sense amplifier needs a large voltage swing for data sensing. Therefore, the advantage of saving the BL discharge energy can be canceled out by the increase in static energy due to the long read delay.

# **B. DIFFERENTIAL SRAM BITCELLS**

Differential 10T, 12T, and P-P-N 10T SRAM bitcells without the write-back scheme have been proposed to overcome the



FIGURE 2. Single-ended (a) 8T, (b) 7T-1, and (c) 7T-2 SRAM bitcells.



FIGURE 3. (a) Differential 10T, (b) 12T, and (c) P-P-N 10T SRAM bitcells.

half-select issue. A pass gate controlled by the column-based WWL was added to the differential 10T and 12T SRAM bitcells, while pseudo-data nodes pQ and pQb were used in the P-P-N 10T SRAM bitcell to prevent data loss in the RHSCs. These bitcells have a shorter read delay than single-ended SRAM bitcells because the differential voltage latch sense amplifier (VLSA) needs a small voltage swing for data sensing. However, the increased number of transistors







**FIGURE 4.** (a) The proposed differential 7T SRAM bitcell structure: waveforms during the (b) read "0" and (c) read "1" operations.

incurs a large bitcell area. Moreover, these bitcells have a high read energy consumption due to high BL capacitance caused by the increased bitcell height. Therefore, a new SRAM bitcell is required to resolve the half-select issue with a fast delay, a small area, and low energy consumption.

# **III. PROPOSED DIFFERENTIAL 7T SRAM BITCELL**

Fig. 4(a) shows the structure of the proposed differential 7T SRAM bitcell. Compared to the conventional 6T SRAM bitcell, it has an additional transistor (PMR) driven by a word-line right bar (WLRB); word-line (WL) and WLRB are row-based signals. Prior to the read and write operations, WL, WLRB, and BL/BLB are set to  $V_{SS}$ ,  $V_{DD}$ , and  $V_{DD}$ , respectively.

# A. READ OPERATION

Fig. 4(b) and 4(c) show the read "0" and read "1" operations and their waveforms, respectively. Before the read operation starts in the proposed differential 7T SRAM bitcell, BL and



FIGURE 5. The selected bitcell and RHSC during the write "1" operation.



**FIGURE 6.** Waveforms in the RHSCs of the proposed differential 7T SRAM bitcell with Q stored as "0" during a write operation according to  $T_{W_P1}$ . Data in the RHSCs (a) is flipped with a short  $T_{W_P1}$  (b) but stably maintained with a sufficiently long  $T_{W_P1}$ .



**FIGURE 7.** The write "1" operation in the proposed differential 7T SRAM bitcell according to  $T_{W_P2}$ . The write operation (a) fails when  $T_{W_P2}$  is too short (b) but succeeds when  $T_{W_P2}$  is sufficiently long.

BLB are precharged to  $V_{DD}$ , as in the conventional 6T SRAM bitcell. Afterward, the read operation begins by increasing the WL and decreasing the WLRB. Accordingly, BL or BLB is discharged when the Q node stores "0" (read "0") or "1" (read "1"), respectively. In the conventional 6T SRAM bitcell, the charge injected from the BL or BLB is likely to cause data flip during the read operation, which degrades the read stability. Meanwhile, in the proposed differential 7T SRAM bitcell, this data loss risk is mitigated by turning off the PMR (WLRB =  $V_{SS}$ ) during the read operation, which improves the read stability for both read "0" and "1" operations. For a read "0" operation, the read current flows through the PGL and PDL. This causes an increase in the data Q node voltage



FIGURE 8. WL/WLRB (a) generator and (b) waveforms of the proposed differential 7T SRAM including a replica BL.

(disturbance from BL). However, this cannot flip the Qb node because its pull-down path is disconnected by turning off the PMR. Therefore, when the read operation is finished, the increased Q node voltage can be completely recovered to "0." In contrast, in the case of a read "1" operation, the read current flows through PGR and PDR, which increases the M node voltage (disturbance from BLB). This disturbance cannot affect the Qb node because it is isolated from the M node by turning off the PMR. Although the Qb node floats, its voltage does not become high enough to flip the data Q node. This is because it remains at a lower voltage than  $V_{SS}$  owing to the coupling caused by the signal transition of WLRB from  $V_{DD}$  to  $V_{SS}$ . Therefore, the proposed differential 7T SRAM bitcell can endure read disturbance.

#### **B. WRITE OPERATION**

The write operation is classified into write "0" ( $V_{BL} = V_{SS}$ and  $V_{BLB} = V_{DD}$  and write "1" ( $V_{BL} = V_{DD}$  and  $V_{BLB} =$  $V_{SS}$ ) operations. A write operation starts when WL increases and WLRB decreases. Although the proposed 7T SRAM bitcell has a differential BL pair, it performs a single-ended write operation because the Qb node is not connected to the BLB by the turned-off PMR transistor. The single-ended write operation exhibits a lower write ability yield than a differential write operation, particularly when the Q node needs to be altered from "0" to "1" during the write "1" operation. This is because the PGL nMOS pass gate cannot deliver a full "1" to the Q node. To prevent a single-ended write operation, the WLRB is reset to  $V_{DD}$  after a delay of  $T_{W_P1}$  from enabling WL. In this regard,  $T_{W_P1}$  should be carefully determined because the WLRB signal affects the RHSCs, as shown in Figs. 5 and 6. If  $T_W P_2$  is too short, data in the RHSCs can be flipped because a high BL or BLB voltage level in the unselected columns causes a large disturbance, as shown in Fig. 6(a). Thus, T<sub>W\_P1</sub> should be sufficiently long for the BL or BLB of the unselected column

to be discharged before the WLRB is reset. This causes a smaller disturbance, as shown in Fig. 6(b).

Another timing constraint that needs to be considered is the delay of  $T_{W\_P2}$ , which is the duration when WL and WLRB are simultaneously high after the WLRB is reset. If  $T_{W\_P2}$  is too short, the write operation fails because the time to flip the data is insufficient, as shown in Fig. 7(a). Thus, the WL needs to be low after a sufficiently long  $T_{W\_P2}$  after the WLRB is reset, as shown in Fig. 7(b).

As  $T_{W\_P1}$  ( $T_{W\_P2}$ ) increases, the read stability yield of the RHSCs (the write ability yield of the selected bitcells) is improved. However, the write delay increases. Thus, this influence must be considered when the SRAM operating frequency is determined.

 $V_{th}$  in the saturation and linear modes were measured as  $V_{GS}$  when  $I_{DS}$  per effective width was  $10^{-5}$  A/ $\mu$ m, with  $|V_{DS}| = V_{DD}$  and  $|V_{DS}| = 0.05$  V [17]. The sub-threshold swing was measured as  $V_{GS\_10times} - 0.05$  V.  $V_{GS\_10times}$  is  $V_{GS}$  when  $I_{DS}$  is 10 times that at  $V_{GS} = 0.05$  V.

#### C. WL/WLRB GENERATOR

Fig. 8(a) and 8(b) show the WL/WLRB generator and its waveforms, respectively. Commonly, a replica BL and delay lines generate the sense amplifier enable (SAE) and disable the WL-enable-signal ( $WL_{EN}$ ). To generate  $T_{W\_P1}$  and  $T_{W\_P2}$  for the proposed bitcell, an even number ( $N_{TWP1}$ ) and odd number ( $N_{TWP2}$ ) of delay line stages are used, respectively. The several delay line stages of  $N_{TWP1}$  are shared with the path for generating an SAE.

The replica BL node ( $BL_{Rep}$ ) is precharged to  $V_{DD}$  prior to enabling the WL. After the WL is enabled, the  $BL_{Rep}$ is discharged. Subsequently,  $OUT_{NTWP1}$  is decreased during the write operation (WT = 1), which disables the WLRB. In contrast, during a read operation (WT = 0),  $OUT_{NTWP1}$ is always  $V_{DD}$  regardless of  $BL_{Rep}$ . Thus, WLRB is not reset before WL is disabled. In this manner, unnecessary toggling



FIGURE 9. Layout for (a) a single bitcell and (b) the 2  $\times$  2 array in the proposed differential 7T SRAM based on 22-nm FinFET technology.

of the internal nodes is circumvented during read operations to reduce the power consumption.

# D. BITCELL LAYOUT

Fig. 9(a) and 9(b) show the proposed differential 7T SRAM bitcell and 2  $\times$  2 array layouts based on 22-nm FinFET technology, respectively. The local interconnects (L1 and L2) in the middle of the layer are applied to reduce the number of metal layers [13]. A dummy poly gate is incorporated for the regular pitch of the poly gate. The BL/BLB,  $V_{DD}$ , and  $V_{SS}$  are routed using metal 2, the WL and WLRB are routed using metal 3, and the inner routing of the bitcell is achieved using metal 1.

# **IV. SIMULATION RESULTS AND COMPARISON**

The proposed differential 7T SRAM bitcell was verified via an HSPICE Monte Carlo simulation using a 22-nm BSIM-CMG FinFET model [14]. The characteristics of this model were fitted to those of a commercial low-power device model based on the measured data for 22-nm FinFET silicon [15]. Moreover, the parasitic capacitances were fitted to the TCAD simulation results for the 22-nm FinFET [16]. The model characteristics are listed in Table 1. It was assumed that the  $V_{th}$  variation in each transistor follows a Gaussian distribution [17] with a standard deviation ( $\sigma_{Vth}$ ) expressed as

$$\sigma_{Vth} = \frac{A_{Vt}}{\sqrt{Length \times Effective \ Width}} \tag{1}$$

where an  $A_{Vt}$  of 1.8 mV  $\cdot \mu$ m was used for the FinFET with a lightly doped channel [18] to consider the random dopant fluctuation, work function variation, fin width variation, and

#### **TABLE 1.** Model characteristics for $V_{DD} = 0.8$ V.

| Parameter                  | nMOS          | pMOS          |
|----------------------------|---------------|---------------|
| Gate length                | 34 nm         | 34 nm         |
| Equivalent oxide thickness | 0.9 nm        | 0.9 nm        |
| Fin thickness              | 8 nm          | 8 nm          |
| Fin height                 | 34 nm         | 34 nm         |
| On-current                 | 880 μA/μm     | 780 μA/μm     |
| Off-current                | 1 nA/µm       | 1 nA/µm       |
| Sub-threshold swing        | 69 mV/dec     | 72 mV/dec     |
| DIBL                       | 46 mV/V       | 50 mV/V       |
| $V_{th}$                   | 230 mV (Sat.) | 245 mV (Sat.) |
|                            | 264 mV (Lin.) | 283 mV (Lin.) |
| $C_{gg}$                   | 1.47 fF/µm    |               |
| $C_d$                      | 0.33 fF/µm    |               |

 $V_{th}$  in the saturation and linear modes were measured as  $V_{GS}$  when  $I_{DS}$  per effective width was 10<sup>-5</sup> A/µm, with  $|V_{DS}| = V_{DD}$  and  $|V_{DS}| = 0.05$  V [17]. The sub-threshold swing was measured as  $V_{GS\_10times} - 0.05$  V.  $V_{GS\_10times}$  is  $V_{GS}$  when  $I_{DS}$  is 10 times that at  $V_{GS} = 0.05$  V.





gate length variation. The effective width is the sum of the fin thickness and twice the fin height.

To improve the accuracy of the simulation result, wire parasitic resistance (*R*) and capacitance (*C*) were modeled by using the  $\pi$ -RC wire model based on *R* per length of 21 ohm/ $\mu$ m and *C* per length of 0.16 fF/ $\mu$ m, as reported in [19]. When the performance was compared between the proposed differential 7T SRAM bitcell and previously proposed SRAM bitcells, the one located farthest away from the peripheral circuit was considered as the worst performer.

The read static noise margin [20] and WL write trip voltage [21] are commonly used metrics to measure the read stability and write ability yields, respectively. However, these static metrics are unsuitable for the proposed differential 7T SRAM bitcell because they cannot consider the floating node and WLRB pulse width [22], [23]. In this study, counting read and write failure samples in the Monte Carlo simulation results were used to consider the floating node and WLRB pulse width, and importance sampling based on [24] was used to reduce the number of samples in the transient Monte Carlo simulation. In the comparison, an SRAM bitcell array that has 256 rows and 128 columns in a 4:1 bit-interleaved structure



FIGURE 11. Comparison of the read delay of the bitcells.

was assumed and the operating voltage is set to  $V_{DD} = 0.5$  V in the near- $V_{th}$  region.

#### A. BITCELL AREA

Fig. 10 shows a comparison of the bitcell areas of the proposed differential 7T SRAM bitcell to that of the previously proposed SRAM bitcells. All the bitcell areas are estimated using the 22-nm FinFET technology. To minimize the layout area, the number of fins in all each transistor was set as one.

The differential 10T, 12T, and P-P-N 10T SRAM bitcells have significantly large area overheads because of the large number of transistors. In contrast, the proposed differential 7T SRAM bitcell has a reasonably small area overhead compared to the conventional 6T SRAM bitcell. Although the previously proposed single-ended 8T, 7T-1, and 7-2 SRAM bitcells have a smaller bitcell area than the proposed differential 7T SRAM bitcell, they exhibit problems such as a long read delay or sensing failure.

# B. READ DELAY AND READ STABILITY

The read delay is comprised of WL decoding, BL development ( $T_{BL}$ ), and sensing/data-out delays. In the differential operation,  $T_{BL}$  is defined as the time between when the WL is enabled and when the SAE is enabled. The SAE should be enabled when the voltage development between the BL and BLB ( $\Delta V_{BL}$ ) is larger than the offset voltage ( $V_{OS}$ ) of the sense amplifier for correct sensing operations. It was assumed that the mean of the V<sub>OS</sub> distribution ( $\mu_{VOS}$ ) is zero because a sense amplifier has a completely symmetrical structure. Moreover, the variance of the  $V_{OS}$  distribution ( $\sigma_{VOS}$ ) was determined as 20 mV because the industry target of  $3\sigma_{VOS}$ is typically set as 50–70 mV [25].  $T_{BL}$  is defined as the time required to ensure the  $5\sigma$  sensing yield. The sensing yield is calculated through importance sampling, in which the sensing failure is counted when  $\Delta V_{BL}$  is smaller than  $5\sigma_{VOS}$  [26].

In the proposed differential 7T SRAM bitcell,  $T_{BL_5\sigma}$  during the read "0" operation is different from that during the read "1" operation owing to the asymmetrical structure. During the read "0" operation in the proposed differential



FIGURE 12. Worst case scenario and RBL voltage during a read "0" operation in 7T-2.

7T SRAM bitcell, the M node is not driven to  $V_{DD}$  because of the turned-off PMR transistor. This implies that  $V_{BLB}$  is disconnected from the  $V_{DD}$  of the selected bitcell. Thus,  $V_{BLB}$ during a read "0" operation is slightly reduced because of the leakage current from the unselected bitcells in the selected column compared to  $V_{BL}$  during read "1" operation, which results in a smaller  $\Delta V_{BL}$ . Therefore,  $T_{BL_5\sigma}$  of a read "0" operation is slightly larger than that of a read "1" operation. In this study, to consider the worst-case, the  $T_{BL_5\sigma}$  of a read "0" operation was used for comparison The  $T_{BL_5\sigma}$  in the proposed differential 7T SRAM bitcell is 1.55 ns.

The proposed differential 7T SRAM bitcell has a smaller  $T_{BL_5\sigma}$  compared to the differential 10T, 12T, and P-P-N 10T SRAM bitcells because it exhibits the lowest BL capacitance owing to the smallest bitcell layout height. Moreover, in the differential 10T, 12T, and P-P-N 10T SRAM bitcells, the VVSS node voltage slightly increases in the read operation because charges flow into the VVSS node from the BL/BLBs in the unselected columns, which degrades the read current.  $T_{BL_5\sigma}$  values in the differential 10T, 12T, and P-P-N 10T SRAM bitcells were 1.7, 1.74, and 1.71 ns, respectively. By considering the WL decoding delay and sensing/data-out delay, the read delays of the proposed differential 7T and the differential 10T, 12T, and P-P-N 10T SRAMs were 3.15, 3.3, 3.34, and 3.31 ns, respectively (Fig. 11).

Meanwhile, in the single-ended operation, the sensing failure is counted when the  $V_{BL}$  does not reach the  $5\sigma$  trip voltage of the high-skewed inverter SA. As shown in Fig. 11, 7T-1 and 8T SRAM bitcells have a longer read delay than the differential SRAM bitcells owing to a larger  $T_{BL}$  due to a larger bitline swing. In a 7T-2 with a single transistor read buffer, the RBL voltage cannot be fully discharged during a read '0' operation because the current of the unselected bitcells in the selected column interrupts the RBL discharge (Fig. 12), which causes the sensing failure.

During the read operation, the storage nodes of the bitcells can be disturbed by the BL charge. Even with the BL disturbance, the previously stored data needs to be maintained until the WL pulse is terminated. Thus, in this study,

TABLE 2. Read stability yield in the RHSCS during a write operation.

| Bitcell          | In the selected bitcell | In RHSCs during write operation |
|------------------|-------------------------|---------------------------------|
| Proposed 7T      | 10.71 σ @ Read "0"      | 5.08 σ @ Read "0"               |
|                  | 10.12 σ @ Read "1"      | 5.01 σ @ Read "1"               |
| Differential 10T | 11.78 σ                 | 11.78 σ                         |
| 12T              | 11.78 σ                 | 11.78 σ                         |
| P-P-N 10T        | 10.49 σ                 | 10.49 σ                         |
| 7T-1             | 10.71 σ @ Read "0"      | 9.98 σ @ Read "0"               |
|                  | 10.12 σ @ Read "1"      | -4.3 σ @ Read "1"               |
| 8T               | 11.78 σ                 | 3.6 σ                           |
|                  |                         |                                 |



FIGURE 13. Waveform of the floated Qb node at various corners.

a stable read operation implies that the previously stored data in the selected bitcells during the read operation is stably maintained until the WL pulse is terminated. When the read stability yield is calculated through importance sampling, the following cases are counted as read operation failures; the data in the selected bitcells during the read operation at the end of the WL pulse are altered from the previously stored data.

Table 2 reports the read stability yields in the selected bitcells during the read operation and the RHSCs during the write operation of the proposed differential 7T and previously proposed SRAM bitcells. All of the selected bitcells achieved the  $5\sigma$  read stability yield during read operation thanks to the decoupled read current path from the data node. The read "0" and "1" stability yields in the proposed differential 7T SRAM bitcell differ because it has an asymmetrical structure. Although the read "1" stability yield is slightly smaller than the read "0" stability yield (because the floating node Qb can be charged owing to the leakage current from  $V_{DD}$  and the M node during the read "1" operation), a  $5\sigma$  read stability yield can be achieved. In addition, although the Qb node during read "1" operation is floated by enabling the WLRB signal, the stored data cannot be flipped since the data Qb node voltage is reduced due to the coupling caused by the WLRB transition. Fig. 13 shows the Q and Qb nodes during a read "1" operation at various corners. At the worst corner (hot temperature and degraded  $V_{DD}$ ), the floating Qb node voltage is increased. However, since  $T_{BL}$  is short enough, the Q node voltage does not flip.



FIGURE 14. Read stability yield in RHSCs according to NTWP1.



FIGURE 15. Read stability margin at various corners.

#### C. STABILITY OF RHSCS DURING WRITE OPERATION

In the previously proposed SRAM bitcells with the bitinterleaved structure, the condition of the RHSCs during the read and write operations is identical to that of the selected bitcell during a read operation and thus, their read stability yield is identical too.

In the proposed differential 7T SRAM bitcell, the disturbance in the RHSCs during a write operation can be reduced by increasing  $T_{W_P1}$ . Fig. 14 shows the read stability yield in the RHSCs according to  $N_{TWP1}$ , which was set to 6 to achieve a  $5\sigma$  read stability yield, and so  $T_{W_P1}$  is referred to as  $T_{W_P1-5\sigma}$ . The read stability margin of the RHSCs at various corners for  $N_{TWP1}$  of 6 is shown in Fig. 15.

In the RHCSs of the differential 10T, 12T, and P-P-N 10T SRAM bitcells during write operations, the data node is decoupled from the BL, which can stably maintain the data node. However, the RHSCs in the previously proposed single-ended 7T-1, 7T-2, and 8T SRAM bitcells undergo a BL disturbance. Thus, these bitcells cannot achieve the  $5\sigma$  yield, as illustrated by the results in Table 2.

# D. WRITE DELAY AND WRITE ABILITY

A write delay consisting of a WL decoding delay,  $T_{W_P1}$ , and  $T_{W_P2}$  is longer during a write "1" operation than during a write "0" operation. This is because the Qb node is discharged by two nMOSs (PMR and PGR) during a write "1" operation.  $T_{W_P1}$  and  $T_{W_P2}$  are determined from  $N_{TWP1}$ , the read stability yield in the RHSCs (see Section IV-C) and  $N_{TWP2}$ , the write ability yield, respectively.



FIGURE 16. Write ability yield according to NTWP2.



FIGURE 17. Write ability margin at various corners.

When the data stored in the selected bitcell differs from the written data, the voltage level of the internal node storing "1" ("0") should be lower (higher) than the metastable point (the trip point of cross-coupled inverters) when the WL pulse is terminated for a stable write operation. The write ability yield is calculated by finding the number of write failures that do not satisfy the aforementioned requirement.

Fig. 16 shows the write ability yield according to  $N_{TWP2}$ : the former is improved as the latter increases. However, the write ability yield is saturated because it is independent of the write time and determined by the strength ratio of the pull-up to the pass-gate transistors when the write time is sufficient with a large  $N_{TWP2}$ .  $N_{TWP2}$  is set to 3. This minimizes the write delay without significantly degrading the write ability yield. In this case,  $T_{W_P2}$  was determined as optimal, i.e.,  $T_{W_P2_optimal}$ . The write ability margin for  $N_{TWP2}$  of 3 at various corners is shown in Fig. 17. The sum of  $T_{W_P2_optimal}$  and  $T_{W_P1_5\sigma}$  is 2.26 ns, and considering the WL decoding delay, the write delay of the proposed differential 7T SRAM is 2.73 ns.

Fig. 18 shows the write ability yields and write assist voltages for the  $5\sigma$  write ability yields of the bitcells; the single-ended 7T-2 SRAM bitcell is excluded because the 7T-2 SRAM bitcell suffers from a read sensing failure issue. The differential 12T SRAM bitcell achieved the highest write ability yield because the pull-up network is completely turned off. The single-ended 7T-1 and 8T SRAM bitcells have higher write ability yield than proposed differential 7T, 10T, and PPN 10T SRAM bitcells because there is only one nMOS pass gate transistor in the write path. However, the RHSCs during write operation in the single-ended 7T-1 and 8T SRAM bitcells cannot achieve  $5\sigma$  read stability because



**FIGURE 18.** Write ability yields and write assist voltage for the  $5\sigma$  write ability yields of the bitcells.

they suffer from BL disturbance. The proposed differential 7T SRAM bitcell exhibited a higher write ability yield than the differential 10T and P-P-N 10T SRAM bitcells because the latter two bitcells incur a disturbance from the VVSS. The P-P-N 10T SRAM bitcell showed the lowest write ability yield since the nMOS-pMOS stack in the write current path cannot transfer full "0" and "1".

A write assist circuit is necessary for the differential 10T, P-P-N 10T, and proposed 7T SRAM bitcells to ensure a  $5\sigma$ write ability yield. Among the write assist circuits, negative  $V_{BL}$  write assist exhibits the best efficiency because it increases the  $V_{DS}$  and  $V_{GS}$  of the pass-gate transistor [27]. However, it needs to toggle numerous column-based BLs and large capacitors, which significantly increases the area overhead and energy consumption. On the other hand, boosted  $V_{WL}$  write assist enhances the pass-gate transistor by increasing  $V_{GS}$ . In addition, it needs to toggle only one row-based WL, which reduces energy consumption for the write assist. Thus, it was used to achieve a  $5\sigma$  write ability yield in this paper.

The P-P-N 10T SRAM bitcell cannot achieve a  $5\sigma$  write ability yield even with boosted  $V_{WL}$  write assist owing to the nMOS-pMOS stack in the write current path. The required  $V_{WL}$  assist levels to achieve a  $5\sigma$  write ability yield for the proposed differential 7T and differential 10T SRAM bitcells were 65 and 90 mV, respectively. Hence, the differential 10T SRAM bitcell requires a higher boosted  $V_{WL}$  assist level than the proposed 7T SRAM bitcell because of a lower write ability yield.

#### E. SRAM OPERATING FREQUENCY

The SRAM operating frequency in the proposed differential 7T SRAM was determined as the highest between the read and write delays. According to Sections IV-B and D, the read delay (3.15 ns) is longer than the write delay (2.73 ns). Thus, the SRAM operating frequency in the differential proposed 7T SRAM was determined as 317 MHz according to the read delay.

In contrast, it is apparent that the operating frequency in the differential 10T and 12T SRAMs are determined by the read delay because the timing overhead to achieve  $5\sigma$  read stability



FIGURE 19. Comparison of read, write and total operation energies.

in the RHSCs during a write operation is not required. Thus, their operating frequencies are 303 MHz and 299 MHz, respectively. Hence, the SRAM operating frequency of the proposed differential 7T SRAM is higher than that of the differential 10T and 12T SRAM by 5% and 6%, respectively.

# F. ENERGY CONSUMPTION AND STANDBY POWER

Fig. 19 shows a comparison of the energy consumption of the whole macro (256 rows and 128 columns), including peripheral circuits, for the 7T-1, 8T, P-P-N 10T, proposed differential 7T, differential 10T, and 12T SRAMs. The energy consumption was measured during an operational period by considering the dynamic and static energies. The energy consumption in the proposed differential 7T SRAM was compared to those of the 7T-1, 8T, P-P-N 10T, differential 10T and 12T SRAMs excluding 7T-2 SRAM because the 7T-2 SRAM cannot achieve stable read operation.

During a read operation, the 7T-1 and 8T SRAMs with single-ended read operation has 41% and 53% smaller read energy than the proposed differential 7T with differential read operation due to twice BL switching activity. In the SRAMs with differential read operation, the read energy consumption in the proposed differential 7T SRAM is lower than those of the P-P-N 10T, differential 10T, and 12T SRAMs by 72%, 73%, and 7%, respectively. The reason is as follows. The energy consumption is dominantly influenced by the BL/BLB capacitance, which is determined by the bitcell layout height. The P-P-N 10T, differential 10T and 12T SRAM bitcells have a larger bitcell layout height than the proposed differential 7T SRAM bitcell due to the larger number of transistors. In addition, in the P-P-N 10T and differential 10T SRAM bitcells, a high VVSS capacitance caused by sharing all of the bitcells in the four columns is toggled.

During a write operation, the column-based signals in the differential 10T and 12T SRAMs need to be toggled in all of the selected columns, which incurs high energy consumption. In contrast, the row-based WL and WLRB signals in the proposed differential 7T SRAM are toggled in only the selected row, which requires less energy. Therefore, the energy consumed during toggling of the control signals in



FIGURE 20. Comparison of energy-delay product and energy-delay-area product.

the write operation is the lowest in the proposed 7T SRAM. However, the differential 10T SRAM has the lowest write energy because of the following reason. In the differential 12T and proposed 7T SRAMs, the BL or BLB in the unselected columns is discharged during a write operation. In contrast, in the differential 10T SRAM, neither the BL nor BLB in the unselected columns is discharged because the VVSS remains at VDD. Thus, the proposed differential 7T SRAM consumes write energy that is 62% lower than that of the differential 12T SRAM but 36% higher than the differential 10T SRAM. However, the total operation energy is dominantly determined by the read energy because a read operation is mainly performed and a write operation occurs when cache hit occurs. Considering the average read/write operation ratio of 7:1 in [28], the proposed differential 7T SRAM has a total operation energy consumption that is 70% and 23% lower than that of the differential 10T and 12T SRAM bitcells, respectively. However, the 7T-1, 8T, and P-P-N 10T SRAMs are excluded because the 7T-1 and 8T SRAM bitcells cannot achieve the 5  $\sigma$  read stability yield at the RHSCs in the write operation with the bit-interleaving structure and the P-P-N 10T SARM bitcell cannot achieve  $5\sigma$  write ability yield even with boosted VWL write assist.

The standby power is measured at the minimum data retention voltage ( $V_{DR}$ ), which ensures the  $5\sigma$  hold stability yield. Since the proposed differential 7T, and differential 10T and 12T SRAM bitcells have a cross-coupled inverter structure, the  $V_{DR}$  of these cells is 0.275 V [26]. The simulation results reveal that the proposed differential 7T SRAM has a higher standby power of 154.9 pW than the differential 10T and 12T SRAMs because the sub-threshold leakage from the BL to the storage node decreases with the stack effect due to the stacked nMOS in the differential 10T and 12T SRAMs. The differential 12T SRAM has slightly lower standby power of 138.9 pW than the differential 10T SRAM of 141 pW because the additional pMOS transistor cannot transfer full "0", which decreases the  $V_{DS}$  of the pull-up transistor.

Fig. 20 shows the energy-delay product (EDP) and leakage-energy-delay product (LEDP) of the proposed differential 7T, and differential 10T and 12T SRAMs. Because the proposed differential 7T SRAM has the shortest delay and

lowest operation energy consumption, its EDP is 72% and 28% less than the differential 10T and 12T SRAMs, respectively. In addition, although the standby power of the proposed differential 7T SRAM is slightly higher than the others with a single nMOS pass gate, the LEDP of the proposed differential 7T SRAM is 69% and 19% lesser than the differential 10T and 12T, respectively, thanks to it having the lowest EDP.

# **V. CONCLUSION**

In the near- $V_{th}$  region with a bit-interleaved structure, the RHSCs of the conventional 6T, 8T, and 7T SRAM bitcells undergo BL/BLB disturbance. The problem is resolved by using column-based write WL in differential 10T and 12T SRAM bitcells and pseudo-data nodes in the P-P-N 10T SRAM bitcell. However, these bitcells have several problems, such as a large bitcell area, a long delay, and high energy consumption. Hence, the differential 7T SRAM bitcell with an additional PMR nMOS transistor between the PUR and PDR is proposed to mitigate these problems. During a read operation, the PMR is turned off by the WLRB of VSS, which improves the read stability. Moreover, the differential write operation can be performed by using a pulsed WLRB signal during a write operation, thereby resolving the -selected issue. The bitcell area of the differential 7T SRAM bitcell is 33%, 49%, and 37% smaller than the differential 10T, 12T, and P-P-N 10T SRAM bitcells, respectively, and its operating frequency is 5% and 6%, higher than those of the differential 10T and 12T, respectively. The proposed differential 7T SRAM has higher write energy consumption than the differential 10T SRAM owing to bitline discharging in the unselected columns but lower than the 12T SRAM in which VVSS is toggled. However, the differential 10T SRAM has the highest read operation energy due to the toggling of the VVSS. Considering the read/write operation ratio, the proposed differential 7T consumes 70% and 23% lower total energy than the differential 10T and 12T SRAM, respectively. Moreover, the LEDP of the proposed differential 7T SRAM was 69% and 19% less than those of the differential 10T and 12T SRAMs, respectively. In conclusion, the proposed differential 7T SRAM bitcell achieved higher performance and operational yield with a smaller area along with low energy consumption.

# REFERENCES

- D. Markovic, C. C. Wang, L. P. Alarcon, T.-T. Liu, and J. M. Rabaey, "Ultralow-power design in near-threshold region," *Proc. IEEE*, vol. 98, no. 2, pp. 237–252, Feb. 2010.
- [2] W.-K. Chen, *Linear Networks and Systems*. Belmont, CA, USA: Wadsworth, 1993, pp. 123–135.
- [3] P. Hazucha, T. Karnik, J. Maiz, S. Walstra, B. Bloechel, J. Tschanz, G. Dermer, S. Hareland, P. Armstrong, and S. Borkar, "Neutron soft error rate measurements in a 90-nm CMOS process and scaling trends in SRAM from 0.25-µm to 90-nm generation," in *IEDM Tech. Dig.*, Washington, DC, USA, Dec. 2003, pp. 21.5.1–21.5.4.
- [4] J. Maiz, S. Hareland, K. Zhang, and P. Armstrong, "Characterization of multi-bit soft error events in advanced SRAMs," in *IEDM Tech. Dig.*, Washington, DC, USA, Dec. 2003, pp. 21.4.1–21.4.4.

- [5] T.-H. Kim, J. Liu, J. Keane, and C. H. Kim, "A 0.2 V, 480 kb subthreshold SRAM with 1 k cells per bitline for Ultra-Low-Voltage computing," *IEEE J. Solid-State Circuits*, vol. 43, no. 2, pp. 518–529, Feb. 2008.
- [6] K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, Y. Nakazawa, T. Ishii, and H. Kobatake, "A read-static-noise-margin-free SRAM cell for low-VDD and high-speed applications," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 113–121, Jan. 2006.
- [7] M.-F. Chang, M.-P. Chen, L.-F. Chen, S.-M. Yang, Y.-J. Kuo, J.-J. Wu, H.-Y. Su, Y.-H. Chu, W.-C. Wu, T.-Y. Yang, and H. Yamauchi, "A sub-0.3 V area-efficient L-shaped 7T SRAM with read bitline swing expansion schemes based on boosted read-bitline, asymmetric-V<sub>TH</sub> read-port, and offset cell VDD biasing techniques," *IEEE J. Solid-State Circuits*, Vol. 48, no. 10, pp. 2558–2569, Oct. 2013.
- [8] N. Verma and A. Chandrakasan, "A 256 kb sub-threshold SRAM in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 42, no. 3, pp. 680–688, Mar. 2007.
- [9] Y. Morita, H. Fujiwara, H. Noguchi, Y. Iguchi, K. Nii, H. Kawaguchi, and M. Yoshimoto, "An area-conscious low-voltage-oriented 8T-SRAM design under DVS environment," in *Proc. IEEE Symp. VLSI Circuits*, Kyoto, Japan, Jun. 2007, pp. 256–257.
- [10] I. J. Chang, J.-J. Kim, S. P. Park, and K. Roy, "A 32 kb 10T subthreshold SRAM array with bit-interleaving and differential read scheme in 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 2, pp. 650–658, Feb. 2009.
- [11] Y.-W. Chiu, Y.-H. Hu, M.-H. Tu, J.-K. Zhao, Y.-H. Chu, S.-J. Jou, and C.-T. Chuang, "40 nm bit-interleaving 12T subthreshold SRAM with dataaware write-assist," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 9, pp. 2578–2585, Sep. 2014.
- [12] C.-H. Lo and S.-Y. Huang, "P-P-N based 10T SRAM cell for low-leakage and resilient subthreshold operation," *IEEE J. Solid-State Circuits*, vol. 46, no. 3, pp. 695–704, Mar. 2011.
- [13] K. Ronse, P. De Bisschop, G. Vandenberghe, E. Hendrickx, R. Gronheid, A. V. Pret, A. Mallik, D. Verkest, and A. Steegen, "Opportunities and challenges in device scaling by the introduction of EUV lithography," in *IEDM Tech. Dig.*, San Francisco, CA, USA, Dec. 2012, pp. 18.5.1–18.5.4.
- [14] S. Khandelwal et al. BSIM-CMG 107.0.0 Multi-Gate MOSFET Compact Model. Berkley Education. [Online]. Available: https://www-device. eecs.berkeley.edu/bsim/?page= BSIMCMG\_LR
- [15] C. Auth *et al.*, "A 22 nm high performance and low-power CMOS technology featuring fully-depleted tri-gate transistors, self-aligned contacts and high density MIM capacitors," in *Proc. Symp. VLSI Technol.*, Honolulu, HI, USA, Jun. 2012, pp. 131–132.
- [16] M. Shrivastava, B. Verma, M. S. Baghini, C. Russ, D. K. Sharma, H. Gossner, and V. R. Rao, "Benchmarking the device performance at sub 22 nm node technologies using an SoC framework," in *IEDM Tech. Dig.*, Baltimore, MD, USA, Dec. 2009, pp. 1–4.
- [17] C. Millar, D. Reid, G. Roy, S. Roy, and A. Asenov, "Accurate statistical description of random dopant-induced threshold voltage variability," *IEEE Electron Device Lett.*, vol. 29, no. 8, pp. 946–948, Aug. 2008.
- [18] C. H. Lin, R. Kambhampati, R. J. Miller, T. B. Hook, A. Bryant, W. Haensch, P. Oldiges, I. Lauer, T. Yamashita, V. Basker, T. Standaert, K. Rim, E. Leobandung, H. Bu, and M. Khare, "Channel doping impact on FinFETs for 22 nm and beyond," in *Proc. Symp. VLSI Technol.*, Honolulu, HI, USA, Jun. 2012, pp. 15–16.
- [19] D. Ingerly et al., "Low-K interconnect stack with metal-insulator-metal capacitors for 22 nm high volume manufacturing," in Proc. IEEE Int. Interconnect Technol. Conf., San Jose, CA, USA, Jun. 2012, pp. 249–251.
- [20] E. Seevinck, F. J. List, and J. Lohstroh, "Static-noise margin analysis of MOS SRAM cells," *IEEE J. Solid-State Circuits*, vol. 22, no. 5, pp. 748–754, Oct. 1987.
- [21] Z. Guo, A. Carlson, L.-T. Pang, K. T. Duong, T.-J.-K. Liu, and B. Nikolic, "Large-scale SRAM variability characterization in 45 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 11, pp. 3174–3192, Nov. 2009.
- [22] R. V. Joshi, S. Mukhopadhyay, D. W. Plass, Y. H. Chan, C.-T. Chuang, and Y. Tan, "Design of sub-90 nm low-power and variation tolerant PD/SOI SRAM cell based on dynamic stability metrics," *IEEE J. Solid-State Circuits*, vol. 44, no. 3, pp. 965–976, Mar. 2009.
- [23] J. Wang, S. Nalam, and B. H. Calhoun, "Analyzing static and dynamic write margin for nanometer SRAMs," in *Proc. 13th Int. Symp. Low Power Electron. Design (ISLPED)*, Bangalore, India, Aug. 2008, pp. 129–134.
- [24] T. S. Doorn, E. J. W. ter Maten, J. A. Croon, A. Di Bucchianico, and O. Wittich, "Importance sampling Monte Carlo simulations for accurate estimation of SRAM yield," in *Proc. 34th Eur. Solid-State Circuits Conf.* (*ESSCIRC*), Edinburgh, U.K., Sep. 2008, pp. 230–233.

# IEEE Access

- [25] T. Na, S.-H. Woo, J. Kim, H. Jeong, and S.-O. Jung, "Comparative study of various latch-type sense amplifiers," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 22, no. 2, pp. 425–429, Feb. 2014.
- [26] K. Cho, J. Park, T. W. Oh, and S. Jung, "One-sided schmitt-trigger-based 9T SRAM cell for near-threshold operation," *IEEE Trans. Circuits Syst. I, Reg. Papers Reg. Papers*, vol. 67, no. 5, pp. 1551–1561, May 2020.
- [27] M.-H. Tu, J.-Y. Lin, M.-C. Tsai, C.-Y. Lu, Y.-J. Lin, M.-H. Wang, H.-S. Huang, K.-D. Lee, W.-C. Shih, S.-J. Jou, and C.-T. Chuang, "A single-ended disturb-free 9T subthreshold SRAM with cross-point data-aware write word-line structure, negative bit-line, and adaptive read operation timing tracing," *IEEE J. Solid-State Circuits*, vol. 47, no. 6, pp. 1469–1482, Jun. 2012.
- [28] Y. Xie, Emerging Memory Technologies: Design, Architecture, and Applications. New York, NY, USA: Springer, 2013, p. 187.



**KEONHEE CHO** (Member, IEEE) was born in Seoul, South Korea, in 1994. He received the B.S. degree in electrical and electronic engineering from Yonsei University, Seoul, in 2018, where he is currently pursuing the Ph.D. degree in electrical and electronic engineering. His research interests are focused on near-threshold SRAM cell design and low-voltage SRAM peripheral circuit design.



**TAE WOO OH** (Member, IEEE) was born in Seoul, South Korea, in 1992. He received the B.S. degree in electrical and electronic engineering from Yonsei University, Seoul, in 2015, where he is currently pursuing the Ph.D. degree in electrical and electronic engineering. His research interests are focused on low-power and high-speed SRAM and next-generation semiconductor devices.



**JI SANG OH** (Member, IEEE) was born in Seoul, South Korea, in 1996. He received the B.S. degree in electrical and electronic engineering from Yonsei University, Seoul, Republic of Korea, in 2021, where he is currently pursuing the Ph.D. degree in electrical and electronic engineering. His research interests are focused on FinFET-based low-power and high-performance SRAM cells.



**SEONG-OOK JUNG** (Senior Member, IEEE) received the B.S. and M.S. degrees in electronic engineering from Yonsei University, Seoul, South Korea, in 1987 and 1989, respectively, and the Ph.D. degree in electrical engineering from the University of Illinois at Urbana–Champaign, Urbana, IL, USA, in 2002. From 1989 to 1998, he worked with Samsung Electronics Company Ltd., Hwasung, South Korea, where he was involved with specialty memories, such as video

RAM, graphic RAM, and window RAM. He was with T-RAM Inc., Mountain View, CA, USA, where he was the Leader of the Thyristor-Based Memory Circuit Design Team. From 2003 to 2006, he worked with Qualcomm Inc., San Diego, CA, USA, where he was involved in highperformance low-power embedded memories, process variation tolerant circuit design, and low-power circuit techniques. Since 2006, he has been a Professor with Yonsei University. His current research interests include process variation-tolerant circuit design, low-power circuit design, mixed-mode circuit design, and next-generation memory and technology.

. . .



**JUHYUN PARK** (Member, IEEE) was born in Incheon, South Korea, in 1988. He received the B.S. degree in electronic and electrical engineering from Hongik University, Seoul, South Korea, in 2012, and the Ph.D. degree in electrical and electronic engineering from Yonsei University, Seoul, in 2020. He joined SK Hynix Inc., Icheon, in 2020, where he is involved in mobile DRAM design.