# <span id="page-0-6"></span>A Charge Recycling Logic Data Links for Single- and Multiple-Channel I/Os

Han Wu, *Member, IEEE*, Jeong Hoan Park, *Member, IEEE*, Rucheng Jiang, *Graduate Student Member, IEEE*, Jung-Hwan Cho[i](https://orcid.org/0000-0002-3611-4734)<sup>®</sup>[,](https://orcid.org/0000-0002-3150-1727) *Member, IEEE*, and Jerald Yoo<sup>®</sup>, *Senior Member, IEEE* 

*Abstract*— Wide input-output (IO) chip-to-chip interfaces, such as 3-D chip stacking [through-silicon via (TSV)], silicon interposer in high-bandwidth memory (HBM), and other 2.5-D chip-to-chip interface, handle a large amount of data in the server and artificial intelligence (AI) applications. With a large number of IOs, power consumption becomes a huge burden. This article presents a novel charge recycling (CR) logic with >20% power reduction under random data streaming. The presented generic CR technique is applicable to both TSV and transmission line (T-Line) link IOs. The CR logic is implemented on two silicon dies where the single-channel CR (CR1) uses a storage capacitor to recycle charge at each data transition and multi-channel CR (CR2/4/8) replenishes the charge between multiple channels during the opposite transitions. Fabricated in a 40-nm 1P8M standard CMOS, the TSV link (2.56 Gb/s) and the T-Line link (5.12 Gb/s) save energy up to 32.2% and 47%, respectively, under periodic data transmission and up to  $>20\%$  under pseudorandom binary sequence (PRBS). The eye diagrams and the bit error rate (BER) show that signal integrity is maintained when compared with conventional data links.

*Index Terms*— Charge recycling (CR), data links, energy efficiency, energy recycling, energy reduction ratio (ERR), highbandwidth memory (HBM), I/O, through-silicon via (TSV), transmission line (T-Line).

#### <span id="page-0-3"></span>I. INTRODUCTION

<span id="page-0-2"></span>**B** IG data applications, such as artificial intelligence (AI), virtual reality, and media streaming, have significantly virtual reality, and media streaming, have significantly increased the data amount of computing and transmission. Based on the Semiconductor Industry Association (SIA), with the current trend, by 2040, the computing system energy consumption will exceed the world's total energy production [\[1\]. In](#page-9-0) a computing system, the energy cost of a DRAM access (1–2 nJ/b) is orders of magnitude greater than that of the cache access (10 pJ/b)  $[2]$ . Currently, the mainstream data links between CPU and DRAM are the through-silicon via (TSV) link for the high-bandwidth memory (HBM) and transmission line (T-Line) link for DDR and GDDR, as shown

Manuscript received 26 January 2023; revised 9 April 2023 and 18 June 2023; accepted 30 June 2023. Date of publication 14 August 2023; date of current version 26 September 2023. This article was approved by Associate Editor Minoru Fujishima. This work was supported by Samsung Electronics. *(Corresponding author: Jerald Yoo.)*

Han Wu and Rucheng Jiang are with the Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117583. Jeong Hoan Park and Jung-Hwan Choi are with Samsung Electronics,

Hwaseong 18200, South Korea. Jerald Yoo is with the Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117583, and also with the N.1 Institute for Health, Singapore 117456 (e-mail: jyoo@nus.edu.sg).

Color versions of one or more figures in this article are available at https://doi.org/10.1109/JSSC.2023.3294475.

Digital Object Identifier 10.1109/JSSC.2023.3294475

<span id="page-0-0"></span>

Fig. 1. Data links between CPU and DRAM.

<span id="page-0-1"></span>

Fig. 2. Energy per transition.

in Fig. [1.](#page-0-0) In this big data era, memory designers have pushed the data rate every year; for example, GDDR6 and HBM3 reached up to 24 and 6.4 Gb/s/ch., respectively. Both the TSV and T-Line links have heavy parasitic capacitance: based on JEDEC, the parasitic capacitance on the HBM2 TSV link and DDR5 link is 2.4 [\[3\] and](#page-9-2) 0.9 pF [\[4\], re](#page-9-3)spectively. High-speed data transmission under such a heavy load naturally suffers from thermal issues, resulting in high power consumption and degraded overall computing performance.

<span id="page-0-5"></span><span id="page-0-4"></span>In the digital circuit and communication links, dynamic power consumption occurs due to the frequent switching activities of the capacitive loads. When the output transits from low to high (L2H), the output load is charged up to VDD by the supply, as shown in Fig. [2.](#page-0-1) During this phase, half of the energy drawn from the supply  $(1/2 \ C_{IO} VDD^2)$  is converted to heat by the PMOS on resistance, and the rest half is then stored to the load  $(C_{IO})$ . When the output transits from high to low (H2L), the charge stored in the load  $C_{\text{IO}}$  is now dumped to the ground, and during this phase, the remaining half the energy is converted to heat through the NMOS on resistance. Therefore, when the complete L2H and H2L transition cycle

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/

<span id="page-1-0"></span>

Fig. 3. Supply voltage reduction trends of DRAM.

<span id="page-1-5"></span><span id="page-1-4"></span><span id="page-1-3"></span>is finished, the energy per cycle will be  $C_{\text{IO}}$ VDD<sup>2</sup>. There are three ways to reduce the energy per cycle: 1) lowering the VDD; 2) reducing  $C_{\text{IO}}$ ; and 3) energy recycling. First, reducing VDD would save energy quadratically. This method has been applied to each memory generation, as summarized in Fig. [3](#page-1-0) [\[4\], \[](#page-9-3)[5\], \[](#page-9-4)[6\], \[](#page-9-5)[7\]. H](#page-9-6)owever, lowering the VDD is approaching its end due to the finite threshold voltage. Second, researchers are reducing  $C_{\text{IO}}$  to reduce the energy per transition. For example, in HBM2 [\[8\], re](#page-9-7)searchers have changed the driving scheme of TSV from nine segments to three segments, achieving a 30% reduction of the peak current consumption. This method highly depends on the packaging technique. Finally, a more aggressive method hides in the equation itself; the charge in C<sub>IO</sub> can be reused (recycled) to save power, similar to water recycling in human society.

<span id="page-1-9"></span><span id="page-1-8"></span><span id="page-1-7"></span>The charge-reuse methods have been applied to CPU and DRAM applications; for example, the adiabatic charging method [\[9\], \[](#page-9-8)[10\] w](#page-9-9)as applied in the CPU clock tree. IBM first introduced the resonant clock tree, which reported a 35% clock power reduction [\[11\]. H](#page-9-10)owever, it is only applicable to the narrow resonant frequency band (4.6 GHz). When the clock operates at a lower frequency, this method consumes more power due to the overhead of decap loads compared with the conventional clock tree. Therefore, it is not suitable for modern wide dynamic voltage and frequency scaling (DVFS) CPUs. To cope with a wider frequency range (megahertzto-gigahertz), the stepwise charging method was adopted for the clock driver  $[12]$ . This method shows a 55.6% energy reduction of the clock driver's power; however, the overhead of additional charge-reuse capacitance is  $6.7\times$  larger than that of load capacitance, meaning that it is not suitable for the silicon area cost-sensitive CPUs.

<span id="page-1-10"></span>On the CPU side, the SRAM (cache) power consumption is significant, as illustrated in [\[2\]. T](#page-9-1)he forward body biasing (FBB) technique was developed by Intel to reduce the minimum supply (VDD $_{MIN}$ ) of SRAM [\[13\].](#page-9-12) Recently, the charge recycling (CR) method has been applied in the SRAM to reduce the write operation power of SRAM [\[14\].](#page-9-13) The mechanism uses the unselected bitline charge to connect with

<span id="page-1-2"></span>

Fig. 4. (a) Conventional IO driver. (b) Proposed IO driver.

the ground of the selected SRAM cell to raise the ground in the write operation. In this way,  $VDD<sub>MIN</sub>$  is further lowered to reduce power consumption quadratically. It achieves 30%–66% energy reduction for the SRAM write operation by lowering  $125$  mV of VDD<sub>MIN</sub>. The bitline charge-recyclingbased write assist (BCR-WA) circuit is feasible based on the periodic array and rhythmic operation property of the SRAM matrix. CR can also be applied in the DRAM to save energy for the computing system. For example, in the DRAM read operation, researchers proposed using the idle half-page row accessing energy to power up the sensing amplifier block to achieve charge reuse [\[15\]. I](#page-9-14)t reaches 30% energy reduction at 85 ◦C. However, it only reduces the refresh operation power, which only accounts for 6% of DRAM overall power, and it was not silicon-proven.

<span id="page-1-13"></span><span id="page-1-6"></span>In summary, the charge-reuse methods can be implemented inside the CPU on the clock tree and SRAM to save energy significantly. It also has the potential to be applied inside the DRAM to save energy for the computing system further. However, all these methods are limited to either periodic clocks, periodic arrays, or rhythmical operations. None of them achieved the CR technique for random data streaming in the modern high-speed communication links between CPU and DRAM.

To address all aforementioned issues and to achieve energy reduction in the random data communication links, this article proposes the generic CR technique applicable in one/two/four and eight channels for both the TSV and the T-Line links [\[16\].](#page-9-15) The rest of this article is organized as follows. Section [II](#page-1-1) introduces the principles, energy reduction ratio (ERR) computing, and topologies of CR input-output (IO). Section [III](#page-5-0) describes the building blocks for the ASIC. Section [IV](#page-6-0) shows the measurement results, and Section [V](#page-8-0) concludes the work.

# <span id="page-1-14"></span><span id="page-1-1"></span>II. PROPOSED GENERIC CR FOR HIGH-SPEED DATA LINKS

# *A. CR Concept*

<span id="page-1-12"></span><span id="page-1-11"></span>Fig. [4\(a\)](#page-1-2) shows the conventional IO driver working principle. During the L2H transition, the energy is drawn directly from the supply, where half of the energy is consumed by the pull-up network, and the rest is stored in the load  $(C_{\text{IO}})$ . In the H2L transition, the charge stored in  $C_{\text{IO}}$  is dumped to the ground, during which the pull-down network consumes the remaining half of the energy. In order to achieve energy recycling, the circuit needs a third period between these two transitions. Fig.  $4(b)$  shows the proposed CR concept. In the first step (L2H), the energy draws from the supply to  $C_{\text{IO}}$ .

<span id="page-2-0"></span>

Fig. 5. (a) Proposed driver circuit adopting CR concept. (b) Working principle of the CR1 driver circuit.

In the second step (H2L), the storage capacitor  $(C<sub>S</sub>)$  stores the energy. In the next L2H transition, the stored energy charges up *C*IO first ("recycled") instead of drawing energy from the supply; only the remaining charge is then drawn from the supply to "top up"  $C_{\text{IO}}$ .

#### *B. Single-Channel CR IO (CR1)*

It is necessary to compute the ERR of the proposed CR IO to trade off with the potential area overhead. For singlechannel cases, the schematic of the proposed CR (CR1) circuit is shown in Fig.  $5(a)$ . Here, the storage capacitor  $(C<sub>S</sub>)$ , near the driver (DRV), is to be realized by an on-chip capacitor whose capacitance is *n* times of  $C_{\text{IO}}$ . As the driver circuit is usually near or under the pad, we can utilize the pad capacitance itself as a storage capacitor; as the alternatives, a metal–insulator–metal (MIM) capacitor or MOS capacitor can be placed under the pad as well. For example, in a 40-nm standard CMOS process, the typical MIM cap density is ∼2 fF/ $\mu$ m<sup>2</sup>. A 60  $\times$  60  $\mu$ m silicon area, close to the pad physical geometry, reaches  $\sim$ 7 pF, which is over five times the typical DDR4 IO capacitance [\[17\]. I](#page-9-16)n addition, in modern dense IO chips, the IO pads are typically surrounded by the power or ground pads to generate the checker-board pattern [\[18\] su](#page-9-17)itable for storage capacitor implementation.

<span id="page-2-5"></span>The working principle of the CR driver circuit is shown in Fig.  $5(b)$ . When input  $D_{\text{IN}}$  changes from H2L, the switch control logic (SC) senses the transition, then turns off the DRV, and turns on the switch (SW) for a short period  $(t_1$  and  $t_2$ ). Similarly, when input  $D_{\text{IN}}$  changes from L2H, the SC senses the transition, turns off the DRV, and turns on the SW for a short period  $(t_3$  and  $t_4$ ). Initially, there exists no charge inside *CS*. After multiple cycles, the charge amount will

<span id="page-2-1"></span>

Fig. 6. Simplified capacitor lumped model for two CR periods.

accumulate in  $C<sub>S</sub>$  to a saturation value and swing between  $V_{SL}$  and  $V_{SH}$ . To simplify the ERR calculation, the SW is assumed to be zero resistance when turned on. Based on this assumption, the voltage on  $C_{\text{IO}}$  ( $D_{\text{OUT}}$ ) and the voltage on  $C_S$  $(V_{\text{CS}})$  would reach the same voltage  $V_{\text{SH}}$  at  $t_2$  (the end of 1st (H2L) CR period) and reach the same voltage  $V_{SL}$  at  $t_4$  (the end of 2nd (L2H) CR period). During the 1st CR period,  $D_{\text{OUT}}$ would decrease from VDD to  $V_{\text{SH}}$ , and  $V_{\text{CS}}$  would increase from  $V_{SL}$  to  $V_{SH}$ . Subsequently, DRV is turned on, and SW is turned off; then,  $D_{\text{OUT}}$  keeps on going to 0. When  $D_{\text{IN}}$  changes from L2H in the next bit, the previously stored charge will help charge (i.e., "jump start")  $C_{\text{IO}}$  from 0 to  $V_{\text{SL}}$  in the 2nd CR period. As the data stream continuously, these two periods alternate. To be more realistic, when the turn-on resistance of SW  $(R<sub>ON</sub>)$  is taken into consideration, in the end, the voltage on  $D_{\text{OUT}}$  and  $V_{\text{CS}}$  will have similar voltage changes to that of without  $R_{ON}$ .

# *C. ERR Calculation*

There are two simplified capacitor-lumped models for these two CR periods, as shown in Fig. [6.](#page-2-1) Based on the assumption that the switch turn-on resistance is 0, the charge conservation law can derive [\(1\)](#page-2-2) and [\(2\)](#page-2-3) corresponding to the two CR periods. If the switch turn-on resistance is not zero, the law of conservation of energy can derive similar but more complex first-order differential equations. For simplification derivation purposes, here, the charge conservation law has a similar interpretation for the energy reduction property of the CR driver

$$
C_S \cdot V_{SL} + C_{IO} \cdot VDD = (C_S + C_{IO}) \cdot V_{SH} \tag{1}
$$

$$
C_S \cdot V_{SH} + C_{IO} \cdot 0 = (C_S + C_{IO}) \cdot V_{SL}. \tag{2}
$$

<span id="page-2-6"></span>Substituting  $nC_{\text{IO}}$  into  $C_S$ , we can derive the voltage of  $V_{\text{SL}}$ as follows:

<span id="page-2-4"></span><span id="page-2-3"></span><span id="page-2-2"></span>
$$
V_{\text{SL}} = \frac{n}{2n+1} \cdot \text{VDD}.\tag{3}
$$

<span id="page-3-0"></span>

Fig. 7. Red for the conventional and blue for the CR: (a) ideal equivalent *RC* charging circuit, (b) energy consumption, and (c) voltage ramp-up of the conventional and the CR drivers.

Here, *n* is the ratio between  $C_s$  and  $C_{\text{IO}}$ . To be noted, the CR circuit has a redistribution time and settling of voltage on storage capacitor needs around 100 consecutive PRBS31 data to saturate around 1/2 VDD. During this ramp-up period, the fluctuations of  $V_{SL}$  and  $V_{SH}$  will make zero-crossing point changes. To avoid this causing more jitter, the practical implementation needs to add the saturation period or give an initial reset to 1/2 VDD before applying the CR technique to transmit data.

From Fig. [6,](#page-2-1) intuitively, we can conclude that the power consumption of the proposed CR driver is used to charge up  $C_{\text{IO}}$  from  $V_{\text{SL}}$  to VDD in every cycle, in contrast to the conventional driver charging  $C_{\text{IO}}$  from 0 to VDD. The dynamic power consumption is reexamined from the calculus angle before calculating the ERR of the proposed CR driver compared with the conventional driver. The following equivalent *RC* charging circuit in Fig.  $7(a)$  shows the conventional driver, where the ideal switch and constant resistor  $(R)$  represent the turn-on resistance of the pull-up transistor PMOS, respectively. The energy consumption from the VDD consists of the energy consumed on the resistor *R* and the energy stored on the capacitor.

At time 0 s, the ideal switch is turned on, and VDD starts charging up the *RC* circuit. KVL shows the closed-loop equation in the following:

$$
i(t)R + v_{\text{OUT}}(t) = \text{VDD}.\tag{4}
$$

 $i(t)$  and  $v_{\text{OUT}}(t)$  have the following relationship:

$$
i(t) = \frac{C_{\text{IO}}dv_{\text{OUT}}(t)}{dt}.\tag{5}
$$

Combining [\(4\)](#page-3-1) and [\(5\)](#page-3-2) to solve the first-order differential equation, the representatives are derived as follows:

<span id="page-3-4"></span><span id="page-3-3"></span>
$$
i(t) = e^{-\frac{t}{RC}} \frac{\text{VDD}}{R} \tag{6}
$$

$$
v_{\text{OUT}}(t) = \left(1 - e^{-\frac{t}{RC}}\right) \text{VDD}.\tag{7}
$$

The overall energy consumed from VDD consists of two parts, heat  $[E_R(t)]$  generated on *R* and the work  $[E_{C_{10}}(t)]$ that needs to overcome to inject charge into the capacitor  $C_{10}$ , which is represented as follows:

<span id="page-3-5"></span>
$$
E_R(t) = \int_0^t i(t)^2 \cdot R dt
$$
 (8)

$$
E_{C_{\text{IO}}}(t) = \int_0^t i(t) \cdot v_{\text{OUT}}(t) dt.
$$
 (9)

The energy consumption  $[E_{\text{CONV}}(t)]$  of the conventional driver is derived as follows:

<span id="page-3-6"></span>
$$
E_{\text{CONV}}(t) = E_R(t) + E_{C_{\text{IO}}}(t). \tag{10}
$$

Substituting [\(6\)](#page-3-3) and [\(7\)](#page-3-4) into [\(8\)](#page-3-5)–[\(10\)](#page-3-6),  $E_{\text{CONV}}(t)$ ,  $E_R(t)$ , and  $E_{C_{10}}(t)$  are derived as follows:

$$
E_{\text{CONV}}(t) = \left(1 - e^{-\frac{t}{RC}}\right) \cdot C_{\text{IO}} \text{VDD}^2 \tag{11}
$$

$$
E_R(t) = \left(1 - e^{-\frac{2t}{RC}}\right) \cdot \frac{1}{2} C_{\text{IO}} \text{VDD}^2 \tag{12}
$$

$$
E_{C_{\text{IO}}}(t) = \left( \left( 1 - e^{-\frac{t}{RC_{\text{IO}}}} \right) - \frac{1}{2} \left( 1 - e^{-\frac{2t}{RC_{\text{IO}}}} \right) \right) \cdot C_{\text{IO}} \text{VDD}^2. \tag{13}
$$

The energy consumption and *C*<sub>IO</sub>'s voltage ramp-up curves of the conventional driver are plotted, as shown in Fig.  $7(b)$  and [\(c\).](#page-3-0) When *t* is much greater than  $RC_{IO}$ , we can get the conventional energy per transition ( $E_{\text{CONV}}$ ) as follows:

<span id="page-3-7"></span>
$$
E_{\rm CONV} = C_{\rm IO} \text{VDD}^2. \tag{14}
$$

Unlike the conventional driver, in the CR1 circuit, the load capacitor  $(C_{\text{IO}})$  is charged up to  $V_{\text{SL}}$  as derived in [\(3\)](#page-2-4) at time  $t_X$ . Then, the CR driver keeps charging up  $C_{\text{IO}}$  from the level of  $V_{SL}$ , as shown in Fig.  $7(a)$ .

Intuitively, the CR1 driver's energy consumption is reduced compared with the traditional driver, as shown in Fig.  $7(b)$  and [\(c\).](#page-3-0) To derive the ERR, applying [\(3\)](#page-2-4) at time  $t_X$ , the energy stored in  $C_{\text{IO}}$  can be represented as follows:

<span id="page-3-8"></span>
$$
E_{C_{10},t_X} = \frac{1}{2} \left( \frac{n}{2n+1} \right)^2 \cdot C_L \text{VDD}^2 \tag{15}
$$

where  $C_S = nC_{\text{IO}}$ . Applying [\(13\)](#page-3-7) at time  $t_X$ , the energy stored in  $C_{\text{IO}}$  can be represented as follows:

$$
E_{C_{\text{IO}}}(t) = \left( \left( 1 - e^{-\frac{t}{RC_{\text{IO}}}} \right) - \frac{1}{2} \left( 1 - e^{-\frac{2t}{RC_{\text{IO}}}} \right) \right) \cdot C_{\text{IO}} \text{VDD}^2. \tag{16}
$$

<span id="page-3-2"></span><span id="page-3-1"></span>From  $(15)$  and  $(16)$ , the specific value of  $t_X$  can be derived as follows:

<span id="page-3-9"></span>
$$
t_X = RC_{\text{IO}} \cdot \ln\left(\frac{2n+1}{n+1}\right). \tag{17}
$$

Since  $t_X$  is derived and the relationship of the energy consumption ( $E_{CR}$ ) of the CR1 driver and  $E_{CONV}(t_X)$  in Fig.  $7(b)$  is established,  $E_{CR}$  is derived as follows:

$$
E_{CR} = \frac{n+1}{2n+1} \cdot C_{IO} VDD^2.
$$
 (18)

Therefore, the ERR of CR1 driver compared with the conventional driver is derived as follows:

$$
ERR = \frac{E_{CONV} - E_{CR}}{E_{CONV}} = \frac{n}{2n + 1}.
$$
 (19)

When *n* approaches infinite, the ERR approaches 50%, the same as predicted by stepwise charging [\[9\]. T](#page-9-8)his ERR equation [\(19\)](#page-4-0) gives the designer more general guidelines for the tradeoff of sizing the storage capacitor  $(C<sub>S</sub>)$ . In practice, increasing *n* comes at the cost of the area and adding additional parasitic capacitance to the load due to wider wires connecting the storage capacitor and the load. To be practical,  $C_S$  can be hidden under the pad, in which case, optimized  $n = 3.3$  for a TSV link and  $n = 5$  for the T-Line link. To reduce the area overhead, we expand this concept to multi-channel cases in Section [II-D,](#page-4-1) such as in two/four/eight channels.

## <span id="page-4-1"></span>*D. CR Among Two Channels (CR2)*

In CR2, the storage capacitor  $(C<sub>S</sub>)$  is removed, as shown in Fig.  $8(a)$ ; instead, the switch controller detects the data transitions and polarities on two channels. When the transition edges come at the same time and the polarity of the channels is opposite, the switch controller will turn on the switch to share the charge between these two channels' C<sub>IO</sub>. By doing so, the falling edge channel "aids" the rising edge channel to "jump start" its charging. Here, the switch is composed of complementary switches to reduce the turn-on resistance. To be noted, the charge-sharing speed is faster than CR1; this is because the voltage difference between the drain and the source of the SW is VDD at the beginning of sharing instead of nearly half VDD in CR1. CR2 has a larger average charge-sharing current compared with CR1.

<span id="page-4-4"></span><span id="page-4-3"></span>In the circuit, δ*1* determines the switching pulsewidth. This width is implemented with tunability for adjusting the CR efficiency and eye diagram property. The process voltage temperature (PVT) variation will affect the performance of CR. Based on the post-simulation, the CR period varies  $>2\times$  under different PVTs. To avoid eye-width degradation, the switch pulsewidth needs to be trimmed on silicon. The pulsewidth implemented on-chip is manual tuning. The automatic trimming scheme is needed for practical usage. For a 2.56-Gb/s speed, the unit interval (UI) is 390.625 ps. As the rising and falling edge should be less than 1/3 of the UI, estimated from the rising and falling time of modern DRAM data links' eye diagrams [\[19\], \[](#page-9-18)[20\], \[](#page-9-19)[21\], t](#page-9-20)he switch pulsewidth should be less than 130 ps. For 5.12-Gb/s data links, the switch pulsewidth should be less than 65 ps. If the data-induced jitter is less than one-tenth of the switch pulsewidth, the ERR is still evident. If it is close to the length of the switch turn-on time. The ERR will be reduced. Therefore, an automatic tuning scheme is necessary to mitigate the influence of data-induced jitter in future implementation. The working principle of CR2 is shown

<span id="page-4-2"></span><span id="page-4-0"></span>

Fig. 8. (a) Schematic and (b) working mechanism of CR2.

in Fig. [8\(b\).](#page-4-2) PU0 and PD0 consist of a window. In order to avoid the current leakage from the supply to the switch, the switch pulsewidth should be slightly smaller than the window. The non-overlap pre-driver waveform helps prevent the large short circuit current, which contributes further to energy reduction.

### *E. CR Among Four (CR4) and More Channels*

<span id="page-4-5"></span>The CR4 has six exclusive switches connecting each other, as shown in Fig. [9;](#page-5-1) with these switches, opposite-phase transition on any two channels can be utilized for the CR cycle. At time  $t_2$ , the switch controller detects the opposite data transition patterns on these four channels. It turns on S02 and S03 to share the charge between channel 0 and channels 2 and 3. Meanwhile, it also turns on S12 and S13 to share the energy between channel 1 and channels 2 and 3. Now, the charge is recycled among multiple channels through the double switches. The energy-sharing efficiency is boosted even further. This comes at the cost of higher switch area overhead; now, each channel is connected with three switches,

<span id="page-5-1"></span>

Fig. 9. (a) Working example timing diagram. (b) Equivalent topology connection of CR4.

<span id="page-5-2"></span>

Fig. 10. Topology summary of CR1, CR2, CR4, and CR8.

and the parasitic capacitance from the switches is also introduced to be an overhead. The CR efficiency of the current topology is still better than just using a depot storage capacitor among the four channels with four switches. Therefore, the exclusive switch topology is adopted. For eight channels, energy recycling can be achieved using 28 exclusive switches. The topology of the proposed generic CR IOs is summarized in Fig. [10.](#page-5-2) In summary, energy is saved at the tolerable penalty of the area overhead.

#### III. CHIP IMPLEMENTATION

#### <span id="page-5-0"></span>*A. Block Diagram of Implemented Data Links*

Fig. [11](#page-5-3) shows the circuit diagram for the verification that the CR technique can save power and keep the performance of the eye diagram and bit error rate (BER). The clock

<span id="page-5-3"></span>

Fig. 11. Block diagram of implemented data links.

forwarding scheme transmits data and the clock together. On the transmitter (TX) side, we implemented a phase-locked loop (PLL) to generate a stable clock and data generator to generate pseudorandom data PRBS7, PRBS15, PRBS23, and PRBS31. The phase interpolator (PI) is implemented for the clock phase moving to measure the BER bathtub curve. The clock divider is also implemented to measure the ERR versus different data transmission speeds. On the receiver (RX) side, the BER counter is implemented on the chip to record the bit errors. For the TSV link chip, the data rate is 2.56 Gb/s. For the T-Line link chip, the data rate is 5.12 Gb/s by applying the DDR transmission scheme. To emulate the load capacitance of the TSV link, 6-pF on-chip capacitance is implemented on each IO. T-Line link adopts a flip-chip bonding technique to minimize the parasitic load capacitance and achieves  $50-\Omega$ impedance matching.

Conventional (CONV), CR1, CR2, CR4, and CR8 topologies are implemented to show that the CR technique is effective on both the TSV and T-Line links. The TSV link is for dense IOs, heavy load, and relatively lower speed (2.56 Gb/s) applications, while the T-Line link is for relatively small load and high-speed (5.12 Gb/s) applications. Specifically, the conventional IO, CR1, and CR2 are implemented on both links. CR4 is implemented on the T-Line link. CR8 is implemented on TSV Link. CR8 here has 28 exclusive switches, which can be well-designed between bumping pad gaps to minimize the area overhead.

#### *B. Driver Circuit Optimization*

<span id="page-5-4"></span>The driver circuit schematic of the TSV and T-Line link is shown in Fig. [12.](#page-6-1) The TSV driver circuit is size optimized based on the target to drive up to 6-pF heavy load at the speed of 2.56 Gb/s. The slew rate is kept similar to the current HBM2's slew rate [\[8\], \[](#page-9-7)[22\]. T](#page-9-21)he T-Line driver circuit is optimized to drive a 50- $\Omega$  characteristic impedance channel at the speed of 5.12 Gb/s.

#### *C. Chip Micrograph and Data Link Implementation*

<span id="page-5-5"></span>Fig. [13](#page-6-2) shows the TSV-link chip and its data links. The simulation of  $[23]$  shows a 40- $\mu$ m-length TSV has 0.112-pF parasitic capacitance. If the eight-stack TSV link is 1000  $\mu$ m in length, its TSV parasitic capacitance is around 2.8 pF. The on-chip heavy parasitic capacitance of 6 pF is implemented to

<span id="page-6-1"></span>

Fig. 12. Schematic of (a) TSV driver and (b) T-Line data links.

<span id="page-6-2"></span>

Fig. 13. Micrograph of (a) TSV chip and (b) its data links.

emulate the worst scenario of the heavy parasitic TSV links. The bonding wire is around 1 mm length to connect the TX and RX to represent the TSV link.

Fig. [14](#page-6-3) shows the microphotograph of the T-Line link chip; flip-chip bonding is used to reduce the parasitic capacitance similar to current DDR chips. The T-Line link designed on the printed circuit board (PCB) has a 10 cm length with  $50-\Omega$ characteristic impedance, which is close to modern CPU and DRAM communication traces.

## IV. MEASUREMENT RESULTS

## <span id="page-6-0"></span>*A. Waveform Measurement*

The waveform (Fig. [15\)](#page-7-0) is measured at a speed of 5.12 Gb/s. The overshoot in the conventional channel is alleviated in CR1 and CR2. CR2 is more evident. In this way, CR also helps improve the power integrity, which contributes further to the

<span id="page-6-3"></span>

Fig. 14. Micrograph of (a) T-Line chip and (b) its data links.

signal integrity. It has a penalty in the area and speed, but it is tolerable up to 5.12 Gb/s.

#### *B. Eye Diagram Measurement*

Fig. [16\(a\)](#page-7-1) shows the eye diagram measurement setup of CONV, CR1, and CR2 under PRBS31 at 2.56 Gb/s for the TSV link and 5.12 Gb/s for the T-Line link. The eye opening and slew rate of CR1 are slightly inferior to CONV [Fig.  $16(b)$ ] due to relatively low charge-sharing efficiency between two unbalanced channels. However, CR2's performance is comparable with CONV, attributed to the reduced ground and VDD bouncing. The eye opening of the T-Line link [Fig.  $16(c)$ ] at 5.12 Gb/s shows that CR1's eye opening is similar to that of CONV, and the eye height and eye opening of CR2 are improved by the suppression of ground and VDD bouncing. The slew rate of the T-Line link is a little smaller than the TSV link due to the requirement of a 50- $\Omega$  impedance matching driver.

#### *C. BER Measurement*

The bathtub curve of TSV and T-Line links is measured using the on-chip BER counter, as shown in Fig. [17.](#page-7-2) The input data are PRBS31 on both links. The CR circuits have a similar curve shape to that of the conventional circuit. Similarly, its impact on data communication quality in the TSV and T-Line links is minimal. Therefore, the proposed CR has a negligible effect on signal integrity.

<span id="page-7-0"></span>

Fig. 15. Waveform measurement of T-Line chip at 5.12 Gb/s.

## *D. Power Consumption Measurement*

To quantify the maximum power consumption, we apply periodic data (e.g., 100% CR possibility) as input. The power consumption of the IO circuit consists of the TX driver circuit, CR logic, and RX circuit. The data link has a separate power rail for the power consumption measurement. Take the TSV link CR1 for example; the power portion of the TX driver is 89.3%, the CR logic is 2.98%, and the RX circuit is 7.72%. The TX driver dominates the data link power consumption. Fig.  $18(a)$  and [\(b\)](#page-7-3) shows the ERR of CR1, CR2, CR4, and CR8 under different speeds for the TSV link and T-Line link, respectively. We can observe that CR2/8 and CR1/2/4 reach over 30% and 40% ERR at the TSV link (2.56 Gb/s) and T-Line link (5.12 Gb/s), respectively; the ERR of CR4 reaches 47% greater than that of CR2 as the CR efficiency is boosted by turning on more switches [Fig.  $9(b)$ ]. The driver circuit is designed to target the slew >4 V/ns at the maximum speed of 2.56 Gb/s for the TSV link and 5.12 Gb/s for the T-Line link. The driver is relatively over-design for the lower speed. In general, there are similar trends of ERR for the TSV link's CR8 and the T-Line link's CR4. When the speed slows down, their ERR degrades. The reason behind the phenomenon is that the crowbar current (short circuit current) occupies a larger

<span id="page-7-1"></span>

Fig. 16. (a) Eye diagram measurement setup. (b) Eye diagram of CONV, CR1, and CR2 under TSV link at 2.56 Gb/s. (c) Eye diagram of CONV, CR1, and CR2 under T-Line link at 5.12 Gb/s (the eye diagram is degraded due to the limited bandwidth of 2.5 GHz of the active probe).

<span id="page-7-2"></span>

Fig. 17. BER of (a) 2.56-Gb/s TSV chip and (b) 5.12-Gb/s T-Line chip.

<span id="page-7-3"></span>

Fig. 18. ERR of CR1, CR2, CR4, and CR8 under periodic data and different speeds: (a) through TSV link and (b) through T-Line link.

portion of the data link at a higher speed, which is mitigated by the CR; therefore, the ERR is improved at a higher speed. The different ERR trends of the TSV's CR1 and CR2 versus the T-Line links are due to the overhead of the T-Line's buffer chain (added for timing matching at 5.12 Gb/s), which is a great overhead at a lower speed. In contrast, TSV's CR logic circuit power is less than 10% under all speeds compared to the entire link's power. This T-Line link buffer overhead can

<span id="page-8-2"></span>

| Links Types                            | Through Silicon Via of this paper |        |        |                 | Transmission Line of this paper |                       |        |        | <b>TSV</b> |           |        | <b>DDR</b> |        |
|----------------------------------------|-----------------------------------|--------|--------|-----------------|---------------------------------|-----------------------|--------|--------|------------|-----------|--------|------------|--------|
| Metrics                                | <b>CONV</b>                       | CR1    | CR2    | CR <sub>8</sub> | <b>CONV</b>                     | CR1                   | CR2    | CR4    | [24]       | $[25]$    | $[26]$ | $[27]$     | [28]   |
| Data Rates (Gbps) $#$                  | 2.56                              |        |        |                 | 5.12                            |                       |        |        | 3.2        | 4.8       | 9      | 7.3        | 18     |
| Power<br>Consumption/Ch.(mW)           | 17.25                             | 13.49  | 11.98  | 11.7            | 9.13                            | 5.44                  | 5.14   | 4.84   |            |           | N.A.   |            |        |
| <b>Energy Reduction</b><br>Ratio (ERR) | N.A.                              | 21.7%  | 30.4%  | 32.2%           | N.A.                            | 40.4%                 | 43.7%  | 47%    |            |           | N.A.   |            |        |
| Energy/bit $(pJ/b)$                    | 6.74                              | 5.27   | 4.68   | 4.57            | 1.78                            | 1.06                  |        | 0.95   | 1.07       | $0.37***$ | 0.29   | 1.17       | N.A.   |
| Supply VDDQ (V)                        | 1.1                               |        |        |                 |                                 |                       |        |        | 1.2        | 1.1       | 0.3    | 0.5        | 1.35   |
| Slew Rate $(V/ns)$                     | 4.4                               | 4.0    | 4.7    | N.A.            | 4.1<br>N.A.                     |                       |        |        | N.A.       |           |        |            |        |
| Eye Width (UI)                         | 0.7                               | 0.65   | 0.74   | N.A.            | 0.54                            | 0.55                  | 0.69   | N.A.   | $0.7***$   | $0.77***$ | 0.39   | 0.25       | 0.50   |
| Eye Height $(V)$                       | 0.88                              | 0.76   | 0.8    | N.A.            | 0.41                            | 0.44                  | 0.49   | N.A.   | $0.34***$  | $1.1***$  | 0.134  | N.A.       | 0.13   |
| Area/Ch. $(mm^2)$                      | 0.0016                            | 0.0039 | 0.0018 | 0.0031          | 0.013                           | $0.0178$ <sup>*</sup> | 0.0136 | 0.0139 | 0.0056     | 0.0056    | 0.0046 | 0.0246     | 0.1038 |
| Process (nm)                           | 40                                |        |        |                 |                                 |                       |        |        |            | 65        | 4      | 8          | 8      |

TABLE I PERFORMANCE COMPARISON OF THE PROPOSED GENERIC CR IOS WITH CONVENTIONAL IOS AND LATEST IOS

# The maximum ERR is measured by applying periodic data to a single channel (CONV and CR1) and two opposite periodic data to multi-channels (CR2, CR4, and CR8).

The storage capacitor of CR1 for the T-Line link is arranged under the bump pad for area saving.

Estimated value from the eye diagrams of the references.

\*\*\* The energy/bit only includes the receiver circuit.

<span id="page-8-1"></span>

Fig. 19. (a) CR probability setting of CR2; ERR of CR1, CR2, CR4, and CR8 under various CR probability and random data; (b) through TSV link at 2.56Gb/s; and (c) through T-Line link at 5.12 Gb/s.

be alleviated by applying adaptive delay cell logic to disable the buffer chains when the link operates at low speed. The data running in the links normally is random. To examine the ERR property under different CR probabilities, the periodic data and pseudorandom data are applied to different links. Take the CR2 link for example, as shown in Fig.  $19(a)$ . When DATA0's pattern of "010101..." and DATA1's pattern of "101010..." are running on channels 0 and 1, respectively, the CR probability is defined as 100%. By masking out various portions of DATA1's pattern, more CR probabilities can be derived such as 12.5%, 25%, 50%, 75%, and 87.5%. To verify fully random data streaming, PRBS23 and PRBS31 are applied on both channels, which corresponds to 24.4% CR probability. The ERR under various CR probabilities for the TSV link and the T-Line link is also measured, as shown in Fig. [19\(b\)](#page-8-1) and [\(c\),](#page-8-1) respectively. If the CR probability reaches 50%, the ERR of CR1, CR2, and CR8 is consistently over 20%; for higher CR probability, the T-Line link's ERR exceeds 40%. CR1 consists of one data pattern detection logic and one switch connecting to the load capacitor. CR2 has triple data pattern detection logic and one switch connecting to the load capacitor and CR8 has 84 data pattern detection logics and 28 switches in total. The ERR reduces dramatically for CR8 as the CR probability goes down due to the circuit and parasitic capacitance overhead. Therefore, CR1 can keep constant over the various probability due to the low extra circuit and parasitic capacitance overhead. For both links under CR1 topologies, the ERR is  $> 20\%$  under random data pattern PRBS31. To be noted, the periodic signal fits even better on the differential signaling, which can maximize the gain from the CR logic. For random data links, applying PRBS23 and PRBS31 on two channels has a 24.4% CR probability, which corresponds to around a 15% ERR for both the TSV and T-Line links. This is beneficial for high-density IOs as the thermal issue will be alleviated. Meanwhile, the power integrity and signal integrity are also improved.

Table [I](#page-8-2) summarizes the performance of two chips fabricated for CR high-speed IOs. With periodic data input at 2.56-Gb/s TSV link, CR1, CR2, and CR8 can save energy by 21.7%, 30.4%, and 32.2%, respectively. With periodic data input at 5.12-Gb/s T-Line link, CR1, CR2, and CR4 can save energy by 40.4%, 43.7%, and 47.0%, respectively. In the T-Line link chip, the switch area overhead is negligible compared to the large driver size. *C<sup>S</sup>* of CR1 in the T-Line link is placed under the bump pad to save the area.

#### V. CONCLUSION

<span id="page-8-0"></span>The proposed CR technique is applicable to I/O interfaces. It supports the TSV link of 6-pF load capacitance and the T-Line link of  $50-\Omega$  load impedance with single- and multiple-channel topologies. The proposed technique covers various speeds and has tolerable area overhead. The CR

circuits achieve a maximum of 47% ERR under periodic data and >20% ERR under random data while maintaining signal integrity.

#### ACKNOWLEDGMENT

The authors would like to thank Dr. Miaolin Zhang for his help in digital controller implementation and Prof. Longyang Lin for his help in measurements.

#### **REFERENCES**

- <span id="page-9-0"></span>[\[1\]](#page-0-2) *Rebooting the IT Revolution: A Call to Reaction*, Semiconductor Industry Association, Washington, DC, USA, Mar. 2015.
- <span id="page-9-1"></span>[\[2\] M](#page-0-3). Horowitz, "1.1 computing's energy problem (and what we can do about it)," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 10–14.
- <span id="page-9-2"></span>[\[3\]](#page-0-4) *High Bandwidth Memory (HBM) DRAM*, Standard JESD235A, JEDEC, Arlington County, VA, USA, 2013, pp. 95–96.
- <span id="page-9-3"></span>[\[4\]](#page-0-5) *DDR5 SDRAM*, Standard JESD79-5, JEDEC, Arlington County, VA, USA, 2020, pp. 444–445.
- <span id="page-9-4"></span>[\[5\] Y](#page-1-3). Asakura, *DDR Memory and Interface Design Trends*. Tokyo, Japan: Micron Japan, Sep. 2011.
- <span id="page-9-5"></span>[\[6\] K](#page-1-4). Song et al., "A 1.1 V 2y-nm 4.35 Gb/s/pin 8 Gb LPDDR4 mobile device with bandwidth improvement techniques," *IEEE J. Solid-State Circuits*, vol. 50, no. 8, pp. 1945–1959, Aug. 2015.
- <span id="page-9-6"></span>[\[7\] K](#page-1-5).-S. Ha et al., "A 7.5 Gb/s/pin 8-Gb LPDDR5 SDRAM with various high-speed and low-power techniques," *IEEE J. Solid-State Circuits*, vol. 55, no. 1, pp. 157–166, Jan. 2020.
- <span id="page-9-7"></span>[\[8\] J](#page-1-6). H. Cho et al., "A 1.2 V 64Gb 341GB/S HBM2 stacked DRAM with spiral point-to-point TSV structure and improved bank group data control," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 208–209.
- <span id="page-9-8"></span>[\[9\] L](#page-1-7). J. Svensson and J. G. Koller, "Driving a capacitive load without dissipating fCV<sup>2</sup> ," in *IEEE Symp. Low Power Electron. Dig. Tech. Papers*, Oct. 1994, pp. 100–101.
- <span id="page-9-9"></span>[\[10\]](#page-1-8) J. G. Koller and W. C. Athas, "Adiabatic switching, low energy computing, and the physics of storing and erasing information," in *Proc. Workshop Phys. Comput.*, Oct. 1992, pp. 267–270.
- <span id="page-9-10"></span>[\[11\]](#page-1-9) S. C. Chan, P. J. Restle, K. L. Shepard, N. K. James, and R. L. Franch, "A 4.6 GHz resonant global clock distribution network," in *Proc. IEEE Int. Solid-State Circuits Conf.*, Feb. 2004, pp. 342–443.
- <span id="page-9-11"></span>[\[12\]](#page-1-10) L. G. Salem and P. P. Mercier, "A 0.4-to-1 V 1MHz-to-2 GHz switchedcapacitor adiabatic clock driver achieving 55.6% clock power reduction," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2017, pp. 442–443.
- <span id="page-9-12"></span>[\[13\]](#page-1-11) F. Hamzaoglu et al., "A 3.8 GHz 153 Mb SRAM design with dynamic stability enhancement and leakage reduction in 45 nm high-k metal gate CMOS technology," *IEEE J. Solid-State Circuits*, vol. 44, no. 1, pp. 148–154, Jan. 2009.
- <span id="page-9-13"></span>[\[14\]](#page-1-12) H. Jeong et al., "Bitline charge-recycling SRAM write assist circuitry for V<sub>MIN</sub> improvement and energy saving," IEEE J. Solid-State Circuits, vol. 54, no. 3, pp. 896–906, Mar. 2019.
- <span id="page-9-14"></span>[\[15\]](#page-1-13) H. Ha, A. Pedram, S. Richardson, S. Kvatinsky, and M. Horowitz, "Improving energy efficiency of DRAM by exploiting half page row access," in *Proc. 49th Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO)*, Oct. 2016, pp. 1–12.
- <span id="page-9-15"></span>[\[16\]](#page-1-14) H. Wu et al., "A 0.95pJ/b 5.12Gb/s/pin charge-recycling IOs with 47% energy reduction for big data applications," in *Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC)*, Nov. 2022, pp. 1–3, doi: [10.1109/A-SSCC56115.2022.9980680.](http://dx.doi.org/10.1109/A-SSCC56115.2022.9980680)
- <span id="page-9-16"></span>[\[17\]](#page-2-5) *DDR4 SDRAM Standard*, Standard JESD79-4A, JEDEC, Arlington County, VA, USA, 2020.
- <span id="page-9-17"></span>[\[18\]](#page-2-6) J. W. Poulton et al., "A 1.17-pJ/b, 25-Gb/s/pin ground-referenced singleended serial link for off- and on-package communication using a process- and temperature-adaptive voltage regulator," *IEEE J. Solid-State Circuits*, vol. 54, no. 1, pp. 43–54, Jan. 2019.
- <span id="page-9-18"></span>[\[19\]](#page-4-3) M. Park et al., "A 192-Gb 12-high 896-Gbps HBM3 DRAM with a TSV auto-calibration scheme and machine-learning-based layout optimization," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2022, pp. 444–445.
- <span id="page-9-19"></span>[\[20\]](#page-4-4) C. Moon, J. Seo, M. Lee, I. Jang, and B. Kim, "A 20 Gb/s/pin 1.18pJ/b  $1149 \mu m^2$  single-ended inverter-based 4-tap addition-only feed-forward equalization transmitter with improved robustness to coefficient errors in 28 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2022, pp. 450–451.
- <span id="page-9-20"></span>[\[21\]](#page-4-5) H. Park et al., "A 0.385-pJ/bit 10-Gb/s TIA-terminated di-code transceiver with edge-delayed equalization, ECC, and mismatch calibration for HBM interfaces," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2022, pp. 452–453.
- <span id="page-9-21"></span>[\[22\]](#page-5-4) C.-S. Oh et al., "A 1.1 V 16GB 640GB/s HBM2E DRAM with a databus window-extension technique and a synergetic on-die ECC scheme," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 330–331.
- <span id="page-9-22"></span>[\[23\]](#page-5-5) Q. Deng, M.-X. Zhang, Z.-Y. Zhao, and P. Li, "A precise model of TSV parasitic capacitance considering temperature for 3D IC," in *Proc. Int. Conf. Autom., Mech. Control Comput. Eng.*, 2015, pp. 1721–1725.
- [\[24\]](#page-0-6) S. Hwang et al., "A 3.2 Gbps/pin HBM2E PHY with low power I/O and enhanced training scheme for 2.5D system-in-package solution," in *Proc. IEEE Hot Chips Symp.*, Aug. 2020, pp. 1–12.
- [\[25\]](#page-0-6) H.-G. Ko et al., "A 370-fJ/b, 0.0056 mm<sup>2</sup>/DQ, 4.8-Gb/s DQ receiver for HBM3 with a baud-rate self-tracking loop," in *Proc. Symp. VLSI Circuits*, Jun. 2019, pp. C94–C94.
- [\[26\]](#page-0-6) K. Chae et al., "A 4 nm 1.15TB/s HBM3 interface with resistor-tuned offset-calibration and in-situ margin-detection," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2023, pp. 1–3.
- [\[27\]](#page-0-6) K. Chael et al., "An 8 nm all-digital 7.3Gb/s/pin LPDDR5 PHY with an approximate delay compensation scheme," in *Proc. Symp. VLSI Circuits*, Jun. 2019, pp. C96–C97.
- [\[28\]](#page-0-6) S.-M. Lee et al., "An 8 nm 18Gb/s/pin GDDR6 PHY with TX bandwidth extension and RX training technique," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 338–339.



Han Wu (Member, IEEE) received the B.Eng. degree in electronic science and technology and the M.E. degree in microelectronics and solid-state electronics from the College of Optoelectronic Engineering, Chongqing University, Chongqing, China, in 2013 and 2016, respectively, and the Ph.D. degree from the Department of Electrical and Computer Engineering, National University of Singapore, Singapore, in 2021.

He is currently a Research Fellow with the Department of Electrical and Computer Engineering,

National University of Singapore. His research focuses on energy-efficient high-speed links, ultra-low-power system-on-chip design, and MEMS sensor interface circuit design.



Jeong Hoan Park (Member, IEEE) received the B.S. and Ph.D. degrees from the School of Electrical Engineering and Computer Science, Seoul National University, Seoul, South Korea, in 2011 and 2017, respectively.

From 2017 to 2020, he was a Research Fellow with the Electrical and Computer Engineering Department, National University of Singapore, Singapore. He is currently a Staff Engineer with the DRAM Design Team, Samsung Electronics, Hwaseong, South Korea. His main research topic is developing

low-power and low-noise multi-channel bio-signal acquisition and neural prosthesis with neuromorphic processing system-on-chip (SoC). Also, he is interested in design for test (DFT) in high-bandwidth memory (HBM).



Rucheng Jiang (Graduate Student Member, IEEE) received the B.S. degree from Northwestern Polytechnical University, Xi'an, China, in 2014, and the M.S. degree in electronic science and technology from Zhejiang University, Hangzhou, China, in 2017. He is currently pursuing the Ph.D. degree with the Department of Electrical and Computer Engineering, National University of Singapore, Singapore.

From 2017 to 2020, he worked in industry in the design of high-performance Operational

Amplifier (OPAMP) and voltage regulators. His research interests include energy-efficient high-speed and high-performance analog-to-digital converters (ADCs).



Jerald Yoo (Senior Member, IEEE) received the B.S., M.S., and Ph.D. degrees from the Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2002, 2007, and 2010, respectively.

From 2010 to 2016, he was with the Department of Electrical Engineering and Computer Science, Masdar Institute, Abu Dhabi, United Arab Emirates, where he was an Associate Professor. From 2010 to 2011, he was with the Microsys-

tems Technology Laboratories (MTL), Massachusetts Institute of Technology (MIT), Cambridge, MA, USA, as a Visiting Scholar. Since 2017, he has been with the Department of Electrical and Computer Engineering, National University of Singapore, Singapore, where he is currently an Associate Professor. He has pioneered research on low-energy body area networks (BANs) for communication/powering and wearable body sensor networks using the planar-fashionable circuit board for a continuous health monitoring system. He has authored book chapters in *Biomedical CMOS ICs* (Springer, 2010), *Enabling the Internet of Things—From Circuits to Networks* (Springer, 2017), *The IoT Physical Layer* (Springer, 2019), and *Handbook of Biochips (Biphasic Current Stimulator for Retinal Prosthesis)* (Springer, 2021). His current research interests include low-energy circuit technology for wearable biosignal sensors, flexible circuit board platform, BAN communication and powering, application-specific integrated circuits (ASICs) for piezoelectric micromachined ultrasonic transducers (PMUTs), and system-on-chip (SoC) design to system realization for wearable healthcare applications.

Dr. Yoo is serving/served as a Technical Program Committee Member for the IEEE International Solid-State Circuits Conference (ISSCC), the Co-Chair for ISSCC Student Research Preview, and the Emerging Technologies and Applications Subcommittee Chair of IEEE Asian Solid-State Circuits Conference (A-SSCC) and IEEE Custom Integrated Circuits Conference (CICC). He is also an Analog Signal Processing Technical Committee Member of the IEEE Circuits and Systems Society. He was a recipient or a co-recipient of several awards, including the IEEE International Solid-State Circuits Conference (ISSCC) 2020 Demonstration Session Award (Certificate of Recognition), the IEEE International Symposium on Circuits and Systems (ISCAS) 2015 Best Paper Award (BioCAS Track), the ISCAS 2015 Runner-Up Best Student Paper Award, the Masdar Institute Best Research Award in 2015, and the IEEE Asian Solid-State Circuits Conference (A-SSCC) Outstanding Design Award in 2005. He was the Founding Vice-Chair of the IEEE SSCS United Arab Emirates (UAE) Chapter and is the Chair of the IEEE SSCS Singapore Chapter. He served as a Distinguished Lecturer for the IEEE Circuits and Systems Society (CASS) from 2019 to 2021 and the IEEE Solid-State Circuits Society (SSCS) from 2017 to 2018.



Jung-Hwan Choi (Member, IEEE) was born in Daegu, South Korea, in 1968. He received the B.S. degree from Kyungpook National University, Daegu, in 1990, and the M.S. and Ph.D. degrees from the Korea Advanced Institute of Science and Technology, Daejeon, South Korea, in 1992 and 1997, respectively, all in electrical engineering.

In 1997, he was with Samsung Electronics, Hwaseong, South Korea, where he is involved in the design of Rambus, XDR DRAM, and high-speed I/O interface for memory applications. He is currently

at Samsung Electronics, where he is responsible for the design of DRAM interface and the development of high-speed DRAM interfaces for the next generation, including LPDDRx and DDRx. His current research interests include the design of monolithic microwave integrated circuits (ICs), highspeed memory, and high-frequency measurement.