

Received August 4, 2021, accepted August 20, 2021, date of publication August 24, 2021, date of current version September 7, 2021. Digital Object Identifier 10.1109/ACCESS.2021.3107536

# A Differentiating Receiver With a **Transition-Detecting DFE for Dual-Rank Mobile Memory Interface**

### SUNGPHIL CHOI<sup>©1</sup>, (Member, IEEE), YONG-UN JEONG<sup>©1</sup>, (Member, IEEE), JOO-HYUNG CHAE<sup>(D2)</sup>, (Member, IEEE), SHIN-HYUN JEONG<sup>(D1)</sup>,

AND SUHWAN KIM<sup>[D]</sup>, (Senior Member, IEEE) <sup>1</sup>Department of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea <sup>2</sup>Department of Electronics and Communications Engineering, Kwangwoon University, Seoul 01897, South Korea

Corresponding author: Suhwan Kim (suhwan@snu.ac.kr)

This work was supported in part by the Institute of Information and Communications Technology Planning and Evaluation (IITP) grant funded by the Ministry of Science and ICT (MSIT) under Grant 2020-0-01300, in part by the Ministry of Trade, Industry and Energy (MOTIE) under Grant 10080570, and in part by Korea Semiconductor Research Consortium (KSRC) Support Program for the Development of the Future Semiconductor Device.

**ABSTRACT** The signal integrity of dual-rank LPDDR5 interface is challenged by reflections from the stub of the inactive rank within the redistribution layer (RDL). These reflections arrive at the active rank within the unit-interval of the data-rate and distort not only the post-cursor but also the pre-cursor of the signal that the decision feedback equalizer (DFE) cannot remove. We introduce a differentiating receiver that restores data through edge information to eliminate the reflective distortions, including the pre-cursor, and a transition-detecting DFE to address remaining inter-symbol interference. A prototype receiver, fabricated in a 65 nm CMOS process, eliminates pre-cursor distortion with a reflection flight time of 50 ps and 70 ps, at 8 Gb/s and 6.4 Gb/s, resulting in a wide horizontal margin at a bit-error-rate of  $10^{-12}$ . The energy efficiency is 0.80 pJ/bit at 8 Gb/s.

**INDEX TERMS** Center-pad, decision feedback equalizer (DFE), multi-rank, redistribution layer (RDL), pre-cursor.

#### I. INTRODUCTION

The emerging interest in applications such as the internet of things, autonomous vehicles, and artificial intelligence is driving the demand for memory that has higher data bandwidth and capacity. To meet these demands, the latest memory systems have higher per-pin data-rates and multi-rank configurations [1]-[3]. Particularly, mobile memory is now commonly dual-rank, with two memory dies stacked in a single package [2], as shown in Fig. 1(a). The memory controller sends data to the DQ (data) pins of the package, which are connected to each memory die by wire bonding. Thus, dual-rank mobile memory has a point-totwo-point (P22P) interface instead of a point-to-point (P2P) interface.

The associate editor coordinating the review of this manuscript and approving it for publication was Cihun-Siyong Gong<sup>10</sup>.

Unlike P2P, P22P interface has signal integrity issues due to reflections caused by inactive rank. Because low-power double data-rate 4 (LPDDR4) has an edge-pad structure where the pads are on the side of die, the stub consists of the short wire bonding alone [4], [5]. The equivalent model for wire bonding can be interpreted as an RLC lumped model [6], which has a delay of approximately 1.6 ps using 3.2 Gb/s simulation, the maximum data-rate of LPDDR4. This delay is relatively small compared to 1 unit-interval (UI) of the maximum data-rate (=312.5 ps), thus it can be negligible. However, the LPDDR4 edge-pad structure needs repeaters to transmit signals to the other side of the die at the cost of power and area. LPDDR5 has a center-pad structure [7], in which the transmitter and receiver are both located at the center of the die. This reduce the number of repeaters required, and hence result in low power consumption. A redistribution layer (RDL) is needed to connect the center-pads to the edge-bonding metal [8]. The length of the RDL should



**FIGURE 1.** (a) Dual-rank LPDDR5 interface (b) simulated single-bit response and (c) eye diagram of the receiver input to DRAM1, showing the effect of the delay  $2(t_{wb}+t_{RDL})$  encountered by the reflection in the redistribution layer of DRAM2.

be half of the die size, and the RDL has a delay of about 30 ps at 0.5 mm length [6]. The problem with dual-rank LPDDR5 is that when one of the dies is operating, the wire bonding and the RDL of the other memory die appears as a stub, and the time delay cannot be negligible, as shown in Fig. 1(a). If the time taken by the signal to traverse the wire bonding and the RDL is defined as  $t_{wb}$  and  $t_{RDL}$ , then the stub generates a series of reflections that arrive at increments of  $2(t_{wb} + t_{RDL})$  after the main signal, deteriorating signal integrity. The reflections from an RDL stub follow the main signal at much shorter intervals than those in other multi-drop interfaces [9]–[14]; they behave more like the reflections from a through-hole via stub of a PCB trace [15], [16]. The first reflection can actually return during a bit transition and cause reflective pre-cursor distortion, which leads to narrower eye width. Thus, reflections in a dual-rank LPDDR5 interface cause distortions not only post-cursor but also pre-cursor, unlike other memory interfaces. Fig. 1(b) and (c) show the simulated single-bit response and an eye diagram of the input to the receiver of the active rank in a dual-rank LPDDR5 interface.

A continuous-time linear equalizer (CTLE) and a feedforward equalizer (FFE) can remove pre-cursor inter-symbol interference (ISI). However, the CTLE is ineffective in dual-rank LPDDR5 interface, since it can eliminate pre-cursor ISI caused by insertion loss but not reflective pre-cursor distortion [17]. The FFE in the transmitter or the receiver is also not suitable for LPDDR5. In the transmitter, a fractional spaced FFE [18] with tap delay adjustment according to the channel length within 1 UI can be used to reduce the pre-cursor distortion in Fig. 1(b). However, this increases the clock path power consumption required to make fractional spaced delays. In the receiver, the FFE requires an analog delay element composed of large passive components or power-hungry active devices to delay the receiver input. Similar to TX-FFE, it is difficult to match the analog delay to the reflection delay of  $2t_{RDL}$  that varies with the channel length [19].

To address these issues, we propose a differentiating receiver with a transition-detecting decision feedback equalizer (TD-DFE) that can eliminate both reflective pre-cursor and post-cursor distortions. A differentiator and hysteresis latch allow the value of a received data bit to be determined by the direction of the corresponding transition rather than by its voltage level, so that the reflections that cause pre-cursor and post-cursor distortions are effectively eliminated. In addition, the proposed TD-DFE achieves a wider eye by compensating the remaining ISI that occurs when there is a previous bit transition.

The rest of this article is organized as follows. In Section II, we analyze the effects of a short stub on the channel. In Section III, we describe the concepts behind the proposed differentiating receiver and the TD-DFE. In Section IV, we show how the receiver was implemented. Experimental results are presented in Section V, and we draw conclusions in Section VI.

## II. ANALYSIS OF CHANNELS WITH REFLECTIVE PRE-CURSOR DISTORTION

Fig. 2(a) shows the frequency responses of four channels that mimic an LPDDR5 interface with stubs, simulated using high frequency structure simulator (HFSS). The first reflection on channel A, B, C and D arrives after  $2t_{stub}$  of 35 ps, 70 ps, 105 ps, and 140 ps, respectively. Each returning reflection creates notches in the frequency response, as shown in Fig. 2(a). The first notch frequency  $f_{notch}$  is given by [17]

$$f_{\text{notch}} = \frac{c_0}{4l \cdot \sqrt{\varepsilon_r}} = \frac{1}{4 \cdot t_{stub}},\tag{1}$$

where  $c_0$  is the speed of light,  $\varepsilon_r$  is the relative permittivity, l is the length of the stub, and  $t_{stub}$  is the delay of the stub.

Using this equation, the first notch frequency of each channel is found to be 14 GHz, 7.24 GHz, 4.74 GHz, and 3.54 GHz, respectively. Existing multi-drop interfaces [10], [11] have long stubs and operate at a much higher frequency than  $f_{\text{notch}}$ , so the  $2t_{\text{stub}}$  delay required for the first reflection to return is greater than 1 UI of the datarate. In this case, post-cursor reflection becomes the main issue and can be removed by DFE. Conversely, in a dual-rank LPDDR5 when the first reflection arrives within 1 UI, the resulting notch frequency is greater than the operating frequency and reflective pre-cursor distortion occurs.

When the channel length is physically shorter than approximately one-tenth of the wavelength of the operating frequency, an electrically short channel can be represented by a lumped model. Otherwise, the channel needs to be interpreted as a transmission line [20]. Fig. 2(b) is a graph of transmission delay versus operating frequency, in which this one-tenth condition is represented by the black curve. The colored points correspond to the four simulated channels at a range of likely LPDDR5 operating frequencies. This graph suggests that channel A can be represented by a lumped



FIGURE 2. (a) Simulated frequency responses of four LPDDR5 channels with stubs of different length. (b) Graph of transmission delay against operating frequency based on (a). The black curve shows delays which are one-tenth of the wavelength of that frequency. The colored points correspond to the four simulated channels at a range of plausible LPDDR5 operating frequencies. (c) Simulated eye diagrams for each stub and operating frequency.

model, whereas channel B - D require transmission models. The simulated eye diagrams produced by these channels are shown in Fig. 2(c). We see that channel A is not affected by reflections, even at 8 Gb/s; but the cases B3 - B4, C2 - C4, and D1 - D4 exhibit signal integrity deterioration. The notch frequency of channel D is approximately 4 GHz, and when the operating frequency reaches this value, the eye closes completely.

#### III. CONCEPTS OF DIFFERENTIATING RECEIVER WITH TD-DFE

#### A. DIFFERENTIATING RECEIVER

Differentiating receivers are used in AC-coupled links [21], [22], such as wireless interconnects, as shown in Fig. 3(a). Since the channel is AC coupled, the not-return-to-zero (NRZ) signal from the transmitter is converted into positive and negative pulses at the input to the receiver, as shown

in Fig. 3(c). Each pulse corresponds to a rising or falling edge of the transmitter output, and a hysteresis latch can be used to recover the signal output by the transmitter. The threshold value of the hysteresis latch alternates between  $V_{th\_up}$  and  $V_{th\_dn}$  based on previous data values. This arrangement means that data value is retained until a transition bit arrives, which renders any reflection negligible. The proposed receiver has a similar scheme, in which the distorted receiver input is differentiated to produce pulses corresponding to each edge, from which the original signal can be recovered using a hysteresis latch, as shown in Fig. 3(b). The latch ignores spurious pulses generated by reflection, as shown in Fig. 3(d). Therefore, the output signal is free of pre-cursor and post-cursor distortion.

A simple transmission line model of a dual-rank LPDDR5 interface is shown in Fig. 4. In the write operation, as shown in Fig. 4(a), the input signal with a voltage of  $V_{in}$ 



FIGURE 3. (a) AC-coupled link [21], [22]. (b) An LPDDR5 interface channel and proposed receiver. Operation concepts of (c) the AC-coupled receiver and (d) the proposed differentiating receiver. The effects of the reflection from the stub connected to the inactive rank of memory are shown in red.

encounters an impedance discontinuity at junction B and continues to termination C and D with a transmission coefficient of  $T_1$ . The signal transmitted to D is the main signal, which arrives with a voltage level of  $T_1V_{in}$ . The open stub at C generates a reflection with coefficient  $\Gamma_3(=1)$ . This reflection is transmitted onward to Do with a transmission coefficient of  $T_2$ . After a delay of  $2t_{RDL}$ , the reflection is added to the main signal at D. The resulting single-bit response of a dual-rank LPDDR5 interface is shown in the right of Fig. 4(a). At  $2t_{RDL}$  after the main signal has arrived, the voltage at D is  $(T_1 + T_2T_1)V_{in}$ , and the input signal has pre-cursor distortion, which can be seen as intra-symbol interference. Likewise, a reflection occurs in the read operation, as shown in Fig. 4(b). The input signal with a voltage of  $V_{in}$  encounters an impedance discontinuity at junction B and continues to termination A and C with a transmission coefficient of  $T_3$ . Since the unused rank's RDL appears as a stub, reflection with a delay of  $2t_{RDL}$  causes pre-cursor distortion in the controller receiver input A.

Since the reflection takes a longer route than the main signal due to the RDL stub, its edges will be significantly less pronounced than those of the main signal when it arrives at the receiver input. The combination of differentiator and the hysteresis latch with threshold voltages  $V_{th_up}$  and  $V_{th_dn}$  exploits this difference, allowing the reflection to be eliminated. To filter reflections correctly, the strength of the hysteresis latch should be determined by considering the channel environment and operating conditions.

#### B. TRANSITION-DETECTING DECISION FEEDBACK EQAULIZER

DFEs are commonly used in the memory interfaces to eliminate post-cursor ISI caused by channel loss and reflections in conventional multi-drop interfaces, such as DIMMs [11]–[14]. The DFEs can obtain adequate data using several previously decided UI-based taps. However, in dual-rank LPDDR5 interface, reflections have a short delay, so the first reflection affects the pre-cursor of the



**FIGURE 4.** Simple transmission-line model of the single-bit response of a dual-rank LPDDR5 interface. (a) Write operation; connections from the PCB through the distribution layer, showing the main signal (blue), together with the reflection (in red) that occurs when the upper rank is active and the lower rank is inactive, and the waveform arriving at D, showing the effect of the reflection which arrives after a delay of 2t<sub>RDL</sub>, together with its differential D'. (b) Read operation.

TABLE 1. Comparison of Dual-rank LPDDR5 and Multi-drop Interface.

|                                    | Channel                        | The first<br>reflection<br>position | Eqaulization<br>(CTLE, DFE<br>and FFE) |  |
|------------------------------------|--------------------------------|-------------------------------------|----------------------------------------|--|
| Dual-rank<br>LPDDR5 interface      | Short<br>reflective<br>channel | Pre-cursor                          | Not effective                          |  |
| Multi-drop<br>Interfaces [11]-[14] | Long<br>reflective<br>channel  | Post-cursor                         | Effective                              |  |

signal. DFE is suitable for longer reflections but not for dual-rank LPDDR5 interfaces. The salient characteristics of dual-rank LPDDR5 and multi-drop interfaces are compared in Table 1.

Since the hysteresis latch recovers data through rising or falling edges, not the level of data, the reflective pre-cursor and post-cursor distortion is filtered out. However, remaining ISI can occur when there is a previous bit transition. If the  $2t_{RDL}$  is much smaller than 1 UI, as shown in Fig. 5(a), then the falling edge of the signal is essentially unaffected by the reflection of the rising edge of the signal, and the eye of the output of the hysteresis latch has a width of 1 UI. However, as  $2t_{RDL}$  appears close to 1 UI, the differentiated rising edge of the reflection affects the differentiated falling edge of the main signal. When the signal is recovered to high-to-low transition through  $V_{\text{th dn}}$ , the hysteresis latch has an increased response time by  $t_{\rm HYS}$ . The output of the latch now has a different response time depending on the previous bit transition and has a reduced eye width of 1 UI –  $t_{HYS}$ , as shown in Fig. 5(b). We introduce a transition detector (TD),



FIGURE 5. Effects of remaining ISI in the differentiating receiver. (a) when  $2tRDL \ll 1$  UI, the output of the hysteresis latch has an open eye. (b) As 2tRDL approaches 1 UI, the eye opening decreases.

shown in Fig. 6, to achieve a wider eye in this situation. Since the output of the hysteresis latch changes depending on the previous bit transition, the TD categorizes this transition as rising, falling, or no transition. In response to rising and falling transitions, the TD produces the voltage offsets UP and DN applied to subsequent output from the hysteresis latch. This compensates for the expected delay in its operation and widens the eye.

#### **IV. ARCHITECTURE AND IMPLEMENTATION**

The receiver, shown in Fig. 7, consists of a CTLE, a differentiator, a pre-amplifier, a hysteresis latch, and a TD-DFE. The receiver has a quarter-rate architecture to reduce simultaneous switching noise (SSN) and relax a timing margin [23]. Differential clock signals are amplified in a CLK buffer before passing through an I/Q divider to produce 4-phase clocks [1]. The I/Q divider is followed by the 4-phase corrector, which compensates phase error manually. The data recovered by the receiver is demultiplexed into the four quarter-rate data signals D0, D90, D180, and D270. During our tests, D0 was monitored to measure the bit-error rate (BER).

#### A. CONTINUOUS-TIME LINEAR EQAULIZER AND DIFFERENTIATOR

To extend the receiver bandwidth, a CTLE is used with PMOS input devices for the ground termination, as shown in Fig. 8(a). Since memory interfaces use single-ended signaling, the CTLE receives a single-ended input, together with a reference voltage  $V_{\text{REF}}$  generated from REF Gen1. The CTLE has tunable boosting of 0.46 to 3.12 dB at 4 GHz, as shown in Fig. 8(b). Thanks to the CTLE's high-frequency boosting, edges of the data become steeper, which improves hysteresis latch operation. However, excessive boosting can also amplify reflections and noise. As mentioned in Section I, a CTLE can reduce pre-cursor ISI caused by insertion loss, but reflective pre-cursor distortion remains, as shown in Fig. 8(b) and (c). The red line in Fig. 8(b) shows a combined frequency response of the channel and the CTLE.



FIGURE 6. The transition detector (TD) produce the voltage offset UP and DN in response to the slower operation of the hysteresis latch caused by bit transition.

The frequency component that has disappeared due to the frequency notches cannot be fully restored by the CTLE.

The CTLE is followed by a passive RC high-pass filter which differentiates the signal. The value of R and C are 1.3 k $\Omega$  and 30 fF, respectively. The corner frequency of the RC high-pass filter is 4 GHz at maximum. Fig. 8(d) shows the frequency response of the CTLE and the differentiator. Since the output signal  $V_{\text{DIFF}}$  is proportional to the time constant RC, process variations may alter time constant RC and output signal  $V_{\text{DIFF}}$ . However, the hysteresis latch recovers data based on edge information rather than signal level, and the output signal amplified sufficiently by the following preamplifier, which is a current-mode logic buffer, ensures that the hysteresis latch operates correctly. The common-mode voltage of the pre-amplifier is generated by REF Gen2.

#### **B. HYSTERESIS LATCH**

Fig. 9(a) shows the circuit diagram of the hysteresis latch, in which positive feedback within a cross-coupled NMOS pair produces the two stable operation points. The amount of hysteresis can be varied by adjusting the tail current  $I_{\rm HYS}$ of the cross-coupled NMOS pair, which is determined by a 4-bit control signal. Fig. 9(b) shows the range of hysteresis  $(= V_{HYS})$ , which is half of the difference between  $V_{th up}$ and  $V_{\text{th dn}}$ . The hysteresis latch can have a maximum  $V_{\text{HYS}}$ of 138 mV, while the input swing must be larger than  $V_{\rm HYS}$  to operate. The output of the hysteresis latch, with the reflective pre-cursor distortion eliminated, is input to the sampler of the TD-DFE.

Since the proposed scheme restores data through edge information, which is a high-frequency component, it can be susceptible to jitter and voltage noise such as random jitter, deterministic jitter, crosstalk, and SSN. Fig. 10(a) shows the circuit-level simulation results of the hysteresis latch to show the effect of voltage noise and timing jitter on the proposed scheme. The simulation is performed at 6.4 Gb/s using a 300 mV input signal with various jitter and noise sources.



FIGURE 7. Block diagram of the proposed receiver with stub channel.



**FIGURE 8.** CTLE: (a) circuit diagram; (b) simulated frequency response, showing the tunable high-frequency boosting which peaks at 4 GHz (double arrow); (c) 8 Gb/s CTLE input and output eye diagrams at different levels of boosting, showing that the effect of reflective pre-cursor distortion remains; and (d) simulated frequency response of the CTLE and the differentiator.

Timing jitter is categorized into DJ and RJ. DJ is correlated to signal reflections, power supply noise, or limited



**FIGURE 9.** Hysteresis latch: (a) circuit diagram; (b) simulations results of hysteresis strength ( $=V_{HYS}$ ).

bandwidth, and RJ is the timing fluctuations caused by random voltage noise [24]. Reflections caused by RDL stub in LPDDR5 interface affect bit transition, which constitutes most of DJ. In addition, SSN is also included in DJ. SSN is modeled as the sum of two sine waves with 50 mV peak-to-peak voltage swing [25]. While DJ of receiver input  $DJ_{RX,p-p}$  is 101 ps, DJ of hysteresis latch output  $DJ_{HYS,p-p}$  is reduced to 46.2 ps, and SSN still remains. RJ is added as zero-crossing deviation of the receiver input based on Gaussian distribution. While RJ of receiver input  $RJ_{RX,RMS}$  is 10.8 ps, RJ of hysteresis latch output  $RJ_{HYS,RMS}$  is 11.1 ps. Crosstalk noise is classified into near-end crosstalk (NEXT) and far-end crosstalk (FEXT). Unlike NEXT, FEXT results in crosstalk-induced jitter (CIJ)



**FIGURE 10.** (a) Jitter histograms of receiver input (up) and hysteresis latch output (down); (b) jitter transfer graph of the proposed receiver.

and degrades receiver performance. Crosstalk is modeled using two adjacent channels. Despite 121 mV peak-to-peak FEXT voltage noise, our receiver operates normally due to hysteresis latch, and only CIJ remains as 41.7 ps. As can be seen from the jitter histogram, the hysteresis latch does not increase voltage noise and timing jitter but rather significantly reduce DJ, the dominant jitter due to reflection. Fig. 10 shows the simulated jitter transfer graph of the proposed receiver buffer, including a CTLE, a differentiator, a pre-amplifier, and a hysteresis latch. The simulation is performed at 8 Gb/s using a 300 mV input signal with jitter added at various frequencies. The dotted line shows the jitter transfer graph of the CTLE output, and the solid line shows that of the hysteresis latch out. The high-frequency jitter is amplified due to CTLE and pre-amplifier, but the low-frequency jitter is attenuated due to the high-pass filter characteristic of the differentiator. To minimize high-frequency jitter and noise, we add sufficient decoupling capacitors located in close proximity to the circuit. Also, efforts are required to minimize crosstalk noise at the board or package level.

#### C. TRANSITION-DETECTING DECISION FEEDBACK EQUALIZER

The quarter-rate TD-DFE consists of four units,  $DFE_0$ ,  $DFE_{90}$ ,  $DFE_{180}$ , and  $DFE_{270}$ , each of which contains a summer merged sampler, TD, and SR latch. The output of the hysteresis latch goes to all four units, which sample the data using the quarter-rate clock. The following SR latch



**FIGURE 11.** Circuit diagram of the summer merged sampler in the TD-DFE. The coefficients of the UP and DN operation of the TD-DFE are denoted as  $C_P$  and  $C_N$ , respectively.



FIGURE 12. Histogram of the input offset voltage for  $V_{CM} = 0.6 V$ ; (b) simulated delay versus  $\Delta V_{IN}$ - $V_{OS}$ .

converts the signal from return-to-zero (RZ) to non-returnto-zero (NRZ) format. Lastly, the TD determines whether a bit transition has taken place and the direction of a bit transition from the two preceding output values.

Fig. 11 shows the circuit diagram of the summer merged sampler, which has a StrongArm latch. The StrongArm latch is widely used due to its advantages of fast decision and low static power consumption. In the reset phase, CLK goes low, and the output signals OUT<sub>P</sub> and OUT<sub>N</sub> are pre-charged to supply voltage. When the clock signal CLK goes high, the sampler amplifies the input and regenerates it. The summer merged sampler performance in terms of noise and delay is shown in Fig. 12. Fig. 12(a) shows the histogram of the input offset voltage. The offset is simulated through Monte Carlo simulations with 1000 samples. For the designed  $V_{\rm CM}$ of 0.6 V, the 1- $\sigma$  value is 5.12 mV. The simulated delay of the sampler is shown in Fig. 12(b), and the total inferred noise is 1.18 mV<sub>rms.</sub> The outputs of the TD  $H_P$  and  $H_N$ identify the previous bit transition. The tap weights C<sub>P</sub> and C<sub>N</sub> are analog values, which are set by an external 5-b digital code through a DAC. In memory interfaces, the adaptation is performed through the memory controller due to limited area [26]. In the initialization and the periodic re-training between the memory and its controller, the controller can calculate and update the tap weights of the TD-DFE again.

| D <sub>P</sub> [n-1]                  | D <sub>P</sub> [n-2] | D <sub>N</sub> [n-2] | D <sub>P</sub> [n-1] | D <sub>N</sub> [n-1] | H <sub>e</sub> [n] | H <sub>N</sub> [n] |
|---------------------------------------|----------------------|----------------------|----------------------|----------------------|--------------------|--------------------|
| $D_{N[n-2]} \longrightarrow H_{P[n]}$ | 0                    | 1                    | 1                    | 0                    | 0                  | 1                  |
| 2                                     | 1                    | 0                    | 0                    | 1                    | 1                  | 0                  |
| $D_P[n-2] \longrightarrow H_N[n]$     | 0                    | 1                    | 0                    | 1                    | 0                  | 0                  |
| $D_{N[n-1]}$                          | 1                    | 0                    | 1                    | 0                    | 0                  | 0                  |
| (a)                                   |                      |                      | (b                   | )                    |                    |                    |

FIGURE 13. Block diagram of a TD, and (b) its truth table, which shows how the positive and negative output values are determined based on each of its preceding two bits.



FIGURE 14. Timing diagram of the proposed TD-DFE.

Fig. 13(a) is a block diagram of a TD, which consists of two NOR gates. The TD receives the two previous data values D[n-2] and D[n-1], and produces the outputs  $H_P[n]$  and  $H_N[n]$ with the values shown in the truth table of Fig. 13(b). The values of  $H_{\rm P}[n]$  and  $H_{\rm N}[n]$  for a rising transition, a falling transition, and no transition are (0, 1), (1, 0) and (0, 0)respectively. If there is no transition, the DFE is turned off because there will be no post-cursor ISI. As the voltage-based ISI is changed to time-based jitter through the hysteresis latch, resulting in an insufficient setup margin, the TD output compensates  $t_{\rm HYS}$  by applying voltage offset to the summer merged sampler input. For example, in the DN operation, the values of  $H_P$  and  $H_N$  is (0, 1). Therefore, the summer merged sampler has a negative voltage offset with a coefficient of C<sub>N</sub> and improves the setup margin of the summer merged sampler by reducing the jitter of the hysteresis latch output.

Fig. 14 shows an example timing diagram of the TD-DFE. When the TDs in the DFE units detect that a bit transition has occurred, the summer merged sampler applies the delays required to remove the ISI by voltage offset. The red lines of the sampler input highlight the delay added due to a previous bit transition. In this example, a preceding low-to-high transition causes a DN offset to be applied at the input of the summer merged sampler, correcting the delayed falling edge of the bit. The outputs of TD270 become H<sub>P</sub>270 = 0 and H<sub>N</sub>270 = 1, producing a DN offset at the summer merged sampler as highlighted by the green line. Fig. 14 also shows how a delayed rising edge resulting from a previous



FIGURE 15. Block diagram of the clock path.

bit transition causes an UP offset to be applied at the input of the summer merged sampler, correcting the rising edge of the following bit. When there is no preceding bit transition, there is no ISI, and the DFE is turned off (the gray boxes in Fig. 14). In the post-layout simulation, the feedback loop delay, which limits the maximum data-rate, of our TD-DFE is 72.44 ps and 107.33 ps in typical and slow process corner with a supply voltage of 1.0V.

The corrections made to the waveform in response to a preceding bit transition are designed on the basis that the corrected bit is followed by a further bit transition. If there is no subsequent transition, the input to the sampler remains constant, and there is no ISI. This makes the offset applied at the input to the sampler redundant; that redundant offset has no effect on the output of the hysteresis latch because of the threshold voltages between which it operates. In our receiver, transition detection is performed by a hysteresis latch, and bit decision is performed by a sampler of the TD-DFE. Bit error cannot be reduced through the TD-DFE, but the horizontal margin can be widened.

#### D. CLOCK PATH

Fig. 15 shows a block diagram of the clock path [1]. In this prototype chip, the differentiator and hysteresis latch are implemented only in the data path, excluding the clock path. If the complete source-synchronous system is implemented, both the data and clock path should have a similar structure to have an identical jitter profile [27]. The implemented clock path contains an input buffer, a current-mode-logic (CML)-to-CMOS converter, and a quadrature clock generator (I/Q divider). The differential clock signals CLK<sub>P</sub> and CLK<sub>N</sub> are amplified using a 2-stage CML amplifier with a negative capacitance circuit to increase its bandwidth. The clock signals are converted to CMOS voltage level through the AC-coupled inverters with resistive feedback, improving duty cycle performance [28]. The residual duty cycle distortion is shown as 4-phase skew after the I/Q divider, which can be compensated through a 4-phase corrector.

To prevent performance degradation of the TD-DFE, which has a quarter-rate structure, 4-phase skew is removed as much as possible through layout matching and modified using a 4-phase corrector, which has manually controlled 3-bit binary-weighted MOSCAP arrays. The resolution of this array is 1 ps. We detect 4-phase skew through post-layout simulation, and the residual quadrature mismatch after correction CLK<sub>0</sub>-CLK<sub>90</sub>, CLK<sub>90</sub>-CLK<sub>180</sub>, CLK<sub>180</sub>-CLK<sub>270</sub>, and CLK<sub>270</sub>-CLK<sub>0</sub> is 0.33°, 0.35°, -0.31°, and -0.33°, respectively.



**FIGURE 16.** (a) Measurement setup, and (b) die micrograph, showing the location of the proposed and conventional receivers.

#### **V. MEASUREMENTS**

Fig. 16 shows our measurement setup and a micrograph of the prototype chip, which was fabricated in 65 nm CMOS technology with a supply voltage of 1.0V. For testing, the signal quality analyzer (Anritsu MP1800A) generates the data input signal  $D_{in}$  and the differential clock signals  $CLK_P$  and  $CLK_N$ . The data enters the receiver on the FR4 PCB channel together with reflective pre-cursor distortion caused by the stub. The demultiplexed data D0 from the receiver is acquired by the signal quality analyzer, and its error detector determines the BER.

To imitate the dual-rank memory interface, we used FR4 PCB channel boards with short stubs. Fig. 17(a) shows the geometry of the microstrip line, which has a channel length of 15 mm and stub lengths of 9 and 12 mm. The measured channel frequency response is shown in Fig. 17(b). Operation of the receiver was verified at 6.4 Gb/s with a 12mm stub and at 8 Gb/s with a 9mm stub. With the 9 mm stub length, the first notch frequency is 4.45 GHz, from which it can be inferred (1) that the delay  $2t_{stub}$  experienced by the reflection is 112.4 ps. With the 12 mm stub, the first notch frequency is 3.35 GHz, and the inferred delay is 149.3 ps. The delay to the reflections produced by both of these stubs is sufficiently short to cause pre-cursor distortion. When measuring the prototype receiver, we set the hysteresis latch strength considering the channel environment to prevent incorrect bit



**FIGURE 17.** (a) Geometry of the microstrip line with a 9 or 12 mm stub on an FR4 PCB. (b) Measured frequency response of the microstrip line.



FIGURE 18. Eye diagrams of an input signal with pre-cursor distortion, and bathtub curves for a conventional receiver and for our differentiating receiver, without and with the TD-DFE: (a) at 6.4 Gb/s with a 12 mm stub, and (b) at 8 Gb/s with a 9 mm stub.

decision. To verify the effectiveness of the proposed receiver, we compared the performance of our receiver with that of a conventional receiver consisting of a CTLE, a pre-amplifier, and quarter-rate samplers.

Fig. 18 shows measured eye diagrams for a PRBS7 pattern and the BER bathtub curves at 6.4 Gb/s and 8 Gb/s with 3 dB of CTLE gain boosting. Both PCB stubs led to pre-cursor distortion due to reflection. The measured eye width of the input signal with a 12 mm stub at 6.4 Gb/s was 36.7 ps (=0.235 UI), and with a 9 mm stub at 8 Gb/s, it was 39.2 ps (=0.313 UI). To measure the BER bathtub curves, we manually adjusted the clock-data delay with a signal quality analyzer.

| TABLE 2. Per | formance summary | and comparison | with other multi- | drop memory interfaces. |
|--------------|------------------|----------------|-------------------|-------------------------|
|--------------|------------------|----------------|-------------------|-------------------------|

|                                            | ISSCC 2011<br>[9]                                  | JSSC 2015<br>[10]                                | JSSC 2017<br>[11]                                  | JSSC 2011<br>[12]                                  | TCAS-I 2009<br>[14]                                | This work                                          |
|--------------------------------------------|----------------------------------------------------|--------------------------------------------------|----------------------------------------------------|----------------------------------------------------|----------------------------------------------------|----------------------------------------------------|
| Technology [nm]                            | 130                                                | 40                                               | 45                                                 | 180                                                | 250                                                | 65                                                 |
| Supply [V]                                 | 1.2                                                | 0.9                                              | 1.1                                                | 1.8                                                | 2.5                                                | 1.0                                                |
| Data-rate [Gb/s/pin]                       | 4.8                                                | 3.75                                             | 2.9                                                | 3.8                                                | 2                                                  | 8                                                  |
| Channel                                    | 8-drop<br>0.43 inch<br>FR4 board<br>(Single-ended) | 4-drop<br>12 inch<br>FR4 board<br>(Differential) | 4-drop<br>4.72 inch<br>FR4 board<br>(Differential) | 2-drop<br>2.59 inch<br>FR4 board<br>(Single-ended) | 4-drop<br>7.54 inch<br>FR4 board<br>(Single-ended) | 2-drop<br>0.59 inch<br>FR4 board<br>(Single-ended) |
| The first reflection position              | Post-cursor                                        | Post-cursor                                      | Post-cursor                                        | Post-cursor                                        | Post-cursor                                        | Pre-cursor                                         |
| Equalization technique                     | IMBM <sup>c</sup> + CTLE                           | Multi-tone                                       | DFE                                                | DFE                                                | DFE                                                | Differentiator<br>+ TD-DFE                         |
| Timing margin [UI]<br>@ BER                | 0.73<br>@10 <sup>-10</sup> (PRBS7)                 | 0.43<br>@10 <sup>-12</sup> (PRBS15)              | 0.36<br>@10 <sup>-10</sup> (PRBS7)                 | 0.31<br>@10 <sup>-12</sup> (PRBS31)                | 0.35<br>@10 <sup>-12</sup> (PRBS7)                 | 0.464<br>@10 <sup>-12</sup> (PRBS7)                |
| <sup>a</sup> Energy efficiency<br>[pJ/bit] | 13.69                                              | 0.48                                             | 2.45                                               | 21.25                                              | 5                                                  | 0.80                                               |
| <sup>b</sup> Area [mm <sup>2</sup> ]       | N/A                                                | 0.01                                             | 0.087                                              | 0.264                                              | 0.026                                              | 0.01                                               |



FIGURE 19. Measured shmoo plots; (a) 6.4 Gb/s conventional receiver, (b) 6.4 Gb/s differentiating receiver with the TD-DFE, (c) 8 Gb/s conventional receiver, and (d) 8 Gb/s differentiating receiver with the TD-DFE.

In a real system, a memory interface has training sequences to align the clock to the center of data [26]. With the proposed receiver, the horizontal opening of the eye is improved by 0.360 UI at 6.4 Gb/s with a 12mm stub. When the TD-DFE is switched off, our receiver functions as only a differentiating receiver, and the eye width drops to 0.320 UI. The conventional receiver, which shares the chip with our receiver, produces narrower eye with a width of 0.236 UI. At 8 Gb/s with a 9 mm stub, the corresponding eye widths are 0.464 UI, 0.440 UI, and 0.320 UI. Fig. 19 shows the measured shmoo plots of receivers with PRBS7 pattern. At both 6.4 Gb/s and 8 Gb/s, the proposed receiver obtains larger vertical margins than a conventional receiver.

At 8 Gb/s, our receiver draws 6.42 mW. From a 1.0 V supply, Fig. 20 gives the power breakdown. The pre-amplifier which amplifies the differentiated signal consumes 35.1%

a, b. Receiver only c. Impedance-matched bi-directional multi-drop



FIGURE 20. Power breakdown for the proposed receiver.

of the total power. The hysteresis latch, the CTLE, and the TD-DFE respectively consume 33.6%, 16.0%, and 15.3%.

Table 2 summarizes the characteristics and performance of our receiver and compares them with other multi-drop memory interfaces.

#### VI. CONCLUSION

We have presented a novel differentiating receiver with a transition-detecting decision feedback equalizer for dualrank mobile memory interfaces. In a dual-rank memory, the temporarily inactive transmission path from the central pad to the inactive rank causes reflections that arrive at the active rank a sufficiently short time after the main signal to cause pre-cursor distortion. The use of a differentiating receiver eliminates reflections including pre-cursor as well as post-cursor, and the subsequent equalizer achieves a wider horizontal opening of the eye. A prototype chip fabricated in 65 nm CMOS process achieves a lower BER than the conventional receiver at two operating conditions at which pre-cursor distortion occurs. The energy efficiency of our receiver is 0.80 pJ/bit at 8 Gb/s.

#### REFERENCES

[1] J.-H. Chae, M. Kim, S. Choi, and S. Kim, "A 10.4-Gb/s 1-tap decision feedback equalizer with different pull-up and pull-down tap weights for asymmetric memory interfaces," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 67, no. 2, pp. 220–224, Feb. 2020.

- [2] H. Lee, J. Hwang, and H. Lee, "Optimizing ODT condition and Driver's turn-on resistance to achieve SI of LPDDR dual rank configuration," in *Proc. IEEE 21st Electron. Packag. Technol. Conf. (EPTC)*, Singapore, Dec. 2019, pp. 608–612.
- [3] B. Koo, J. Choi, K. Chae, and J. Kim, "Enabling 6.4 Gbps/pin LPDDR5 interface using bandwidth improvement techniques," presented at DesignCon, Santa Clara, CA, USA, Jan. 2019.
- [4] (2014). JEDEC Solid State Technology Association: LPDDR4. [Online]. Available: http://www.jedec.org/standards-documents/docs/jesd209-4
- [5] T.-Y. Oh, H. Chung, J.-Y. Park, K.-W. Lee, S. Oh, S.-Y. Doo, H.-J. Kim, C. Lee, H.-R. Kim, J.-H. Lee, and J.-I. Lee, "A 3.2 Gbps/pin 8 Gbit 1.0 V LPDDR4 SDRAM with integrated ECC engine for sub-1 V DRAM core operation," *IEEE J. Solid-State Circuits*, vol. 50, no. 1, pp. 178–190, Jan. 2015.
- [6] K.-B. Wu, T.-Y. Kuo, C.-C. Hung, B. Lin, I.-H. Peng, M.-T. Yang, and R.-B. Wu, "Novel RDL design of wafer-level packaging for signal/power integrity in LPDDR4 application," *IEEE Trans. Compon., Packag., Manuf. Technol.*, vol. 8, no. 8, pp. 1431–1439, Aug. 2018.
- [7] K.-S. Ha, S. Lee, Y.-S. Park, H.-J. Kwon, T.-Y. Oh, Y.-S. Sohn, S.-J. Bae, K.-I. Park, J.-B. Lee, C.-K. Lee, D. Lee, D. Moon, H.-R. Hwang, D. Park, Y.-H. Kim, Y. H. Son, and B. Na, "A 7.5 Gb/s/pin 8-Gb LPDDR5 SDRAM with various high-speed and low-power techniques," *IEEE J. Solid-State Circuits*, vol. 55, no. 1, pp. 157–166, Jan. 2020.
- [8] J.-W. Fang and Y.-W. Chang, "Area-I/O flip-chip routing for chip-package co-design considering signal skews," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 29, no. 5, pp. 711–721, May 2010.
- [9] W.-Y. Shin, G.-M. Hong, H. Lee, J.-D. Han, S. Kim, K.-S. Park, D.-H. Lim, J.-H. Chun, D.-K. Jeong, and S. Kim, "A 4.8 Gb/s impedance-matched bidirectional multi-drop transceiver for high-capacity memory interface," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2011, pp. 494–496.
- [10] K. Gharibdoust, A. Tajalli, and Y. Leblebici, "Hybrid NRZ/multi-tone serial data transceiver for multi-drop memory interfaces," *IEEE J. Solid-State Circuits*, vol. 50, no. 12, pp. 3133–3144, Oct. 2015.
- [11] H.-W. Lim, S.-W. Choi, J.-K. Ahn, W.-K. Min, S.-K. Lee, C.-H. Baek, J.-Y. Lee, G.-C. Hwang, Y.-H. Jun, and B.-S. Kong, "A 5.8-Gb/s adaptive integrating duobinary DFE receiver for multi-drop memory interface," *IEEE J. Solid-State Circuits*, vol. 52, no. 6, pp. 1563–1575, Jun. 2017.
- [12] H.-J. Chi, J.-S. Lee, S.-H. Jeon, S.-J. Bae, Y.-S. Sohn, J.-Y. Sim, and H.-J. Park, "A single-loop SS-LMS algorithm with single-ended integrating DFE receiver for multi-drop DRAM interface," *IEEE J. Solid-State Circuits*, vol. 46, no. 9, pp. 2053–2063, Sep. 2011.
- [13] H. Fredriksson and C. Svensson, "2.6 Gb/s over a four-drop bus using an adaptive 12-tap DFE," in *Proc. 34th Eur. Solid-State Circuits Conf.*, Sep. 2008, pp. 470–473.
- [14] S.-J. Bae, H.-J. Chi, Y.-S. Sohn, J.-S. Lee, J.-Y. Sim, and H.-J. Park, "A 2-Gb/s CMOS integrating two-tap DFE receiver for four-drop singleended signaling," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 56, no. 8, pp. 1645–1656, Aug. 2009.
- [15] C. Yeh, Y.-C. Tsai, C.-M. Hsu, L.-S. Liu, S.-H. Tsai, Y. H. Kao, and G.-H. Shiue, "Influence of via stubs with different terminations on timedomain transmission waveform and eye diagram in multilayer PCBs," in *Proc. IEEE Electr. Design Adv. Packag. Syst. Symp. (EDAPS)*, Taipei, Taiwan, Dec. 2012, pp. 149–152.
- [16] G.-H. Shiue, C.-L. Yeh, L.-S. Liu, H. Wei, and W.-C. Ku, "Influence and mitigation of longest differential via stubs on transmission waveform and eye diagram in a thick multilayered PCB," *IEEE Trans. Compon., Packag., Manuf. Technol.*, vol. 4, no. 10, pp. 1657–1670, Oct. 2014.
- [17] W. T. Beyene and A. Amirkhany, "Controlled intersymbol interference design techniques of conventional interconnect systems for data rates beyond 20 Gbps," *IEEE Trans. Adv. Packag.*, vol. 31, no. 4, pp. 731–740, Nov. 2008.
- [18] X. Zheng, H. Ding, F. Zhao, D. Wu, L. Zhou, J. Wu, F. Lv, J. Wang, and X. Liu, "A 50–112-Gb/s PAM-4 transmitter with a fractional-spaced FFE in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 55, no. 7, pp. 1864–1876, Jul. 2020.
- [19] A. Agrawal, J. F. Bulzacchelli, T. O. Dickson, Y. Liu, J. A. Tierno, and D. J. Friedman, "A 19-Gb/s serial link receiver with both 4-tap FFE and 5-tap DFE functions in 45-nm SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 12, pp. 3220–3231, Dec. 2012.
- [20] R. Achar and M. S. Nakhla, "Simulation of high-speed interconnects," *Proc. IEEE*, vol. 89, no. 5, pp. 693–728, May 2001.

- [21] L. Luo, J. Wilson, S. Mick, J. Xu, L. Zhang, E. Erickson, and P. Franzon, "A 36Gb/s ACCI multi-channel bus using a fully differential pulse receiver," in *Proc. IEEE Custom Integr. Circuits Conf.*, Sep. 2006, pp. 773–776.
- [22] M. Hossain and A. C. Carusone, "5–10 Gb/s 70 mW burst mode AC coupled receiver in 90-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 45, no. 3, pp. 524–537, Mar. 2010.
- [23] J.-H. Chae, Y.-U. Jeong, and S. Kim, "Data-dependent selection of amplitude and phase equalization in a quarter-rate transmitter for memory interfaces," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 67, no. 9, pp. 2972–2983, Sep. 2020.
- [24] J. F. Buckwalter and A. Hajimiri, "Analysis and equalization of datadependent jitter," *IEEE J. Solid-State Circuits*, vol. 41, no. 3, pp. 607–620, Mar. 2006.
- [25] M. Souilem, J. N. Tripathi, W. Dghais, and H. Belgacem, "An IBISlike modelling for power/ground noise induced jitter under simultaneous switching outputs (SSO)," in *Proc. IEEE 23rd Workshop Signal Power Integrity (SPI)*, Chambéry, France, Jun. 2019, pp. 1–4.
- [26] D. Kim, M. Park, S. Jang, J.-Y. Song, H. Chi, G. Choi, and S. Choi, "A 1.1-V 10-nm class 6.4-Gb/s/Pin 16-gb DDR5 SDRAM with a phase rotator-ILO DLL, high-speed SerDes, and DFE/FFE equalization scheme for Rx/Tx," *IEEE J. Solid-State Circuits*, vol. 55, no. 1, pp. 167–177, Jan. 2020.
- [27] M. Hossain and A. C. Carusone, "7.4 Gb/s 6.8 mW source synchronous receiver in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 46, no. 6, pp. 1337–1348, Jun. 2011.
- [28] Y.-H. Song, R. Bai, K. Hu, H.-W. Yang, P. Y. Chiang, and S. Palermo, "A 0.47–0.66 pJ/bit, 4.8–8 Gb/s I/O transceiver in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 5, pp. 1276–1289, May 2013.



**SUNGPHIL CHOI** (Member, IEEE) received the B.S. degree in electrical engineering from Pohang University of Science and Technology, Pohang, South Korea, in 2013. He is currently pursuing the Ph.D. degree with Seoul National University, Seoul, South Korea.

His research interests include the design of high-speed I/O circuits, clock generation circuits, and memory interfaces.





**YONG-UN JEONG** (Member, IEEE) received the B.S. degree in electrical and electronic engineering from Yonsei University, Seoul, South Korea, in 2013, and the Ph.D. degree in electrical engineering from Seoul National University, Seoul, in 2020.

He is currently a Postdoctoral Researcher with Seoul National University. His research interests include the design of high-speed I/O circuits, clock generation circuits, display interfaces, and memory interfaces.

**JOO-HYUNG CHAE** (Member, IEEE) received the B.S. and Ph.D. degrees in electrical engineering from Seoul National University, Seoul, South Korea, in 2012 and 2019, respectively.

In 2013, he joined SK hynix, Icheon-si, South Korea, as an Intern, working with the Department of LPDDR Memory Design. From 2019 to 2021, at SK hynix, his work focused on GDDR memory design. In 2021, he joined Kwangwoon University, Seoul, where he is currently an Assistant

Professor in electronics and communications engineering. His research interests include the design of high-speed and low-power I/O circuits, clocking circuits, memory interfaces, and mixed-signal in-memory computing.

Dr. Chae received the Doyeon Academic Paper Award from the Inter-University Semiconductor Center, Seoul National University, in 2020.



**SHIN-HYUN JEONG** received the B.S. degree in electrical engineering from Georgia Institute of Technology, Atlanta, GA, USA, in 2018. He is currently pursuing the Ph.D. degree in electrical engineering from Seoul National University, Seoul, South Korea.

His current research interests include the design of high-speed I/O circuits, analog/mixed-signal circuits, clock generation circuits, and memory interfaces.



**SUHWAN KIM** (Senior Member, IEEE) received the B.S. and M.S. degrees in electrical engineering and computer science from Korea University, Seoul, South Korea, in 1990 and 1992, respectively, and the Ph.D. degree in electrical engineering and computer science from the University of Michigan, Ann Arbor, MI, USA, in 2001.

From 1993 to 1999, he was with LG Electronics, Seoul. From 2001 to 2004, he was a Research Staff Member at IBM T. J. Watson Research Center,

Yorktown Heights, NY, USA. In 2004, he joined Seoul National University, Seoul, where he is currently a Professor in electrical and computer engineering. His research interests include analog and mixed-signal integrated circuits, high-speed I/O circuits, low-power sensor readout circuits, and silicon-photonic integrated circuits.

Dr. Kim has received the 1991 Best Student Paper Award of the IEEE Korea Section and the First Prize (Operational Category) in the VLSI Design Contest of the 2001 ACM/IEEE Design Automation Conference, the Best Paper Award of the 2009 Korean conference on semiconductors, and the 2011 Best Paper Award of the International Symposium on Low-Power Electronics and Design. He has also served as the Organizing Committee Chair for IEEE Asian Solid State Conference and the General Co-Chair and the Technical Program Chair for the IEEE International System-on-Chip (SoC) Conference. He has participated multiple times on the Technical Program Committee of the IEEE International Soc Conference, the International Symposium on *Low-Power Electronics and Design*, the IEEE Asian Solid-State Circuits Conference, and the IEEE International Solid-State Circuits Conference. He served as a Guest Editor for the IEEE JOURNAL OF SOLID-STATE CIRCUITS Special Issue on the IEEE Asian Solid-State Circuits Conference.