Received 28 July 2022, accepted 8 August 2022, date of publication 17 August 2022, date of current version 7 September 2022. Digital Object Identifier 10.1109/ACCESS.2022.3199429

## **RESEARCH ARTICLE**

# A 24-Gb/s/pin Single-Ended PAM-4 Receiver With 1-Tap Decision Feedback Equalizer Using Inverter-Based Summer for Memory Interfaces

### HYUNKYU PARK<sup>10</sup>, YONG-UN JEONG<sup>10</sup>, (Member, IEEE),

AND SUHWAN KIM<sup>10</sup>, (Senior Member, IEEE)

<sup>1</sup>Department of Electrical and Computer Engineering, Seoul National University, Seoul 08826, South Korea <sup>2</sup>Samsung Electronics, Hwaseong 18448, South Korea

Corresponding author: Suhwan Kim (suhwan@snu.ac.kr)

This work was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2020-0-01300, Development of AI-specific parallel high-speed memory interface, 50%, and No.2022-0-01013, Development of DRAM PIM semiconductor technology for enhanced computing function for edge, 50%).

**ABSTRACT** Separate inverter-based summers for each eye are introduced into the decision feedback equalizer (DFE) of a single-ended four-level amplitude modulation (PAM-4) receiver for memory interfaces. The summers increase the input swing of the slicers while maintaining the advantages of inverter-based amplifiers with higher gain and lower power consumption than current-mode logic (CML) amplifiers. The high-gain summer can improve clock-to-Q delays of slicers in the PAM-4 DFE without increasing the power consumption of the slicers. This can alleviate the timing constraint that the DFE must meet to respond correctly to the previous data. The non-linear gain of the inverter-based structure can be ignored by using separate paths depending on each eye. A prototype chip was fabricated in a 65 nm CMOS process. At 24 Gb/s, the DFE can achieve a bit error rate (BER) of  $10^{-12}$  with an eye width of 100 mUI with -7.3 dB insertion loss at Nyquist frequency and the power efficiency of 0.73 pJ/b.

**INDEX TERMS** Four-level pulse amplitude modulation (PAM-4) receiver, inverter-based summer, memory interface.

#### I. INTRODUCTION

The rapid increase in IP traffic in data centers is driving the demand for high-speed and low-power memory interfaces [1], [2]. There is limited scope to increase the bandwidth of present memory interfaces, which use nonreturn-to-zero (NRZ) signaling, because of channel losses as the data-rate increases [3], [4]. Four-level pulse amplitude modulation (PAM-4) signaling can double the data-rate without increasing the clock and Nyquist frequency: four different signal levels allow two bits to be transmitted in each unit interval (UI). Therefore, the application of PAM-4 signaling to memory interfaces has already received considerable attention. [4], [5], [6].

The associate editor coordinating the review of this manuscript and approving it for publication was Yiming Huo<sup>(b)</sup>.

PAM-4 signaling has a vertical margin which is three times lower than NRZ signaling because of the four signal levels. The margin is further reduced if the levels are not evenly spaced. This makes PAM-4 signaling vulnerable to noise [7], [8], [9]. Therefore, it is attractive to use decision feedback equalizers (DFEs) in PAM-4 receivers because the equalizers can compensate for inter-symbol interference (ISI) without amplifying the noise [8]. However, the reduced vertical margin of a PAM-4 signal also lengthens the clock-to-Q delay of the slicer. In single-ended signaling, which is used in memory interfaces, the clock-to-Q delay can be increased than that in differential signaling. The lengthened clock-to-Q delay makes it difficult to satisfy the timing constraint that requires the PAM-4 DFE to decide data within one UI [8].

One of the methods to improve the clock-to-Q delay in PAM-4 DFEs is to use high-performance slicers. References [9] and [10] use current-mode logic (CML) slicers, but the CML slicers consume more power than conventional StrongArm slicers [11]. Moreover, the output swing of the slicer is small, so CML-to-CMOS amplifiers are required [10]. Alternatively, in [11], two-stage StrongArm slicers can be used to reduce clock-to-Q delay. However, an additional phase adaptation is needed to generate the different clocks for each stage, which increases power consumption. In [8], track-and-regenerate slicers are used to increase gain. However, this type of slicer requires a stage to trace the input signal, and this draws an increase in power consumption.

Another method to improve the clock-to-Q delay in receivers is to increase the peak-to-peak swing of the input of the slicers. The CML amplifier or summer with resistive source degeneration can amplify peak-to-peak swing by adjusting the source degeneration resistors [6], [8]. However, this has a limitation in increasing the gain without increasing power consumption.

In this paper, we propose a single-ended PAM-4 receiver that uses a DFE with inverter-based summers for singleended memory interfaces where low power consumption is important. The inverter-based summer can achieve high gain with lower power consumption than the CML summer which is used in the previous PAM-4 DFEs. By using the high gain, the input swing of the slicers in the DFE can be increased, which improves the clock-to-Q delay of the slicers without increasing the power consumption of the slicers. Additionally, we solve the non-linear gain of the inverterbased summer by using separate paths associated with each eye. By using the inverter-based summers and separating the paths, the clock-to-Q delay of the slicers increased by the smaller eye height of the PAM-4 signal can be improved with lower power consumption.

The rest of this brief is organized as follows: in Section II, we describe the issue of the reduced eye height of the signal in PAM-4 receivers and the method to address this issue by introducing inverter-based DFE and the design consideration of the inverter-based summer; in Section III, we explain how we implement the DFE and the receiver; in Section IV, we present measured result; and we draw conclusions in Section V.

#### **II. INVERTER-BASED SUMMER IN PAM-4 DFE**

#### A. DESIGN CONSTRAINT ON PAM-4 DFE

The sensitivity of a slicer is defined as the minimum input swing of the input signal, which the slicer requires to make a decision within one UI at a specific baud-rate [8]. As the baud- rate increases, the sensitivity required for the slicer increases. A timing constraint must be met in order for a PAM-4 DFE to respond correctly to the previous data. Fig. 1 shows the timing constraint on NRZ DFE and PAM-4 DFE, where  $V_{refH}$ ,  $V_{refM}$ , and  $V_{refL}$  are the reference voltages for the decision. The constraint can be expressed as follows:

$$T_{setup} + T_{CKQ} + T_{DLY} < 1 \text{ UI}, \tag{1}$$



FIGURE 1. Timing constraint and output of summer on (a) NRZ DFE and (b) PAM-4 DFE.



**FIGURE 2.** Simplified half circuit diagram of (a) CML summer and (b) proposed inverter-based summer.

where  $T_{setup}$  is the setup time and settling of the summer,  $T_{CKQ}$  is the clock-to-Q delay of the slicers, and  $T_{DLY}$  is the propagation delay from the slicers to the summer. The timing constraint means that the input swing of the slicers must be greater than the sensitivity.

Fig. 1(a) and (b) show the timing constraints on an NRZ DFE and PAM-4 DFE. Comparing Fig. 1(a) and Fig. 1(b), the input swing ( $V_{sum}$ ) of slicers of the PAM-4 DFE has 1/3 or less than that of the NRZ DFE, assuming that the peak-to-peak swings of  $V_{sum}$  are equal. This makes the PAM-4 DFE more difficult to meet the timing constraint than the NRZ DFE with the equal baud-rate.

#### B. INVERTER-BASED SUMMER AND ITS DESIGN CONSIDERATION

Fig. 2(a) shows a simplified circuit diagram of a CML summer which is used for previous PAM-4 DFEs [7], [8], [9]. The CML summer consists of resistive-loaded amplifier a for the main tap, pull-down current sources ( $I_{PD}$ [2:0]), and switches for the post tap. Fig. 2(b) shows a simplified circuit diagram of an inverter-based summer proposed in this paper. The inverter-based summer is composed of an inverter-based amplifier for the main tap, and an inverter-based amplifier for the post tap. which outputs the sum of tap coefficients based on the previous data.

Table 1 compares the main tap of CML summer and inverter-based summer. In this paper, the width of PMOS of

|                                           |                      | CML<br>summer        | Inverter-based<br>summer |  |  |
|-------------------------------------------|----------------------|----------------------|--------------------------|--|--|
| Current                                   |                      | I <sub>main</sub>    |                          |  |  |
| Equal<br>transcon-<br>ductance of<br>NMOS | Width<br>(NMOS/PMOS) | $W_n$ /-             | $W_n / W_p (= 2W_n)$     |  |  |
|                                           | Gain                 | $g_{mn} \cdot R_{O}$ | $2g_{mn} \cdot R_{O}$    |  |  |
| Equal gain                                | Width                | $4 \cdot W_n$ /-     | $W_n / 2W_n$             |  |  |
|                                           | Output capacitance   | $4C_d$               | $3C_d$                   |  |  |
|                                           | Input<br>capacitance | $4C_g$               | $3C_g$                   |  |  |

**TABLE 1.** Comparison of main tap of CML summer and proposed inverter-based summer with equal power consumption.

 $^{\ast}\,C_{d}$  : Drain capacitance of NMOS in main tap

 $C_g$ : Gate capacitance of NMOS in main tap.  $R_O$ : Output impedance of summers

the inverter-based amplifier for the main tap  $(W_p)$  is about two times bigger than that of NMOS  $(W_n)$  for the symmetrical characteristics of the inverter-based amplifier from the input common level [13], [14]. Since the NMOS and PMOS in the inverter-based amplifier share the drain current, the inverterbased amplifier can increase the transconductance two times bigger than the CML amplifier [13]. This doubles the gain of the inverter-based summer compared to the CML summer, assuming the output impedances of the two summers are equal to  $R_0$ . The transconductance and current of the NMOS in the CML summer and inverter-based summer can be expressed as follows:

$$g_{mn} = k \cdot \frac{W_n}{L} \cdot V_{OV}, \qquad (2)$$

$$I_{main} = \frac{k}{2} \cdot \frac{W_n}{L} \cdot V_{OV}^2 \tag{3}$$

where k is a constant based on the process parameter,  $V_{OV}$  is the overdrive voltage, and L is the length of the NMOS. In (2),  $g_{mn}$  is proportional to  $W_n$ . However, at constant  $I_{main}$ , widening  $W_n$  reduces  $V_{OV}$ , as shown in (3). Therefore, to double the transconductance with the output current constant, the width of NMOS must be increased by four times. Due to the increased width of the NMOS, the CML summer has 1.33 times larger output capacitance compared to the inverter-based summer with equal gain. Therefore, the inverter-based summer has wider bandwidth than the CML summer. In addition, the input capacitance of the inverter-based summer is smaller than that of the CML summer.

The first post tap of the CML summer consists of pulldown current sources connected to the output. Therefore, the output common level of the CML summer can be changed by the pull-down current sources. To maintain the common level, [7] and [8] use a common-level restoration, which is connected to the output node. The common-level restoration increases the complexity and output capacitance of the summer. On the other hand, in the inverter-based summer, the



91890



**FIGURE 3.** (a) Three input pulses to inverter-based summer, (b) the corresponding output pulses of inverter-based summers with threshold voltage of (b)  $V_{th,H}$ , (c)  $V_{th,M}$  and (d)  $V_{th,L}$ , and (e) simplified diagram of previous PAM-4 DFE with single summer and (f) PAM-4 DFE using inverter-based summers with separate paths.

main tap and the post tap are separated to make the operation point of the summer independent of the tap coefficients.

 $V_{IN,1st}$  in Fig. 2(b) is determined by the summing node based on the previous data. The output of the inverter-based summer can be expressed as follow:

$$V_{OUT} = (2g_{mn} \cdot V_{IN} + g_{m,1st} \cdot V_{IN,1st}) \cdot R_O, \qquad (4)$$

where  $g_{m,1st}$  is the transconductance of the inverter for the first post tap, and  $R_O$  is the output impedance of the inverterbased summer. As shown in (4), the tap coefficients can be controlled by the amplitude and polarity of  $V_{IN,1st}$  from the threshold voltage.

The gain of inverter-based amplifiers is reduced when the input signal is far from its threshold voltage [12]. Based on the characteristic, Fig. 3 shows input and output pulses of the inverter-based summers with different threshold voltages.  $V_{11}$ ,  $V_{10}$ ,  $V_{01}$ , and  $V_{00}$  are the voltage level when the signals are 2'b11, 2'b10, 2'b01 and 2'b00, respectively, without the channel loss. As shown in Fig. 3(a), three input pulses are applied to the inverter-based summers (i.e., the transition  $00 \rightarrow 11 \rightarrow 00$ ,  $00 \rightarrow 10 \rightarrow 00$ , and  $00 \rightarrow 01 \rightarrow 00$ ), where  $V_{th,H}$ ,  $V_{th,M}$ , and  $V_{th,L}$  are the three cases of threshold voltages of the inverter-based summer,  $h_{0,3,I}$ ,  $h_{0,2,I}$  and  $h_{0,1,I}$  are the main cursors of each input pulse (transition  $00 \rightarrow 11 \rightarrow 00$ ,



FIGURE 4. Block diagram of proposed PAM-4 receiver.

 $00 \rightarrow 10 \rightarrow 00, 00 \rightarrow 01 \rightarrow 00$  respectively),  $h_{1,3,I}$ ,  $h_{1,2,I}$  and  $h_{1,1,1}$  are the first post cursors of each input pulses. Fig. 3(b) shows the output pulses of the inverter-based summer with the threshold voltage of  $V_{th,H}$ , where  $h_{0,3,S}$ ,  $h_{0,2,S}$ , and  $h_{0,1,S}$ are the main cursors of each output pulse,  $h_{1,3,S}$ ,  $h_{1,2,S}$  and  $h_{1,1,S}$  are the first post cursors of each output pulse. It can be seen in Fig. 3(b) that the gain of the summer is the largest when the input signal is between  $h_{0,3,I}$  and  $h_{0,2,I}$ . Therefore, the difference between  $h_{0,3,S}$  and  $h_{0,2,S}$  is larger than the difference between  $h_{0,2,S}$  and  $h_{0,1,S}$ , and the difference between  $h_{0,1,S}$  and  $V_{00}$ . In addition, since the magnitude of the first post-cursor ISI is different for each output pulse  $(h_{1,3,S}, h_{1,2,S}, and h_{1,1,S})$ , different tap coefficients must be applied depending on the previous data. Fig. 3(c) and (d) show the output pulse when the threshold voltage of inverterbased summers with the threshold voltages of V<sub>th,M</sub> and  $V_{th,L}$ , respectively. Fig. 3(c) shows that the difference between  $h_{0,2,S}$  and  $h_{0,1,S}$  is larger than the difference between the main taps of the other output pulses. Fig. 3(d)shows that  $h_{0,1,S}$  is larger than the difference between other main taps. In addition, the first post-cursor ISIs in Fig. 3(b), (c), and (d) are different depending on the threshold voltages. Therefore, the inverter-based summers must apply different tap coefficients depending on not only the previous data, but the threshold voltages of each inverter-based summer to compensate for ISI changed by the gain of the inverterbased summers.

Because of the non-linear characteristic of the inverterbased summer, the input and output swings and the gain of the inverter-based amplifiers in PAM-4 receivers [12], [15] are restricted by the need for linearity in order to maintain the equal spacing of the signal level. To apply the inverter-based summer to the PAM-4 DFEs using a single summer [8], [9], [11] (i.e., Fig. 3(e)), the gain of the inverter-based summer should be limited. However, this limits the heights of each eye, which degrades the clock-to-Q delay of the slicers when the inverter-based amplifier is applied to PAM-4 DFEs. To overcome the non-linearity caused by the inverter-based summer, we use separate paths dedicated to each eye to the PAM-4 DFE, as shown in Fig. 3(f). By using the separate paths, the amplification of a specific eye does not affect other eyes, and the clock-to-Q delay can be improved.

In the PAM-4 DFE with the separate paths, compared to the PAM-4 DFE with the single summer shown in Fig. 3(e), the capacitance of slicers that each summer drives is reduced about from  $3 \cdot C_{slicer}$  to  $1 \cdot C_{slicer}$ . Each summer can maintain the bandwidth with about three times reduced power consumption and smaller size, which also makes the input capacitance of the inverter-based summer with the separate paths ( $C_{INV}$ ) about three times smaller than that of the summer ( $C_{single}$ ) in the DFE with single summer. Therefore, the change of power consumption due to the increased number of summers is not significant. In addition, the path separation allows that the inverter-based summer with high gain and low power consumption can be used without considering its output linearity.

In [16], a PAM-4 receiver was implemented with separate continuous-time linear equalizers (CTLEs) dedicated to each eye. However, each CTLE cannot amplify the corresponding eye. In addition, the implementation of PAM-4 receiver with only CTLEs cannot improve the signal integrity degraded by some factors such as reflection.

#### **III. IMPLEMENTATION**

#### A. OVERALL ARCHITECTURE

Fig. 4 shows the block diagram of our PAM-4 receiver. The analog front end (AFE) consists of three pairs of a CTLE and a single-to-differential (S2D) amplifier. Each pair adjusts the output common level for one eye and compensates for the pre-cursor ISI. The 1-tap DFE consists of three parts (DFE<sub>H</sub>,  $DFE_M$ , and  $DFE_L$ ) corresponding to each eye of the received signal, and each part includes the inverter-based summers. In addition, the DFE has a quarter-rate structure to retain an adequate timing margin for the feedback path in the DFE. The reference voltage generator (Vref gen.) is composed of resistor ladders which generates the reference voltages V<sub>refH</sub>, V<sub>refM</sub>, and V<sub>refL</sub>. The coefficient controller (Coeff. controller) generates the tap coefficients R<sub>PUH</sub>[2:0]/R<sub>PDH</sub>[2:0], R<sub>PUM</sub>[2:0]/R<sub>PDM</sub>[2:0], and R<sub>PUL</sub>[2:0]/ R<sub>PDL</sub>[2:0] to compensate for the different 1st post-cursor ISI associated with each eye. The IQ divider (IQ DIV) generates quadrature clocks CLK<sub>0</sub>, CLK<sub>90</sub>, CLK<sub>180</sub>, and CLK<sub>270</sub> by dividing external high-speed differential clocks CK<sub>RX</sub> and CKB<sub>RX</sub> by two. A quadrature signal corrector (QSC), consisting of four digitally controlled delay lines (DCDLs) and a QSC loop filter, is used to reduce the skew between the quadrature clocks [17]. To measure bit-error rates (BERs) for each eye, a MUX selects and outputs data based on an external selection signal (SEL).

#### **B. ANALOG FRONT END**

Fig. 5(a) shows the block diagram of the CTLE and single-to-differential amplifier in Fig. 4. The CTLE has the

1<sup>st</sup> post tap



FIGURE 5. (a) Block diagram and (b) simulated frequency response of AFE depending on change of  $C_D$ .

RC source degeneration ( $R_D$  and  $C_D$ ) controlled externally to adjust the frequency boost. Fig. 5(b) shows the simulated frequency response of the CTLE and S2D amplifier. The frequency boost at 6 GHz can be varied from 3 dB to 3.7 dB by changing  $C_D$ .

#### C. INVERTER-BASED SUMMER AND PAM-4 DECISION FEEDBACK EQUALIZER

Fig. 6(a) shows a block diagram of the inverter-based summer in DFE unit<sub>0</sub>[1] of DFE<sub>M</sub> in Fig. 4 as an example to explain the operation of the inverter-based summer. V<sub>th</sub> is the threshold voltage of inverter-based summer, and  $\Delta V$  is the swing of the input signal. This summer is composed of two inverter-based amplifiers: one for the main tap (IN) and one for the post taps (D<sub>270</sub>[2:0]), with resistors to produce the post taps. The input common level of the summer is the threshold voltage to amplify a certain eye. Fig. 6(b) shows the equivalent half circuit diagram of the inverter-based summer. In the inverter-based amplifier for the post tap, different pullup resistors (R<sub>PUM</sub>[2:0]) and pull-down resistors (R<sub>PDM</sub>[2:0]) are used to apply the different tap coefficients appropriate to the previous data. In the 1st tap, the sum of the coefficients of inverter-based summer is as follows:

$$V_{IN,1st} = \frac{R_{PD}||\frac{1}{g_{mn}}}{(R_{PD}||\frac{1}{g_{mn}}) + (R_{PU}||\frac{1}{g_{mp}})} \cdot V_{DD},$$
 (5)

where  $R_{PD}$  and  $R_{PU}$  are the total pull-down and pull-up impedances determined by the previous data, respectively, based on the previous data,  $V_{DD}$  is the supply voltage, and  $g_{mn}$  and  $g_{mp}$  are the transconductances of the diode-connected NMOSs and PMOSs transistors.



**FIGURE 6.** (a) Block diagram and (b) equivalent half circuit of inverter-based summer in DFE unit<sub>0</sub>[1] of DFE<sub>M</sub>.



**FIGURE 7.** Simulated outputs of summers (DFE Unit $_0$ [2:0]) (a) without tap coefficients and (b) with tap coefficients.

The coverage of the coefficient of the summer is from 0 mV to 250 mV ( $0.21 \cdot V_{DD}$ ). The diode-connected MOSFETs are used in the inverter-based amplifier for the first post tap to match the threshold voltage. Additionally, In the case of the inverter-based summer without diode-connected NMOSs and PMOSs, the range of tap coefficients produced by the summer is reduced. For the case that previous data is 3'b000 or 3'b111,  $V_{IN,1st}$  is tied to  $V_{DD}$  or ground regardless of the size of the pull-down and pull-up resistors. For the equalization with the characteristics like Fig. 3(a)-(d), each resistance of  $R_{PUM}[2:0]$  and  $R_{PDM}[2:0]$  must be adjusted separately. The DFE units in DFE<sub>L</sub>, and DFE<sub>H</sub> in Fig. 4 also include the summers with the same structure.

Fig. 7 shows the differential outputs of the inverter-based summers associated with  $CLK_0$  without and with the tap



FIGURE 8. Results of Monte-Carlo simulations (2700 cases) of (a) threshold voltage variation of inverter-based summer and (b) input offset of signal path including AFE and inverter-based summer.

coefficients. The simulation is under the condition of a data rate of 24 Gb/s and an insertion loss of 9 dB at 6 GHz. Compared to Fig. 7(a) and (b), the case in Fig. 7(b) can open each eye with the eye heights of 346 mV, 462 mV, and 401 mV, with a minimum height of 150 mV from 0 V (lower bound of the output of DFE Unit<sub>0</sub>[2]) and with the minimum eye width of 23 ps.

Fig. 8 (a) and (b) show the results of Monte-Carlo simulations, which show the threshold voltage variation of the inverter-based summer and the input offset of the signal path including the AFE and the summer with 2700 cases of mismatches under the process, voltage ( $0.9 \cdot V_{DD} \sim 1.1 \cdot V_{DD}$ ) and temperature ( $-40 \ ^{\circ}C \sim 125 \ ^{\circ}C$ ) variation. In the simulations, the values of  $\sigma$  is  $0.021 \cdot V_{DD}$  and  $0.012 \cdot V_{DD}$ , respectively.

In memory interfaces, the memory controller operates the optimization for the receiver using training patterns during the training sequence [18]. We also performed the procedures to decide the reference voltages and bias levels. Additionally, we designed the blocks in the receiver close together and symmetrically to reduce the mismatches [1], [19].

#### D. SLICER AND DATA BUFFER

Fig. 9(a) shows the circuit diagram of the StrongArm slicer and the buffer used in the proposed receiver. The StrongArm slicer has been widely used for memory interfaces because of the advantages of rail-to-rail output swing and ignoring static power consumption. However, the clock-to-Q delay of the slicer is affected by the swing of the input signal. Therefore, the sensitivity of the slicer must be considered to apply to PAM-4 receivers. Fig. 9(b) shows the operation and timing constraint of PAM-4 DFE at the baud rate of 12 Gbaud/s. Considering the timing constraint shown in (1), the clock-to-Q delay must be less than 31.7 ps. Fig. 9(c) shows the simulated clock-to-Q delay of the slicer depending on the input swing (i.e.,  $V_{IN}$ ) with the baud rate of 12 Gbaud/s. To reduce the clock-to-Q delay under 31.7 ps, the input swing must be larger than 90 mV. As shown in Fig. 7, the inverter-based summers can satisfy the condition.



FIGURE 9. (a) Circuit diagram of StrongArm slicer and buffer, (b) Timing parameter of PAM-4 DFE associated with timing constraint and (c) simulated change of clock-to-Q delay of StrongArm slicer.



FIGURE 10. Measurement setup, die photograph and block description of prototype chip.

#### **IV. MEASUREMENT RESULT**

Fig. 10 shows the measurement setup and a die photograph of the prototype chip. A single-ended PAM-4 transmitter [4] is used to provide an environment similar to a memory interface. There is no transmitter-side equalization so that the effect of receiver-side equalization can be measured accurately. A BER tester (Anritsu MP1800A) supplies 6 GHz clock

TABLE 2. Performance summary and comparison with other receivers.

|                                          | [6]                    | [8]                                | [9]                                   | [11]                   | [12]                                              | [16]                   | This work                                           |
|------------------------------------------|------------------------|------------------------------------|---------------------------------------|------------------------|---------------------------------------------------|------------------------|-----------------------------------------------------|
| Process [nm]                             | 28                     | 28                                 | 65                                    | 65                     | 16                                                | 1Y                     | 65                                                  |
| Input                                    | PAM-3,<br>single-ended | PAM-4,<br>differential             | PAM-4,<br>differential                | PAM-4,<br>differential | PAM-4,<br>differential                            | PAM-4,<br>single-ended | PAM-4,<br>single-ended                              |
| Data-rate<br>[Gb/s/pin]                  | 30                     | 30                                 | 28                                    | 16                     | 28                                                | 22                     | 24                                                  |
| Pin efficiency                           | 150%                   | 100%                               | 100%                                  | 100%                   | 100%                                              | 200%                   | 200%                                                |
| Equalizer (TX)                           | -                      | -                                  | 2-tap FFE*                            | 2-tap FFE              | 3 dB post-<br>cursor boost                        | -                      | -                                                   |
| Equalizer (RX)                           | CTLE, 1-tap<br>DFE     | 2-stage<br>CTLE, 2-tap<br>DFE      | CTLE, 1-tap<br>FIR & 1-tap<br>IIR DFE | CTLE, 1-tap<br>DFE     | Inverter-based<br>CTLE, ADC<br>based FFE &<br>DFE | CTLE                   | CTLE, 1-tap<br>DFE with<br>inverter-based<br>summer |
| Slicer                                   | StrongArm<br>slicer    | Track-and-<br>regenerate<br>slicer | CML slicer                            | TSSA**                 | N/A                                               | Dual-tail latch        | StrongArm<br>slicer                                 |
| Channel loss<br>(dB)                     | 6.6<br>@ 10 GHz        | 8.2<br>@ 15 GHz                    | 20.8<br>@ 14 GHz                      | 23<br>@ 8 GHz          | 32<br>@ 14 GHz                                    | 2<br>@ 5.5 GHz         | 7.3<br>@ 6 GHz                                      |
| Energy<br>efficiency<br>(receiver, pJ/b) | 0.85                   | 1.10                               | 3.24***                               | 1.71***                | 3.32***                                           | N/A                    | 0.73                                                |
| Energy<br>efficiency<br>(DFE, pJ/b)      | 0.24                   | 0.43                               | 2.96                                  | 0.2                    | N/A                                               | N/A                    | 0.29                                                |
| Energy<br>efficiency<br>(slicers, pJ/b)  | N/A                    | 0.21                               | 2.87                                  | N/A                    | N/A                                               | N/A                    | 0.09                                                |

\* Feedforward Equalizer.

\*\* Two-stage sense amplifier

\*\*\* Excluding local clock buffer

Excluding local clock burlet



FIGURE 11. (a) Insertion loss of the channel and (b) measured input eye diagram of receiver before channel and (c) after channel at 24 Gb/s.

signals to the transmitter and the prototype chip and measures the BER of the receiver. The test options of the receiver can be adjusted by using an inter-integrated circuit (I2C) connected to an external PC. The channel used for data transmission comprises SMA cables, connectors, and an 8.5-inch FR4 PCB



**FIGURE 12.** (a) Measured BER curves for different equalizations of the receiver and (b) eye diagram of 1:4 de-multiplexed data (D<sub>270</sub>[2:0]).

trace. The prototype chip was fabricated in 65 nm CMOS process and is supplied with 1.2 V. The total active area of the chip is  $0.071 \text{ mm}^2$ , of which the PAM-4 receiver occupies  $0.037 \text{ mm}^2$ .

Fig. 11(a) shows the insertion loss of the channel. At a Nyquist frequency of 6 GHz, the insertion loss is -7.3 dB. However, the pad capacitances of the prototype chip, the test board, and the bonding wire can be expected to introduce additional insertion loss [20]. Fig. 11(b) and (c) are the eye diagram of the input of the PAM-4 receiver before and after



FIGURE 13. Power breakdown of proposed PAM-4 receiver.

the channel at a data-rate of 24 Gb/s. After the channel, the eyes are closed.

Fig. 12(a) shows the measured BER curves for different equalizations of the receiver. In this measurement, the frequency boost of the CTLE is 3 dB at 6 GHz. The receiver can achieve a BER of  $10^{-8}$  without the DFE. With the DFE, the receiver achieves a BER of  $10^{-12}$  with a minimum eye width of 100 mUI under the same condition. We measured BER using PRBS-7 patterns at a data-rate of 24 Gb/s. Fig. 12(b) shows the eye diagram of 1:4 de-multiplexed output data (D<sub>270</sub>[2:0]).

Fig. 13 shows the power breakdown of the PAM-4 receiver. At 24 Gb/s, the CTLEs and S2Ds consume 6.93 mW, and the DFE consumes 7.02 mW, of which the inverter-based summers consume 4.93 mW, and the slicers consume 2.09 mW. The receiver totally consumes 17.5 mW, corresponding to energy efficiency of 0.73 pJ/b at 24 Gb/s.

Table 2 compares the performance of our PAM-4 receiver with that of previous multi-level receivers. Our PAM-4 receiver has low power consumption by using the DFE with inverter-based summers, which reduces the power consumption of the slicers.

#### V. CONCLUSION

We have presented a single-ended PAM-4 receiver for memory interfaces with a DFE which uses inverter-based summers. By maintaining the advantage of inverter-based amplifier that has lower power consumption and the high gain, the timing constraint which can be degraded by the clock-to-Q delay of the slicers can be satisfied without increasing the power consumption of the slicer. A prototype chip with the PAM-4 receiver achieved a BER of  $10^{-12}$  with a minimum eye width of 100 mUI at 24 Gb/s and an insertion loss of -7.3 dB. The power efficiency of the PAM-4 receiver is 0.73 pJ/b.

#### REFERENCES

- J.-H. Chae, H. Ko, J. Park, and S. Kim, "A quadrature clock corrector for DRAM interfaces, with a duty-cycle and quadrature phase detector based on a relaxation oscillator," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 27, no. 4, pp. 978–982, Apr. 2019.
- [2] K.-S. Ha et al., "A 7.5 Gb/s/pin LPDDR5 SDRAM with WCK clocking and non-target ODT for high speed and with DVFS, internal data copy, and deep-sleep mode for low power," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, Feb. 2019, pp. 378–380.

- [4] Y.-U. Jeong, H. Park, C. Hyun, J.-H. Chae, S.-H. Jeong, and S. Kim, "A 0.64-pJ/bit 28-Gb/s/pin high-linearity single-ended PAM-4 transmitter with an impedance-matched driver and three-point ZQ calibration for memory interface," *IEEE J. Solid-State Circuits*, vol. 56, no. 4, pp. 1278–1287, Apr. 2021.
- [5] P.-W. Chiu and C. Kim, "A 32 Gb/s digital-intensive single-ended PAM-4 transceiver for high-speed memory interfaces featuring a 2-tap time-based decision feedback equalizer and an *in-situ* channel-loss monitor," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 336–338.
- [6] H. Park, J. Song, J. Sim, Y. Choi, J. Choi, J. Yoo, and C. Kim, "30-Gb/s 1.11-pJ/bit single-ended PAM-3 transceiver for high-speed memory links," *IEEE J. Solid-State Circuits*, vol. 56, no. 2, pp. 581–590, Dec. 2021.
- [7] J. Im, D. Freitas, A. B. Roldan, R. Casey, S. Chen, C.-H.-A. Chou, T. Cronin, K. Geary, S. McLeod, L. Zhou, I. Zhuang, J. Han, S. Lin, P. Upadhyaya, G. Zhang, Y. Frans, and K. Chang, "A 40-to-56 Gb/s PAM-4 receiver with ten-tap direct decision-feedback equalization in 16-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3486–3502, Dec. 2017.
- [8] K.-C. Chen, W. W.-T. Kuo, and A. Emami, "A 60-Gb/s PAM4 wireline receiver with 2-tap direct decision feedback equalization employing trackand-regenerate slicers in 28-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 56, no. 3, pp. 750–762, Mar. 2021.
- [9] A. Roshan-Zamir, T. Iwai, Y.-H. Fan, A. Kumar, H.-W. Yang, L. Sledjeski, J. Hamilton, S. Chandramouli, A. Aude, and S. Palermo, "A 56-Gb/s PAM4 receiver with low-overhead techniques for threshold and edge-based DFE FIR- and IIR-tap adaptation in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 54, no. 3, pp. 672–684, Mar. 2019.
- [10] T. Toifl, C. Menolfi, M. Ruegg, R. Reutemann, P. Buchmann, M. Kossel, T. Morf, and M. Schmatz, "A 22-Gb/s PAM-4 receiver in 90-nm CMOS SOI technology," *IEEE J. Solid-State Circuits*, vol. 41, no. 4, pp. 954–965, Apr. 2006.
- [11] L. Tang, W. Gai, L. Shi, X. Xiang, K. Sheng, and A. He, "A 32 Gb/s 133 mW PAM-4 transceiver with DFE based on adaptive clock phase and threshold voltage in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 114–116.
- [12] K. Zheng, Y. Frans, S. L. Ambatipudi, S. Asuncion, H. T. Reddy, K. Chang, and B. Murmann, "An inverter-based analog front-end for a 56-Gb/s PAM-4 wireline transceiver in 16-nm CMOS," *IEEE Solid-State Circuits Lett.*, vol. 1, no. 12, pp. 249–252, Dec. 2018.
- [13] K. Park and W. Oh, "A 40-Gb/s 310-fJ/b inverter-based CMOS optical receiver front-end," *IEEE Photon. Technol. Lett.*, vol. 27, no. 18, pp. 1913–1933, Sep. 1, 2015.
- [14] B. Nauta, "A CMOS transconductance-C filter technique for very high frequencies," *IEEE J. Solid-State Circuits*, vol. 27, no. 2, pp. 142–153, Feb. 1992.
- [15] K. Zheng, Y. Frans, K. Chang, and B. Murmann, "A 56 Gb/s 6 mW 300 μm<sup>2</sup> inverter-based CTLE for short-reach PAM2 applications in 16 nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Apr. 2018, pp. 1–4.
- [16] T. M. Hollis et al., "An 8-Gb GDDR6X DRAM achieving 22 Gb/s/pin with single-ended PAM-4 signaling," *IEEE J. Solid-State Circuits*, vol. 57, no. 1, pp. 224–235, Jan. 2022.
- [17] H. Park, J. Park, J. W. Lee, Y. Jeong, S.-H. Jeong, S. Kim, and J.-H. Chae, "A high-accuracy and fast-correction quadrature signal corrector using an adaptive delay gain controller for memory interfaces," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2021, pp. 1–5.
- [18] K. Song, "A 1.1 V 2y-nm 4.35 Gb/s/pin 8 Gb LPDDR4 mobile device with bandwidth improvement techniques," *IEEE J. Solid-State Circuits*, vol. 50, no. 8, pp. 1945–1959, Aug. 2015.
- [19] H. Tuinhout, N. Wils, and P. Andricciola, "Parametric mismatch characterization for mixed-signal technologies," *IEEE J. Solid-State Circuits*, vol. 45, no. 9, pp. 1687–1696, Sep. 2010.
- [20] J.-H. Chae, M. Kim, S. Choi, and S. Kim, "A 10.4-Gb/s 1-tap decision feedback equalizer with different pull-up and pull-down tap weights for asymmetric memory interfaces," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 67, no. 2, pp. 220–224, Feb. 2020.



**HYUNKYU PARK** received the B.S. degree in electrical engineering from Sungkyunkwan University, Suwon, South Korea. He is currently pursuing the Ph.D. degree with Seoul National University, Seoul, South Korea.

His research interests include the design of highspeed I/O circuits, clock generation circuits, and memory interface.



**SUHWAN KIM** (Senior Member, IEEE) received the B.S. and M.S. degrees in electrical engineering and computer science from Korea University, Seoul, South Korea, in 1990 and 1992, respectively, and the Ph.D. degree in electrical engineering and computer science from the University of Michigan, Ann Arbor, MI, USA, in 2001.

From 1993 to 1999, he was at LG Electronics, Seoul. From 2001 to 2004, he was a Research Staff

Member at IBM T. J. Watson Research Center, Yorktown Heights, NY, USA. In 2004, he joined Seoul National University, Seoul, where he is currently a Professor of electrical and computer engineering. His research interests include analog and mixed-signal integrated circuits, high-speed I/O circuits, low-power sensor readout circuits, and silicon-photonic integrated circuits.

Dr. Kim has received the 1991 Best Student Paper Award of the IEEE Korea Section, the First Prize (Operational Category) in the VLSI Design Contest of the 2001 ACM/IEEE Design Automation Conference, the Best Paper Award of the 2009 Korean Conference on Semiconductors, and the 2011 Best Paper Award of the International Symposium on Low-Power Electronics and Design. He has also served as the Organizing Committee Chair for IEEE Asian Solid State Conference and the General Co-Chair and the Technical Program Chair for the IEEE International System-on-Chip (SoC) Conference. He has participated multiple times at the Technical Program Committee of the IEEE International SoC Conference, the International Symposium on Low-Power Electronics and Design, the IEEE Asian Solid-State Circuits Conference, and the IEEE International Solid-State Circuits Conference. He was the Guest Editor for the Special Issue on the IEEE Asian Solid-State Circuits Conference of IEEE JOURNAL OF SOLID-STATE CIRCUITS.



**YONG-UN JEONG** (Member, IEEE) received the B.S. degree in electrical and electronic engineering from Yonsei University, Seoul, South Korea, in 2013, and the Ph.D. degree in electrical engineering from Seoul National University, Seoul, in 2020.

He was a Postdoctoral Researcher at the Intra-University Semiconductor Research Center, Seoul National University, until 2022. In 2022, he joined Samsung Electronics, Hwaseong, South

Korea, where he has been involved in the design of DRAM. His research interests include the design of high-speed I/O circuits, clock generation circuits, display interface, and memory interface.

...