

Received July 22, 2021, accepted August 16, 2021, date of publication August 24, 2021, date of current version September 2, 2021. Digital Object Identifier 10.1109/ACCESS.2021.3107540

# A 1.69-pJ/b 14-Gb/s Digital Sub-Sampling CDR With Combined Adaptive Equalizer and Self-Error Corrector

YOONJAE CHOI<sup>®1</sup>, (Graduate Student Member, IEEE), SEWOOK HWANG<sup>3</sup>, (Member, IEEE), YEONHO LEE<sup>04</sup>, (Member, IEEE), HYUNSU PARK<sup>2</sup>, (Graduate Student Member, IEEE), JONGHYUCK CHOI<sup>001</sup>, (Graduate Student Member, IEEE), JINCHEOL SIM<sup>1</sup>, (Graduate Student Member, IEEE), AND CHULWOO KIM<sup>D1</sup>, (Senior Member, IEEE) <sup>1</sup>Department of Electrical Engineering, Korea University, Seoul 02841, South Korea

<sup>2</sup>Department of Semi-Conductor System Engineering, Korea University, Seoul 02841, South Korea

<sup>4</sup>SK Hynix Inc., Icheon 17336, South Korea

Corresponding author: Chulwoo Kim (ckim@korea.ac.kr)

This work was supported in part by Samsung Electronics Company, Ltd., under Grant IO201210-08000-01, and in part by the Brain Korea 21 FOUR Project in 2021.

**ABSTRACT** This paper presents an area- and energy- efficient digital sub-sampling clock and data recovery (CDR) with combined adaptive equalizer and self-error corrector (SEC). Using the digitized phase difference between the incoming data and the full-rate output clock of a digitally controlled oscillator (DCO) for both the equalizer adaptation and clock recovery loop, the proposed adaptive equalizer is combined with the CDR by sharing its adaptation loop including a sub-sampling phase detector (SSPD) and a digital logic circuit. Consequently, the active area and power dissipation for the adaptive equalizer are reduced. Furthermore, the SEC is proposed to improve the high-frequency jitter tolerance of the CDR. The SEC detects bit errors by observing the comparator decision and corrects the errors without any data encoding or complicate circuits. The out-of-band jitter tolerance is improved by 22.6% at 100 MHz for 17.2-dB loss channel with  $< 10^{-12}$  bit-error rate (BER) with the proposed SEC. The SEC is applicable to various receivers with compact design at a low cost. The prototype receiver consumes 23.9 mW at 14-Gb/s and occupies  $0.007 \text{ mm}^2$  in a 28-nm CMOS technology.

INDEX TERMS Adaptive equalizer, clock and data recovery (CDR), continuous-time linear equalizer (CTLE), decision feedback equalizer (DFE), jitter tolerance, receiver, self-error corrector (SEC), sub-sampling.

#### I. INTRODUCTION

The demand for a higher data bandwidth in wireline interfaces continues to increase with the development of CMOS technology performance. To meet this requirement while assuring a low bit-error rate (BER), the role of the equalizer is becoming increasingly important because of the severe channel conditions at higher data rates. The equalizer needs to properly compensate for the channel loss and eliminate intersymbol interferences (ISIs), which degrades the BER at the

The associate editor coordinating the review of this manuscript and approving it for publication was Cihun-Siyong Gong<sup>D</sup>.

receiver side. However, the different channel conditions limit the performance of the equalizer. Hence, an adaptive equalization is adopted as an essential technique for high-speed wireline transceivers to cope with the unpredictable channel loss and fully utilize the equalizer performance.

For the equalizers at the receiver side, a continuous-time linear equalizer (CTLE) and a decision feedback equalizer (DFE) are the most popular topologies. They are commonly employed together to deal with the large channel loss at high data rates. Various equalizer adaptation techniques have been introduced to adapt those equalizers [1]-[14]. However, CTLE and DFE are usually adapted in different

<sup>&</sup>lt;sup>3</sup>Butterfly Network Inc., Guilford, CT 06437, USA



**FIGURE 1.** (a) Conventional separate implementation of adaptive equalizer and CDR, (b) combined adaptive equalizer and CDR.

ways, thereby increasing the power and hardware costs. The spectrum balancing technique has been widely used for CTLE adaptation [1]–[4]. However, it suffers from process, voltage, and temperature (PVT) variations. The CTLE adaptation based on the slope-detection method requires large power dissipation [5]. On the other hand, the DFE is mostly adapted in digital methods. Recently, adaptation based on the sign-sign least-mean-square (SS-LMS) algorithm has become the most popular scheme, because of its simple implementation [6]–[8].

To save power and hardware costs, methods of merging the adaptations for both equalizers have been reported. The spectrum balancing technique is employed for the CTLE and DFE in [9]. However, such an analog method is vulnerable to PVT variations and not scalable to the CMOS process shrink. In [11], the adaptation based on the SS-LMS algorithm is adopted to both equalizers. However, the equalizer adaptation based on SS-LMS algorithm requires an additional powerhungry error sampler.

A clock and data recovery (CDR) is also an important option to achieve a low BER in the high-speed receivers. In most cases, the equalizer adaptation and clock recovery loop are implemented separately [10], [11], as shown in Fig. 1(a). It uses edge and error samplers, resulting in increased power and hardware costs from the front-end circuits and high-speed clock buffers. Moreover, separate digital logic circuits are required. In [12] and [13], the adaptive equalizer is combined with the CDR. However, they still require large power costs from the additional high-speed comparators and clock buffers. This paper presents an equalizer adaptation scheme to further reduce the power dissipation and hardware costs. Unlike the conventional separate implementations, in the proposed receiver, the digital sub-sampling CDR and adaptive equalizer including the CTLE and DFE are combined in an effective way that reduces the high-speed comparators and clock buffers, as shown in Fig. 1(b). They share a sub-sampling phase detector (SSPD), a CML-to-CMOS converter and digital logic to enhance the power efficiency and save the active area simultaneously.

Furthermore, an error correction scheme is proposed in this paper to enhance the high-frequency jitter tolerance of



FIGURE 2. Top block diagram of the proposed receiver.

the CDR, referred to as self-error corrector (SEC). As the data rate increases, the received data are more affected by high-frequency jitter. Several techniques have been proposed to enhance jitter tolerance [14]-[16]. However, jitter tolerance enhancement using a gate-digital-controlled-oscillator (GDCO) requires an additional circuit for the GDCO frequency preset [14]. In addition, increasing the jitter tracking bandwidth increases the unwanted clock jitter [15]. A realtime phase alignment scheme without increasing the CDR bandwidth is proposed in [16]. However, it necessitates complicate circuits that require large power and area. The proposed scheme aims to enhance the high-frequency jitter tolerance with low cost. By sensing the output of a comparator, the SEC detects and corrects the bit error due to the lack of sampling margin. Consequently, the SEC alleviates the necessity of additional complicate circuits or CDR bandwidth extension to enhance the out-of-band jitter tolerance.

The paper is organized as follows. Section II presents the overall architecture of the proposed receiver, the ADC-based sub-sampling CDR, and the combined adaptive equalizer along with its algorithm. Section III presents the SEC and describes its operation. In Section IV, the circuit implementation details are presented. Section V presents the measurement results, and Section VI concludes this paper.

# II. ADC-BASED SUB-SAMPLING CDR AND COMBINED ADAPTIVE EQUALIZER

## A. ARCHITECTURE OF THE PROPOSED RECEIVER

The overall architecture of the proposed receiver is shown in Fig. 2. The proposed architecture is composed of the adaptive equalizer, digital sub-sampling CDR, and SEC. In the proposed receiver, the phase error between the incoming data and digitally controlled oscillator (DCO) output is acquired by the sub-sampling phase detection technique [17]. Then, it is fed to the digital logic block and utilized for both clock recovery and equalizer adaptation. Consequently, the adaptive equalizer can be combined with the digital CDR by sharing the equalizer adaptation loop. The operation details of the sub-sampling phase detection, clock recovery, and equalizer adaptation are described in this section.

The equalizer is composed of a CTLE and a 1-tap DFE and its adaptation loop consists of CML-to-CMOS converter, frequency divider, phase interpolator (PI), SSPD, and digital logic circuits, which are shared with the CDR to save the area and power cost. The CDR receives 14-Gb/s data and recovers the full-rate 14 GHz clock. After the data recovery, data decision errors in the sampled data are corrected by the following SEC. The frequency divider divides the frequency of the equalized data by 8 to meet the maximum operating speed of the following SSPD and digital logic circuit in the given process. The impact of the data division on the CDR will be discussed in Section II-C. Even though the data division is used in this prototype for design simplicity, pattern filtering can be used for better performance, as described in [13].

The phase offset between the data at the comparator, which is sampled for data recovery, and the divided data at the SSPD, which is the reference for the clock recovery, exists because of the additional frequency division path delay; it should be cancelled to maximize the sampling margin of the comparator. Hence, the digitally controlled PI is employed to control the data phase after the frequency division and to compensate for the phase offset. Moreover, the PI makes the slight clock timing adjustment for the SEC, and it is manually controlled with the external test options. Then, the SSPD compares the phase of DCO clock with the phase of delaycompensated data. The digitized phase error is fed to the clock recovery and the equalizer adaptation logic which is synthesized.

### **B. ADC-BASED SUB-SAMPLING PHASE DETECTION**

For this implementation, the phase error is acquired in the voltage domain by the SSPD based on a flash analog-todigital converter (ADC) [18]. A simplified diagram of the SSPD is shown in Fig. 3(a). It is composed of sample-andhold switches and an ADC. The frequency of the equalized data is divided by 8 to satisfy the timing constraint of the ADC in a given process. By sampling the full-rate output clock of the DCO with the rising edge of divided data (D<sub>DIV</sub>) using the sample-and-hold switches, the analog voltage corresponding to the phase error can be obtained. Then, the ADC quantizes the sampled voltage into the digital domain, and thus, the output digital code (PD<sub>OUT</sub>) is obtained. To achieve a linear phase to voltage conversion, the slope of clock edge should not be steep in the SSPD. The clock waveform is described as a RC waveform because the inverter-based buffer is used. From the Monte Carlo simulation results, the mean and standard deviation of the clock rise time in the SSPD of the prototype receiver is 22.9 ps and 934 fs, respectively. For the CDR operation, the linear conversion range is not a major concern [17]. However, a calibration circuit can be used to obtain a higher linearity against PVT variations and deal with a large input jitter [22].



**FIGURE 3.** (a) Simplified block diagram of ADC-based SSPD, and (b) its output curves.

A higher ADC resolution improves the performance of the SSPD and thus, reducing the recovered clock jitter. Moreover, the accuracy of the equalizer adaptation can be improved. However, as the resolution of flash ADC increases, the number of comparators and their power dissipation also increase exponentially. Consequently, the ADC resolution should be determined carefully considering the trade-off between its performance and the power dissipation. Fig. 3(b) illustrates the relationships between the detected phase error and the recovered clock phase depending on the ADC resolution. The black solid line shows the SSPD output curve when the number of the ADC bit is 1, and the curve converges to the solid line as the ADC bit increases to infinite value. The phase detection resolution of the SSPD is proportional to the number of ADC bits. However, the resolution improvement by the ADC bit increment is not linear and its effect is negligible when the ADC bit is over 5. In this work, a 3.17-bit flash ADC is adopted to achieve a cost-effective performance. The 3.17-bit rather than 3-bit is chosen to minimize the switching operation of digital loop filter (DLF) by adopting a middle level. Furthermore, the dynamic range of the ADC is set to a narrow range in the vicinity of the CDR locking point to improve the effective phase detection resolution close to 5-bit. This will be discussed in detail in Section IV-C.

#### C. SUB-SAMPLING CDR

Fig. 4 shows the overall concept of the sub-sampling CDR. The SSPD samples the clock with rising edge of  $D_{DIV}$ , as illustrated in Fig. 4(a). The negative edge of clock (the positive edge of CLKB) is sampled with the SSPD, because the positive edge is used for data sampling. Then, one of the nine digital codes (0, 1, 2, 3, 4, 5, 6, 7, 8) is acquired depending on the phase error, as shown in Fig. 4(b). If the data phase is leading the clock phase, the SSPD outputs the PD<sub>OUT</sub> that is smaller than 4. Conversely, the PD<sub>OUT</sub> is larger than 4 if the data phase is lagging the clock phase. The PD<sub>OUT</sub> is fed to the digital logic and converted to the signed digital



**FIGURE 4.** Concept of sub-sampling CDR: (a) operation of SSPD, and (b) its output depending on the phase error.

codes (-4, -3, -2, -1, 0, 1, 2, 3, 4), which is referred to as ERR. The clock recovery loop updates the DLF at every rising edge of the D<sub>DIV</sub> to control the output phase of DCO, such that the ERR converges to the 0, and the clock is aligned to D<sub>DIV</sub>.

For characterization of the proposed sub-sampling CDR, the s-domain transfer function of the proposed CDR can be derived as

$$H_{open}(s) = G_{SSPD}\left(K_P + \frac{K_I f_{REF}}{s}\right) \frac{K_{DCO}}{s}$$
(1)  
$$H_{closed}(s) = \frac{H_{OPEN}(s)}{s}$$

$$= \frac{1 + H_{OPEN}(s)}{s^2 + G_{SSPD}K_{DCO}(K_{PS} + K_{I}f_{REF})}$$
(2)

where  $G_{SSPD}$  is the SSPD gain,  $K_P$  and  $K_I$  are the proportional and integral gain of the DLF, respectively; and  $K_{DCO}$  is the resolution of DCO. Moreover,  $G_{SSPD}$  can be expressed as

$$G_{SSPD} = \frac{V_{DD} 2^B}{V_{range}} \tag{3}$$

where  $V_{range}$  and *B* are the dynamic range and resolution of the SSPD. Even if the data are divided by *N*, the division ratio is not added to the transfer function because the sub-sampling virtually multiplies the input phase by *N* [18]. Consequently, the sub-sampling with the data divided-by-*N* does not reduce the CDR gain. However, it inevitably reduces the amount of phase information and operating frequency of the CDR loop. Therefore, the CDR bandwidth and jitter tolerance are degraded as shown in Fig. 5(a). To improve the degraded jitter tolerance, the reduced CDR bandwidth is compensated using the multi-bit phase detection because the CDR bandwidth increases with an increase in the resolution of the SSPD. MATLAB simulations are performed using the parameters obtained from the post-layout simulations. Fig. 5(b) shows



FIGURE 5. Effect of sub-sampling: (a) CDR gain, and (b) CDR bandwidth.

the calculated CDR bandwidth depends on the resolution of SSPD and *N* assuming other parameters are identical. According to the simulation results, the bandwidth of the designed CDR is 10 MHz. Because of the variations  $(3\sigma)$  of clock slope in the SSPD of the prototype receiver, the CDR bandwidth range is between 8.58 and 11.0 MHz.

# D. COMBINED ADAPTIVE EQUALIZATION

The equalizer adaptation is performed in a similar manner by reusing the existing hardware, which are already used in the clock recovery loop. The distribution of the received data edge is closely related to the degree of equalization, and it can be utilized for adaptive equalization [19]. The bandwidth limitation of the channel causes ISIs in the data during transmission through the channel. The impact of ISIs is demonstrated as the widen data edge distribution; thus, a large standard deviation. The equalizer compensates for the channel loss to eliminate ISIs and the standard deviation of the data edge distribution reduces after the equalization. Therefore, the distribution of equalized data edge has the narrowest distribution when they are optimally compensated. However, if the data are under- or over-compensated, the standard deviation of the data edge becomes larger compared to that of the optimally equalized data. Therefore, the standard deviation of the data edge distribution indicates the equalization status, and it can be exploited for the equalizer adaptation. However, the adaptive equalizer based on the data edge distribution may have a smaller voltage margin compared to that of the adaptation based on the voltage margin or BER depending on the channels. According to the post-layout simulation results for the target channel model, the eye width of the adaptive equalizer output is improved by 2.11%, whereas the eye height decreased by 2.77% compared to those of the equalized data eye with the maximum voltage margin. However, an improved eye width at the cost of voltage margin is more beneficial for the CDR because it reduces the jitter of the recovered clock and timing uncertainty [20]. From the post-layout simulation results, the eye height is greater than 300 mV after the equalization.

The standard deviation of the data edge distribution can be extracted by the long-term time domain analysis of the digitized phase difference between the transmitted data and recovered clock. To obtain the standard deviation, the square of ERR (ERR<sup>2</sup>) is calculated as shown in Fig. 4(b). If the ERR<sup>2</sup> is accumulated over time, a digital code (ERR<sub>ACC</sub>) corresponding to the standard deviation can be obtained because



FIGURE 6. Obtained ERR<sub>ACC</sub>: (a) under- or over-equalized case, (b) optimally equalized case, (c) relations with equalizer coefficients, and (d) simulation results.

the ERR is already normalized from the PD<sub>OUT</sub> and its mean is 0, similar to the center of the edge. For the under- or over-equalized case, the ERR<sub>ACC</sub> rises sharply because the sampling point fluctuates significantly from the center of the edge, as illustrated in Fig. 6(a). However, if the data is optimally compensated, the ERR<sub>ACC</sub> rises relatively slowly as shown in Fig. 6(b). Thus, the standard deviation of the data edge for finding the optimum equalizer coefficients can be acquired from the digitized phase error, which has been already used in the clock recovery loop. The optimum equalizer coefficients can be obtained in an effective way, and it is applicable to both the CTLE and DFE. Fig. 6(c) illustrates the relations between the ERR<sub>ACC</sub> and the degree of equalization. Similar to that of the recovered clock phase, the ERR<sub>ACC</sub> shows the lowest value when the data are optimally



FIGURE 7. Flowchart of proposed algorithm for clock recovery and equalizer adaptation.

compensated because the data edge has the narrowest distribution. The ERR<sub>ACC</sub> increases as the data are over- or undercompensated. The post-layout simulation is performed to verify the proposed equalizer adaptation method. The equalizer code is swept from the minimum (0) to maximum (30) while the recovered clock edge is aligned with the divided data edge. Fig. 6(d) shows the obtained ERR<sub>ACC</sub> curve. It has the smallest value at the optimum point, and the equalizer is adapted to this code. The equalized data eye has the largest horizontal margin at this point. The simulation results show that the obtained ERR<sub>ACC</sub> can be used to find the optimum equalizer code.

# E. CLOCK RECOVERY AND EQUALIZER ADAPTATION ALGORITHM

A flowchart of the proposed algorithm for the clock recovery and the equalizer adaptation is shown in Fig. 7. The clock recovery and the equalizer adaptation are performed simultaneously until the equalizers are adapted. Thereafter, the equalizer codes are locked to the optimum settings and only the clock recovery loop works to continuously track the jitter of the incoming data.

For the clock recovery loop, the ERR is used to update the DLF. At each rising edge of the  $D_{DIV}$ , the digitized phase error is fed from the SSPD, and the digital logic updates the DLF through the proportional ( $k_p$ ) and integral ( $k_i$ ) paths. The DLF controls the DCO output frequency to increase, decrease, or hold depending on the ERR value. Hence, the recovered clock is aligned with the incoming data and tracks the data jitter.

At the equalizer adaptation loop, the CTLE code  $(CTLE_{CO})$  and DFE code  $(DFE_{CO})$  are adapted to the



**FIGURE 8.** Concept of proposed self-error correction technique: (a) bit error in receiver, and (b) bit error correction with SEC.

optimum equalizer settings. The CTLE and 1-tap DFE are adapted in this work. However, when they are adapted simultaneously, calibration collision may occur and the complexity of the hardware increases. Therefore, the DFE is adapted first in this work. To guarantee a stable adaptation of the DFE, the CTLE should open the data eye to minimize the decision error at the slicer and the DFE error propagation. Thus, the first attempt for the DFE adaptation is performed while the CTLE codes are set to their maximum values. At the reset state, the equalizer codes are set to the highest setting and the CTLE adaptation mode is turned off (MODE = 0). The ERR<sup>2</sup> is accumulated 72 times for each equalizer code to obtain the setting with the smallest ERRACC, beginning from the maximum compensation setting. The accumulations for all DFE codes are performed until the code becomes 0. Then, the adaptation loop returns the equalizer setting to the code with the smallest ERRACC, which is the optimum equalizer code. For a lossy channel that requires the maximum compensation of the CTLE, the maximum  $CTLE_{CO}$  becomes the optimum setting, and the DFE is adapted alone. For a channel with a low loss that requires the CTLE adaptation, the optimum DFE<sub>CO</sub> is found to be 0 with the maximum CTLE<sub>CO</sub> in the first attempt. Then, MODE is toggled to 1, and the CTLE adaptation is performed in the same manner while the DFE is turned off. After the CTLE is adapted, MODE is set to 2 and the DFE adaptation is repeated with the optimum CTLE setting.

#### **III. SELF-ERROR CORRECTION TECHNIQUE**

In the receiver, bit errors occur during transition of the data due to the lack of the sampling margin even if equalization and impedance matching are performed and effects from the other noise sources are relatively small. As shown in Fig. 8(a), bit errors occur at the leading or trailing edges of the consecutive identical digits (CIDs) in most cases. This is because these sampling points are vulnerable to the sampling margin reduction by the remaining ISI and the clock jitter. The CDR is employed to maximize the sampling margin. However, still data bit errors may occur because of the remaining out-ofband jitter which cannot be tracked by the CDR and further reduces the timing margin by moving the data edge closer to the clock edge. If the bit errors occur, it means the data transition that exists in the transmitted data is missing in the recovered data. Consequently, the bit error can be corrected by inserting a data transition with the proposed SEC, as shown in Fig. 8(b).



FIGURE 9. (a) Block diagram of SEC and (b) its timing diagram.



FIGURE 10. Monte Carlo simulation results: (a) SEC decision ratio and (b) error rate of the sampled data.

The simplified block diagram of the receiver with the proposed SEC is illustrated in Fig. 9(a). The SEC can be simply implemented with an error predictor and an error corrector. The error predictor senses the output of the comparator and observes whether the comparator has made a firm decision. Simply, it checks whether the amplitude of the comparator output is larger than the logic threshold of the SR latch. If the error predictor detects a data decision error, a 1 unit interval (UI) pulse is generated at its output (EP<sub>OUT</sub>). Then, the pulse is fed to the error corrector to correct the detected error by inserting the missing transition. The timing diagram of the SEC is shown in Fig. 9(b). When the comparator cannot make a firm decision for the equalized data (EQ<sub>N,P</sub>), it means that the output amplitude of the comparator (COMP<sub>N,P</sub>) is not sufficiently large to pass the logic threshold of the SR latch. Thus, a bit error occurs in the sampled data (DSAMP<sub>N,P</sub>). The error predictor predicts the bit error by sensing the output of the comparator. Simply, it finds out when both of the comparator outputs (COMP<sub>N,P</sub>) are logic high, and then it generates a 1 UI pulse (EP<sub>OUT</sub>).



**FIGURE 11.** (a) Schematic of CTLE, (b) schematic of digitally controlled resistor, and (c) AC response of CTLE.



FIGURE 12. Schematic of DFE.

The following error corrector functions by making a transition at the missing point of its output ( $DSEC_{N,P}$ ).

Due to the logic threshold mismatch between the SR latch and the error predictor, this error correction process can be conducted for a weak decision of the comparator that is not an actual error. It is irrelevant for most cases because the SEC makes the data transition that exists in the original data. The Monte Carlo simulations are performed to prove the efficacy of the SEC for the leading or trailing edges of the CIDs. Fig. 10(a) plots the simulation results illustrating the relationship between the pulse generation ratio and input amplitude of the comparator. Fig. 10(b) shows the relationship between the error rate of the sampled data and input amplitude of the comparator, with and without the SEC. Although the SEC decreases the input sensitivity of the comparator by increasing the parasitic capacitances, the output error rate of the SEC is much lower than that of the comparator without the SEC. However, the weak decision caused by the hold time violation can cause a bit error in the SEC. In the conventional architecture, the weak decision caused by the hold time violation does not creates an error because there is no transition in the original data. However, the hold time violation induced



FIGURE 13. Schematic of DCO.



FIGURE 14. Schematic of flash ADC.

weak decision can be recognized as the error in the SEC, the transition that does not exists in the original data is made and it becomes the bit error. Consequently, the sampling clock phase is set slightly earlier than the middle of the data eye in this work to minimize the probability of hold time violation and thus maximizes the performance of the SEC. Moreover, it is advantageous for the comparator considering the aperture delay [21].

## **IV. CIRCUIT IMPLEMENTATION**

#### A. CTLE AND DFE

The conventional CTLE architecture is implemented as shown in Fig. 11(a). To realize the tunable high-frequency boosting gain, digitally controlled resistor pairs were used as shown in Fig. 11(b). They are controlled with 15-bit thermometer code. The maximum high-frequency boosting gain of the CTLE is 13-dB. A current-mode-logic (CML) differential pre-amplifier is also adopted after the CTLE as a pre-amplifier to guarantee a sufficient voltage margin before the DFE. For the DFE design, a direct-feedback 1-tap DFE is employed in the prototype receiver to achieve the low power consumption and to reduce a design complexity, as shown in Fig. 12. To control the gain of the DFE, digitally controlled binary-weighted current summers are employed, and they are controlled with a 4-bit binary code.



**FIGURE 15.** Implementation details of proposed SEC: (a) error predictor and (b) error corrector.

## **B. DIGITALLY CONTROLLED OSCILLATOR (DCO)**

A pseudo-differential 3-stage inverter-based ring oscillator with a digitally controlled capacitor array is employed for the DCO as shown Fig. 13. The single-ended architecture of the implemented DCO is shown for simplicity. The DCO generates 14 GHz full-rate clock. The capacitor array has 4-bit coarse and 5-bit fine tuning. The DLF composed of proportional and integral path controls the capacitor array to align the phase of the DCO output clock and data. Because the frequency-locked loop is not adopted in the prototype receiver, the initial frequency of the DCO is manually set to 14 GHz with the external test option.

# C. 3.17-BIT FLASH ADC

A 3.17-bit flash ADC with eight comparators is adopted to quantize the phase error. Fig. 14 shows the single-ended architecture of the implemented ADC for simplicity. The dynamic range of flash ADC is determined by the range of reference voltage generated from a resistive digital-to-analog converter (RDAC). The voltage resolution of the ADC can be expressed as

$$\Delta v = \frac{V_{range}}{2^B} = \frac{\alpha \cdot V_{DD}}{2^B} \tag{4}$$

where  $\alpha$  is the ratio of the ADC dynamic range to the supply voltage (V<sub>DD</sub>) which is 1 V in the given process. To further improve the equivalent resolution of the SSPD with the fixed ADC resolution, which is 3.17-bit, using a given resource efficiently, the reference voltages are selected in the vicinity of half of the supply voltage, which is the lock point of CDR. Therefore, in this work,  $\alpha$  is set to 0.36 and the equivalent ADC resolution is 4.64-bit near the CDR lock point. The eight voltage levels (0.36, 0.4, 0.44, 0.48, 0.52, 0.56, 0.6, 0.64 V) are chosen as reference voltages for the ADC. The sampled voltages outside the RDAC range are not distinguished by the SSPD, and the identical largest PD<sub>OUT</sub> is fed to the



FIGURE 16. Chip mircrograph.



FIGURE 17. Measurement setup.

digital logic circuit. From (4), the equivalent time resolution is derived as

$$\Delta t = \frac{\Delta v}{s_{CLK}} \tag{5}$$

where  $s_{CLK}$  is the clock rising slope. Because the clock is RC wave in the SSPD, its rising slope can be expressed as

$$s_{CLK} = \frac{d}{dt} \left\{ V_{DD} \left( 1 - e^{-\frac{t}{RC}} \right) \right\}$$
$$= \frac{V_{DD}}{RC} e^{\frac{-t}{RC}}$$
(6)

From (4) and (5), the equivalent resolution of the SSPD in time can be expressed as

$$\Delta t = \frac{\alpha \cdot RC}{2^B \cdot e^{\frac{-t}{RC}}} \tag{7}$$

The RC time constant can be calculated using the rise time of the clock in the SSPD. From (7), the calculated time resolution of the SSPD is 0.83 ps at the CDR lock point.

#### D. SELF-ERROR CORRECTOR (SEC)

The SEC is simply composed of the error predictor and error corrector that have been described in Section III. The implementation details of the proposed error predictor and its truth table are shown in Fig. 15(a). The error predictor generates 1 UI pulse only when both inputs (A, B) are logic high (1, 1) during the sampling state, which means the existence of data



FIGURE 18. Measured channel losses.



FIGURE 19. Measured BER for each CTLE code.

decision error or weak decision at the comparator. The error predictor holds its output while the clock is low to make a 1 UI pulse. Both inputs are always high (1, 1) when the clock is low because the comparator is in the pre-charge state. The implementation details of the error corrector are shown in Fig. 15(b). The error corrector makes a transition when the pulse is fed from the error predictor. It can be simply implemented by toggling the data from the previous data using MUXs. To realize this operation, two flip-flops and two MUXs are employed in this work. When the EPOUT is low, the sampled data are directly fed to the output. Conversely, the differential outputs are interchanged when the EP<sub>OUT</sub> is high.

## **V. MEASUREMENT RESULTS**

The prototype receiver is fabricated in a 28-nm CMOS technology and the chip micrograph is shown in Fig. 16. The prototype receiver occupies 0.007 mm<sup>2</sup>. The measurement setup is shown in Fig. 17. For the input data, the 14-Gb/s differential PRBS7 & 31 patterns modulated with sinusoidal jitter are generated by a programmable pattern generator (Tektronix PPG 3202). The generated data patterns are applied to the receiver through differential channels. The eye diagrams of half-rate recovered clock (7 GHz) that is divided from the full-rate recovered clock and recovered data (14-Gb/s) are measured with a real-time oscilloscope (Tektronix DPO77002SX). The BERs of the recovered data are measured with a programmable error detector (Tektronix PED 3202). The test options are controlled with an inter-integrated circuit (I2C). To measure the



FIGURE 20. Measured (a) BER, and (b) high-frequency jitter tolerances for each DFE code.

performance of the receiver, two type of channels are used, and their channel losses are 14.8-dB (31-inch) and 18.8-dB (40-inch) at Nyquist rate (7 GHz) excluding 2.8-dB of setup loss, respectively. The measured channel losses are shown in Fig. 18.

To verify the proposed adaptive equalization and the equalizer performance, BER is measured for each equalizer code with PRBS7 input. Fig. 19 shows the BER curve for each CTLE code and the adapted points for two types of channel. The CTLE is adapted to the maximum code (15) for both cases, Fig. 20(a) shows the BER curve for each DFE code. However, the measured BERs for 31-inch channel are under  $10^{-12}$  for all DFE codes when the CTLE code is 14 or 15. The BERs for 40-inch channel are under  $10^{-12}$  when the DFE code is larger than 4. Consequently, the BER alone cannot be used to determine whether the DFE is adapted to the optimum code. To verify the DFE adaptation, high-frequency jitter tolerances at 100MHz are measured when the CTLE code is set to 15. Fig. 20(b) shows the measured high-frequency jitter tolerances for each DFE code and the adapted points. The DFE is adapted to the optimal code with the largest jitter tolerance for each channel.

The jitter tolerance of the prototype receiver is measured to verify the functionality of the prototype receiver and the effect on the high-frequency jitter tolerance improvement of the proposed SEC. The jitter tolerance with a BER criterion of  $10^{-12}$  is measured after the equalizer adaptation is conducted. The measured jitter tolerances at 14-Gb/s are shown in Fig. 21.

 TABLE 1. Performance summary and comparison with state-of-the-art receivers.

|                               | JSSC 2016<br>[6]                | TCAS-I 2017<br>[10] | TCAS-II 2018<br>[7]                      | ISSCC 2019<br>[13]     | JSSC 2020<br>[11]      | This Work              |
|-------------------------------|---------------------------------|---------------------|------------------------------------------|------------------------|------------------------|------------------------|
| Data Rate [Gb/s]              | 16                              | 1.62-5.4            | 1.62-10                                  | 36                     | 1.62-10.8              | 14                     |
| Block                         | EQ + CDR*                       | EQ + CDR            | EQ + CDR                                 | EQ + CDR               | EQ + CDR               | EQ + CDR               |
| Equalizer                     | 1-tap IIR DFE<br>1-tap data DFE | CTLE                | CTLE<br>2-tap data DFE<br>1-tap edge DFE | CTLE<br>1-tap data DFE | CTLE<br>2-tap data DFE | CTLE<br>1-tap data DFE |
| Channel Loss [dB]             | 28                              | 20                  | 23                                       | 18.25                  | 34                     | 21.2                   |
| Supply [V]                    | 1                               | 1.2                 | 1                                        | 0.9                    | -                      | 1                      |
| Power [mW]                    | 141.1                           | 36.8                | 24.4                                     | 106.3                  | 37.2                   | 23.7***                |
| Energy efficiency<br>[pJ/bit] | 8.8                             | 6.81                | 2.44                                     | 3.04                   | 3.4                    | <u>1.69</u>            |
| Technology                    | 28nm FDSOI                      | 65nm                | 65nm                                     | 28nm                   | 65nm                   | 28nm                   |
| Area [mm <sup>2</sup> ]       | 0.008**                         | 0.27                | 0.25                                     | -                      | 0.17                   | <u>0.007</u>           |

\*Clock recovery using an external clock

\*\*Area of DFE core only





FIGURE 21. Measured jitter tolerance curves.

The measured BERs exceed  $10^{-12}$  when the adaptive equalizer is turned off and the equalizer is set to the minimum compensation. For PRBS7 input, the measured high-frequency jitter tolerances at 100 MHz for 31-inch and 40-inch channel are 0.31 UI<sub>PP</sub> and 0.17 UI<sub>PP</sub>, respectively, without the proposed SEC. However, the jitter tolerances are improved to 0.38 UI<sub>PP</sub> and 0.20 UI<sub>PP</sub> when the SEC is turned on. Consequently, the measured high-frequency jitter tolerances at 100 MHz are improved by 22.6% and 17.6%, respectively, with the proposed SEC compared to that without SEC. For PRBS31 input, the high-frequency jitter tolerance at 100 MHz for 31-inch channel is 0.14 UI<sub>PP</sub> without the SEC, whereas it improved to 0.17 UI<sub>PP</sub> with the SEC.

The eye diagram of the half-rate recovered clock for the 40-inch channel is shown in Fig. 22(a). The measured RMS and peak-to-peak jitter of the half-rate recovered clock were 1.82 ps and 12.2 ps, respectively. Fig. 22(b) shows the power breakdown of the prototype receiver at 14-Gb/s. The total power consumption is 23.7 mW, including all clock paths. The CTLE with pre-amplifier, DFE with clock path, DCO, SSPD and digital logics, and SEC consume 7.58, 4.50, 9.48, 0.95, and 1.19 mW, respectively. The proposed SEC improves



**FIGURE 22.** (a) Measured eye diagram of half-rate recovered clock and (b) power breakdown at 14 Gb/s.

the high-frequency jitter tolerance of the receiver with a low power cost. Table 1 shows the performance summary of the prototype receiver and comparison with other stateof-the-art receivers with adaptive equalizer and CDR. This work achieves the smallest area and the best bit efficiency of 1.69pJ/b among the compared receivers owing to the combination of the adaptive equalizer and the CDR.

## **VI. CONCLUSION**

In this article, the digital sub-sampling CDR with the combined adaptive equalizer and the SEC is presented. To improve the energy efficiency and reduce the chip area, the digitized phase error between the incoming frequency-divided data and the full-rate output clock of the DCO is utilized for both the equalizer adaptation and the clock recovery loop. Accordingly, the adaptive equalizer is combined with the sub-sampling CDR by sharing its adaptation loop, including the SSPD and the digital logic block. Consequently, the cost-effective design of the proposed receiver can be accomplished. The proposed architecture has the smallest area and lowest power consumption compared to the previous works.

Furthermore, the SEC is proposed to improve the high-frequency jitter tolerance of the CDR. The SEC enhances the out-of-band jitter tolerance by correcting bit errors by monitoring the output of the data decision comparator. The proposed scheme does not require any data encoding or complicate circuits. The high-frequency jitter tolerance of the prototype receiver at 100MHz with a BER criterion of  $10^{-12}$ , with the SEC is improved by 22.6% for 17.2-dB loss channel compared to that of the CDR without SEC. The proposed SEC is applicable to various receivers with simple implementation, compact design, and low cost.

#### REFERENCES

- J. Lee, "A 20-Gb/s adaptive equalizer in 0.13-µm CMOS technology," IEEE J. Solid-State Circuits, vol. 41, no. 9, pp. 2058–2066, Sep. 2006.
- [2] H.-Y. Joo and L.-S. Kim, "A data-pattern-tolerant adaptive equalizer using the spectrum balancing method," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 57, no. 3, pp. 228–232, Mar. 2010.
- [3] K.-H. Cheng, Y.-C. Tsai, Y.-H. Wu, and Y.-F. Lin, "A 5-Gb/s inductorless CMOS adaptive equalizer for PCI express generation II applications," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 57, no. 5, pp. 324–328, May 2010.
- [4] B. Nakhkoob and M. M. Hella, "A 4.7-Gb/s reconfigurable CMOS imaging optical receiver utilizing adaptive spectrum balancing equalizer," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 1, pp. 182–194, Jan. 2017.
- [5] D. Lee, J. Han, G. Han, and S. M. Park, "An 8.5-Gb/s fully integrated CMOS optoelectronic receiver using slope-detection adaptive equalizer," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2861–2873, Dec. 2010.
- [6] S. Shahramian, B. Dehlaghi, and A. C. Carusone, "Edge-based adaptation for a 1 IIR + 1 discrete-time tap DFE converging in 5 μs," *IEEE J. Solid-State Circuits*, vol. 51, no. 12, pp. 3192–3203, Dec. 2016.
- [7] J. Lee, K. Park, K. Lee, and D.-K. Jeong, "A 2.44-pJ/b 1.62–10-Gb/s receiver for next generation video interface equalizing 23-dB loss with adaptive 2-tap data DFE and 1-tap edge DFE," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 65, no. 10, pp. 1295–1299, Oct. 2018.
- [8] K. Lee, H. Kim, W. Jung, J. Lee, H. Ju, K. Park, O. Kim, and D.-K. Jeong, "An adaptive offset cancellation scheme and shared-summer adaptive DFE for 0.068 pJ/b/dB 1.62-to-10 Gb/s low-power receiver in 40 nm CMOS," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 68, no. 2, pp. 622–626, Feb. 2021, doi: 10.1109/TCSII.2020.3014925.
- [9] Y.-H. Kim, Y.-J. Kim, T. Lee, and L.-S. Kim, "A 21-Gbit/s 1.63-pJ/bit adaptive CTLE and one-tap DFE with single loop spectrum balancing method," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 2, pp. 789–793, Feb. 2016.
- [10] S. Hwang, J. Song, Y. Lee, and C. Kim, "A 1.62–5.4-Gb/s receiver for displayport version 1.2a with adaptive equalization and referenceless frequency acquisition techniques," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 10, pp. 2691–2702, Oct. 2017.
- [11] J. Lee, K. Lee, H. Kim, B. Kim, K. Park, and D.-K. Jeong, "A 0.1-pJ/b/dB 1.62-to-10.8-Gb/s video interface receiver with jointly adaptive CTLE and DFE using biased data-level reference," *IEEE J. Solid-State Circuits*, vol. 55, no. 8, pp. 2186–2195, Aug. 2020.
- [12] S. Son, S. Ryu, H. Yeo, and J. Kim, "A 2× blind oversampling FSE receiver with combined adaptive equalization and infinite-range timing recovery," *IEEE J. Solid-State Circuits*, vol. 54, no. 10, pp. 2823–2832, Oct. 2019.
- [13] D. Yoo, M. Bagherbeik, W. Rahman, A. Sheikholeslami, H. Tamura, and T. Shibasaki, "A 36 Gb/s adaptive baud-rate CDR with CTLE and 1-tap DFE in 28 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 126–128.
- [14] C.-F. Liang, S.-C. Hwu, and S.-I. Liu, "A jitter-tolerance-enhanced CDR using a GDCO-based phase detector," *IEEE J. Solid-State Circuits*, vol. 43, no. 5, pp. 1217–1226, May 2008.
- [15] M. Hossain and A. C. Carusone, "7.4 Gb/s 6.8 mW source synchronous receiver in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 46, no. 6, pp. 1337–1348, Jun. 2011.
- [16] S. Hwang, J. Song, S.-G. Bae, Y. Lee, and C. Kim, "An add-on type realtime jitter tolerance enhancer for digital communication receivers," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 3, pp. 1092–1103, Mar. 2016.

- [17] T. Siriburanon, S. Kondo, K. Kimura, T. Ueno, S. Kawashima, T. Kaneko, W. Deng, M. Miyahara, K. Okada, and A. Matsuzawa, "A 2.2 GHz –242 dB-FOM 4.2 mW ADC-PLL using digital sub-sampling architecture," *IEEE J. Solid-State Circuits*, vol. 51, no. 6, pp. 1385–1397, Jun. 2016.
- [18] X. Gao, E. A. M. Klumperink, M. Bohsali, and B. Nauta, "A low noise sub-sampling PLL in which divider noise is eliminated and PD/CP noise is not multiplied by N<sup>2</sup>," *IEEE J. Solid-State Circuits*, vol. 46, no. 11, pp. 2635–2649, Nov. 2009.
- [19] F. Gerfers, G. W. D. Besten, P. V. Petkov, J. E. Conder, and A. J. Koellmann, "A 0.2–2 Gb/s 6× OSR receiver using a digitally self-adaptive equalizer," *IEEE J. Solid-State Circuits*, vol. 43, no. 6, pp. 1436–1448, Jun. 2008.
- [20] K. L. J. Wong, E. H. Chen, and C. K. K. Yang, "Edge and data adaptive equalization of serial-link transceivers," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 2157–2169, Sep. 2008.
- vol. 43, no. 9, pp. 2157–2169, Sep. 2008.
  [21] H. O. Johansson and C. Svensson, "Time resolution of NMOS sampling switches used on low-swing signals," *IEEE J. Solid-State Circuits*, vol. 33, no. 2, pp. 237–245, Feb. 1998.
- [22] Z.-Z. Chen, Y.-H. Wang, J. Shin, Y. Zhao, S. A. Mirhaj, Y.-C. Kuan, H.-N. Chen, C.-P. Jou, M.-H. Tsai, F.-L. Hsueh, and M.-C. F. Chang, "A sub-sampling all-digital fractional-N frequency synthesizer with -111 dBc/Hz in-band phase noise and an FOM of -242 dB," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 268–269.



**YOONJAE CHOI** (Graduate Student Member, IEEE) was born in Seoul, South Korea, in 1992. He received the B.S. degree in electrical engineering from Korea University, Seoul, in 2016, where he is currently pursuing the integrated M.S. and Ph.D. degree.

His current research interests include high-speed wireline transceivers and memory interfaces.

Mr. Choi was a recipient of the Ministry of Trade, Industry and Energy Award at the Korea Semiconductor Design Contest, in 2017, and selected as a Scholarship Student by the Korea Semiconductor Industry Association, in 2016.



**SEWOOK HWANG** (Member, IEEE) received the B.S. and Ph.D. degrees in electrical engineering from Korea University, Seoul, South Korea, in 2009 and 2015, respectively.

He is currently with the Analog Circuit Design Team, Butterfly Network, Inc., Guilford, CT, USA. His research interests include wireline transceivers and analog sensor front-end designs.

Dr. Hwang was a recipient of the IEEE Solid-State Circuits Society Pre-Doctoral Achievement Award, in 2014 and 2015, the IEEE Circuits and Systems Society Pre-Doctoral Scholarship, in 2013, the Ministry of Trade, Industry and Energy Award at Korea Semiconductor Design Contest, in 2013, and the Commissioner of the Korean Intellectual Property Office Award at the Korea Semiconductor Design Contest, in 2012.



**YEONHO LEE** (Member, IEEE) was born in Seoul, South Korea, in 1988. He received the B.S. degree in electrical engineering and the Ph.D. degree in semiconductor system engineering from Korea University, Seoul, in 2012 and 2019, respectively.

He is currently with SK Hynix Inc., Icheon, South Korea. His research interests include energy-efficient wireline systems, high-speed I/O circuit design, and clock and data recovery circuits.

Dr. Lee was a recipient of the Ministry of Trade, Industry and Energy Award at the 2017 Korea Semiconductor Design Contest.



**HYUNSU PARK** (Graduate Student Member, IEEE) received the B.S. degree in electronics engineering and the M.S. degree in semi-conductor system engineering from Korea University, Seoul, South Korea, in 2016 and 2018, respectively, where he is currently pursuing the integrated Ph.D. degree in integrated circuits and systems.

His research interests include memory interfaces, high-speed transceivers, and clock generation circuit design.



**JONGHYUCK CHOI** (Graduate Student Member, IEEE) received the B.S. degree in electrical engineering from Korea University, Seoul, South Korea, in 2017, where he is currently pursuing the integrated M.S. and Ph.D. degrees.

His research interests include memory interfaces, high-speed transceivers, and energy-efficient wireline systems.

Mr. Choi received the IEEE Seoul Section Student Paper Contest Bronze Award, in 2019. He was a recipient of the Prime Minister Award at Korea Semiconductor Design Contest, in 2019, and the Ministry of Trade, Industry and Energy Award at Korea Semiconductor Design Contest, in 2017. Furthermore, he was granted a student scholarship by Korea Semiconductor Industry Association, in 2018.



**JINCHEOL SIM** (Graduate Student Member, IEEE) was born in Seoul, South Korea, in 1992. He received the B.S. degree in electrical engineering from Korea University, Seoul, in 2017, where he is currently pursuing the integrated M.S and Ph.D. degrees.

His current research interest includes high-speed wireline transceivers.



**CHULWOO KIM** (Senior Member, IEEE) received the B.S. and M.S. degrees in electronics engineering from Korea University, in 1994 and 1996, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana–Champaign, in 2001.

In 1999, he worked as a summer intern at the Design Technology, Intel Corporation, Santa Clara, CA, USA. In May 2001, he joined IBM

Microelectronics Division, Austin, TX, USA, where he was involved in cell processor design. Since September 2002, he has been with the School of Electrical Engineering, Korea University, where he is currently a Professor. He was a Visiting Professor with the University of California at Los Angeles, Los Angeles, in 2008, and the University of California at Santa Cruz, Santa Cruz, in 2012. He is the coauthor of two books, namely, *CMOS Digital Integrated Circuits: Analysis and Design* (McGraw Hill, 4th edition 2014) and *High-Bandwidth Memory Interface* (Springer, 2013). His current research interests include the areas of wireline transceiver, memory, power management, and data converters.

Dr. Kim received the Samsung Humantech Thesis Contest Bronze Award, in 1996, the ISLPED Low-Power Design Contest Award, in 2001 and 2014, the DAC Student Design Contest Award, in 2002, the SRC Inventor Recognition Awards, in 2002, the Young Scientist Award from the Ministry of Science and Technology of Korea, in 2003, the Seoktop Award for Excellence in Teaching, in 2006 and 2011, the ASP-DAC Best Design Award, in 2008, the Special Feature Award, in 2014, and Korea Semiconductor Design Contest: Prime Minister Award, in 2016. He served on the Technical Program Committee for the IEEE International Solid-State Circuits Conference and as a Guest Editor for IEEE JOURNAL OF SOLID-STATE CIRCUITS. He is currently on the Editorial Board of IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS and the Chair of the SSCS Seoul Chapter. He has been elected as a Distinguished Lecturer of the IEEE Solid-State Circuits Society for 2015–2016.