

Received 7 August 2023; revised 27 September 2023 and 4 November 2023; accepted 18 November 2023. Date of publication 27 November 2023; date of current version 8 December 2023.

Digital Object Identifier 10.1109/OJCAS.2023.3335400

# Hybrid Timing Error Detector for Baud Rate Multilevel Clock and Data Recovery

AHMED ABDELAZIZ (Graduate Student Member, IEEE), MOHAMED AHMED (Graduate Student Member, IEEE), AND TAWFIQ MUSAH<sup>(D)</sup> (Senior Member, IEEE)

Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH 43210, USA

This article was recommended by Associate Editor P.-H. Hsieh.

CORRESPONDING AUTHOR: T. MUSAH (e-mail: musah.3@osu.edu)

This work was supported by the Intel Circuits SRS.

**ABSTRACT** This paper proposes a hybrid phase detector for use in multilevel timing recovery systems. The proposed approach suppresses errant zero-crossings associated with multilevel baud rate phase detectors and ensures maximum signal swing in lock, with minimal hardware and power overhead. Analysis and simulation results in a 28nm CMOS process are used to explore the functionality of proposed phase detector and demonstrate its effectiveness in achieving superior performance to the conventional approach. Clock and data recovery (CDR) loop simulations show that the proposed phase detector enables  $1.36 \times$  increase in vertical eye margin while maintaining similar steady-state RMS jitter and compared to the conventional approach. The simulations also show effective suppression of unwanted phase detector zero-crossing, while achieving comparable acquisition bandwidth to the conventional approach.

**INDEX TERMS** Hybrid phase detection, slope-sensitive latch, time-domain interpolation, multilevel signaling, StrongArm latch, clock and data recovery, decision feedback equalizer.

### I. INTRODUCTION

**B** AUD rate clock and data recovery (CDR) circuits are currently preferred to oversampling ones in ultra-high speed wireline receivers. This is due to the simpler clocking they enable, especially when high degree of time interleaving is employed. However, high speed realizations of the two most popular timing error detectors, also referred to as phase detectors (PD), suffer from complexity and locking issues. For the minimum mean square error (MMSE) based phase detector, analog filters are required to extract the error signal and slope using the same clock signal [1]. An alternative to the filter approach is to use a pair of closely spaced clock phases around the baud rate clock to extract the slope [2]. This results in local clock generation complexity similar to the oversampled PDs.

The other baud rate PD is the Mueller-Muller PD (MMPD) [3], [4], [5] that relies on just baud rate data and error samples to extract the timing error. Practical realizations of the two-level MMPD type-A [4], [5], especially when decision feedback equalization (DFE) is used, results in suboptimal lock conditions due to nonzero precursor intersymbol interference (ISI). This effect leads to loss of

signal margin at the receiver. This signal loss is especially detrimental in multilevel pulse amplitude modulation (PAM) receivers, where the signal margin is especially degraded. Moreover, the multilevel MMPD suffers from two more drawbacks. Firstly, as shown in Fig. 1, it requires extra comparators for error extraction that increases with number of data levels. Secondly, it has the potential to lock at undesirable locations far from the middle of the data eve [6], [7]. In [6], coarse implementations of the MMPD were shown to suppress these undesirable lock points at the cost of PD gain or CDR steady state jitter. Several design techniques have been developed to aid the CDR to lock to an optimal sampling point, some while slightly reducing the number of comparators [8], [9], [10], [11]. However, these techniques result in increased digital hardware and computational complexity.

A mixed-signal approach to phase detection that addresses the suboptimal lock and errant PD zero crossing, while ensuring minimal overhead is critically needed. The timedomain error extraction technique proposed in [12] provides a foundation upon which such a technique could be built. In that work, time domain interpolation of the data comparator





FIGURE 1. Wireline transceiver with baud rate clock and data recovery using multilevel MMPD

delay was used to generate the error signal while using only two extra comparators regardless of number signaling levels. This approach reduces the number of comparators loading the frontend to just the data comparators plus an additional two for error extraction. It was also observed that when dynamic input data is fed to the comparators without sampling, there was a level shifting of the PD response in the data transition region. The observed level shifting alleviate the errant zero-crossings problems of multilevel CDRs reported in [6], [7]. However, receiver realization without dedicated input sampling could lead to performance degradation and signal integrity issues due to jitter and comparator aperture time variations.

In this paper, the time-based error extraction approach of [12] is modified to ensure the suppression of undesirable locking points without the data sampling uncertainties. This is achieved by applying the continuous-time input signals to just the error-interpolating comparators. The data comparators are maintained as usual with dedicated trackand-hold (T/H) circuits that ensure high receiver fidelity. The exposure of the error extraction scheme to the input variations leads to errant zero-crossing suppression similar to [6]. It also allows a multimode operation, where the MMPD responds to the input signal slope similar to the MMSE PD. The paper is organized as follows. The proposed error signal interpolation approach and its dependence on the slope of the comparator input signal is discussed in Section II. The proposed approach is used to realize a fourlevel pulse amplitude modulation (PAM4) PD in Section III. In Section IV, the proposed PD performance in a CDR loop is simulated and discussed. Conclusions will be included in Section V.

### **II. TIME DOMAIN INTERPOLATION USING CONTINUOUS INPUT SIGNAL**

The StrongArm latch [13] is one of the mainstay building blocks of high speed comparators. A PMOS input version of the StrongArm latch is shown in Fig. 2, where an

VOLUME 4, 2023



FIGURE 2. Schematic of the StrongArm latch for use with ground referenced signaling. The additional input pair is used for multilevel PAM quantization thresholds.

auxilliary reference input is added for PAM4 level offset addition. The circuits also shows an additional path for offset correction. The clock-to-Q delay of the StrongArm latch was shown to be inversely proportional to the absolute value of its input voltage in [12] for a sampled input signal. The contributions to the total latch delay can be separated into the components from the amplification and regeneration phases of the latch. As discussed in [14], the integrating node and output of the latch at the end of the amplification and regeneration phases, assuming sampled inputs, can be written as

$$V_{INTa}(t) = \frac{G_{m,i}V_{IN}t}{C_{OUT}}$$
(1)

$$V_{OUTr}(t) = V_{OUTa}(t_a) \exp\left(\frac{t - t_a}{\tau_r}\right)$$
(2)

where  $C_{OUT}$ ,  $G_{m,i}$  and  $\tau_r$  are the load capacitor, the input transconductance and the time constant of the regeneration, respectively. The time constant of the regeneration was derived in [14] as

$$\tau_r = \frac{C_{OUT}}{G_{m,r}} \tag{3}$$

where  $G_{m,r}$  is the transconductance of the regenerating devices.

The delay at the end of the regeneration phase can be then be found by solving (1) and (2) respectively for t at which  $V_{OUT}$  crosses the target threshold.

$$t_{a} = \frac{C_{OUT} v_{tp}}{G_{m,i} V_{IN}}$$

$$t_{d} = t_{a} + \tau_{r} \ln \left( \frac{V_{th}}{V_{OUTa}(t_{a})} \right)$$

$$= \frac{C_{OUT} V_{tp}}{G_{m,i} V_{IN}} + \tau_{r} \ln \left( \frac{V_{th}}{V_{OUTa}(t_{a})} \right)$$
(5)

where  $V_{tp}$  is the threshold of the PMOS devices in the backto-back inverter and  $V_{th}$  is the switching threshold of the logic stage loading the StrongArm Latch.

When the track-and-hold (T/H) circuit that sampled the input of the latch is removed, the voltage at the integrating node of the latch now becomes sensitive of the changing input signal. Therefore, the voltage at the integrating node during amplification has to be generalized to

$$V_{INTa} = \frac{G_{m,i}}{C_{OUT}} \int^t V_{IN} dt \tag{6}$$

Assuming a linearly changing input signal (ramp), with an initial voltage  $V_{IN0}$  and slope *s*, the voltage at the integrating node can be expressed as

$$V_{INTa} = \frac{G_{m,i}}{C_{OUT}} \int^{t} (V_{IN0} + st) dt$$
  
$$= \frac{G_{m,i}}{C_{OUT}} \left[ V_{IN0}t + \frac{st^{2}}{2} \right]$$
  
$$= \frac{G_{m,i}t}{C_{OUT}} \left[ V_{IN0} + \frac{st}{2} \right]$$
(7)

Equation (7) tracks (1) with  $V_{IN}$  substituted with  $(V_{IN0} + \frac{st}{2})$ . With this modification, one can expect that a positive slope will result in a lower (higher) delay compared to the case where the input is sampled and held at  $V_{IN0}$  if  $V_{IN0}$  is positive (negative). The opposite effect is expected when the input slope is negative. The delay from the regeneration phase is somewhat affected by the continuous input, because the dynamic current at the onset of regeneration will change the initial voltage ( $V_{OUTa}(t_a)$ ). This effect is implicitly captured in (2) and (5) and does not require separate treatment. Intuitively, the delay in the regeneration phase will follow the same trend as the amplification phase.

A simulation of the StrongArm latch in a 28 nm CMOS process with a 10 GHz clock confirms the expectation for both positive and negative differential input voltages. The effect on the latch delay observed from Fig. 3 indicates that a ramp input modulates the latch delay in two ways. The first effect is that the changing input induces an input referred offset in the latch in the same direction as the rate of change. For a rising ramp signal, the latch will transition at a lower input voltage when the input at the start of the latch amplification ( $V_{IN0}$ ) is negative. This is because the latch has a finite delay for both the amplification and regenerative phases and the ramp will continue to rise during this time



FIGURE 3. StrongArm latch delay dependence on input slope for non-sampled signals.

period. The reverse happens with a falling ramp input. The other effect is that the delay modulation dependence on the direction of the input slope and the polarity of  $V_{IN0}$  makes the delay asymmetric around the new switching point. Both of these effects will be instrumental in achieving the proposed errant locking point suppression and the optimal positioning of the desired locking point.

The above information is used to implement the time based interpolation as shown in Fig. 4. It has been demonstrated that with two latches, a time signal proportional to the input voltage can be extracted [15]. However, unlike [15], only the sign of the interpolated signal is extracted here to provide a quantization level between the two latch reference voltages. To ensure sensitivity to the dynamic input while providing timing certainty for the data, one latch input is sampled while the other is left dynamic. A pre-amplifier is placed before the latch with the dynamic input to isolate the input signal from the latch kickback noise. It is important to note that the pre-amplifier here is not intended to provide gain, as the time based interpolation will be distorted. The realization of the interpolating quantizer using the dynamic input in the higher and lower reference levels are shown in Fig. 4(a) and Fig. 4(b), respectively.

If the two latches were identical with different reference voltages  $V_H$  and  $V_L$ , as shown in Fig. 4(a), the delays of the two latches will be functions of  $(V_{IN} - V_H)$  and  $(V_{IN} - V_L)$ , respectively. Consequently, when  $V_{IN}$  is midway between  $V_H$  and  $V_L$ , the magnitude of the inputs to both latches will be the same, and thus result in the same latch delay. A 1b time-to-digital converter (TDC) is used as arbiter of the decision delays between the two latches. The first to decide has a higher differential voltage, thus the latch with the high reference level is connected to the N input of the arbiter. An SR latch is used to hold the decisions of all the three comparators for post-processing. The schematic of the pre-amplifier and the arbiter [16] is included in Fig. 5. Gain tuning is added in the pre-amplifier to allow better matching



FIGURE 4. Time domain interpolation topology using a combination of latches with sampled and continuous input showing (a) continuous input latch at higher reference voltage (b) continuous input latch at lower reference voltage.

of the sampled and continuous paths. The pre-amplifier and arbiter have significantly less power and area compared to the comparator, leading to power savings in the interpolation.

A simulation of Fig. 4(a) using dynamic input is shown in Fig. 6. The slowly changing ramp inputs and reference levels used as stimuli for the two latches are shown in Fig. 6(a). Transient noise is turned on and the interpolating quantizer is run for 400 clock cycles with a 10 GHz clock. A cumulative distribution function (CDF) and probability distribution function (PDF) of the time-interpolating comparator output is plotted in Fig. 6(b) and Fig. 6(c), respectively, for negative slope input ramp (A) and a positive slope input ramp (B). For comparison, the CDF of a conventional voltage comparator with a 0V threshold is also included. It is evident that input slope dependent modulation of the interpolation creates a bimodal distribution at the transition voltage of the interpolating comparator. This effect leads to reduction of the average gain of the interpolating comparator around the zero-crossing. The amount of separation between the two modulated PDFs dependents on the magnitude of the input slope. A sweep of the input slope from -0.6 V/100 ps to +0.6 VV/100 ps show approximately  $\pm 20$  mV offset amplitude that is fairly linear with slope. Thus, the effective input voltage of the interpolating comparator with a finite delay  $\Delta t$  can be approximated as a linear function of the input slope.

$$V_{IN-eff} = V_{IN} + s\Delta t \tag{8}$$

It is important to note that even though the simulations of Fig. 6 and Fig. 7 were carried out using ramp signals, practical implementations with pseudorandom bit sequences (PRBS) will have dynamic slopes that change from one clock sample to the next. In that situation, the slope, s, needs to be thought of a stochastic signal that is a function of the data



FIGURE 5. Schematics of the (a) pre-amplifier and (b) time arbiter used in the 3-level interpolating quantizer.

pattern, its associated ISI and the position of the sampling clock within the receiver eye.

The phenomena described in this section provide the foundation upon the proposed hybrid phase detector is built. The pairing of the dynamic and sampled input latched enable the exposure of the error extraction to varying input slope, and thus improved locking behavior. However, this exposure does not degrade receiver signal integrity because sampling is used to ensure reduced sensitivity to jitter in the data path.

#### **III. PROPOSED HYBRID PHASE DETECTOR**

The block diagram of the analog frontend of proposed phase detector using the dynamic interpolation concept discussed in Section II is shown in Fig. 8. A PAM4 realization of the proposed scheme is chosen for simplicity, but the demonstrated operation applies to higher order PAM realizations too. Three comparators with thresholds at  $[\pm \frac{2h_0}{3}, 0]$  decode the received data using track and holds (T/H) to sample their inputs. This ensures that high fidelity is maintained on the received data. Two extra comparators are added, similar to [12], for time-based error extraction purposes. But unlike [12] where adjacent error and data comparators are used to interpolate the error level



FIGURE 6. Characterization of Fig. 4(a) showing (a) stimuli (b) CDF of the interpolated quantization level and (c) PDF of the interpolated quantization level in the presence of noise.

in between, two changes are made here. First, the two error comparators are designed to accept continuous input. Secondly, all four error levels are interpolated using a combination of the two dynamic-input comparators with the data compactors. Moreover, the threshold of the two



FIGURE 7. Input reffered offset voltage of the dynamic interpolating comparator as a function of the input slope.

dynamic-input comparators are far enough away from the received signal such that they provide consistent outputs (always "0" for the comparator at  $\frac{2h_0}{3}$  and "1" for the one at  $-\frac{2h_0}{2}$ ). Thus, their usefulness comes from the delay-sensitive decisions in the interpolations, rather than their decisions themselves. As shown in Fig. 8, each of the four error levels are quantized using either one of the comparators with dynamic inputs  $(C_1/C_5)$  paired with  $C_2$  or  $C_4$ . This means the slope-dependent effects highlighted in Section II will be present in all the error levels. A timing diagram showing the input, delayed, sampled signals, as well as the StrongArm outputs, arbiter output and clocks are shown in Fig. 9. The sampling clocks are timed to fall earlier than the latch clock for two purposes. The first is to allow the charge inject/clock feedthrough of the T/H circuits to settle before the StrongArm latch enters evaluation mode. And the second is to ensure alignment with the pre-amplifier path. To better understand how the proposed PD improves CDR performance, it is helpful to start with the MMPD Type A timing function, where

$$f(\tau) = \frac{1}{2}(h_{-1} - h_1) = \frac{1}{2}(h(\tau - T) - h(\tau + T))$$
(9)

This can be realized with discrete baud rate samples of the receiver input, x. and the decoded data, d, using

$$f(\tau) = \frac{1}{2} E\{x[n-1]d[n] - x[n]d[n-1]\}$$
(10)

Using (8), the effective receiver input, x', can be rewritten to

$$x'[n] = x[n] + s[n]\Delta t \tag{11}$$

The proposed hybrid PD timing function can be written as

$$f(\tau) = \frac{1}{2} E\left\{x'[n-1]d[n] - x'[n]d[n-1]\right\}$$
  
=  $\frac{1}{2} E\left\{x[n-1]d[n] - x[n]d[n-1]\right\}$   
+  $\frac{1}{2} E\left\{s[n-1]\Delta \ td[n] - s[n]\Delta \ td[n-1]\right\}$  (12)



FIGURE 8. Proposed PD showing Analog frontend schematic for PAM4 receiver.

The resulting timing function of (12) captures the impact of the input slope sensitivity of the error interpolating comparators. The first term of the expression describes the conventional MMPD timing function that can be realized with sampled inputs with or without time-domain interpolation [12]. The second term will produce a lock when the slope approaches zero. This implies that the PD cannot lock away from the middle of the receiver eye, effectively eliminating the unwanted zero-crossings reported in [6], [7]. Morever, in the presence of asymmetric first precursor and post-cursor ISI, the proposed PD will move its lock closer to the maximum signal swing where the input slope is minimum.

To validate the behavior described in (12), the receiver using a proportional realization (Fig. 1 [6]) of the proposed phase detector was designed and simulated in 28 nm CMOS process. This testbench was targeted towards demonstrating the ability of the proposed PD to suppress errant zerocrossings. A well-behaved channel and a 3-tap transmitter feedforward equalizer (TX FFE) is assumed. The channel frequency response along with the receiver single bit response (SBR) in the presence of the TX FFE is shown in Fig. 10. This will ensure an open eye at the input of the receiver at 10 GBaud, without the need for a receiver equalizer. For comparison, a conventional MMPD was built and simulated in the same process using dedicated voltagemode error comparators. The simulation of the PD transfer functions using PRBS31 data patterns and PAM4 signaling is shown in Fig. 11. The proposed hybrid PD exhibits no unwanted zero-crossings in the transition regions ([0p 40p], [100p 140p]) unlike the conventional MMPD. Moreover, the gain at the target PD zero-crossings is lower for the proposed compared to the conventional. This gain reduction is a result of the slope-dependent hysteresis effect described in Fig. 6. Given the comparable gain between the proposed PD and the conventional MMPD away from the target zero-crossing, the reduced gain at the zero-crossing is expected to improve jitter tolerance without appreciable impact on phase acquisition times [17].

Also, the hysteresis effect is expected to reduce the sensitivity of the proposed PD to process mismatch. Gain and offset calibration is now a mainstay of all wireline links in advanced processes, and can be used to ensure satisfactory matching between the sampled and continuous paths. Nevertheless, it is still instructive to discuss the impact of residual gain and offset errors. The T/H in the sampled path can be designed to achieve unity gain, after charge injection and clock feed-through cancellation, by ensuring that its nominal bandwidth is high enough that variations do not lead to signal attenuation. The continuous path however has a preamp with residual gain and offset errors. Equating the integrating node voltage prior to regeneration for both sampled (1) and continuous (7) input comparators assuming only gain and offset between the two paths yields

$$\frac{t}{\tau}[V_{IN} - V_L + V_{OSL}] = -\frac{t}{\tau} \Big[ A_H V_{IN} - V_H + V_{OSH} + \frac{st}{2} \Big]$$
(13)

where  $\tau = \frac{C_{OUT}}{G_{m,i}}$ ,  $A_H = 1 + \delta$ ,  $\delta$  is the pre-amp residual gain error and  $V_{OSH}/V_{OSL}$  are the residual offset voltages in the continuous and sampled paths. Solving (13) for  $V_{IN}$  gives

$$V_{IN} = \frac{V_H + V_L}{2 + \delta} - \frac{V_{OSH} + V_{OSL}}{2 + \delta} - \frac{st}{2(2 + \delta)}$$
(14)

Ideally,  $\delta$ ,  $V_{OSH}$  and  $V_{OSL}$  are zero, while *s* is zero only at maximum signal. Positive  $\delta$ ,  $V_{OSH}$  and  $V_{OSL}$  will attempt to move the lock point to a lower input voltage. At such a lower voltage, the slope, *s*, will be nonzero and will compensate for the input voltage reduction until it approaches the maximum signal (according to Fig. 7). The compensation follows same logic for negative errors. The degree of the compensation will be data dependent, as such the effective compensation is an average achieved by the CDR loop.

A 200 Monte Carlo simulation of the proposed PD and the conventional MMPD is shown in Fig. 12. The clock phase is set to the zero-crossing of the conventional MMPD (64 ps). The proposed PD experiences smaller variation in average PD output ( $\sigma$ =19.8 mV) compared to the conventional MMPD ( $\sigma$ =22.5 mV). Finally, as it is more evident in the negative mean of Fig. 12, the proposed PD has a zero-crossing to the right of the conventional MMPD. This is because the sampled pulse response of Fig. 10 has the



FIGURE 9. Timing diagram for PAM4 receiver.

TABLE 1. CDR design parameters.

| Symbol    | Parameter Definition              | Value Used   |
|-----------|-----------------------------------|--------------|
| $K_P$     | CDR loop filter proportional gain | $2^{-3}$     |
| $K_I$     | CDR loop filter integral gain     | $K_P/2^{10}$ |
| $K_{VCO}$ | VCO Sensitivity                   | 100MHz/V     |
| $F_{VCO}$ | VCO quiescent frequency           | 10GHz        |
| $T_D$     | CDR Loop Latency                  | 3 UI         |

primary cursor to the left of the peak. This is obfuscated by the size of the marker, but the effect will be simulated and quantified in the next section.

# IV. PERFORMANCE OF THE PROPOSED PD IN A CDR LOOP

Full rate realization of the CDR in Fig. 1 is used to measure the performance of the proposed hybrid PD and compare it with the conventional MMPD. PAM4 PRBS31 data generation and 3-tap TX FFE are implemented using behavioral blocks. The digital logic for the CDR, the digital-to-analog converter (DAC), and voltage controlled oscillator (VCO) where all realized in Cadence Virtuoso using VerilogA blocks. The CDR parameters used for the simulations in this section are included in Table 1.

Transistor-level implementation of all the analog frontend (AFE), including the time-domain interpolation, retiming flip flops and the decoder logic for both the data and error signals were used. The CDR performance is characterized for two different equalization strategies. The phase of the VCO output clock was extracted by comparing the output clock's zero-crossing with that of an ideal clock source. Eye margins are extracted by monitoring the receiver signals after sampling by the VCO clock.

## A. CDR REALIZATION WITH TX FEEDFORWARD EQUALIZER

The TX FFE is designed to ensure an open eye at the receiver. This obviates any equalization at the receiver, except a 1.2dB gain in the AFE. This testbench avoids any receiver equalization non-idealities and allow the validation of only the CDR loop performance. The CDR is simulated for 2500 clock periods or unit intervals (UIs). The VCO output phase for 11 initial phase conditions spread evenly across the UI are shown in Fig. 13. The proposed PD locks to the target phase regardless of the initial phase position. As expected from [6], the conventional MMPD shows a false lock at phase position 8 ps and 28 ps, instead of the expected 64 ps. These locations are predicted by the phase detector transfer characteristics of Fig. 11. The receiver eye at these two errant locking points are completely closed given that they occur within the transition regions of the data eye. Away from these errant lock points, the CDR using the proposed PD showed comparable acquisition times to the one using the conventional MMPD. Moreover, the lock phases for these initial phase are always higher for the proposed PD than the conventional. This indicates a rightward shift in the lock for the proposed PD and thus a higher received signal swing.

An initial phase with no lock issues for the conventional MMPD is chosen for receiver eye height measurement. The sampled PAM4 data over the 2500 UI duration included in Fig. 14 confirms the improved receiver voltage margin. The minimum receiver voltage margin after 1000 UI is measured to be 38.5 mV for the proposed PD and 28.4 mV for the conventional PD. Moreover, the steady state jitter distribution measured within the same region yields 0.47 psrms for the proposed PD and 0.54 psrms for the conventional MMPD. While the proposed PD shows better steady state jitter, the



FIGURE 10. Channel characteristic of the PD validation testbench showing (a) channel frequency response showing 16.8dB loss at Nyquist and (b) pulse response showing locked sampling instances.

impact of this improvement is not well captured in this testbech given the open eye at the receiver input.

# B. CDR REALIZATION WITH DECISION FEEDBACK EQUALIZER

The DFE based CDR is shown in Fig. 15. To stress the CDR loop, a higher loss channel (30dB loss at the Nyquist frequency in Fig. 16(a)) was used with a CTLE (12dB peaking, not shown in Fig. 15(a)) and TX FFE. The 3-tap TX FFE was relaxed to allow significant ISI at the receiver. More importantly, precursor ISI was allowed at the receiver, as shown in Fig. 16(b). The first precursor ISI in conjuction with the DFE ISI cancellation pushes the MMPD lock points to the left (red diamond markers) farther from the maximum eye height (green circular markers). The loss in  $h_0$  is somewhat compensated by the reduction in  $h_{-1}$  for the MMPD. This fact becomes apparent in the eventual eye margin with







FIGURE 12. Proposed PD and Conventional MMPD outputs at 64ps phase offset after 200 Monte Carlo runs.



FIGURE 13. CDR phase acquisition using the proposed PD and conventional MMPD with no receiver equalization.

PAM4 signaling. A 5-tap DFE, shown in Fig. 15, is used to provide enough the equalization needed to open the eye. The realization of the DFE summer is included in Fig. 15(b) showing the direct feedback paths and additional offset



FIGURE 14. Sampled receiver voltage margin for the proposed PD and conventional MMPD.



FIGURE 15. (a) CDR loop of a receiver employing a direct feedback DFE for equalization (b) Realization of the summer for the direct feedback DFE taps.

correction path. Direct feedback implementation was used for simplicity given the ample time available in a 100 ps UI. All other CDR loop components and parameters are kept as in Section IV-A.

The simulated receiver input eyediagram over 2 UI using PAM2 signaling is shown in Fig. 17, along with the equalized eyediagram for the CDR using the proposed PD



FIGURE 16. Pulse response of the received signal before and after equalization.

and the conventional MMPD (minimize  $h_{-1}$  since DFE ensures  $h_1 = 0$ ). An extra data point highlighting the effect of using the equalization condition of the proposed PD (maximize  $h_0$ ) for the conventional MMPD is also added. For each PD, the eye is selected for the initial phase that ensures that CDR starts as close to the lock phase as possible. This ensures no convergence issues. It is obvious from Fig. 17 that the CDR using the conventional MMPD locks to a reduced signal level leading to a 41mV drop in voltage margin. The signal level of the proposed PD is almost twice that of the MMPD, leading to the improved eye margin even in the presence of higher precursor ISI. The advantage is not as clear for the PAM signaling case. The increased signal level is apparent in Fig. 18 (average of 150mV and 110mV for the proposed PD and MMPD, respectively. However, the precursor ISI has a more detrimental effect due to the reduced signaling levels of PAM4, thus almost erasing the margin advantage. Moreover, it may appear from the selected eye that the conventional MMPD will have the same performance













**FIGURE 17.** PAM2 receiver eyediagrams showing (a) unequalized eye (b) equalized eye for the proposed PD CDR and (c) equalized eye for the conventional MMPD CDR equalized for max  $h_0$  (d) equalized eye for the conventional MMPD CDR equalized for min  $h_{-1}$ .

as the proposed PD under same equalization condition. However, the conventional MMPD has a weak lock point in the middle of the eye when maximum  $h_0$  equalization

**FIGURE 18.** PAM4 receiver eyediagrams showing (a) unequalized eye (b) equalized eye for the proposed PD CDR and (c) equalized eye for the conventional MMPD CDR equalized for max  $h_0$  (d) equalized eye for the conventional MMPD CDR equalized for min  $h_{-1}$ .

is applied (struggles to achieve (9)). This will exacerbate the already difficult convergence problem and will manifest itself as degraded phase acquisition.



FIGURE 19. CDR phase acquisition using the proposed PD and conventional MMPD with DFE using (a) PAM2 and (b) PAM4 signaling

The CDR phase acquisition was extracted after 5000 UI samples and included in Fig. 19 for both the PAM2 and PAM4 signaling cases. The higher phase lock position of the proposed PD indicates a higher sampled receiver voltage. It also shows that the elevated receiver noise was able to knock the conventional MMPD (min  $h_{-1}$ ) out of the false lock. However, the MMPD configured for maximum eye remains stable in the false lock points. It is reasonable to conclude that the propensity of the MMPD to false lock significantly increases with decreasing probability of achieving (9) in a PAM4+ CDR. The steady-state jitter of the CDR with the proposed PD was simulated to be comparable to the conventional MMPD (0.56 psrms compared to 0.58 psrms) while providing false-lock free maximum eye tracking.

The power consumption of the AFE of Fig. 8 was measured and included in Table 2 together with a summary of other performance parameters. The AFE power for the proposed PD at 7.8 mW was less than that of the

#### TABLE 2. Performance comparison.

|                              | Conventional MMPD                   | Proposed PD                         |
|------------------------------|-------------------------------------|-------------------------------------|
| Technology                   | 28 nm CMOS                          | 28 nm CMOS                          |
| Data Rate [Gb/s]             | 20                                  | 20                                  |
| Architecture                 | Full rate                           | Full rate                           |
| Channel Loss at Nyquist [dB] | 16.8                                | 16.8                                |
| Supply Voltage [V]           | 0.9                                 | 0.9                                 |
| Minimum Eye Margin [mV]      | 28.4 <sup>1</sup> /161 <sup>2</sup> | 38.5 <sup>1</sup> /202 <sup>2</sup> |
| Steady-State Jitter [psrms]  | $0.54^1/0.58^2$                     | $0.47^{1}/0.56^{2}$                 |
| AFE Power [mW]               | 9.0                                 | 7.8                                 |

<sup>1</sup> Steady-state jitter of CDR using TX FFE only.

<sup>2</sup> Steady-state jitter of CDR using TX FFE and DFE.

conventional MMPD at 9mW. This is because the 4 arbiters in total consumed only 0.45mW compared to the comparator at 1.17mW. This leads to a total power reduction of 1.2mW for the proposed PD AFE even after the addition of the two pre-amplifiers. Dynamic biasing [18] (switches not shown in Fig. 5(a)) was used to limit the power of the pre-amplifier and ensure no static power consumption in the proposed AFE.

### **V. CONCLUSION**

A hybrid PD that realizes a slope-dependent timing function in addition to the conventional MMPD timing function is presented. Analysis and simulation are included to explain the operating principle of the proposed PD. CDR loop simulations in a 28 nm CMOS show that the proposed PD achieves better steady-state jitter and higher receiver voltage margin while maintaining comparable acquisition time to the conventional design. Moreover, the proposed PD realizes a CDR loop devoid of the false and suboptimal locking issues characteristic of traditional multilevel CDRs.While the PAM2 and PAM4 realization of the hybrid PD presented here shows an ability to track the maximum eye height with little overhead, significantly higher power savings can be expected from higher order PAM realizations. This is because the hardware penalty of the two additional interpolating comparators diminishes with increasing number of PAM levels.

#### REFERENCES

- F. Musa and A. Carusone, "Clock recovery in high-speed multilevel serial links," in *Proc. Int. Symp. Circuits Syst.*, vol. 5, 2003, p. 5.
- [2] L. Shi, W. Gai, L. Tang, X. Xiang, and A. He, "Hardware-efficient slope-error algorithm based PAM4 baud rate CDR scheme for 40 Gb/s receiver," *Electron. Lett.*, vol. 54, no. 17, pp. 1020–1022, 2018.
- [3] K. Mueller and M. Muller, "Timing recovery in digital synchronous data receivers," *IEEE Trans. Commun.*, vol. 24, no. 5, pp. 516–531, May 1976.
- [4] F. Spagna et al., "A 78mW 11.8Gb/s serial link transceiver with adaptive RX equalization and baud-rate CDR in 32nm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC)*, 2010, pp. 366–367.
- [5] P. Francese et al., "A 16Gb/s 3.7mW/Gb/s 8-tap DFE receiver and baud rate CDR with 30kppm tracking bandwidth," in *Proc. A-SSCC*, 2013, pp. 33–36.
- [6] T. Musah and A. Namachivayam, "Robust timing error detection for multilevel baud-rate CDR," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 69, no. 10, pp. 3927–3939, Oct. 2022.

- [7] F. Tachibana et al., "A 56-Gb/s PAM4 transceiver with false-lockaware locking scheme for mueller-müller CDR," in *Proc. IEEE 48th Eur. Solid State Circuits Conf. (ESSCIRC)*, 2022, pp. 505–508.
- [8] M.-C. Choi, H.-G. Ko, J. Oh, H.-Y. Joo, K. Lee, and D.-K. Jeong, "A 0.1-pj/b/dB 28-Gb/s maximum-eye tracking, weight-adjusting MM CDR and adaptive DFE with single shared error sampler," in *Proc. IEEE Symp. VLSI Circuits*, 2020, pp. 1–2.
- [9] H. Ju, K. Lee, K. Park, W. Jung, and D.-K. Jeong, "Design techniques for 48-Gb/s 2.4-pj/b PAM-4 baud-rate CDR with stochastic phase detector," *IEEE J. Solid-State Circuits*, vol. 57, no. 10, pp. 3014–3024, Oct. 2022.
- [10] S. Lee, B. Kang, W. Rhee, and D.-K. Jeong, "A 0.061-pj/b/dB 28-Gb/s gradient-based maximum eye tracking CDR with 2-tap DFE adaptation in 28-nm CMOS," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 70, no. 11, pp. 3998–4002, Nov. 2023.
- [11] X. Xiang, W. Gai, A. He, H. Zhou, and Y. Dong, "Equal-slope baudrate CDR algorithm with optimized eye opening," *Microelectron. J.*, vol. 114, May 2021, Art. no. 105138.
- [12] T. Musah, "Time-based error extraction for multilevel receivers," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), 2020, pp. 1–5.
- [13] B. Razavi, "The strongARM latch [a circuit for all seasons]," *IEEE Solid State Circuits Mag.*, vol. 7, no. 2, pp. 12–17, Jun. 2015.
- [14] J. Kim, B. S. Leibowitz, J. Ren, and C. J. Madden, "Simulation and analysis of random decision errors in clocked comparators," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 56, no. 8, pp. 1844–1857, Aug. 2009.
- [15] I.-M. Yi et al., "A time-based receiver with 2-tap DFE for a 12Gb/s/pin single-ended transceiver of mobile dram interface in 0.8v 65nm CMOS," in *Proc. ISSCC*, 2017, pp. 400–401.
- [16] M. Zhang, Y. Zhu, C.-H. Chan, and R. P. Martins, "16.2 a 4× interleaved 10GS/s 8b time-domain ADC with 16× interpolation-based inter-stage gain achieving >37.5dB SNDR at 18ghz input," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC)*, 2020, pp. 252–254.
- [17] M. Abouzeid and T. Musah, "Hysteretic error extraction in multi-level wireline receivers," in *Proc. ISCAS*, 2021, pp. 1–5.
- [18] Z. Tan, C.-H. Chen, Y. Chae, and G. C. Temes, "Incremental deltasigma ADCs: A tutorial review," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 67, no. 12, pp. 4161–4173, Dec. 2020.



AHMED ABDELAZIZ (Graduate Student Member, IEEE) received the B.Sc. and M.Sc. degrees in electronic and communication engineering from Cairo University, Egypt. He is currently pursuing the Ph.D. degree with the Electrical and Computer Engineering Department, The Ohio State University, Columbus, OH, USA. From 2016 to 2019, he worked as a part time Analog Design Engineer with Si-ware, Egypt. His main area of research is high speed serial links and RF transceivers. In 2016, he got the 1st place in

Ibtiecar 2016 graduation projects competition - IC Design Track, a national wide contest.



**MOHAMED AHMED** (Graduate Student Member, IEEE) received the B.Sc. and M.Sc. degrees in electronic and communication engineering from Cairo University, Egypt. He is currently pursuing the Ph.D. degree with the Electrical and Computer Engineering Department, The Ohio State University, Columbus, OH, USA. His area of research is high speed serial links, power management, and general analog/mixed-signal IC design. From 2016 to 2020, he worked as an IC Design Engineer with Vidatronic Inc., Egypt. In

2016, his capstone project placed first in the Egyptian Engineering (EED 2016) and in Ibtiecar 2016 graduation projects competition - IC Design Track, a national wide contest.



**TAWFIQ MUSAH** (Senior Member, IEEE) received the B.S. degree in electrical engineering from Columbia University, New York, NY, USA, in 2005, and the Ph.D. degree in electrical and computer engineering from Oregon State University, Corvallis, OR, USA, in 2010.

From 2010 to 2018, he worked with the Signaling Research Lab, Intel Corporation, Hillsboro, OR, USA, on circuits and systems to enable Intel's next generation chip-to-chip electrical and optical links. Before joining Intel in 2010,

he interned with Texas Instruments designing a hardware sensor in 2010 and Intel Labs working on micro-power ADC in 2006 and 2007. He is currently an Assistant Professor with the Department of Electrical and Computer Engineering, THE Ohio State University, Columbus, OH, USA. His research interests include low-power equalization techniques for nextgeneration electrical and optical I/O links, multi-GS/s ADCs, and high-level circuit modeling and verification.

Dr. Musah is a recipient of Intel Labs Divisional Recognition Award in 2014 and 2017 and Intel Labs Academy Award for Excellence in Bringing New Experiences or Technical Innovation to Market in 2015. He has been serving on the Technical Programming Committee of IEEE International Symposium on Circuits and System since 2021.