# Optimal Optical Receivers in Nanoscale CMOS: A Tutorial

Bahaa Radi<sup>®</sup>, *Member, IEEE*, Diaaeldin Abdelrahman<sup>®</sup>, *Member, IEEE*, Odile Liboiron-Ladouceur<sup>®</sup>, *Senior Member, IEEE*, Glenn Cowan<sup>®</sup>, *Member, IEEE*, and Tony Chan Carusone<sup>®</sup>, *Fellow, IEEE* 

Abstract—The integration of optical receivers in nanoscale CMOS technologies is challenging due to less intrinsic gain and more noise compared to SiGe BiCMOS technologies. Recent research revealed that low-noise, high-gain, and lowpower CMOS optical receivers can be designed by limiting the bandwidth of the front-end followed by equalization techniques that benefit from good switching characteristics offered by CMOS technologies. In this tutorial brief, the operation of decision-feedback equalization, feed-forward equalization, and continuous-time linear equalization is reviewed in the context of high baud-rate 2-PAM and 4-PAM modulation. Recent advances and techniques in 4-PAM optical receivers are reviewed and compared in terms of speed, sensitivity, bandwidth, and efficiency.

*Index Terms*—Circuit noise, decision feedback equalizers, equalizers, optical receivers, pulse modulation, sensitivity.

## I. INTRODUCTION

W ITH a growing demand for data stemming from video streaming, social media, and cloud computing, traffic in data centers has grown by three folds over the past five years [1]. Consequently, optical interconnects are replacing electrical interconnects, which suffer from crosstalk and frequency-dependent losses, especially at high speeds and long distances (100s of meters). Modern Ethernet standards [2] require optical modules capable of supporting aggregated data rates of 400 Gb/s. Some define four 4-PAM optical channels, each supporting a data rate of 100 Gb/s (e.g., 400GBASE-DR4). In conventional 400G, pluggable optics consist of a discrete photodetector, a SiGe BiCMOS TIA, and a CMOS SerDes interconnected over a PCB [3]. The electrical interconnects between components are lossy and, thus,

Manuscript received January 31, 2022; revised April 4, 2022; accepted April 5, 2022. Date of publication April 11, 2022; date of current version May 27, 2022. This brief was recommended by Associate Editor E. Bonizzoni. (*Corresponding author: Bahaa Radi.*)

Bahaa Radi and Tony Chan Carusone are with the Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada (e-mail: bahaa.radi@isl.utoronto.ca; tony.chan.carusone@isl.utoronto.ca).

Diaaeldin Abdelrahman is with the Department of Electrical Engineering, Assiut University, Assiut 71515, Egypt (e-mail: diaaeldin@aun.edu.eg).

Odile Liboiron-Ladouceur is with the Department of Electrical and Computer Engineering, McGill University, Montreal, QC H3A 0E9, Canada (e-mail: odile.liboiron-ladouceur@mcgill.ca).

Glenn Cowan is with the Department of Electrical and Computer Engineering, Concordia University, Montreal, QC H3G 1M8, Canada (e-mail: gcowan@ece.concordia.ca).

Color versions of one or more figures in this article are available at https://doi.org/10.1109/TCSII.2022.3166468.

Digital Object Identifier 10.1109/TCSII.2022.3166468

unscalable. CMOS optical receivers integrated with the SerDes obviate this problem and reduce size and cost.

Whereas SiGe BiCMOS offers high intrinsic gain, bandwidth, and low noise, nanoscale CMOS offers good switching circuits, including some recently reported equalizer circuits, but suffers from less gain and more noise. Therefore, there is a need to rearchitect receivers' analog front-ends to leverage nanoscale CMOS technologies' strengths.

The conventional way of supporting higher data rates in optical receivers is to extend the front-end bandwidth. However, this generally implies lower transimpedance [4], [5]. To break this trade-off, the bandwidth of the TIA can be intentionally limited below the conventional target of  $0.5 \times$ baud rate, allowing for higher gain at the cost of intersymbol interference (ISI). ISI can be corrected using equalization techniques suited to nanoscale CMOS implementation. This optimization is well studied for 2-PAM modulation [6] and many prototypes leveraging different equalization techniques were developed [7]–[12]. However, 4-PAM modulation is more susceptible to bandwidth limitations because ISI is three times larger (relative to eye height) than in 2-PAM. As a result, it is important to study this optimization in the context of 4-PAM.

Section II of this tutorial covers continuous-time linear equalization (CTLE), feed-forward equalization (FFE), and decision feedback equalizer (DFE)-based optical receivers. Section III compares these equalization techniques. Section IV reviews recent advances in optical receiver design with emphasis on 4-PAM optical receivers, where we also look at design trends. Finally, Section V concludes the tutorial.

#### **II. FRONT-END OPTIMIZATION**

Shunt-feedback transimpedance amplifiers (SFTIA), particularly inverter-based, are the most popular nanoscale CMOS TIA in recent years [3], [12]–[16]. Inverters offer high linearity, high transconductance per unit bias current, self-biasing in a feedback configuration, and high swing. We consider the TIA in Fig. 1 (a) with the small-signal model in Fig. 1 (b).

In this model, the input capacitance,  $C_{IN}$ , is the sum of the photodetector capacitance,  $C_{PD}$ , the pad capacitance,  $C_{PAD}$ , and the inverters' gate-to-source capacitances,  $C_{gs}$ .  $C_a$  is the capacitance of the following stage, and  $R_a$  is the output resistance of the TIA. The combined transconductance of the NMOS and the PMOS devices is  $g_m$ , and  $R_f$  is the feedback resistor. Finally, the model includes the gate-to-drain capacitance,  $C_{gd}$ , which is important due to the Miller effect. Specifically,  $1/R_f C_{gd}$  could become the dominant pole for large transistor sizes and feedback resistances.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/



Fig. 1. (a) TIA circuit diagram and its (b) small signal model. Noise sources are annotated in red.

TABLE I VALUES USED FOR SIMULATIONS

| D /          | D i d                          | \$7.1                        |  |  |  |
|--------------|--------------------------------|------------------------------|--|--|--|
| Parameter    | Description                    | value                        |  |  |  |
| $f_{baud}$   | Baud rate                      | 64 Gbaud                     |  |  |  |
| $^{1}ft$     | Technology transient frequency | $5 \times f_{baud}$          |  |  |  |
| $g_m$        | TIA combined transconductnce   | 10-50 mS                     |  |  |  |
| $R_{f}$      | Feedback resistor              | 100-4000 Ω                   |  |  |  |
| $C_g$        | Gate capacitance               | $C_g = \text{gm}/(2\pi f_t)$ |  |  |  |
| $^2C_{gs}$   | Gate-to-source capacitance     | $2/3 C_g$                    |  |  |  |
| $^{2}C_{gd}$ | Gate-to-drain capacitance      | $1/3 C_g$                    |  |  |  |
| $C_{pd}$     | PD capacitance                 | 60 fF                        |  |  |  |
| $C_{pad}$    | Pad capacitance                | 40 fF                        |  |  |  |
| $C_{IN}$     | Total input capacitance        | $C_{pad} + C_{pd} + C_{gs}$  |  |  |  |
| $C_a$        | Output capacitance             | 25 fF                        |  |  |  |
| A            | Inverter gain                  | 6 V/V                        |  |  |  |
| $R_a$        | TIA output resistance          | $A/g_m$                      |  |  |  |
| $I_{pp}$     | Peak-peak input current        | $100\mu A$                   |  |  |  |

<sup>1</sup> This comparable to  $f_t$  of the 14 nm FinFET CMOS node [17]. <sup>2</sup> Assumed based on simulations reported in [12].

The transfer function of the TIA is

$$Z_{TIA}(s) = \frac{-R_a(g_m R_f - 1 - sC_{gd}R_f)}{1 + g_m R_a + sK_1 + s^2 K_2},$$
(1)

where

$$K_1 = C_{IN}(R_a + R_f) + C_a R_a + C_{gd} R_f (1 + g_m R_a), \qquad (2)$$

and

$$K_2 = R_f R_a (C_{IN} C_a + C_{IN} C_{gd} + C_{gd} C_a).$$
(3)

Some parameters in this model are coupled. Namely, the transconductance,  $C_{gs}$  and  $C_{gd}$  are coupled by the technology transition frequency,  $f_t$ , and, as the gain of the inverter,  $A = g_m R_a$  is the transistors' intrinsic gain, so  $R_a$  and  $g_m$  are coupled. Table I summarizes the numerical values that will be used in this brief alongside the relationships between coupled parameters. We will use  $g_m$  and  $R_f$  as our design parameters. Model values are based on [3], [12] which report a 64 Gb/s and 2-PAM, and a 100 Gb/s 4-PAM optical receivers, respectively. For our simulations, we target a bit rate of 64 Gb/s in the 2-PAM case and 64 Gbaud (128 Gb/s) in the 4-PAM case.

In the following subsections, we take the following approach: 1) describe and illustrate the equalization technique used, 2) calculate the worst-case eye opening assuming 2-PAM signaling using peak distortion analysis, 3) calculate the output-referred noise from the model, 4) calculated the worst-case signal-to-noise ratio,  $SNR_{WC}$  at the output of the receiver, and finally 5) extend the conclusions to 4-PAM.

#### A. CTLE-Based Optical Receivers

We first consider a SFTIA followed by a CTLE stage that recovers a part of the bandwidth and reduces ISI. As such, the TIA in (1) can be redesigned to have  $1/\chi$  times less bandwidth compared to an unequalized (UE) implementation. An ideal unity-gain CTLE stage that recovers the full bandwidth has the transfer function:

$$H_{CE}(s) = \frac{1 + g_m R_a + sK_1 + s^2 K_2}{(1 + g_m R_a)(1 + \frac{s}{\chi Q^2 \pi f_{TIA}} + \frac{s^2}{(\chi^2 \pi f_{TIA})^2})}.$$
 (4)

where,  $f_{TIA}$  is the 3-dB bandwidth of the TIA preceding the CTLE, and Q is the quality factor of the CTLE, taken here as  $1/\sqrt{2}$ . The zeros of the CTLE perfectly cancel the poles of the TIA in (1), and the pole frequencies of the CTLE are  $\chi$  times higher relative to those of the preceding TIA. It should be noted that a practical CTLE stage has more poles than zeros.

The transfer function from the input to the output is the product of (1) and (4). Thus,  $R_f$  can be increased, reducing the bandwidth of the TIA while the CTLE stage recovers that bandwidth. Practically, the value of  $\chi$  cannot be too large because it leads to: 1) excessive peaking in the CTLE stages leading to gain and group delay variations; 2) decreased tunability and increased susceptibility to PVT variations [18].

As the total bandwidth of the TIA/CTLE ( $\chi f_{TIA}$ ) becomes smaller, i.e., below 0.5× baud rate, the signal will not have sufficient time to settle, leading to a degradation of gain. Moreover, this leads to ISI, further reducing eye opening. This is illustrated in the pulse responses shown in Fig. 2 (a) for various  $\chi f_{TIA}$ , where precursors and postcursors are introduced when the bandwidth is far below the baud rate. From this, for 2-PAM, the worst-case eye opening,  $V_{ISI}$  is calculated from the main cursor  $V_{A,0}$  and the *i*<sup>th</sup> pre/postcursors,  $V_{A,i}$ :

$$V_{ISI} = |V_{A,0}| - \sum_{i \neq 0} |V_{A,i}|$$
(5)

This method for finding eye-opening is peak distortion analysis (PDA), and is extensible to 4-PAM [19].

To understand the benefit of a CTLE, we next define the worst-case signal-to-noise ratio  $(SNR_{WC})$  as a function of  $f_{3dB}/f_{baud}$  and  $\chi$ . We will use  $f_{3dB}$  to refer to the overall 3-dB bandwidth of the TIA/CTLE in CTLE-based receivers, and to the 3-dB bandwidth of the TIA in the FFE and DFE-based receivers. We begin by considering the noise sources in the SFTIA: the channel thermal noise,  $I_{n,g_m}^2 = 4kT\gamma g_m$ , and the thermal noise of the feedback resistor,  $I_{n,R_f}^2 = 4kT/R_f$ . The calculation of the noise at the output of the TIA can be



Fig. 2. (a) Pulse response at the output of the CTLE stage. Here,  $g_m = 30$  mS and we sweep  $R_f$ . (b)  $SNR_{WC}$  as a function of  $\chi$ . (c)  $SNR_{WC}$  for UE receiver and a CTLE-based receiver with  $\chi = 2$  as a function of  $f_{3dB}/f_{baud}$ . Here,  $f_{3dB}$  refers to the total bandwidth of the chain (i.e., TIA+CTLE).



Fig. 3. (a) FFE block diagram. (b) illustration of the FFE operation with 3-taps ( $g_m = 30$ mS and  $R_f = 600 \Omega$ . (c)  $SNR_{WC}$  as a function of  $f_{3dB}/f_{baud}$ .

simplified by splitting  $I_{n,R_f}^2$  as in Fig. 1 [20]. The resulting TIA output power spectral density,  $S_{out}$ , is

$$S_{out}(s) = I_{n,R_f}^2 |Z_{TIA}(s) - Z_o(s)|^2 + I_{n,g_m}^2 |Z_o(s)|^2$$
(6)

where  $Z_o$  is the output impedance of the TIA,

$$Z_o(s) = \frac{R_a(1 + sR_f(C_{gd} + C_{IN}))}{1 + g_m R_a + sK_1 + s^2 K_2}$$
(7)

We later use (6) in FFE and DFE noise calculations. Here, we are interested in the power spectral density at the output of the CTLE stage,  $S_{CE}(s)$ 

$$S_{CE}(s) = I_{n,R_f}^2 |(Z_{TIA} - Z_o)H_{CE}|^2 + I_{n,g_m}^2 |Z_o \times H_{CE}|^2$$
(8)

The output RMS noise voltage is

$$V_{n,out} = \sqrt{\int_0^\infty S_{CE}(s) df}.$$
(9)

Finally the worst-case signal-to-noise ratio  $(SNR_{WC})$  is defined as the ratio of the eye-opening found from PDA to the RMS noise.

$$SNR_{WC} = 20 \log_{10} \left( \frac{V_{ISI}}{V_{n,out}} \right). \tag{10}$$

We plot  $SNR_{WC}$  as a function of  $\chi$  and  $f_{3dB}/f_{baud}$  as shown in Fig. 2 (b) and (c), respectively. In constructing these plots, we sweep the values of  $g_m$  and  $R_f$  and pick the best achievable  $SNR_{WC}$  for a given  $f_{3dB}/f_{baud}$  or  $\chi$ .

From Fig. 2 (b), we observe  $SNR_{WC}$  improves as  $\chi$  increases. However, this improvement is more pronounced when going from  $\chi = 1$  to  $\chi = 1.5$  compared to going from  $\chi = 1.5$  to  $\chi = 2$ . This is because, while employing a CTLE with a reduced-bandwidth TIA helps in suppressing white noise, the colored noise is unaffected [6], [18], and using large values of  $\chi$  provides only marginal improvement because the colored noise component dominates.

The worst-case SNR,  $SNR_{WC}$ , is plotted as a function of  $f_{3dB}/f_{buad}$  in Fig. 2 (c) for a UE TIA, and a TIA followed by

a CTLE with  $\chi = 2$ . For 2-PAM signaling, the optimal  $f_{3dB}$  in the UE case is  $0.3 \times f_{baud}$  and it increased to  $0.39 \times f_{baud}$  in the CTLE-based receiver. A lower  $f_{3dB}$  results in ISI that degrades  $SNR_{WC}$  while  $f_{3dB}$  larger than  $0.3 \times f_{baud}$  increases the outputreferred integrated noise voltage also degrading  $SNR_{WC}$ . The CTLE implementation has a 3 dB better  $SNR_{WC}$  compared to the UE implementation. For 4-PAM modulation, in the CTLE-based receiver, the optimal  $f_{3dB}$  is around  $0.53 \times f_{baud}$  compared to 0.45 in the UE implementation, and the CTLE provides around 4.7 dB of  $SNR_{WC}$  improvement. We note that the optimal  $f_{3dB}$  for 4-PAM is  $1.38 \times$  higher (relative to baud rate) than for 2-PAM, significantly less than the  $2\times$  increase in data rate afforded by 4-PAM. We also note the bandwidth of the TIA in the CTLE-based implementation is less than that of the UE TIA.

# B. Feed-Forward Equalization

A feed-forward equalizer (FFE)-based optical receiver can be modeled as shown in Fig. 3(a). Each FFE tap produces a delayed, scaled version of the input pulse. By adding a timeshifted and scaled version of the signal, pre- and post-cursors can be reduced. This operation is demonstrated in Fig. 3 (b) for a three-tap FFE. Once tap weights are set, the worst-case vertical eye opening is calculated from the equalized pulse response using (5).

When selecting tap weights in FFE-based receivers, noise enhancement of FFE should be considered. In FFE-based receivers, the FFE filter sums scaled and delayed versions of the same signal, and considering that noise at the output of the TIA is colored, noise samples present in these signals are correlated. This should be considered when calculating the output noise power. To calculate output noise power, we begin by calculating the autocorrelation of the noise at the output of the TIA,

$$R(\tau) = \int_0^\infty S_{out}(j2\pi f)e^{j2\pi f\tau}df.$$
 (11)



Fig. 4. (a) FIR-DFE block diagram. (b) FIR-DFE operation illustration (with  $g_m = 30 \text{ mS}$  and  $R_f = 600 \Omega$ . (c)  $SNR_{WC}$  as a function of  $f_{3dB}/f_{baud}$ .

The output-referred RMS noise voltage at the output of an N-tap FFE can then be calculated as follows:

$$V_{n,FFE} = \sqrt{\sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_i \alpha_k R(\frac{|i-k|}{f_{baud}})}.$$
 (12)

Here,  $\alpha_i$  and  $\alpha_k$  is the *i*<sup>th</sup> and *k*<sup>th</sup> tap coefficients, respectively.

As can be seen, tap coefficients appear both in  $V_{ISI}$  and  $V_{n,FFE}$  calculations. Therefore, the optimal coefficients maximize  $SNR_{WC}$  as opposed to minimizing ISI or noise. Tap weights can be calculated using adaptive algorithms that minimize the error between the output of the FFE and a training sequence. Alternatively, they can be calculated from the pulse response and noise autocorrelation function [21].

Finally,  $SNR_{WC}$  can be calculated using (10). Fig. 3 (c) plots  $SNR_{WC}$  versus  $f_{3dB}/f_{baud}$  for both 2-PAM and 4-PAM receivers. The optimal bandwidth of a 3-tap FFE-based receiver for the 2-PAM case is around  $0.13 \times f_{baud}$ , and it offers 3.4 dB of  $SNR_{WC}$  improvement. In the case of 4-PAM, the optimal bandwidth is  $0.25 \times f_{baud}$  with 4.5 dB of  $SNR_{WC}$  improvement.

#### C. Decision Feedback Equalization (DFE)

Typical DFE-based optical receivers have a finite impulse response (FIR) feedback loop as shown in the block diagram in Fig. 4 (a). For an M-tap FIR DFE-based receiver, each tap is designed to eliminate the corresponding postcursor. This operation is illustrated in Fig. 4 (b) using two taps. The 2-PAM worst-case vertical eye opening is calculated as

$$V_{ISI} = |V_{A,0}| - \sum_{i<0} |V_{A,i}| - \sum_{i>M} |V_{A,i}|$$
(13)

For an infinite-length DFE, all postcursors are removed, and the precursors limit the vertical eye opening. An infinite-length DFE can either be approximated with an analog feedback filter or a long digital FIR feedback filter. One challenge in DFE design is the feedback loop's timing requirement: the slicer output must propagate through the feedback filter to the slicer input within one baud interval. Digital DFE implementations address this with parallelism, implying a complexity and power consumption that increases exponentially with the number of taps [12]. Recently, however, novel DFE architectures break this difficult tradeoff allowing for the pipelining of DFE logic [22]–[24].

Feedback signals are produced from the noiseless signal at the output of the slicer, so a DFE output can be noiseless. The noise voltage at the input of the decision circuit, assuming a



Fig. 5. Comparing all  $SNR_{WC}$  curves for different equalization techniques for a) 2-PAM modulation and b) 4-PAM modulation.

noiseless feedback loop, is

$$V_{n,out} = \sqrt{\int_0^\infty S_{out}(s)df}$$
(14)

Unlike FFE-based optical receivers that enhance noise (see (12)), the DFE loop has no impact on the output referred noise of the TIA. The  $SNR_{WC}$  is calculated using (10) and plotted in Fig. 4 (c) versus  $f_{3dB}/f_{baud}$  for both a 2-tap and infinite length DFE. The optimal bandwidth for 2-PAM signals is around 0.18 × the datarate while it is 0.22 × the baud rate for 4-PAM signals. An ideal infinite length DFE allows for a bandwidth reduction down to 0.04 × the baud rate before the impact of the precursors starts limiting  $SNR_{WC}$ . A two-tap DFE improves  $SNR_{WC}$  by around 4 dB in the case of 2-PAM and by around 5.5 dB in the case of 4-PAM. As the number of taps increases, DFE curves approach the infinite-length curve.

#### **III.** COMPARISON

An overlay of the  $SNR_{WC}$  curves for all types of receivers is shown in Fig. 5. As seen, a 2-tap DFE-based receiver exhibits optimal  $SNR_{WC}$ . CTLE and 3-tap FFE-based optical receivers exhibit similar  $SNR_{WC}$  improvement. FFE-based receivers exhibit less  $SNR_{WC}$  improvement compared to DFEbased receivers because of the noise enhancement. Meanwhile, CTLE-based receivers provide less  $SNR_{WC}$  improvement because, while they provide significant improvement in white noise suppression, they do not have any impact on colored noise. Finally, we note that  $SNR_{WC}$  scales in proportion to the input current,  $I_{pp}$ , without affecting the optimal bandwidth in each case.

#### IV. STATE-OF-THE-ART

Table II summarizes some of the most recently published high-speed 2-PAM and 4-PAM receivers. The receiver in [3] uses a 2-tap FFE and a 2-tap DFE and lowers bandwidth to optimize the sensitivity. It was found that the optimal bandwidth for 4-PAM receivers is higher (relative to baud rate

| Specification                                 | [3]                 | [13]       | [25]       | [26]    | $[15]^1$                  | [27]   | $[14]^1$    | [12]                          |
|-----------------------------------------------|---------------------|------------|------------|---------|---------------------------|--------|-------------|-------------------------------|
| Tashnalasy                                    | 28 nm               | 16 nm      | 28 nm      | 40 nm   | 40 nm                     | 16 nm  | 22 nm       | 14 nm                         |
| Technology                                    | Bulk CMOS           | FinFET     | Bulk CMOS  | CMOS    | CMOS                      | FinFET | FinFET      | FinFET                        |
| Data Rate (Gb/s)                              | 100                 | 106.25     | 112        | 32      | 64                        | 50     | 128         | 64                            |
| 2-PAM Sensitivity at                          |                     |            |            |         |                           |        |             |                               |
| $BER = 10^{-12} \text{ (dBm OMA)}$            | -11.1               | -          | -          | -       | -                         | -      | -           | -9                            |
| 4-PAM Sensitivity at                          |                     |            |            |         |                           |        |             |                               |
| $BER = 2.4 \times 10^{-4} \text{ (dBm OMA)}$  | -8.9                | -11        | -5.1       | -4.8    | $-14.5^{2}$               | -10.9  | $-13.8^{2}$ | -                             |
| PD/Input Capacitance (fF)                     | 70/115              | 10/-       | 70/-       | 100/150 | -/-                       | 35/90  | -/-         | 69/100                        |
| PD Responsivity (A/W)                         | 1                   | 0.96       | 0.5        | 0.8     | -                         | 1      | -           | 0.52                          |
| Energy Efficiency (pJ/bit)                    | 3.9                 | 0.57       | 0.96       | 4.03    | 6.56                      | 1.38   | 0.1         | 1.4                           |
| TIA Bandwidth (GHz)                           | 20                  | 27         | 60         | 4.8     | 12                        | 30     | 64          | 15                            |
| Equalization/Bandwidth<br>Extension Technique | Inductive peaking + | Series and | Series and | 2-tap   | 3-tap T-c<br>DFE al<br>CT | T-coil | Shunt       | <sup>3</sup> Series Inductive |
|                                               | 2-tap FFE +         | Shunt      | Shunt      | FIR     |                           | and    | Inductive   | Peaking+                      |
|                                               | 2-tap FIR DFE       | Peaking    | Peaking    | DFE     |                           | CTLE   | Peaking     | 1-tap DFE                     |
| Bandwidth/Baud rate ratio                     | 0.4                 | 0.51       | 1.07       | 0.3     | 0.375                     | 0.6    | 1           | 0.23                          |

TABLE II State-of-the-Art Summary

<sup>1</sup> Electrical measurements only.

<sup>2</sup> Calculated from the reported input-referred noise assuming a responsivity of 1A/W and an extinction ratio of 20 dB.

<sup>3</sup>A look-ahead speculative implementation of the DFE tap.

but not bit rate) than 2-PAM receivers for a given DFE size, especially when the effect of input jitter is included. This combination of 4-PAM modulation and input jitter amplification by the lower-bandwidth front-end [28] led to the choice of a 20 GHz front-end, which is  $0.4 \times f_{baud}$ . The number of taps is limited to two as more taps lead to increased power consumption while only providing marginal improvement in SNR. A high data rate of 100 Gb/s was achieved despite the low bandwidth.

Reference [13] describes a full-bandwidth 4-PAM receiver where dc-coupled CMOS inverters are used in the entire signal path. A bandwidth of 27 GHz  $(0.51 \times f_{baud})$  is achieved by using series inductive peaking at the input TIA stage and shunt inductive peaking between stages. It achieves a data rate of 106.25 Gb/s. Reference [14] describes a low-power SFTIA with shunt inductive peaking. A record high-speed of 128 Gb/s is achieved. However, only electrical measurements are reported, and the DC gain is 59.3 dB.Q, which is low compared to other receivers. Reference [25] describes a fullbandwidth SFTIA that uses both shunt and series peaking to achieve a high bandwidth of 60 GHz to support 112 Gb/s 4-PAM modulation. Reference [27] describes a 50 Gb/s receiver that uses T-coils in the TIA stage along with a CTLE stage to achieve a bandwidth of 30 GHz. All four receivers have a  $f_{3dB}/f_{baud} > 0.5.$ 

The receiver described in [26] optimizes SNR performance at the slicer input by limiting the bandwidth of the SFTIA to  $0.3 \times$  baud rate and uses a 2-tap DFE to eliminate the resulting ISI. This receiver achieves a data rate of 32 Gb/s while using a front-end bandwidth of only 4.8 GHz. Similarly, [15] is an optimized 64 Gb/s receiver that limits the bandwidth of the front-end to 12 GHz (0.375 × baud rate) and eliminates ISI by using a 3-tap DFE.

Reference [12] is a 64 Gb/s low-bandwidth 2-PAM receiver in which the bandwidth of the TIA is only 15 GHz followed by a 1-tap DFE to remove the 1<sup>st</sup> postcursor. According to [12], the number of taps is limited to one as more taps only resulted in minor SNR improvement. Compared to the 4-PAM receiver in [3], the ratio of bandwidth to baud rate is almost twice larger, which is in line with our findings.

It follows from Table I and this discussion that both full-bandwidth receivers that employ inductive peaking and limited-bandwidth equalized receivers are both still in use optical receivers. With the development of high-speed analog-to-digital converters and ADC-based front-ends that allow for sophisticated equalization, especially in sub-10nm CMOS, we anticipate that low-bandwidth front-ends may see even more use in the future.

### V. CONCLUSION

This tutorial brief covered the optimization of the frontend of optical receivers. We looked into different optimization techniques used and quantified the optimal bandwidths for 2-PAM and 4-PAM signaling. We found that the optimal bandwidth relative to the baud rate is higher in the case of 4-PAM modulation, but it is, in fact, lower relative to the bit rate. This is because of the  $2 \times$  bit rate increase offered by 4-PAM. A review of the state-of-the-art optical receivers was presented. The ongoing trends are the implementation of bandwidth extension techniques and equalization techniques to enable the design of 4-PAM receivers capable of achieving the data rates required by the 400G Ethernet standard and emerging 800G and 1.6T standards. We anticipate that lowbandwidth techniques will see use in ADC-based nanoscale CMOS optical front-ends.

#### REFERENCES

- "Cisco global cloud index: Forecast and methodology, 2016–2021," San Jose, MA, USA, Cisco, White Paper. Accessed: Jan. 9, 2022.
   [Online]. Available: http://https://virtualization.network/Resources/ Whitepapers/0b75cf2e-0c53-4891-918e-b542a5d364c5\_white-paperc11-738085.pdf
- [2] "IEEE P802.3bs 400GbE." [Online]. Available: http://www.ieee802.org/ 3/bs/ (Accessed: Dec. 27. 2021).
- [3] H. Li, C.-M. Hsu, J. Sharma, J. Jaussi, and G. Balamurugan, "A 100-Gb/s PAM-4 optical receiver with 2-Tap FFE and 2-tap direct-feedback DFE in 28-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 57, no. 1, pp. 44–53, Jan. 2022.
- [4] S. S. Mohan, M. D. M. Hershenson, S. P. Boyd, and T. H. Lee, "Bandwidth extension in CMOS with optimized on-chip inductors," *IEEE J. Solid-State Circuits*, vol. 35, no. 3, pp. 346–355, Mar. 2000.
- IEEE J. Solid-State Circuits, vol. 35, no. 3, pp. 346–355, Mar. 2000.
  [5] E. Säckinger, "The transimpedance limit," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 57, no. 8, pp. 1848–1856, Aug. 2010.
  [6] D. Abdelrahman and G. E. R. Cowan, "Noise analysis and design con-
- [6] D. Abdelrahman and G. E. R. Cowan, "Noise analysis and design considerations for equalizer-based optical receivers," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 8, pp. 3201–3212, Aug. 2019.
- [7] S. Saeedi, S. Menezo, G. Pares, and A. Emami, "A 25 Gb/s 3D-integrated CMOS/silicon-photonic receiver for low-power high-sensitivity optical communication," *J. Lightw. Technol.*, vol. 34, no. 12, pp. 2924–2933, Jun. 15, 2016.
- [8] A. Sharif-Bakhtiar and A. C. Carusone, "A 20 Gb/s CMOS optical receiver with limited-bandwidth front end and local feedback IIR-DFE," *IEEE J. Solid-State Circuits*, vol. 51, no. 11, pp. 2679–2689, Nov. 2016.

- [9] S.-H. Huang and W.-Z. Chen, "A 25 Gb/s 1.13 pJ/b -10.8 dBm input sensitivity optical receiver in 40 nm CMOS," IEEE J. Solid-State *Circuits*, vol. 52, no. 3, pp. 747–756, Mar. 2017. [10] B. Radi, M. S. Nezami, M. Taherzadeh-Sani, F. Nabki, M. Ménard, and
- O. Liboiron-Ladouceur, "A 22-Gb/s time-interleaved low-power optical receiver with a two-bit integrating front end," IEEE J. Solid-State Circuits, vol. 56, no. 1, pp. 310-323, Jan. 2021.
- [11] A. Sharif-Bakhtiar, M. G. Lee, and A. C. Carusone, "Low-power CMOS receivers for short reach optical communication," in Proc. IEEE Custom
- Integr. Circuits Conf. (CICC), 2017, pp. 1–8.
  I. Ozkaya et al., "A 64-Gb/s 1.4-pJ/b NRZ optical receiver data-path in 14-nm CMOS FinFET," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3458–3473, Dec. 2017. [13] K. R. Lakshmikumar *et al.*, "A process and temperature insensitive
- CMOS linear TIA for 100 Gb/s/λ PAM-4 optical links," IEEE J. *Solid-State Circuits*, vol. 54, no. 11, pp. 3180–3190, Nov. 2019. [14] S. Daneshgar, H. Li, T. Kim, and G. Balamurugan, "A 128 Gb/s
- PAM4 linear TIA with 12.6 pA/ $\sqrt{\text{Hz}}$  noise density in 22nm FinFET CMOS," in Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC), 2021, pp. 135–138.
- [15] K.-L. Fu and S.-I. Liu, "A 64-Gb/s PAM-4 optical receiver with amplitude/phase correction and threshold voltage/data level calibration,' IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, no. 7, pp. 1726-1735, Jul. 2020.
- [16] L. Szilagyi, J. Pliva, R. Henker, D. Schoeniger, J. P. Turkiewicz, and F. Ellinger, "A 53-Gbit/s optical receiver frontend with 0.65 pJ/bit in 28-nm bulk-CMOS," *IEEE J. Solid-State Circuits*, vol. 54, no. 3,
- pp. 845–855, Mar. 2019. [17] J. Singh *et al.*, "14-nm FinFet technology for analog and RF applications," IEEE Trans. Electron Devices, vol. 65, no. 1, pp. 31-37, Jan. 2018.
- [18] D. Li et al., "A low-noise design technique for high-speed CMOS optical receivers," IEEE J. Solid-State Circuits, vol. 49, no. 6, pp. 1437-1447, Jun. 2014.

- [19] J. G. Proakis, Digital Communications. New York, NY, USA: McGraw-
- Hill, 2001, pp. 602–603.
  [20] F. Y. Liu *et al.*, "10-Gbps, 5.3-mW optical transmitter and receiver circuits in 40-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 9, pp. 2049-2067, Sep. 2012.
- [21] J. Cioffi. "EE379A Notes, Chapter 3." [Online]. Available: https://cioffigroup.stanford.edu/ (Accessed: Jan. 26, 2022).
- [22] A. L. Pola, D. E. Crivelli, J. E. Cousseau, O. E. Agazzi, and M. R. Hueda, "A new low complexity iterative equalization architecture for high-speed receivers on highly dispersive channels: Decision feedforward equalizer (DFFE)," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), 2011, pp. 133-136.
- [23] A. L. Pola, J. E. Cousseau, O. E. Agazzi, and M. R. Hueda, "Efficient decision feedforward equalizer with parallelizable architecture," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), 2013, pp. 2771-2774.
- J. Bailey et al., "A 112-Gb/s PAM-4 low-power nine-tap sliding-block DFE in a 7-nm FinFET wireline receiver," *IEEE J. Solid-State Circuits*, [24] vol. 57, no. 1, pp. 32-43, Jan. 2022.
- [25] H. Li, G. Balamurugan, J. Jaussi, and B. Casper, "A 112 Gb/s PAM4 Linear TIA with 0.96 pJ/bit energy efficiency in 28 nm CMOS," in Proc. IEEE 44th Eur. Solid-State Circuits Conf. (ESSCIRC), 2018, pp. 238-241.
- [26] Ŵ.-H. Ho, Y.-H. Hsieh, B. Murmann, and W.-Z. Chen, "A 32 Gb/s PAM-4 optical transceiver with active back termination in 40 nm CMOS technology," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), 2020, pp. 1–4.
- [27] M. Raj et al., "Design of a 50-Gb/s hybrid integrated si-photonic optical link in 16-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 55, no. 4, p. 1086-1095, Apr. 2020.
- [28] B. Casper and F. O'Mahony, "Clocking analysis, implementation and measurement techniques for high-speed data links-A tutorial," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 56, no. 1, pp. 17-39, Jan. 2009.