

Received 22 July 2020; revised 18 October 2020; accepted 3 November 2020. Date of current version 8 January 2021. Digital Object Identifier 10.1109/0JCAS.2020.3036531

# A 32 Gb/s PAM-4 Optical Transceiver With Active Back Termination in 40 nm CMOS Technology

WEI-HSIANG HO<sup>10</sup> (Member, IEEE), YI-HSUN HSIEH<sup>1</sup>, BORIS MURMANN<sup>10</sup> (Fellow, IEEE), AND WEI-ZEN CHEN<sup>1</sup> (Senior Member, IEEE)

<sup>1</sup>Department of Electronics Engineering and the Institute of Electronics, College of Electrical and Computer Engineering, National Chiao Tung University, Hsinchu 30010, Taiwan

> <sup>2</sup>Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA The editor coordinating the review of this article was Y. Dong.

CORRESPONDING AUTHOR: W-Z. CHEN (e-mail: wzchen@mail.nctu.edu.tw)

This work was supported in part by the Ministry of Science and Technology, Taiwan, under Contract 109-2221-E-009-133, in part by the Silicon Photonic Integrated Circuit Project, and in part by MediaTek.

**ABSTRACT** This article describes the design of a 32 Gb/s four-level pulse amplitude modulation (PAM-4) optical transceiver in a 40 nm CMOS technology. At the transmitter side, the laser driver is composed of an asymmetric waveform equalizer, a 3-tap feed-forward equalizer (FFE), and a novel active-back termination (ABT) circuit. The ABT circuit provides a self-tracking, tunable source impedance to match the characteristic impedance of different laser diodes. At the receiver side, the fully integrated optical receiver consists of a transimpedance amplifier (TIA), a variable gain amplifier (VGA), an automatic threshold tracking circuit (ATC), and a quarter-rate decision feedback equalizer (DFE). By using the adaptive ATC, it reduces the BER induced by the harmonic distortion along the signal path by more than 27X under a THD of -20dB. Both the ATC and DFE are automatically adapted by an on-chip sign-sign LMS (SSLMS) engine. Fabricated in TSMC 40 nm CMOS process, the chip area for the transmitter and receiver are about 0.029 mm<sup>2</sup> and 0.23 mm<sup>2</sup>. The power consumptions are about 146.8 mW and 128.8 mW respectively for the PAM-4 transmitter and receiver.

**INDEX TERMS** Active back termination, automatic threshold tracking, DFE, FFE, PAM4 transceiver.

#### **I. INTRODUCTION**

A S DRIVEN by the traffic demands of data centers, the Ethernet development has now surged to 400 Gb/s, and is expected to evolve to 800 Gb/s in the near future. To accommodate the ever-increasing data rate, multi-lane optical links play a main role for the high speed interconnects. When the data rate exceeds 50 Gb/s per lane, PAM-4 signaling is another key feature that provides a higher spectral efficiency compared to their NRZ counterparts. Since the eye height in PAM-4 signaling is reduced by approximately 9.5 dB under a constant power constraint, it imposes more design challenges on the optical transceiver design in terms of linearity, input referred noise, and tolerance to the residual ISI.

This article describes the design of a 32 Gb/s optical PAM-4 transceiver. The transmitter chip is composed of an electrical signal conditioning circuit followed by a current DAC to drive the laser diode. Typically, the output impedance of the driver circuit is matched to the characteristic impedance of the laser diode through a passive source resistance. In that case, half of the modulation current is consumed by the source resistance, which significantly degrades the energy efficiency of the laser driver. To overcome this shortcoming, a novel active back termination circuit is proposed. It can automatically track the impedance of the laser diode and reduce the extra power consumed by the source impedance while delivering the PAM-4 signals.

The receiver chip is designed from the perspective of minimizing input referred noise. A high gain and relatively low bandwidth TIA ( $\sim 0.3 \times$  baud rate) is adopted as the input stage. The residual ISI due to the insufficient front-end bandwidth is further compensated by DFE [1]–[4]. Given the nonlinearities of the E/O, O/E conversions and the receiver



FIGURE 1. The optical TX architecture.

front-end, the threshold voltages of the PAM-4 slicer is critical to the overall bit error rate performance. To overcome this problem, an automatic threshold tracking circuit is proposed for the PAM-4 demodulator to tackle the nonlinearity of the transmitter and receiver. It can greatly improve the BER of PAM-4 demodulator by  $27 \times$  compared to the prior art [5] using an evenly spaced quantizer.

This article is organized as follows. Section II describes the transmitter design. Section III introduces the receiver design. Section IV summarizes the experimental results. Section V concludes this research work.

## **II. TRANSMITTER DESIGN**

#### A. VCSEL CHARACTERISTICS AND EQUALIZATION

850 nm Vertical-Cavity Surface Emitting Lasers (VCSEL) coupled with multi-mode optical fibers are cost effective for high speed and short reach data links. As VCSELs' effective bandwidth and relaxation oscillation behavior are current level dependent [6], their effects become more pronounced in a PAM-4 transmitter with a higher extinction ratio (ER). To overcome the design issues, nonlinear transmitters using a high speed directly modulated lasers (DMLs) and Volterra equalizer have been demonstrated for over hundred Gb/s operation [7]. However, it demands a sophisticated DSP with a high computing power. An alternative approach is to perform nonlinear equalization in the analog domain [8], [9]. Based on the data dependent transient response in the four current levels, the rising and falling PAM-4 edges are equalized separately to improve the eye quality in the optical domain, which is implemented in this design.

#### **B. PAM-4 TRANSMITTER ARCHITECTURE**

Fig. 1 depicts the transmitter architecture. It is composed of a 3-tap FFE pre-driver followed by a 2-bit current DAC to perform PAM-4 modulation when driving the laser diode. Additionally, an on chip PRBS-7 generator is built-in to facilitate chip measurement. As long as the output node is accompanied by heavy parasitic capacitance, a feedforward equalizer (FFE) is utilized to extend the signal bandwidth along the data path. The FFE is a linear equalizer and is less effective in equalizing the data dependent frequency response of VCSEL. An asymmetric equalizer (AEQ) is adopted that injects compensation current to the laser diode through a current combiner. Fig. 2 illustrates the output current waveform



FIGURE 2. (a) Asymmetric equalizer output waveform (b) Edge detector and its delay line and control logic (c) Current combiner.

of the asymmetric equalizer. In order to compensate unequal rise and fall time response of the laser diode, it provides deemphasis at the data rising edge and pre-emphasis at the data falling edge [9]. Besides, instead of using a fixed delay equalization, the delay time for edge equalization  $(t_1, t_2)$  can be adjusted according to the current level of the data output. Fig. 2(b) illustrates the circuit schematic of the pulse generator and the delay line for the pulse width control of the asymmetric equalizer. The corresponding control logics are also shown in Fig. 2(b). The FFE incorporating with asymmetric equalizer are utilized to switch current DACs as shown in Fig. 2(c), whose outputs are summed up to modulate the VCSEL. A cascode output stage is utilized to prevent oxide from breakdown.

Conventional laser diode driver using a resistive load as a source termination to absorb the reflection signal power. As half of the modulation current at the driver output is consumed by the source resistance, it turns out that the energy efficiency of the transmitter is degraded. To improve this drawback, an active back termination (ABT) circuit is proposed in this design instead of using a passive



FIGURE 3. Active back termination circuit concept.



FIGURE 4. (a) Buffer stage of PAM-4 transmitter and (b) impedance tracking circuit.

termination. As the characteristic impedance of the off-theshelf laser diodes varies from vendor to vendor, the input impedance of the ABT is designed to automatically track the impedance of the laser diode.

## C. DRIVER WITH ACTIVE BACK TERMINATION MATCHING NETWORK

Fig. 3 illustrates the design concept of an active back termination circuit (ABT) [10]. It is composed of a dummy driver operating at a fraction of the modulation current  $(I_{mod}/k)$  followed by a unity gain voltage buffer that is attached to the main current mode output driver. When the modulation current is switched to the output load, the dummy driver will be activated simultaneously such that the input and output voltages of the unity gain buffer would be the same. Thus the  $I_{\text{mod}}$  will be fully delivered to the laser diode ( $R_{\text{LD}}$ ). On the other hand, the output impedance of the unity gain buffer  $(R_{amp})$  is designed to be equal to  $R_{LD}$ , which can be utilized to absorb the reflection energy when the modulation current is switched off. In contrast to the prior art, an automatic impedance tracking circuit (ITC) is proposed incorporating with the ABT. It can accommodate different laser diodes while maintaining impedance matching across a wide frequency range.

Fig. 4 (a) depicts the circuit schematic of the buffer stage of the transmitter, where a T-coil is adopted for a broad band



FIGURE 5. Simulated (a) Input impedance (b) S<sub>11</sub> of the ABT.

impedance matching, where  $C_{\rm DC} = 3 \text{ pF}$ ,  $L_1$  and  $L_2$  are about 128 pH. The unity gain buffer in the ABT is implemented as a source follower with a tunable gate resistance composed of  $M_p$  and  $R_p$ , which performs as the dummy resistance ( $R_{\rm dum}$ ) at the buffer input. The  $R_{\rm dum}$  is adjustable through a source impedance tracking circuit. Fig. 4(b) shows the circuit of the ITC. Here  $M_{\rm p6}$  and  $R_{\rm px}$  are replicas of  $M_p$  and  $R_p$  in the ABT. The  $R_{\rm ref}$  is customized according to  $R_{\rm LD}$ , thus  $R_{\rm dum}$ will automatically track  $R_{\rm ref}$  (and  $R_{\rm LD}$ ) through the feedback configuration.

$$R_{\rm ef} = \alpha R_{\rm LD} = R_{\rm px} + r_{M_{\rm p6}} = R_{\rm p} + r_{M_{\rm p}} \tag{1}$$

Meanwhile,  $M_{n1} - M_{n3}$ ,  $M_{p1} - M_{p4}$ , and  $R_{ref}$  are configured as a constant Gm biased circuit. It generates bias voltages  $V_b$  and  $V_c$  that are applied to the ABT. We have

$$\frac{1}{R_{\rm ref}} = \frac{1}{\alpha R_{\rm LD}} = \frac{g_{M_{\rm n}}}{\alpha} \tag{2}$$

where  $\alpha$  is a scaling factor determined by the device ratio of ABT and ITC in order to save the power consumption in the ITC. Since

$$R_{\rm amp} = R_{\rm LD} = \frac{1}{g_{M_{\rm n}}},\tag{3}$$

the source follower output impedance will also track  $R_{\rm LD}$  concurrently. Fig. 5 shows the input impedance and  $S_{11}$  of the ABT. The increments in the input impedance at low frequency is due to AC-coupled capacitors. According to the simulation results, the ABT can track  $R_{\rm LD}$  from 60 to 80  $\Omega$  with  $S_{11}$  less than -10 dB over a 30 GHz bandwidth.



FIGURE 6. A simplified schematic of SF-TIA and its noise contributors.



FIGURE 7. SF-TIA receiver SNR at different TIA bandwidth-baud rate ratios.

## **III. RECEIVER DESIGN**

Shunt-feedback TIA (SF-TIA) is commonly used as the input stage of the optical receiver to have a lower input referred noise. Fig. 6 shows a simplified schematic of the SF-TIA. It consists of a core amplifier  $A_{\rm C}(s)$  with a feedback resistor  $R_{\rm F}$ .  $C_{\rm in,tot}$  and  $C_{\rm D}$  respectively denotes the input and output capacitances of the core amplifier. By a single pole approximation of the core amplifier,  $A_{\rm C}(s) = A_0(1 + s / \omega_A)$ , the TIA gain response can be derived as [11], [12]

$$T_{\rm Z}(s) = T_{\rm Z0} \cdot \frac{\omega_{\rm n}^2}{s^2 + 2\zeta \omega_{\rm n} s + \omega_{\rm n}^2} \tag{4}$$

where  $T_{Z0}$ ,  $\omega_n$  and  $\zeta$  respectively represent the midband transimpedance gain, natural frequency and damping factor, and can be derived as

$$T_{\rm Z0} = \frac{A_0 R_{\rm F}}{1 + A_0} \tag{5}$$

$$\omega_{\rm n} = \sqrt{\frac{(1+A_0)\omega_{\rm A}}{R_{\rm F}C_{\rm in,tot}}} \tag{6}$$

$$\zeta = \frac{1}{2} \cdot \frac{1 + \omega_{\rm A} R_{\rm F} C_{\rm in,tot}}{\sqrt{(1 + A_0) \omega_{\rm A} R_{\rm F} C_{\rm in,tot}}} \tag{7}$$

For a maximally-flat gain response ( $\zeta = 0.707$ ), the TIA gain has a direct trade-off with its -3-dB bandwidth given the  $A_0$  and  $C_{PD}$ . That is,

$$R_{\rm F} = \frac{\sqrt{2}A_0}{\omega_{\rm TIA}C_{\rm PD}} \tag{8}$$

Considering the constraint of a limited gain bandwidth product  $(GBW_A)$  of the core amplifier, according to (6), the TIA gain can be derived as

$$R_{\rm F} = \frac{(A_0 + 1)\omega_{\rm A}}{C_{\rm in,tot}\omega_{\rm n}^2} \sim \frac{\rm GBW_{\rm A}}{C_{\rm in,tot}\omega_{\rm TIA}^2}$$
(9)

VOLUME 2, 2021

On the other hand, the input-referred noise spectral density of the TIA can be expressed as [11]

$$\overline{I_{n,in}^2}(f) = \frac{4kT}{R_F} + \frac{4kT\gamma}{G_m R_F^2} + \frac{4kTR_D}{A_0^2 R_F^2} + 4kT\gamma \frac{(2pC_{in,tot})^2}{G_m} f^2$$
(10)

The input referred noise current can be derived as

$$I_{n,\text{rms}}^2 = \int_0^\infty \overline{I_{n,\text{in}}^2(\mathbf{f})} d\mathbf{f} \propto f(\omega_{\text{TIA}}, C_{\text{PD}})$$
(11)

According to (11), it is preferable to choose a TIA with a narrower  $\omega_{\text{TIA}}$  to minimize  $I_{n,\text{rms}}^2$ , but at the expenses of increasing ISI. Assume that  $R_{\text{F}}$  is boosted by  $N^2$  times while  $\omega_{\text{TIA}}$  is reduced by N folds, (10) can be rewritten as

$$\overline{I_{n,in}^{2}(f)} = \frac{4kT}{N^{2}R_{F}} + \frac{4kT\gamma}{N^{4}G_{m}R_{F}^{2}} + \frac{4kT}{N^{4}G_{m}^{2}R_{D}R_{F}^{2}} + 4kT\gamma\frac{(2pC_{in,tot})^{2}}{G_{m}}f^{2}$$
(12)

The ISI effect will degrade the vertical eye opening (VEO) at the input of the decision circuit. The VEO can be expressed as

$$VEO = V_{h,0} - \sum_{n \neq 0} |V_{h,n}|$$
(13)

To find the best bandwidth of the TIA in the presence of circuit noise and ISI effect simultaneously, the signal integrity is quantified by the ratio of VEO to the rootmean-square (rms) noise at the output of the TIA. It can be derived as

$$SNR = \frac{VEO}{2\sqrt{V_{n,out}^2}}$$
(14)

Fig. 7 shows the corresponding SNR under different TIA bandwidths for PAM-4 signaling. With a CPD of 100 fF, 10- $\mu$ A input current, and device  $f_{T}$  of 200 GHz, the unequalized TIA has an optimal SNR at  $\omega_{\text{TIA}}$  of around 0.5× baud rate. The SNR is limited by ISI in the narrow band region, and deteriorates in the wide band region due to circuit noise. Since the ISI can be eliminated by incorporating an equalizer, the SNR can be further improved by exploring the design space in the narrow band region. Narrow band TIAs with equalizers, such as decision feedback equalizer (DFE) [1]-[4] or continuous time linear equalizer (CTLE) [13], have been proposed to demonstrate improved receiver sensitivity. As the CTLE restores the TIA bandwidth by providing high frequency gain peaking, it also boosts the high frequency noise at the same time. Meanwhile, if the CTLE frequency response is not perfectly matched with that of the narrow band TIA, an in band gain peaking becomes unavoidable. The corresponding group delay variations will also degrade the data jitter performance. Thus bandwidth extension with CTLE should be carefully managed to maintain the fidelity of input signal.



FIGURE 8. N-tap DFE equalized TIA SNR at different TIA bandwidth-baud rate ratios.



FIGURE 9. The optical RX architecture.



FIGURE 10. Schematic of the proposed AFE.

Another commonly used technique to remove post cursor ISI without amplifying the high frequency noise is by means of DFE. For an m-tap DFE, the VEO<sub>DFE</sub> can be approximated by

$$VEO_{DFE} = V_{h,0} - \sum_{n < 0} |V_{h,n}| - \sum_{n > m} |V_{h,n}|$$
(15)

where  $|V_{h,n}|$  for n < 0 and n > m represents the residual ISI. Fig. 8 illustrates TIA bandwidth versus SNR when incorporating DFE with different number of taps. By incorporating a single-tap DFE, the TIA bandwidth can be reduced to  $0.3 \times$  baud rate while improving the SNR by approximately 1.5 dB compared to that of a receiver with  $\omega_{TIA} \sim 0.5 \times$  baud rate. The additional gain in SNR becomes minor (~1dB) when the TIA bandwidth is further reduced to  $0.15 \times$  baud rate, but a 3-tap DFE is required. It becomes less energy efficient considering the extra power consumed by the DFE. In this design, a 2-tap DFE is embedded to tolerate TIA bandwidth variation associated with parasitic capacitance.



FIGURE 11. Analog front-end simulation results (a) Frequency response. (b) Group delay.

#### A. PAM-4 RECEIVER ARCHITECTURE

Fig. 9 illustrates the receiver architecture, which is composed of a TIA and a VGA followed by a quarter rate time-interleaved DFE. To minimize the input referred noise of the receiver, the receiver bandwidth is designed to be  $0.35 \times$  baud rate, while the induced ISI is compensated by a 2 tap DFE. The average of photocurrent and input referred offset voltage of the analog front-end (AFE) are subtracted by the offset cancellation network (OCN).

As the nonlinearities due to the E/O, O/E conversions and the front-end circuitries will lead to an uneven eye height at the receiver side, the threshold voltages of the data slicers for PAM-4 demodulation are critical to the overall BER performance. To tackle this problem, an automatic threshold tracking circuit (ATC) is adopted. It can extract the threshold levels of the PAM-4 signals based on the measured eye height, and thus greatly improve the BER performance given the nonlinearities of the signal chain. Both the ATC and DFE share the same adaptation engine based on SS-LMS algorithm, which is also integrated on the same chip.

The quarter-rate sampling clocks are generated from an external half-rate input clock. The input clock is firstly divided by a current-mode logic (CML) divider to the quarter-rate, and the divider output is converted to CMOS levels by CML-to-CMOS level converter.

#### **B. RECEIVER FRONT-END**

Fig. 10 shows the circuit schematic of the TIA. A pseudodifferential architecture is adopted to improve common-mode noise suppression. The core amplifier is composed of a threestage cascaded amplifier with active feedback to provide sufficient gain and bandwidth. While the conversion gain is determined by the feedback resistor, the dual feedback TIA is capable of providing 2 k $\Omega$  conversion gain with a 3 dB bandwidth of 4.8 GHz. The phase margin of the nested-feedback TIA is about 77°.

The TIA output is enlarged by a 3-stage cascaded variable gain amplifier (VGA), which are source-degenerated to improve circuit linearity. In order to drive the 4-path, quarter rate DFE, both inductive shunt-peaking and capacitive zero peaking are adopted to compensate the heavy capacitive load associated with the following stage.

Fig. 11 shows the simulation results of the AFE. The input-referred noise current is about 1.29  $\mu$ A<sub>rms</sub>. The overall conversion gain is 73.7 dB $\Omega$ , and the bandwidth is 5.6 GHz.



FIGURE 12. (a) PAM-4 2-tap DFE and (b) its timing diagram and waveform.

The output swing of the PAM-4 signal is 700 mV<sub>pp</sub>. Since the VGA extends the signal bandwidth by 1.17 times, the output SNR of the AFE will be decreased by about 0.7 dB. It is manageable by the decision circuit.

#### C. QUARTER RATE DFE

Fig. 12(a) shows the quarter rate DFE architecture. Each sampled path is buffered by a transconductance amplifier (GM) to isolate kick back noise from the quarter rate DFEs. A detailed timing diagram of the DFE receiver is shown in Fig. 12(b). The PAM-4 data is demodulated through time-interleaved 2 bit flash ADCs. The flash ADC outputs are utilized to modulate 3-bit current DACs and feed them into the summer to perform DFE function. The weighting factor of the DFE ( $W_1$  and  $W_2$ ) and threshold voltages ( $V_{TH}$ ) of the PAM-4 data slicers are automatically adjusted based on an on-chip SS-LMS engine. To circumvent the speed bottleneck of the critical path, speculative error quantizers are also implemented.

Fig. 13 shows the circuit schematic of the summer in the quarter rate DFE. In order to maintain the signal integrity of the PAM-4 input signal, a differential amplifier with source degeneration and zero peaking is utilized as the input stage. The feedback signals are connected to the summer through a Gilbert-cell multiplier ( $M_{3\sim6}$ ), whose gain coefficient is controlled by W<sub>1</sub>. Compared to the prior art in [4], [14], it maintains a constant output common mode voltage level of the summer irrespective of its weighting coefficients, which is crucial to the proper operation of the succeeding data slicer.



FIGURE 13. Schematic of the summer.



FIGURE 14. Schematic of the comparator and its delay.

The data slicer for the PAM-4 demodulator is based on a StrongArm latch, as is shown in Fig. 14. It provides rail-to-rail output swing without consuming DC current. The differential-difference preamplifier of the 2-bit flash ADC can reduce the kick-back noise generated from the StrongArm latches. The time delay of comparators is less than 0.5 UI, given that the input signal swing is larger than 40 mV. The input referred offset voltage and noise of the comparator are about 14.1 mV<sub>rms</sub> and 1.73 mV<sub>rms</sub>, respectively.

#### D. ADAPTATION SCHEME

As is shown in Fig. 15 (a), the PAM-4 signal levels are represented as  $(V_{LV3}, V_{LV1}, -V_{LV1}, -V_{LV3})$ , while the thresholds in between are defined as  $(V_Z \text{ and } \pm V_{TH})$ . In the DFE receiver, the error signals generated from the data slicers determine the adaptation of the coefficients of the equalizer as well as the phase adjustments of the clock and data recovery circuit. As only 2-bit ADCs are adopted in the PAM-4 demodulator, the linearity of the receiver front-end and a proper setup of the threshold levels  $(V_Z \text{ and } \pm V_{TH})$ are critical to the overall system performance. Reference [5] detected the peak input signal swing,  $V_{peak}$ , and then generated the threshold levels based on an equal eye-height assumption. But it can hardly hold considering the circuit



FIGURE 15. (a) PAM-4 signal level (b) Level tracking of [5] (c) Proposed ATC [15].



FIGURE 16. Adaptation scheme.



FIGURE 17. (a)Improper initialization of ATC (b) Proper initialization of ATC.



FIGURE 18. Numerical analysis of THD vs BER.

nonlinearity and ISI effects on the input signal, as is shown in Fig. 15(b).

To overcome the design challenge, an automatic tracking circuit for non-evenly spaced threshold levels are proposed and adopted in this design [15]. As shown in Fig. 15 (c), to properly setup the thresholds of the slicers, input signals at different levels ( $V_{LV3}$ ,  $V_{LV1}$ ,  $-V_{LV1}$ ,  $-V_{LV3}$ ) are tracking independently through a pattern filter, and the threshold voltages ( $V_{TH}$ ,  $-V_{TH}$ .) of the data slicers are setup right in the middle through the averaging of ( $V_{LV3}$ ,  $V_{LV1}$ ) and ( $-V_{LV1}$ ,  $-V_{LV3}$ ).



FIGURE 19. (a) Transmitter and (b) receiver micrograph of chip and optical component assembly.

As shown in Fig. 16, the level tracking of  $(V_{LV3}, V_{LV1}, -V_{LV1}, -V_{LV1}, -V_{LV3})$  share the same engine of DFE based on SS-LMS algorithm. The automatic threshold tracking circuit and weighting coefficient update for the DFE can be represented as

$$V_{\rm LV3}[n+1] = V_{\rm LV3}[n] + e_3[n]$$
(16)

$$V_{LV1}[n+1] = V_{LV1}[n] + e_1[n]$$
(17)

$$V_{\rm TH} = (V_{\rm LV3} + V_{\rm LV1})/2 \tag{18}$$

$$W_k[n+1] = W_k[n] + e[n]D_Z[n-i]$$
 (19)

where n is the time instant, k is the tap index,  $D_Z[n]$  is the received data and e[n] is the difference between the received signal level compared to that of the desired data level. To circumvent the speed bottlenecks in critical paths, speculative error quantizers are also implemented in this design.

To track  $(V_{LV3}, V_{LV1})$  and  $(-V_{LV1}, -V_{LV3})$  separately relies on the operation of a pattern filter. An incorrect initialization of  $V_{LV3}[0]$  and  $V_{LV1}[0]$  may cause the threshold level stuck at an improver level [15], as is shown in Fig. 17(a). This issue can be circumvented by detecting the probability distribution of the output data and setting  $V_{LV3}[0]$  to be higher than 2 times  $V_{LV1}$ , as is shown in Fig. 17 (b).

Assuming that the transfer function of the receiver frontend is modeled as  $V_{out} = V_{in} - \alpha V_{in}^3$ , input signal amplitude is 300 mV, the rms noise of data slicer and ISI is 14 mV, Fig. 18 illustrates the corresponding BER performance and improvement compared to the prior art under different THD conditions. It reveals that the BER can be improved by  $27 \times$ when the THD of the receiver front-end is -20 dB.

## **IV. MEASUREMENT RESULTS**

A PAM-4 32 Gb/s optical transceiver is implemented in TSMC 40 nm CMOS technology. To characterize its performance, the transmitter and receiver chips are wirebonded to a VCSEL and a PD on a PCB to test their performance. Fig. 19(a) and Fig. 19(b) respectively shows the photographs of the transmitter + VCSEL and receiver+PD chip-on board assembly. The TX operates at 1.2 V and 3.3 V dual supplies to power the laser diode. The RX operates under a 1.2 V supply voltage.

Fig. 20 shows the measured electrical eye diagrams of the PAM-4 transmitter at 20 Gb/s and 32 Gb/s operation,





FIGURE 20. Electrical output eye diagrams (a) 20 Gb/s and (b) 32 Gb/s.



FIGURE 21. Measurement setup for the transmitter.



FIGURE 22. optical output eye diagrams (a) 20 Gb/s and (b) 32 Gb/s.

TABLE 1. Power consumption of optical transmitter.

| Block                            | EQ   |        | LDD              |     |      |        |  |  |  |
|----------------------------------|------|--------|------------------|-----|------|--------|--|--|--|
|                                  | FFE  | Asy EQ | Current Combiner | ABT | ITC  | Driver |  |  |  |
| Power (mW)                       | 57.6 | 46.8   | 7.5              | 4.2 | 1.08 | 29.7   |  |  |  |
| Power* (mW)                      | 57.6 | 0      | 3                | 2.4 | 1.08 | 29.7   |  |  |  |
| Energy Efficiency = 4.59 pJ/bit  |      |        |                  |     |      |        |  |  |  |
| Energy Efficiency* = 2.93 pJ/bit |      |        |                  |     |      |        |  |  |  |

\* : w/o Asy. EQ

TABLE 2. Performance summary of optical transmitter.

|                               | This work                                 | [16]                | [17]                |
|-------------------------------|-------------------------------------------|---------------------|---------------------|
| Technology (nm)               | 40 CMOS                                   | 65 CMOS             | 130 SiGe            |
| Data Rate (Gb/s)              | 32                                        | 50                  | 40                  |
| Supply (V)                    | 1.2/3.3                                   | 1.2/3.3             | 2.5/3.85            |
| Data Type                     | PRBS7                                     | PRBS15              | PRBS7               |
| Data Format                   | PAM-4                                     | PAM-4               | PAM-4               |
| Equalization                  | 3-tap linear FFE,<br>Asy. EQ,<br>ABT, ITC | 2.5-tap Asy.<br>FFE | 2-tap linear<br>FFE |
| Outer OMA (dBm)               | 1.68                                      | 3                   | -3.8                |
| Outer ER (dB)                 | 3.34                                      | N/A                 | N/A                 |
| Power (mW)                    | 146.8                                     | 256                 | 376                 |
| Energy Efficiency<br>(pJ/bit) | 4.59                                      | 5.12                | 9.4                 |
| Area (mm <sup>2</sup> )       | 0.029                                     | 0.31                | N/A                 |

respectively. The electrical eye is clearly open when FFE is activated. As only VCSEL diode for NRZ operation is available during our experimental setup, it provides an efficiency of 0.5 W/A with 350 fF parasitic capacitance. The optical



FIGURE 23. Measurement setup for the receiver.



FIGURE 24. Measured input and output eye diagrams of receiver.

test setup is shown in Fig. 21. The input clock is generated from Keysight M8195A and optical output signal is coupled by optical probe with OM4 MMF to Keysight N1092C sampling oscilloscope. Fig. 22 shows the measured optical eye diagram at 20 Gb/s and 32 Gb/s operation respectively. The modulation current and bias current are about 5 mA and 4 mA respectively at 32 Gb/s operation. The optical eye is degraded due to the limited linear region of the available NRZ VCSEL.

Table 1 summaries the power breakdown of the optical transmitter. The corresponding energy efficiency of the transmitter is about 4.59 pJ/bit (2.93 pJ/bit FFE+Driver only). Table 2 summaries the performance benchmark of the optical transmitter. Compared to the prior art, the transmitter provides multiple signal conditioning functions (asymmetric waveform shaping, FFE, ABT) with a decent energy efficiency at 32 Gb/s PAM-4 operation. The power saving can be more pronounced by applying the ABT technique to a DFB laser driver where the modulation current is even higher.

The optical test setup of the receiver is shown in Fig. 23. The PD provides a responsivity of 0.8 A/W and input capacitance of 100 fF. The light source is coupled to the PD through a fiber for BER test. The light source with a 9.8 dB extinction ratio is generated from Keysight 81492A optical transmitter and is driven by Anritsu MP1900A pattern generator. The optical power level is adjusted by setup an internal optical attenuator for input sensitivity test. The output eye diagram is characterized using Agilent 86100C sampling oscilloscope. The bit-error rate (BER) performance is measured using Anritsu MP1900A error detector. By applying 32 Gb/s PAM-4 input signals, the measured eye diagrams at the quarter rate DFE output for both the MSB and LSB are shown in Fig. 24. The input sensitivity vs. BER performance is summarized in Fig. 25. A BER of less than  $10^{-12}$  is achieved with an input sensitivity of -4.8 dBm at 32 Gb/s. At a lower input power level (-5.5 dBm), the DFE can improve the BER by orders of magnitudes due to the limited SNR. The total receiver power consumption at 32 Gb/s is 128.8 mW, which translates to a power efficiency of 4.03 pJ/bit.



FIGURE 25. Measured input sensitivity.

TABLE 3. Performance summary of optical receiver.

|                            | This work | [2]        | [3]          |
|----------------------------|-----------|------------|--------------|
| Technology (pm)            | 40.0408   | 14 Fin-FET | 14 Fin-FET   |
| rechnology (nm)            | 40 CMOS   | CMOS       | CMOS         |
| Data Rate (Gb/s)           | 32        | 32         | 32           |
| Supply (V)                 | 1.2       | 1/0.9      | 1.05/0.9/0.8 |
| Data Type                  | PRBS7     | PRBS7      | PRBS31       |
| Data Format                | PAM-4     | NRZ        | NRZ          |
| Equalization               | 2-tap DFE | 1-tap DFE  | 1-tap DFE    |
| Optical Sensitivity (dBm)  | -4.8      | -13        | -12.4        |
| PD Capacitance (fF)        | 100       | 69         | 69           |
| Power (mW)                 | 128.8     | 58.24*     | 45           |
| Energy Efficiency (pJ/bit) | 4.03      | 1.82       | 1.41         |
| Area (mm²)                 | 0.23      | 0.028      | 0.046        |

 $^{\star}$  : Estimate from reported power breakdown, assume digital power scales linearity with frequency

Table 3 summaries the performance benchmark of the receiver at similar data rates. Reference [2], [3] are NRZ optical receivers and implemented in more advanced technologies with better PDs. Considering the 9.5 dB eye shrink in PAM-4 modulation, the proposed receiver demonstrates a comparable equivalent optical sensitivity. Compared to PAM-4 electrical receivers whose energy efficiency are typically around 10 pJ/bit [18]–5.19 pJ/bit [19], the proposed optical receiver manifests a better energy efficiency with a superior distant reach capability.

## **V. CONCLUSION**

This article describes the design of a 32 Gb/s optical PAM-4 transceiver. The transmitter consists of signal conditioning circuits and active back termination (ABT) for current mode laser driver. The ABT can greatly save the power consumption of the modulation current, which is critical for data intensive optical links. The receiver combines a narrow band receiver front-end with a 2-tap DFE in order to minimize the input referred noise. As only 2-bit data slicers are adopted for the PAM-4 modulator, an automatic threshold tracking (ATC) circuit is utilized to tackle the nonlinearities due to the E/O, O/E conversions along the signal path. It can reduce the BER by  $27 \times$  without resorting to multi-bit ADCs.

## ACKNOWLEDGMENT

The authors would like to thank TSMC University Shuttle Program for the chip fabrication.

## REFERENCES

- S.-H. Huang and W.-Z. Chen, "A 25 Gb/s 1.13 pJ/b 10.8 dBm input sensitivity optical receiver in 40 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 52, no. 3, pp. 747–756, Mar. 2017.
- [2] I. Ozkaya et al., "A 64 Gb/s 1.4 pJ/b NRZ optical receiver data-path in 14 nm CMOS FinFET," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3458–3473, Dec. 2017.
- [3] J. E. Proesel et al., "A 32 Gb/s, 4.7 pJ/bit optical link with -11.7 dBm sensitivity in 14-nm FinFET CMOS," *IEEE J. Solid-State Circuits*, vol. 53, no. 4, pp. 1214–1226, Apr. 2018.
- [4] M. G. Ahmed *et al.*, "A 12-Gb/s –16.8-dBm OMA sensitivity 23mW optical receiver in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 53, no. 2, pp. 445–457, Feb. 2018.
- [5] P.-J. Peng, J.-F. Li, L.-Y. Chen, and J. Lee, "6.1 A 56Gb/s PAM-4/NRZ transceiver in 40nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2017, pp. 110–111.
- [6] L. A. Coldren and S. W. Corzine, Diode Lasers and Photonic Integrated Circuits. New York, NY, USA: Wiley, 1995.
- [7] Y. Gao, J. C. Cartledge, S. S.-H. Yam, A. Rezania, and Y. Matsui "112 Gb/s PAM-4 using a directly modulated laser with linear precompensation and nonlinear post-compensation," in *Proc. 42nd Eur. Conf. Opt. Commun.*, Sep. 2016, pp. 121–123.
- [8] U. Hecht et al., "Non-linear PAM-4 VCSEL equalization and 22 nm SOI CMOS DAC for 112 Gbit/s data transmission," in Proc. Microw. Conf. (GeMiC), Mar. 2019, pp. 115–118.
- [9] M. Raj, M. Monge, and A. Emami, "A modelling and nonlinear equalization technique for a 20 Gb/s 0.77 pJ/b VCSEL transmitter in 32 nm SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 51, no. 8, pp. 1734–1743, Aug. 2016.
- [10] H. Ransijn, G. Salvador, D. D. Daugherty, and K. D Gaynor, "A 10-Gb/s laser/modulator driver IC with a dual-mode actively matched output buffer," *IEEE J. Solid-State Circuits*, vol. 36, no. 9, pp. 1314–1320, Sep. 2001.
- [11] E. Säckinger, Broadband Circuits for Optical Fiber Communication. New York, NY, USA: Wiley, 2005.
- [12] S.-H. Huang, W.-Z. Chen, Y.-W. Chang, and Y.-T. Huang, "A 10-Gb/s OEIC with meshed spatially-modulated photo detector in 0.18μm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 46, no. 5, pp. 1158–1169, May 2011.
- [13] D. Li et al., "A low-noise design technique for high-speed CMOS optical receivers," *IEEE J. Solid-State Circuits*, vol. 49, no. 6, pp. 1437–1447, Jun. 2014.
- [14] A. Roshan-Zamir, O. Elhadidy, H.-W. Yang, and S. Palermo, "A reconfigurable 16/32 Gb/s dual-mode NRZ/PAM4 SerDes in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 52, no. 9, pp. 2430–2447, Sep. 2017.
- [15] C.-T. Hung, Y.-P. Huang, and W.-Z. Chen, "A 40 Gb/s PAM-4 receiver with 2-Tap DFE based on automatically non-even level tracking," in *Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC)*, Nov. 2018, pp. 213–214.
- [16] A. Tyagi et al., "A 50 Gb/s PAM-4 VCSEL transmitter with 2.5-tap nonlinear equalization in 65-nm CMOS," *IEEE Photon. Technol. Lett.*, vol. 30, no. 13, pp. 1246–1249, Jul. 2018.
- [17] W. Soenen *et al.*, "40 Gb/s PAM-4 transmitter IC for longwavelength VCSEL links," *IEEE Photon. Technol. Lett.*, vol. 27, no. 4, pp. 344–347, Feb. 2015.
- [18] D. Cui et al., "A 320mW 32Gb/s 8b ADC-based PAM-4 analog front-end with programmable gain control and analog peaking in 28nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2016, pp. 58–59.
- [19] S. Kiran, S. Cai, Y. Luo, S. Hoyos, and S. Palermo, "A 32 Gb/s ADC-based PAM-4 receiver with 2-bit/stage SAR ADC and partiallyunrolled DFE," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Apr. 2018, pp. 1–4.