# A 22-Gb/s Time-Interleaved Low-Power Optical Receiver With a Two-Bit Integrating Front End

Bahaa Radi<sup>®</sup>, Graduate Student Member, IEEE,

Mohammadreza Sanadgol Nezami<sup>®</sup>, Graduate Student Member, IEEE,

Mohammad Taherzadeh-Sani<sup>®</sup>, Frederic Nabki<sup>®</sup>, *Member, IEEE*, Michaël Ménard<sup>®</sup>, *Member, IEEE*, and Odile Liboiron-Ladouceur<sup>®</sup>, *Senior Member, IEEE* 

Abstract-This article presents the implementation of a novel 22-Gb/s energy-efficient optoelectronic receiver architecture in 65-nm CMOS for short-reach optical communication. The receiver incorporates four sub receivers with a two-bit integrating resettable front-end in each sub receiver. The inputs to two of the four sub receivers are optically delayed by one bit and two complementary quarter-rate clock phases are used to completely recover the data. The two-bit integrating low-bandwidth front end replaces the full-bandwidth transimpedance amplifier used in conventional optoelectronic receivers, resulting in improved energy efficiency. The low-bandwidth operation is enabled by using a capacitor at the input and by amplifying the two-bit integrated voltage with low-bandwidth voltage gain stages that require a bandwidth of only 35% of the operating data rate. The receiver performs a 1:4 demultiplexing operation by only using two quarter-rate clock phases instead of the four phases that are conventionally used in a quarter-rate clocking system. This clocking scheme reduces complexity while maintaining the same timing margin of the quarter-rate systems. This two-clock phase system is enabled by optical delay lines and splitters. The receiver is experimentally validated with a 1550-nm photodetector array wire bonded to the four inputs. The electronic part of the receiver achieves error-free transmission (BER <  $10^{-12}$ ) at 22 Gb/s with an energy efficiency of 1.43 pJ/bit and an average sensitivity of -7.8 dBm (or -6.2 dBm optically modulated amplitude) with a 1.09-V supply.

*Index Terms*—Demultiplexing, integrating-type receiver, low-power electronics, optical interconnects, optical receiver.

## I. INTRODUCTION

WITH the explosive growth in data demand caused by streaming and cloud services, the multiplication of large data centers, reducing the power consumption and the

Manuscript received December 1, 2019; revised May 13, 2020 and July 16, 2020; accepted September 11, 2020. Date of publication September 29, 2020; date of current version December 24, 2020. This work was supported by the Natural Sciences and Engineering Research Council of Canada through the Idea to Innovation Grants Program. This article was approved by Associate Editor Azita Emami. (*Corresponding author: Bahaa Radi.*)

Bahaa Radi, Mohammadreza Sanadgol Nezami, and Odile Liboiron-Ladouceur are with the Department of Electrical and Computer Engineering, McGill University, Montreal, QC H3A 0E9, Canada (e-mail: bahaa.radi@mail.mcgill.ca).

Mohammad Taherzadeh-Sani is with the Department of Electrical Engineering, Ferdowsi University of Mashhad, Mashhad 9177948974, Iran.

Frederic Nabki is with the Department of Electrical and Computer Engineering, École de technologie supérieure, Montreal, QC H3C 1K3, Canada. Michaël Ménard is with the Department of Computer Science, Université

du Québec à Montréal, Montreal, QC H3C 3P8, Canada.

Color versions of one or more of the figures in this article are available online at https://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2020.3025051

cost of short-reach optical transceivers while increasing their speed and scalability, has become critical. As CMOS technology scaling is becoming more advanced, a larger number of transistors can be placed in a given area. One challenge in CMOS scaling is the analog front end on the receiver side, where, conventionally, a transimpedance amplifier (TIA) is used to convert the photocurrent into a voltage while providing low input impedance to the photodetector (PD). Conventional TIAs are bulky, power hungry, and do not scale well with technology. This is because, at higher speeds, a high-gain core amplifier (or a multistage amplifier) is needed, leading to increased power consumption and resulting in TIAs with large size. Consequently, there has been a recent interest in developing optical receivers that do not require conventional TIAs but instead use low-bandwidth techniques [1]-[13]. Those low-bandwidth receivers can be divided into three categories: integrating front-end receivers [1]–[5], resettable receivers [6]-[10], and decision feedback equalizer (DFE)based receivers [11]-[13].

In integrating front-end receivers, a capacitive front end is used to integrate the photocurrent, and a decision is made based on the value of the integrated voltage. The receiver by Palermo et al. [1] employs a double sampling technique in which the integrated voltage difference is used to resolve the value of the bit. This approach suffers from consecutive identical digits (CID) induced issues that cause the voltage difference to decrease when identical bits are received. The receiver by Nazari and Emami-Nevestanak [2] mitigates the CID issue by introducing a dynamic offset modulation (DOM) circuit. However, charge sharing between the sampling capacitors and the input capacitance degrades the sensitivity of the receiver. Saeedi and Emami [3] resolved the issue of charge sharing by introducing a low-bandwidth TIA at the input of the chip, decoupling the sampling capacitor from the input capacitance and thus improving sensitivity. The same group [4] employed advanced packaging techniques to reduce parasitic capacitance at the input, leading to further improvements in sensitivity. The second receiver category is resettable receivers employing a reset to discharge the capacitor before integrating the next bit [6]–[10]. This technique resolves the issues associated with CID at the cost of stricter timing requirements and an incomplete bit integration in [6]-[9], leading to degraded sensitivity. The receiver in [10] addressed the incomplete integration period by interleaving four data paths

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/



Fig. 1. Integrating front-end receiver. (a) Simplified receiver architecture with the four clock phases  $\Phi_1$  to  $\Phi_4$ . (b) Voltage at the input of the sampling circuit when the sequence 1110 is received.  $\Delta v_x$  (x = 1, 2, 3, 4) is the voltage difference between two consecutive samples. (c) Basic operation of the DOM in the receiver to compensate for CID. The red arrows indicate the offset generated by the DOM circuit to compensate the  $\Delta v$  shown in part b and clamp the voltage difference to  $\pm \Delta v_{max}/2$ .

but requires a wideband input stage, a common-mode feedback (CMFB) circuit, and four clock phases for proper operation. The third approach in [11]–[13] uses DFE or speculative DFE to compensate for bandwidth reduction at the input. These approaches have either a critical timing requirement for the feedback or increased complexity with the number of taps in the speculative DFE implementation.

In this work, a resettable two-bit integrating front-end receiver is demonstrated in order to resolve issues associated with CID and charge sharing present in integrating front-end receivers. The proposed architecture also relaxes the timing requirements of the reset signal in resettable receivers, and requires, as a result of the use of optically interleaved inputs, only two quarter data rate clock phases (provided externally in this work). Thus, there is no need for complex circuits to correct duty cycle and phase, which are critical for quarter-rate operation at high-speeds relying on quadrature clock generation [14]. Therefore, the proposed quarter clocking scheme is more energy efficient and has a wider time margin compared with full-rate and half-rate clocking schemes [15].

This article is organized as follows: in Section II, integrating type front-end receivers and resettable receivers' architectures and their limitations are discussed. Section III details the proposed time-interleaved optical receiver with a two-bit integrating front end. More specifically, the receiver architecture, operation, analysis of the front end, noise analysis, and transistor implementation are presented. Section IV discusses the experimental validation of the receiver. Finally, Section V summarizes the work and compares it to other published receivers.

## II. LOW-BANDWIDTH RECEIVER ARCHITECTURE

## A. Integrating Receiver Front-End

The front end of the integrating receiver is shown in Fig. 1(a). The junction capacitance of the PD and input capacitance,  $C_{IN}$ , and a resistor, R, form a low-frequency pole at the input that integrates the photocurrent into a voltage signal. The voltage signal is then sampled every unit interval (UI) using four clock phases and sampling capacitors,  $C_S$ , as shown in Fig. 1(b). The voltage difference,  $\Delta v_x$  (x =1, 2, 3, 4), between samples is used to resolve the bit. If  $\Delta v_x >$ 0, the bit is determined to be a binary "1" and is considered as a binary "0" for  $\Delta v_x \leq 0$ . Assuming that the capacitor is fully discharged at the beginning of the process (i.e., t = 0),  $\Delta v_1$ ,  $\Delta v_2$ , and  $\Delta v_3$  can be written as the following for a sequence of three consecutive binary ones (i.e., 111).

$$\Delta v_1 = \mathrm{RI}_{\mathrm{pd}} \left( 1 - e^{-\frac{T_b}{\mathrm{RC}_{\mathrm{IN}}}} \right) \tag{1}$$

$$\Delta v_2 = \mathrm{RI}_{\mathrm{pd}} \left( 1 - e^{-\frac{T_b}{\mathrm{RC}_{\mathrm{IN}}}} \right) \cdot e^{-\frac{T_b}{\mathrm{RC}_{\mathrm{IN}}}} = \Delta v_1 \cdot e^{-\frac{T_b}{\mathrm{RC}_{\mathrm{IN}}}}$$
(2)

$$\Delta v_3 = \Delta v_2 \cdot e^{-\frac{I_b}{RC_{\rm IN}}} \tag{3}$$

where  $I_{pd}$  is the peak PD current and  $T_b$  is the bit period. Note that  $\pm \Delta v_1$  is the largest possible difference between the two samples (i.e.,  $\Delta v_{max}$ ) when a binary 1 is received when the capacitor is discharged. The voltage difference becomes smaller as more identical bits are received, challenging the receiver as the comparator will need to make a decision based on this smaller voltage difference. It is possible to mitigate this issue by introducing a DOM [2] circuit. The DOM modifies the sense amplifier offset such that the inputs of the comparator are maintained to a constant voltage difference, as shown in Fig. 1(c). The offset is indicated by the red arrows in Fig. 1(c). It can be shown that the voltage difference when DOM is employed is [2]

$$\Delta v_{\text{DOM}} = \frac{1}{2} \text{RI}_{\text{pd}} \left( 1 - e^{\frac{-Tb}{\text{RC}_{\text{IN}}}} \right) = \frac{1}{2} \Delta v_{\text{max}}.$$
 (4)

The achievable  $\Delta v_{\text{DOM}}$  is half of the maximum possible voltage difference,  $\Delta v_{\text{max}}$ . Thus, the comparators need to be able to resolve this reduced voltage difference at all times.

Charge sharing at the input is an issue in integrating frontend receivers. The total charge is shared between  $C_{IN}$  and four of the eight sampling capacitors,  $C_S$ . This degrades the receiver sensitivity. A photodiode with a junction (input) capacitance larger than the sampling capacitance can be used to mitigate this. This way, most of the charge is stored in the junction capacitance for subsequent sampling as expressed by

$$Q_{\rm IN} = \frac{C_{\rm IN}}{C_{\rm IN} + 4C_s} Q_{\rm total} \tag{5}$$

where  $Q_{IN}$  is the charge stored in the input capacitance,  $C_{IN}$ , and  $Q_s$  is the charge stored in the sampling capacitors,  $C_s$ .  $Q_{total}$  is the total charge at the input. Based on (5), there is a minimum required size for the photodiode for



Fig. 2. (a) Resettable receiver architecture operation and timing diagram showing the integration for 0.5 UI and reset for 0.5 UI. (b) Current-amplifier-based receiver architecture showing two interleaved paths and sampling using two phases ( $\Phi$  and  $\bar{\Phi}$ ) and a delayed version of the two phases ( $\Phi_d$  and  $\bar{\Phi}_d$ ). (c) Timing and operation of the current-amplifier-based receiver showing the reset (0.25 UI), sample (0.75 UI), and hold phases (1 UI). (d) Integrate-and-dump receiver showing four interleaved paths. It utilizes four clock phases ( $\Phi_1$ ,  $\Phi_2$ ,  $\bar{\Phi}_1$ , and  $\bar{\Phi}_2$ ). (e) Timing and operation of the integrate-and-dump receiver showing the reset, integrate, and hold.

proper operation. However, the signal-to-noise ratio (SNR) is inversely proportional to the size of the junction capacitance, and a large junction capacitance degrades the sensitivity of the receiver. The SNR is approximated by

$$\sqrt{\text{SNR}} \approx \left( I_{\text{pd}} T_b / C_{\text{IN}} \right) / \left( \sqrt{\text{kT} / C_{\text{IN}}} \right) = \frac{I_{\text{pd}} T_b}{\sqrt{C_{\text{IN}} \text{kT}}}$$
 (6)

where k is the Boltzmann constant, T is the temperature, and  $I_{pd}$  is the peak photocurrent. A solution to the charge sharing issue is to use a low-bandwidth TIA that decouples the junction capacitance from the sampling capacitance [3]. However, this requires an additional circuit at the input of the receiver.

The third challenge for this approach is the possible overloading of the integrator with long-running CIDs. A possible solution is adding a shunt resistor at the input to prevent this overloading at the cost of thermal noise injection at the input.

## B. Resettable Receiver, Current-Amplifier-Based Receivers, and Integrate-and-Dump Receiver

Resettable front-end receivers [6] and current-amplifierbased optical receivers resolve the processing issue related to CID and charge sharing. These design approaches also mitigate the potential issue of overloading of the integrator present in integrating front-end receivers by periodically resetting the input capacitance. The operation of a resettable receiver is shown in Fig. 2(a). This implementation uses a full-rate clock letting the input capacitor charge for 0.5 UI and then discharges for 0.5 UI. Only half of the maximum charge is stored across the capacitor affecting sensitivity. This implementation requires fast sample and hold and slicer circuits to sample and resolve the half-integrated bit before the capacitor is reset.

Current-amplifier-based optical receivers, shown in Fig. 2(b), alleviate these issues by introducing a dual-path current amplifier [7]–[9]. The cycle of operation lasts two UIs, as shown in Fig. 2(c). This allows for more time for the latch to regenerate the output. Moreover, this type of receiver improves the integration time by allocating 0.75 UI for bit integration time instead of 0.5 UI. Only 25% of the bit charge is lost due to the 0.25 UI reset pulse. The duration of the reset pulse is 10 ps at 25 Gb/s requiring careful design and proper phase alignment. Longer reset pulses degrade sensitivity, while shorter ones are more difficult to achieve and may result in an excess residual charge. Moreover, process variations can adversely impact such short pulses.

To address the incomplete integration period, the integrateand-dump receiver, shown in Fig. 2(d), was proposed in [10].  $\omega = RC$ 

Φ

Ð

 $\overline{\mathbf{\Phi}}$ 

Φ

In 1

In 2

In 3

In 4

 $D_1 = 0$ 

Integrate

 $D_2$ 

Rese



4Tb<sup>t[Tb]</sup> 4Tb t[Tb] 4Tb t[Tb] 3Tb 2Th (d) Fig. 3. Proposed receiver architecture and operation. (a) Block diagram of the four sub receivers and connection to the optical blocks with the delayed scheme used. Inset: Single-pole PD model [7]. (b) Timing diagram showing the operation of the receiver and the two phases of operation. (c) Voltage integration  $(\Delta v)$  at the front end for all possible input values when the bandwidth of the PD is higher than 0.7 of the data rate. The rightmost part shows an overlay of all  $\Delta v$  possibilities. (d)  $\Delta v$  when the bandwidth of the PD is lower than 0.7 of the data rate.

3Tb

Integrate

 $D_2 = 1$ 

 $D_1 = 0$ 

The receiver has a wideband current amplifier at the input followed by four low-bandwidth transimpedance amplifiers, one in each of the four data paths. The four data paths are time interleaved and have four phases of operation, shown in Fig. 2(e), described next for one of the data paths. The first phase is the internal reset that begins when  $\Phi_1 = 1$  and  $\Phi_2 = 0$ . In this phase, the input and the output of the amplifier are connected through a switch resetting those nodes. The next phase is the external reset phase when both switches are high. The integrated phase is next and starts when  $\Phi_1 = 0$  and  $\Phi_2 = 1$ . In this phase, the current from the current amplifier is integrated. Finally, in the hold phase when  $\Phi_1 = 0$  and  $\Phi_2 = 0$ , the integrated voltage is held for sampling by the latch.

Integrate

зть

2Tb

 $\Delta v_{10}$ 

This approach addresses the short reset pulse issue of the current-based-amplifier but requires wideband input stages and four clock phases to achieve a 1:4 demultiplexing operation. The receiver also requires the CMFB circuit to ensure that the input and the outputs of the low bandwidth TIAs are properly reset. Moreover, the hold period is 1 UI, which may limit the speed at high-data rates.

#### III. PROPOSED TIME-INTERLEAVED RECEIVER

Integrate

2 Tb ∆v

Δv

Av

3Tb

2Tb

 $= \varDelta v_{10} = \varDelta v_{11} - \varDelta v_{01}$ 

2 Tb

Th

#### A. Architecture and Operation

Integrate

 $D_2 =$ 

 $D_1 =$ 

Reset

The time-interleaved two-bit integrating receiver proposed in this work is shown in Fig. 3(a) along with the timing diagram of operation in Fig. 3(b). On the transmitter side, the bit pattern  $B = [B_1 B_2 B_3 \dots B_N]$  is precoded into the data pattern  $D = [D_1 D_2 D_3 \dots D_N]$  using the following relationship:

$$D_k = B_k \oplus D_{k-1}. \tag{7}$$

This precoding is derived from the five-level polybinary signaling for spectral efficient data links and adapted here for two-level signaling [16]. On the receiver side, bit pattern B can be recovered from received data pattern D using the following equation:

$$B_k = (D_k + D_{k-1}) \text{modulo } 2 = D_k \oplus D_{k-1}.$$
 (8)

Thus, the decoder on the receiver side is simply an XOR logic gate. The benefit of employing this algorithm is that the bit pattern B is recovered from the received signal D without considering bits from previous operation cycles. This coding prevents error propagation between cycles.

The optical input signal is divided using a passive optical splitter and interleaved in time using optical delay lines. The inputs of sub receivers 2 and 3 are delayed by a one-bit period,  $T_b$ , relative to sub receivers 1 and 4. This passive operation of splitting and delaying the light can be performed using silicon photonic (SiP) technologies, as described in the Appendix. The light is then coupled to a PD array of four PDs, which are wire bonded to the four sub receivers. At the front end, to maximize the sensitivity of the receiver, the input capacitance,  $C_{IN}$ , is used as the integration capacitor without adding a capacitor on-chip. Only the top metal layer is used for the pads to reduce the input capacitance. The input capacitance is used simultaneously as both the integrating and sampling capacitor in order to resolve the charge sharing issue.

The operation of the receiver starts by integrating the photocurrent for two UIs over an initial fully discharged input capacitance. At the end of the integration phase, a switch is used to discharge this capacitor. The duration of the reset phase is two UIs. There are four possible waveforms over the integration period corresponding to the four possible combinations of the integrated bits: 00, 01, 10, and 11. These four waveforms are shown in Fig. 3(c) when the bandwidth of the PD is high and in Fig. 3(d) when the bandwidth if the PD is limited and well below the data rate. The resulting triangular overlay of all possibilities (i.e., eye diagram) at the input is shown on the rightmost side of both Fig. 3(c)and (d). This triangular waveform represents the symbols at the input of the front end. Since the symbol rate is at half the data rate, all of the following stages can halve their bandwidth requirements compared to full-bandwidth systems. Conventionally, an analog front end requires a bandwidth of at least 70% of the data rate. In the proposed low-bandwidth receiver, the bandwidth requirement can be relaxed down to 35% of the data rate. The two-bit symbol is amplified using two voltage gain stages. While there are four front-ends in the proposed receiver as opposed to one in a conventional receiver, the power consumption of the front ends remains similar to that of a single-front-end operating at full-rate. This is because the bandwidth required is halved. The first order voltage gain,  $A_V$ , of a single stage is given by

$$A_V = \frac{g_m \times R_d}{1 + s R_d C_L} \tag{9}$$

where  $g_m$  is the small-signal transconductance,  $R_d$  is the drain resistance, and  $C_L$  is the load capacitance. Meanwhile, the bandwidth,  $\omega_s$ , is given by

$$\omega_s = \frac{1}{2\pi R_d C_L}.$$
 (10)

For a given gain, if the bandwidth is halved from  $0.7 \times$  data rate to  $0.35 \times$  data rate, then  $R_d$  can be doubled and  $g_m$  can be halved. Since  $g_m \propto \sqrt{I_d}$ , the power is reduced to 1/4 of that of a conventional receiver in the  $0.35 \times$  data rate case compared with a full-rate front end. It should be noted, however, that this scaling reduces the overdrive voltage,  $V_{od} \approx 2I_d/g_m$ , of the amplifiers. This creates a power

consumption-linearity tradeoff. While the receiver in this demonstration still operates in the linear region, as described in Section III-D, care should be taken if this approach is scaled further to ensure that the amplifiers remain in the linear region of operation. It should also be noted that since the clock generation circuitry power consumption is expected to be lower in the proposed architecture due to the reduced number of clock phases required, this scaling can be beneficial in terms of power consumption compared with conventional front ends.

The integrated symbol is fed to two current-mode logic (CML) flip-flops consisting of two CML latches. Each latch is clocked with two complementary quarter-rate clocks ( $\Phi$ and  $\overline{\Phi}$ ), providing a 1:4 demultiplexing operation. Only two quarter-rate clock phases are needed as opposed to four in conventional receivers to carry out the 1:4 demultiplexing operation. This eliminates the need for duty cycle and quadrature detection/correction circuits found in quarter-rate clock generation circuits while still benefiting from the wide timing margin offered by the quarter-rate operation. The two outputs of the two CML flip-flops are then fed to two differential pairs used as CML-to-CMOS converters, followed by two D flipflops. Finally, the two outputs are fed to an XOR logic gate for decoding, as described by (8). The output of the XOR gate is then buffered to drive the measurement equipment, which represents a load  $R_L$  of 50  $\Omega$ .

## B. Analysis of the Integration

The front end of the receiver integrates two bits leading to four possible waveforms [Fig. 3(d)]. We derive the expression for these waveforms while considering the PD as a first-order single-pole low-pass filter with a bandwidth given by  $\omega$ . The model of the PD is shown in the inset of Fig. 3(a) [7]. The integrated voltage is given by

$$\Delta v = \frac{1}{C_P} \int_0^{2T_b} i_{\rm pd}(t) dt. \tag{11}$$

The photocurrent for the case of  $D_1D_2 = [00]$  is zero. For  $D_1D_2 = [01]$  the photocurrent is given by

$$i_{01}(t) = I_{pd} (1 - e^{-\omega(t - T_b)}), \quad t \in [T_b, 2T_b]$$
 (12)

where  $I_{pd}$  is the PD peak current. The expression for the photocurrent for the case of  $D_1D_2 = [10]$ 

$$i_{10}(t) = I_{pd} (1 - e^{-\omega t}), \quad t \in [0, T_b]$$
  

$$i_{10}(t) = I_{pd} (1 - e^{-\omega T_b}) e^{-\omega (t - T_b)}, \quad t \in [\text{Tb}, 2T_b]. \quad (13)$$

Finally, for the case of  $D_1D_2 = [11]$ , the photocurrent is given by

$$i_{11}(t) = I_{\rm pd} (1 - e^{-\omega t}), \quad t \in [0, 2T_b].$$
 (14)

Assuming an infinite extinction ratio, the average optical power,  $P_{\text{avg}}$ , is related to the peak current through the responsivity,  $R_{\text{pd}}$ , as

$$I_{\rm pd} = 2R_{\rm pd}P_{\rm avg}.$$
 (15)

At  $t = 2T_b$ ,  $\Delta v_{00} = 0$  for  $D_2D_1 = [00]$ . For  $D_1D_2 = [01]\Delta v_{01}$  is given by

$$\Delta v_{01} = \frac{2R_{\rm pd}P_{\rm avg}}{C_{\rm IN}} \bigg( T_b - \frac{1}{\omega} \big( 1 - e^{-\omega T_b} \big) \bigg). \tag{16}$$

For  $D_1D_2 = [10]$ ,  $\Delta v_{10}$  is given by

$$\Delta v_{10} = \frac{2R_{\rm pd}P_{\rm avg}}{C_{\rm IN}} \bigg( T_b - \frac{1}{\omega} \big( 1 - e^{-\omega T_b} \big) \times e^{-\omega T_b} \bigg). \quad (17)$$

This equation takes into account the exponential decay of the current when transitioning from one to zero during the second bit period.

Finally, for  $D_2D_1 = [11]$ ,  $\Delta v$  is

$$\Delta v_{11} = \frac{2R_{\rm pd}P_{\rm avg}}{C_{\rm IN}} \bigg( 2T_b - \frac{1}{\omega} \big( 1 - e^{-2\omega T_b} \big) \bigg).$$
(18)

From (15), a PD with high responsivity and small junction capacitance is desirable for optimal sensitivity. Moreover, the current is integrated over a full UI ( $T_b$ ) in (16) or two full UIs ( $2T_b$ ) in (17) and (18), as opposed to the 0.5  $T_b$  and 0.75  $T_b$  used in resettable receiver front ends and current-amplifierbased receivers, respectively. It can also be shown that  $\Delta v_{01}$  is equal to  $\Delta v_{11} - \Delta v_{10}$  from the three  $\Delta v$  equations, (16)–(18).

For the cases where the integration period is 0.75  $T_b$ ,  $\Delta v$  is

$$\Delta v_{0.75Tb} = \frac{2R_{\rm pd}P_{\rm avg}}{C_P} \left( 0.75T_b - \frac{1}{\omega} \left( 1 - e^{-0.75\omega T_b} \right) \right).$$
(19)

For a quantitative assessment of the improvement in  $\Delta v$ , the ratio of the  $\Delta v (\Delta v_{01} \text{ or } \Delta v_{11} - \Delta v_{10})$  of the proposed over  $v_{0.75Tb}$  is plotted versus different PD bandwidths (in terms of data rate) in Fig. 4. This ratio is given by

$$\frac{\Delta v_{01}}{\Delta v_{0.75T_b}} = \frac{\omega T_b - \left(1 - e^{-\omega T_b}\right)}{0.75 \cdot \omega T_b - \left(1 - e^{-0.75 \omega T_b}\right)}.$$
 (20)

This analysis is verified through simulations using the single-pole PD model shown in the inset of Fig. 3(a). This model is used to compute the simulation points presented in Fig. 4. Fig. 4 indicates that there is an improvement factor of 1.55 at a frequency  $(\omega/2\pi)f = 0.35/T_b$  (half the conventional bandwidth), corresponding to an improvement of 3.8 dB in receiver sensitivity. As expected, with higher PD bandwidth, the improvement factor decreases until it reaches the final value of 1.33, corresponding to the ratio of the integration periods (1 UI/0.75 UI). As lower bandwidth PDs tend to be more cost effective, the proposed receiver shows an improvement in sensitivity with those PDs, as indicated by Fig. 4. Moreover, there is a factor of two improvements over the DOM integrating front-end receiver corresponding to a 3-dB optical sensitivity improvement as indicated by (3), excluding splitting and delay line losses. This is because the front end always resets before integrating.

## C. Noise in the Two-Bit Integrating Front-End Receiver and Input Capacitance Impact on SNR

The SNR ratio at the input, taking into account the noise variances of the two voltage gain stages ( $\sigma_{A1}$  and  $\sigma_{A2}$ ) and the



Fig. 4. Ratio of  $\Delta v_{01}/\Delta v_{0.75\text{Tb}}$  versus PD bandwidth (in terms of bit duration) in the case of the proposed receiver over that of the current-amplifier-based receiver.



Fig. 5. Simulated SNR versus  $C_{\rm IN}$  showing improvement with a smaller capacitance.

comparator ( $\sigma_{\rm C}$ ), which were ignored in (6), is given by

SNR = 
$$\left(\frac{\Delta v_{01}}{\sqrt{\frac{kT}{C_{IN}} + \sigma_{A1}^2 + \sigma_{A2}^2 + \sigma_c^2}}\right)^2$$
. (21)

In the proposed approach, the gain of the first two gain stages is increased at the expense of bandwidth. The main noise contributions come from the two voltage gain stages and the input capacitance at the input. The noise of the comparator is attenuated by the gain of the two voltage gain stages, as indicated by Friis formula for noise, and, thus, can be ignored. Consequently, (18) can be approximated by

$$\operatorname{SNR} \approx \left(\frac{\Delta v_{01}}{\sqrt{\frac{\mathrm{kT}}{C_{\mathrm{IN}}} + \sigma_{A1}^2 + \sigma_{A2}^2}}\right)^2.$$
(22)

The value of  $kT/C_{IN} + \sigma_{A1}^2 + \sigma_{A2}^2$  was simulated within the bandwidth of the receiver and for different  $C_{IN}$ . Moreover,  $\Delta v_{01}$  is simulated with a peak photocurrent of 100  $\mu$ A at 20 Gb/s. The simulated SNR is plotted in Fig. 5. As expected, smaller junction and parasitic capacitances result in a better SNR, which enhances sensitivity. To reduce the capacitance, the parasitic capacitance at the input is reduced by removing the intermediate metal layers in the bond pads and reducing their size. Indeed, packaging can have a significant impact on



Fig. 6. Detailed circuit implementation of one of the four sub receivers in the proposed receiver. The input is wire bonded to a PD.

sensitivity. Wire bonding is the most common optoelectronic packaging technique, and we use it here. Flip-chip of the electronic receiver onto the photonic chip can be used with thin copper pillars to significantly improve the sensitivity [3], [4].

## D. Detailed Circuit Implementation

A detailed circuit implementation of one of the sub receivers is shown in Fig. 6. An nMOS switch is used to discharge the input capacitance at the end of the integration cycle. A shorted pMOS transistor is used for carrier injection cancellation of the nMOS transistor. This is done to avoid having a residual charge in the integration capacitor. The size of the pMOS is half the size of the nMOS. The size of the nMOS switch is minimized to reduce its contribution to the input capacitance while being kept large enough to reliably discharge the input parasitic capacitance. The nMOS and the pMOS switches are clocked by the two complementary clock phases. The two-bit integrated voltage is then amplified by two inductively peaked cascode voltage gain stages. Inductive peaking increases the gain-bandwidth product of the receiver. This enables the receiver to provide more gain for a given bandwidth as compared with when inductive peaking is not used, and thus, improves the sensitivity of the receiver. The two stages are AC coupled to allow for optimal biasing. The low cutoff frequency of the AC coupling capacitor is designed carefully to allow for PRBS 7 and PRBS 15 measurements. The value of the coupling capacitor is kept small (80 fF) to reduce capacitive parasitic loading at the output of the gain stages. To compensate for the small value of the coupling capacitor  $C_C$ , the value of R is increased, and a low-cutoff frequency of 750 kHz is maintained. It can be shown that a large value of R and a small value of  $C_C$  results in a negligible noise contribution to the input-referred noise and does not affect the sensitivity of the receiver [17]. The low cutoff frequency ensures that the bit patterns [01] and [10] completely overlap, as shown in Fig. 3(c), even at data rates as low as 5 Gb/s, assuming that the PD bandwidth is high enough. In addition, AC coupling prevents low-frequency supply noise injected at the output node of the first amplifier stage to be injected into the second stage. Moreover, any in-band noise injected at the output of the second stage is divided by the gain of the two stages when referred to the input. Fig. 7(a) shows the simulation results for the small-signal voltage gain of the two gain stages.

The gain stages have a bandwidth of 12 GHz with a peak gain of 11 dB. The bandwidth is overdesigned to be 0.4 of 30 Gb/s. In practice, the receiver is limited to 22 Gb/s because of the limited switching speed of the CML latches due to the technology node. The bandwidth can be relaxed to 7.7 GHz  $(0.35 \times 22 \text{ Gb/s})$  without impacting the functionality of the receiver. An important design consideration is the linearity of the gain stages as the receiver needs to process multilevel signals. Fig. 7(b) shows the simulated signal power at the output of the two amplifier stages versus the power at the input. The input-referred 1-dB compression point is at -13.4 dBm. It should be noted that the x-axis shows the RF power at the input of the amplifier stages, not the optical power. The optical power corresponding to the 1-dB compression point is estimated to be of -7.66 dBm using (18). Thus, the receiver operates in the linear region since the optical power at its input is limited to -7.8 dBm (sensitivity) at 22 Gb/s, which is below the optical power required to drive the receiver into the nonlinear region of operation. The input-referred third-order intercept point (IIP3) is simulated using the two-tone test and is -4.15 dBm, as shown in Fig. 7(c). The IIP3 corresponds to a peak voltage of 200 mV (138 mV<sub>rms</sub>) at the input. The calculated optical power required to generate this voltage is -3.4 dBm assuming a PD responsivity of 0.7 A/W, a total input capacitance of 160 fF, and that a pair of "1"s is received at 20 Gb/s. Considering that this optical power is relatively high, the receiver is considered to have good linearity, especially since it needs to process only two integrated bits. With a simulated input-referred voltage noise of 0.9 mV<sub>rms</sub> and an IIP3 of 138 mV<sub>rms</sub>, the spurious-free dynamic range is calculated to be 29 dB using (23).

$$SFDR = \frac{2}{3}(IIP3(dBm) - Noise power(dBm)).$$
 (23)

2

The amplified voltage is fed to two CML flip-flops. Each flip-flop consists of a master CML latch followed by a slave CML latch. A CML topology minimizes kickback noise in comparison to CMOS latches. The two voltage gain stages further reduce residual kickback noise from the latches. The CML latches used here are clocked with quarter-rate clocks allowing more time for the latches to fully regenerate. In this prototype, each of the two CML flip-flops is fed with two externally applied reference voltages,  $V_{ref1}$  and  $V_{ref2}$ , for comparison with the signal. This allows the tuning of the comparators in each



Fig. 7. Simulations of the two amplifier gain stages. (a) AC simulation gain of the amplifier stages. (b). Output power of the two amplifier stages versus the input power. The input-referred 1-dB compression point is -13.4 dBm. (c) Two-tone test showing the fundamental and the third-order harmonic powers. The IIP3 is at -4.15 dBm.



Fig. 8. Micrograph of the fabricated chip occupying  $1.5 \text{ mm} \times 1.5 \text{ mm}$  and wire bonded to a  $1 \times 4 \text{ PD}$  array with a  $250 \text{-}\mu\text{m}$  pitch.

of the sub receivers to account for process variations. Two differential pairs are used at the outputs of the slave CML latch to further boost the output voltages and interface with two digital D flip-flops.

The differential pairs operate at a quarter data rate and are designed to have a high gain at this low speed, consuming less power. One output of each of the two differential pairs is connected to a D flip-flop. The two outputs of the D flip-flops are connected to an XOR gate for decoding, according to (8). Finally, the output of the XOR gate is connected to a buffer to drive the measurement equipment.

## IV. EXPERIMENTAL RESULTS

The receiver is implemented in 65-nm CMOS technology. Fig. 8 shows a micrograph of the receiver along with the wire-bonded  $1 \times 4$  PD array. The receiver is measured in two steps: 1) a single sub receiver measurement shown in Fig. 9 (a) and 2) a full system measurement shown in Fig. 9(b). The continuous light (CW) from a 1550-nm laser is connected to a polarization controller and then modulated using a Mach-Zehnder modulator (MZM) at 10, 16, and 22 Gb/s with a PRBS 7 or PRBS 15 sequence from a pulse pattern generator. The output power of the modulator is controlled using a variable optical attenuator (VOA). A mechanically tunable optical delay line (ODL-330 by Santec Corporation, Komaki, Japan) is used to align the system clock and the data. This delay line has a delay tuning range of 400 ps with a resolution of 0.2 ps. Thus, it was possible to have exactly one unit-interval delay in these measurements. The delay line is

followed by a 90:10 power splitter, where 10% of the output is connected to a power meter for monitoring, while 90% of the signal is connected to one of the PDs in the  $1 \times 4$ PD array (DO309\_20 $um_C3_1 \times 4$  by Global Communication Semiconductors, LLC, Torrance, CA, USA). A bit error rate (BER) measurement is done by changing the optical power applied to the chip through the VOA and recording the BER for each input power. The eye diagram is recorded with a digital communication analyzer (DCA). The measured BER curves are shown in Fig. 10(a) and (b) for PRBS 7 and PRBS 15 inputs, respectively. The electronic receiver achieves an average sensitivity of -7.8 dBm at 22 Gb/s with a PRBS 7 sequence and -6.7 dBm with a PRBS 15 sequence for a BER less than  $10^{-12}$ . The extinction ratio is measured to be 8 dB, and thus, the corresponding optical modulation amplitude (OMA) is calculated to be -6.2 dBm OMA for a PRBS 7 sequence. The measured quarter-rate eye output diagram at 5.5 Gb/s is shown in Fig. 10(c).

To validate the tolerance of the receiver to timing variations in the optical delay lines, the bathtub curve is measured at 22 Gb/s, as shown in Fig. 10(d). The receiver shows a timing error tolerance of approximately 0.1 UI (i.e., 4.5 ps) at 22 Gb/s. Note that it is possible to reliably design integrated optical delay lines with a delay error of fewer than 3 ps, as outlined in the Appendix.

The proper operation of the complete system is confirmed by verifying correct deserialization, crosstalk levels, and measuring power consumption. In the setup shown in Fig. 9(b), the modulated light of the MZM is amplified using an erbium-doped fiber amplifier (EDFA). The output of the EDFA is then filtered using a bandpass optical filter centered around 1550 nm, followed by a VOA. The output of the VOA is connected to a 10:90 coupler for monitoring, after which the 90% output is sent to a 1:4 optical splitter with a 6.5-dB insertion loss. Each of the four outputs of the splitter is connected to a mechanically optical delay line (ODL-330 by Santec Corporation) with a reported insertion loss of 1.5 dB. The delays are adjusted to  $T_b$  according to the scheme shown in Fig. 3. In a final implementation, these delay lines are replaced with silicon-photonic delay lines, as described in the Appendix. A fiber array couples the light to the  $1 \times 4$ PD array. The BER is measured at 22 Gb/s with a PRBS 7 input and shown in Fig. 10(e) in comparison with singlechannel measurements. There is a degradation of 1.3 dB



Fig. 9. Measurements setups. (a) Single-sub receiver measurements setup. (b) Full-system measurement setup.



Fig. 10. Measurements results. (a) BER curve for PRBS 7 input. (b) BER curve for PRBS 15 input. (c) Quarter-rate eye diagram of 5.5-Gb/s output. (d) Bathtub curve at 22 Gb/s. (e) BER curve comparing single-channel operation with full-system operation and crosstalk penalty at 22 Gb/s and with a PRBS 7 sequence.

due to crosstalk between the PDs. To mitigate this, the onchip spacing between the PDs could be increased, or ground bond wires acting as shields could be placed between the PDs. The speed of the receiver is limited to 22 Gb/s by the switching speed of the CML latches as opposed to the front end. With an implementation in a more advanced technology node or a monolithic process, the operating speed is expected to improve.

The circuit dissipates 87.6 mW from a 1.09-V power supply. The core of the receiver consumed 31.6 mW or 36% of the power and both clock phase buffers. The output buffer dissipated 56 mW (64%) and required to drive the 50- $\Omega$  terminated measurement equipment. The resulting energy-efficiency, excluding the output buffer, is 1.43 pJ/bit.

## V. DISCUSSION

The proposed technique successfully eliminates charge sharing and CID issues associated with integrating-type receivers and the need for short reset pulses present in current amplifier-based receivers. It also allows for an integration period of more than 1 UI as opposed to 0.75 UI in resettable receivers. It also uses only two clock phases to perform a demux-by-four as opposed to four required in other architectures. There are, however, some tradeoffs present in the proposed technique. The first tradeoff is in the system-level additional optical insertion loss. In this initial demonstration, the excess optical losses are 8 dB with 6.5 dB associated with the splitting of the optical signal and 1.5 dB from the optical delay lines. It is possible to reduce these losses by implementing the splitter and the delay lines using SiP technology. As explained in the Appendix, the delay line loss can be as low as 0.07 dB, and the optical couplers can be designed to balance the power at the PDs. Thus, the total optical loss could be reduced to 6 dB.

A full-bandwidth system utilizes twice the bandwidth required by the proposed system, and thus, has twice the integrated noise. The sensitivity of the proposed system is, theoretically, 3 dB below a full-bandwidth system operating at the same data rate due to the excess insertion loss of the delay lines. Moreover, as indicated by (17), the front end boosts the sensitivity of the electronic part of the receiver by 3.8 dB when the bandwidth is  $0.35 \times$  data rate in comparison with a current amplifier-based receiver and by 3 dB in comparison with the integrating front-end receiver. As a result, the sensitivity of the proposed receiver is only 2.2 and 3 dB below these systems, respectively, considering the 6-dB optical losses.

|                                                    | This<br>work                               | [1]               | [2]                         | [3, 4]                                         | [11]               | [7]                    | [8]                             | [12]                                                   | [13]                                                   | [10]                   | [23]                                                         | [17]                                                            |
|----------------------------------------------------|--------------------------------------------|-------------------|-----------------------------|------------------------------------------------|--------------------|------------------------|---------------------------------|--------------------------------------------------------|--------------------------------------------------------|------------------------|--------------------------------------------------------------|-----------------------------------------------------------------|
| CMOS<br>Technology<br>node (nm)                    | 65                                         | 90                | 65                          | 28                                             | 65                 | 40                     | 40                              | 14                                                     | 14                                                     | 28                     | 16                                                           | 65                                                              |
| Data rate<br>(Gb/s)                                | 22                                         | 16                | 24                          | 25                                             | 20                 | 25                     | 25                              | 32                                                     | 64                                                     | 20                     | 50                                                           | 12.5                                                            |
| Sensitivity<br>(dBm)<br>(BER = 10 <sup>-12</sup> ) | -7.8 <sup>1</sup><br>-6.2 <sup>1</sup> OMA | -5.4              | -4.7                        | -14.9                                          | -5.8 OMA           | -10.8                  | -8.7                            | -11.7<br>OMA                                           | -5.5<br>OMA                                            | -8.6<br>OMA            | -10.9<br>OMA                                                 | -4 <sup>1</sup><br>-3.4 <sup>1</sup> OMA                        |
| Data Type                                          | PRBS 7,<br>15                              | 8B/10B            | PRBS<br>7,9,15              | PRBS<br>7,9,15                                 | PRBS 7             | PRBS<br>15             | PRBS 31                         | PRBS 31                                                | PRBS 7                                                 | PRBS 7                 | PRBS 7                                                       | PRBS 7                                                          |
| Input<br>Capacitance<br>(fF)                       | 160                                        | 440               | 250                         | 33                                             | 200                | 1004                   | 1004                            | 69                                                     | 69                                                     | 200                    | 90                                                           | 160                                                             |
| Power<br>consumptio<br>n (mW)                      | 31.6                                       | 23                | 9.6                         | 4.25                                           | 14.2               | 27.6                   | 27.6                            | 27.6                                                   | 14                                                     | 14                     | 97 <sup>5</sup>                                              | 24.4                                                            |
| Energy<br>efficiency<br>(pJ/bit)                   | 1.43                                       | 1.44 <sup>2</sup> | 0.4 <sup>3</sup>            | 0.17                                           | 0.71               | 1.13                   | 1.13                            | 1.4                                                    | 1.4                                                    | 0.7                    | 1.945                                                        | 1.945                                                           |
| Area (mm²)                                         | 4×0.0812                                   | 0.105             | 0.0028                      | 0.0018                                         | 0.027              | 0.007                  | 0.09                            | 0.046                                                  | 0.028                                                  | 0.005                  | 0.27                                                         | 4×0.1185                                                        |
| Receiver<br>type                                   | Two-bit<br>integrating<br>front-end        | Double sampling   | Double<br>sampling +<br>DOM | Double<br>sampling +<br>DOM +<br>Low BW<br>TIA | Low BW<br>TIA+ DFE | CA<br>based<br>receive | CA based<br>receiver<br>r + CDR | Low<br>Bandwidth<br>TIA + 1- tap<br>speculative<br>DFE | Low<br>Bandwidth<br>TIA + 1- tap<br>speculative<br>DFE | Integrate-<br>and-dump | Conventional<br>with T-Coils<br>for bandwidth<br>enchantment | Conventional<br>receiver with<br>passive optical<br>delay lines |

TABLE I Comparison With the State-of-the-Art

<sup>1</sup>Optical losses not considered. <sup>2</sup>Power consumption of front-end only. <sup>3</sup>Without clock generation and SR latches. <sup>4</sup>PD capacitance reported only. <sup>5</sup> Receiver + clock generation.

To compensate for this degradation in sensitivity, advanced forward error correction codes can be used [18], [19]. Alternatively, as indicated by (19), reducing the junction capacitance of the PD and the parasitic capacitance at the input can have a significant impact on the sensitivity. It is estimated that the front end here has a total input capacitance (junction + parasitic) of 160 fF. Flip-chip packaging with thin copper pillars [3], [4] provides a total input capacitance of 33 fF and can be used for better sensitivity. Finally, if the receiver is implemented in a monolithic process, the capacitance associated with the bond pad is removed, improving the SNR and signal power, as indicated by Fig. 5. This mitigates the sensitivity tradeoff. The improvement in signal power, in this case, means that the voltage amplifiers could be designed with relaxed gain, leading to improved power consumption and better energy efficiency. To summarize this tradeoff, the receiver offers a reduced complexity by removing the clock generation circuits, which also leads to reduced power consumption, at the expense of degraded sensitivity.

A second tradeoff is the fixed speed of the operation set by the optical delay of the delay lines. This can be mitigated by implementing electronically tunable delay lines in SiP [20].

A third tradeoff is an additional area on the chip required for the bond pads needed to connect to the four PDs. This can be mitigated by implementing the receiver in a monolithic process, such as the one offered by GlobalFoundries, Santa Clara, CA, USA [21], where bond pads are not needed, similar to the work presented in [5]. The reported area of PDs in SiP is 25  $\mu$ m × 8  $\mu$ m in [22], which is negligible in this case. Another aspect related to this is the increased physical distance between sub receivers and the number of data paths. This may lead to a power penalty due to clock buffering and distribution. However, since the number of clock phases is only two compared with four used in conventional demultiplex-by-four receivers, requiring half the number of buffers, the proposed receiver could ultimately prove more energy efficient. Moreover, to mitigate this issue, it is possible to reduce the physical distance between sub receivers at the cost of increased parasitic capacitance at the input due to longer connections and increased crosstalk between the sub receivers. If a monolithic process is used, then it is possible to reduce the physical distance between receivers with minimal impact on the crosstalk or the parasitic capacitance at the input.

The proposed receiver, which is designed to be used as a source synchronous receiver, can also be adapted for use alongside a clock and data recovery (CDR) circuit, such as the one proposed in [8]. The delay lines simplify the design of the oscillator in the CDR because only one clock phase needs to be recovered.

Table I shows a performance comparison with the state-ofthe-art. The electronic front end of the receiver achieves better sensitivity than [1], [2] it needs to maintain a large capacitance at the input to mitigate the issue of charge sharing. Moreover, the receiver in [1] uses 8 B/10 B encoding to bypass the CID issue as opposed to PRBS sequences. The sensitivity of the proposed receiver is worse than [3], [4], which uses advanced packaging techniques for better SNR and sensitivity.

The receiver in [7] achieves better energy efficiency and sensitivity but includes a delay circuit that needs to be carefully designed and tuned across different process corners to achieve the required delay of 10 ps. Reference [8] is the same receiver as in [7] but includes a CDR circuit that consumes more power, and this reduces the energy efficiency and sensitivity. From this comparison, it can be seen that source-synchronous receivers offer better receiver energy efficiency, at the cost of the extra clock receiver circuit and clock connection.

The infinite impulse response (IIR) decision DFE receiver in [11] employs a low-bandwidth TIA followed by an IIR DFE to compensate for the bandwidth reduction. IIR DFE receivers, however, are challenged by the critical timing requirements of the feedback loop that needs to settle within 1 UI, which could limit their use at higher speeds. In addition, the IIR nature of the feedback could result in an error propagation issue in the case of incorrect error detection, especially if the magnitude of feedback is increased, resulting in a burst of errors. This limitation is also applied to finite impulse response (FIR) receivers with many taps. To address both the critical timing requirements and the error propagation challenges, the receiver in [12] uses a low-bandwidth TIA with a bandwidth of  $0.22 \times \text{data}$  rate and a one-tap speculative DFE to compensate for bandwidth reduction to achieve 32 Gb/s. Speculative DFE allows for the critical timing required to be relaxed to 4 UI as opposed to 1 UI in conventional DFE. The work reported in [13] is similar to [12] but designed for 64 Gb/s, consumes more power and has lower sensitivity due to the increased data rate but maintains the same energy efficiency. The receiver in [13] was tested with PRBS 7 as opposed to PRBS 31 as in [12]. By using one speculative DFE, the error propagation issue of the IIR DFE receivers is mitigated. However, the speculative DFE taps complexity increases exponentially with the number of taps. Both [12] and [13] are implemented in 14-nm FinFET technology to achieve higher data rates. The critical timing requirement and increased complexity are avoided in the proposed receiver as the integrating nodes are reset to the ground after each cycle. The energy efficiency is similar to the proposed receiver despite the technology node gap.

The integrate-and-dump receiver in [10] removes the feedback used in [11] and replaces it with a reset operation, effectively addressing the critical timing requirement and the potential error propagation. However, it requires a wideband current amplifier in the front end and four clock phases. Since the proposed system uses optical blocks to replace clock phase generation, further power saving in clock generation is possible at the cost of extra optical insertion loss. The receiver in [10] is implemented in CMOS 28 nm and achieves an energy efficiency of 0.7 pJ/bit at 20 Gb/s. The proposed receiver is implemented in CMOS 65 nm, yet has a higher speed of operation of 22 Gb/s, which outlines the benefit of the proposed architecture, potentially due to the reset duration of two UI. The gap in energy efficiency could be attributed in part to the technology node difference. The receiver in [23] is a conventional full-bandwidth receiver implemented in a 16-nm CMOS FinFET technology node and exploits T-coils to improve the bandwidth and achieve a superior speed of 50 Gb/s. The inductors occupy a large area on the IC chip, increasing the cost of the design. While the proposed receiver also employs peaking inductors to improve the gain-bandwidth product in the 65-nm technology node used, the low-bandwidth front end lends itself well to an inductor-less implementation, potentially saving area and cost in a more advanced node. In addition, as this is a conventional front-end receiver, the energy efficiency of the receiver, including clock generation, is the lowest.

Finally, the receiver in [17] is a 12.5 Gb/s 1.93 pJ/bit conventional full-bandwidth receiver with a sensitivity of -3.4 dBm OMA. This receiver uses a conventional commongate input stage and optical delay lines to replace clock generation. The proposed receiver achieves an all-around better performance than [17] thanks to the two-bit low-bandwidth integrating front-end.

Overall, the proposed receiver is a robust, low-complexity alternative that is capable of sustaining long-running identical digits while maintaining a relatively high voltage difference without introducing an open-loop delay for the reset pulse. Such delay is susceptible to process variations. Moreover, the proposed receiver not only has better sensitivity compared with other receivers in similar technology nodes but can also benefit from scaling and more advanced technology nodes where the smaller input capacitance enhances sensitivity.

## VI. CONCLUSION

This work presents a 22 Gb/s receiver with an average -7.8 dBm sensitivity and an energy efficiency of 1.43 pJ/bit. The receiver exploits photonic blocks to remove clock phase generation circuits for reduced power consumption. The receiver aims to address some of the issues present in integrating receiver and current-amplifier-based receivers, mainly charge sharing and short reset pulses, without introducing a TIA circuit while avoiding the critical timing and complexity associated with DFE and speculative DFE-based receivers.

The proposed receiver shows great potential at higher speeds of operation when the clock is becoming more demanding and requires a duty cycle and quadrature error detection circuits. Such circuits are not needed in this system. The receiver, thus, provides a compelling advantage in terms of robustness and reduced complexity. With technology scaling and more expensive technology nodes, low-bandwidth receivers, such as the one proposed, are desirable as they remove the bulky, power-hungry TIAs. Thus, the proposed receiver is suited for applications, such as high-density data center interconnects.

#### APPENDIX

This appendix describes a proposed silicon photonic (SiP) structure that integrates the functionality of the  $1 \times 4$  photonic splitter, the optical delay lines, and the photodetectors (PDs) array onto a single compact chip for integration with the receiver presented. This architecture is based on the designs introduced in [24] and [25].



Fig. 11. Layout of a proposed split-delay SiP structure, including a grating coupler, three directional couplers acting as power splitters, two one-bit delay lines, and four PDs.



Fig. 12. Electronically tunable delay lines consisting of a ring resonator and an MZI delay elements [20].

The layout of the proposed structure is shown in Fig. 11. The proposed SiP chip consists of a single-input grating coupler used to couple the light to the chip, followed by a 50:50 splitter. Each of the two outputs of the splitter is followed by two directional-couplers, each with a coupling ratio of 49:51 to compensate for the propagation loss in the delay lines. Two of the outputs, labeled Out 1 and Out 4, are directly routed to two PDs, while the other two, labeled Out 2 and Out 3, are routed through an optical delay line with a delay corresponding to the period of one bit  $(T_b)$ . The delay lines are made of low-loss silicon waveguides with a core cross section of 220 nm  $\times$  3  $\mu$ m and have a length of 3.63 mm that provides a delay of approximately 45 ps, which corresponds to one bit at 22 Gb/s. The reported loss for the 220 nm  $\times$ 3  $\mu$ m optical waveguide is 0.2 dB/cm, and therefore, the total insertion loss of the delay line is approximately 0.07 dB. This additional loss for Out 2 and Out 3 is compensated by adjusting the coupling ratio of the two directional couplers to 49:51. Thus, the optical power reaching each of the four PDs is the same. The delay lines have a rectangular layout to minimize the area of the chip. In a final integrated system, each of the four detectors can be wire bonded (or flip-chipped) onto the receiver. A monolithically integrated SiP with CMOS can also be considered [5], [26], [27]. Such SiP delay lines provide accurate and reliable delay with an error below 3 ps, and their size conveniently decreases at higher data rates of operation. This makes this approach less complex to implement [24]. It is also possible to replace the fixed delay lines in the proposed structure with electronically tunable lines to support various data rates [20]. The delay line in [20], shown in Fig. 12, consists of a ring resonator delay element for fine delay tuning with a continuous delay of 23 ps and Mach-Zehnder



Fig. 13. Layout of a proposed split-delay SiP with directional couplers to ensure equal power at the output.

switches to select the delay path. There are eight Mach– Zehnder interferometer (MZI) switches followed by seven binary delay stages. This delay line has a continuous delay of up to 1 ns.

The insertion loss of the delay lines ranges from 8.5 dB at 10 ps to 11.3 dB at 1 ns. This is compared with 0.1 dB for fixed delay lines. A more reasonable approach to reduce insertion loss is to use only the ring resonator, which has an insertion loss of 1.1 dB when the delay is 10 ps. This will allow the receiver to operate from 22 Gb/s down to around 18.2 Gb/s with reasonable insertion losses.

If electronically tunable delay lines are employed, then the insertion loss may vary for different data rates. It may then be necessary to use electronically tunable directional couplers, such as the two shown in Fig. 13, to adjust the coupling ratio such that the received power is the same at the outputs of both the no-delay and one-bit delay lines.

#### REFERENCES

- S. Palermo, A. Emami-Neyestanak, and M. Horowitz, "A 90 nm CMOS 16 Gb/s transceiver for optical interconnects," *IEEE J. Solid-State Circuits*, vol. 43, no. 5, pp. 1235–1246, May 2008.
- [2] M. H. Nazari and A. Emami-Neyestanak, "A 24-Gb/s double-sampling receiver for ultra-low-power optical communication," *IEEE J. Solid-State Circuits*, vol. 48, no. 2, pp. 344–357, Feb. 2013.
  [3] S. Saeedi and A. Emami, "A 25 Gb/s 170 μW/Gb/s optical receiver
- [3] S. Saeedi and A. Emami, "A 25 Gb/s 170 μW/Gb/s optical receiver in 28 nm CMOS for chip-to-chip optical communication," in *Proc. IEEE Radio Freq. Integr. Circuits Symp.*, Tampa, FL, USA, Jun. 2014, pp. 283–286.
- [4] S. Saeedi, S. Menezo, G. Pares, and A. Emami, "A 25 Gb/s 3D-integrated CMOS/silicon-photonic receiver for low-power high-sensitivity optical communication," *J. Lightw. Technol.*, vol. 34, no. 12, pp. 2924–2933, Jun. 15, 2016.
- [5] M. Georgas, J. Orcutt, R. J. Ram, and V. Stojanovic, "A monolithicallyintegrated optical receiver in standard 45-nm SOI," *IEEE J. Solid-State Circuits*, vol. 47, no. 7, pp. 1693–1702, Jul. 2012.
- [6] S. Sidiropoulos and M. Horowitz, "Current integrating receivers for high speed system interconnects," in *Proc. IEEE Custom Integr. Circuits Conf.*, Santa Clara, CA, USA, May 1995, pp. 107–110.
- [7] S. Huang and W.-Z. Chen, "A 25 Gb/s 1.13 pJ/b -10.8 dBm input sensitivity optical receiver in 40 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 52, no. 3, pp. 747–756, Mar. 2017.

- [8] Y.-S. Lee, W.-H. Ho, and W.-Z. Chen, "A 25-Gb/s, 2.1-pJ/bit, fully integrated optical receiver with a baud-rate clock and data recovery," *IEEE J. Solid-State Circuits*, vol. 54, no. 8, pp. 2243–2254, Aug. 2019.
- [9] S. Huang, Z.-H. Hung, and W.-Z. Chen, "A 2×20-Gb/s, 1.2-pJ/bit, timeinterleaved optical receiver in 40-nm CMOS," in *Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC)*, KaoHsiung, Taiwan, Nov. 2014, pp. 97–100.
- [10] A. Sharif-Bakhtiar, M. G. Lee, and A. C. Carusone, "Low-power CMOS receivers for short reach optical communication," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Austin, TX, USA, Apr. 2017, pp. 1–8.
- [11] A. Šharif-Bakhtiar and A. Chan Carusone, "A 20 Gb/s CMOS optical receiver with limited-bandwidth front end and local feedback IIR-DFE," *IEEE J. Solid-State Circuits*, vol. 51, no. 11, pp. 2679–2689, Nov. 2016.
- [12] J. E. Proesel et al., "A 32 Gb/s, 4.7 pJ/bit optical link with -11.7 dBm sensitivity in 14-nm FinFET CMOS," IEEE J. Solid-State Circuits, vol. 53, no. 4, pp. 1214–1226, Apr. 2018.
- [13] A. Cevrero et al., "29.1 a 64Gb/s 1.4pJ/b NRZ optical-receiver data-path in 14 nm CMOS FinFET," in *IEEE ISSCC Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2017, pp. 482–483.
  [14] J. Kim et al., "A 112 Gb/s PAM-4 transmitter with 3-Tap FFE in 10 nm
- [14] J. Kim et al., "A 112 Gb/s PAM-4 transmitter with 3-Tap FFE in 10 nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2018, pp. 102–104.
- [15] J. Hwang et al., "56 Gb/s PAM-4 VCSEL transmitter with quarter-rate forwarded clock using 65 nm CMOS circuits," in *Proc. Opt. Fiber Commun. Conf. (OFC)*, 2019, pp. 1–3, Paper W2A.1.
- [16] J. Olmos, L. Suhr, B. Li, and I. Monroy, "Five-level polybinary signaling for 10 Gbps data transmission systems," *Opt. Express*, vol. 21, no. 17, pp. 20417–20422, 2013.
- [17] B. Radi, M. S. Nezami, M. Menard, F. Nabki, and O. Liboiron-Ladouceur, "A 12.5 Gb/s 1.93 pJ/bit optical receiver exploiting silicon photonic delay lines for clock phases generation replacement," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, early access, Nov. 11, 2019, doi: 10.1109/TCSII.2019.2952591.
- [18] Forward Error Correction for High Bit-Rate DWDM Submarine Systems, document G.975.1, Telecommunication Standardization Sector of ITU, Feb. 2004.
- [19] M. Balakrishnan, T. Marian, K. P. Birman, H. Weatherspoon, and L. Ganesh, "Maelstrom: Transparent error correction for communication between data centers," *IEEE/ACM Trans. Netw.*, vol. 19, no. 3, pp. 617–629, Jun. 2011.
  [20] X. Wang *et al.*, "Continuously tunable ultra-thin silicon waveguide
- [20] X. Wang *et al.*, "Continuously tunable ultra-thin silicon waveguide optical delay line," *Optica*, vol. 4, no. 5, pp. 507–517, May 2017.
- [21] K. Giewont *et al.*, "300-mm monolithic silicon photonics foundry technology," *IEEE J. Sel. Topics Quantum Electron.*, vol. 25, no. 5, pp. 1–11, Sep. 2019, doi: 10.1109/JSTQE.2019.2908790.
- [22] M. Fard, G. Cowan, and O. Liboiron-Ladouceur, "Responsivity optimization of a high-speed germanium-on-silicon photodetector," *Opt. Express*, vol. 24, no. 24, pp. 27738–27752, 2016.
- [23] M. Raj et al., "Design of a 50-Gb/s hybrid integrated Si-photonic optical link in 16-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 55, no. 4, pp. 1086–1095, Apr. 2020.
- [24] M. S. Hai, M. Ménard, and O. Liboiron-Ladouceur, "Integrated optical deserialiser time sampling based SiGe photoreceiver," *Opt. Express*, vol. 23, no. 25, pp. 31736–31754, Dec. 2015.
- [25] M. S. Hai, M. Menard, and O. Liboiron-Ladouceur, "A 20 Gb/s SiGe photoreceiver based on optical time sampling," in *Proc. Eur. Conf. Opt. Commun. (ECOC)*, Valencia, Spain, Sep./Oct. 2015, pp. 1–3.
- [26] C. Sun *et al.*, "A monolithically-integrated chip-to-chip optical link in bulk CMOS," *IEEE J. Solid-State Circuits*, vol. 50, no. 4, pp. 828–844, Apr. 2015.
- [27] S. Assefa et al., "Monolithically integrated silicon nanophotonics receiver in 90 nm CMOS technology node," in Proc. Opt. Fiber Commun. Conf./Nat. Fiber Optic Eng. Conf., Anaheim, CA, USA, Mar. 2013, pp. 1–3.



Bahaa Radi (Graduate Student Member, IEEE) received the B.S. degree in electrical engineering from The Hashemite University, Zarqa, Jordan, in 2012, and the M.S. degree in microsystems engineering from Khalifa University, Abu Dhabi, United Arab Emirates, in 2015. He is currently pursuing the Ph.D. degree in electrical engineering with the Photonic Systems Group, McGill University, Montreal, QC, Canada.

His current research interests include power-efficient optical receivers, energy-efficient

optical systems for short-reach applications, and electronic and photonic integrated circuits.



**Mohammadreza Sanadgol Nezami** (Graduate Student Member, IEEE) received the M.Sc. degree in electrical engineering from the Iran University of Science and Technology, Tehran, Iran, in 2006, and the Ph.D. degree in electrical engineering from the University of Victoria, Victoria, BC, Canada, in 2016.

He is currently a Post-Doctoral Researcher with the Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada. His current research is mainly focused on silicon

photonics, optical interconnects, and optoelectronics.



**Mohammad Taherzadeh-Sani** received the B.Sc. degree from the Ferdowsi University of Mashhad, Mashhad, Iran, in 2001, the M.Sc. degree from the University of Tehran, Tehran, Iran, in 2004, and the Ph.D. degree from McGill University, Montreal, QC, Canada, in 2011.

In 2012, he joined the Ferdowsi University of Mashhad as an Assistant Professor. He has authored several journal publications in distinguished journals, such as the IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE TRANSACTIONS ON CIRCUITS

AND SYSTEMS-I (TCAS-I), IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II (TCAS-II), and the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS, and many articles in different conferences, such as European Solid-State Circuits Conference (ESSCIRC), Asian Solid-State Circuits Conference (A-SSCC), International Conference on Computer-Aided Design (ICCAD), and International Symposium on Circuits and Systems (ISCAS). He fabricated several integrated circuits in various technologies from 65- to 180-nm CMOS. His research interests focus on biomedical circuits and systems, high-quality and high-speed data converters, and radio frequency integrated circuits. He has different fabricated ICs and publications on these subjects.

Dr. Taherzadeh-Sani was a recipient of the J. W. McConnell Memorial Fellowship from McGill University in 2007 and 2008 for his Ph.D. research and the Post-Doctoral Fellowship from the Le Fonds Québécois de la Recherche sur la Nature et les Technologies for 2012 and 2013 (declined).



**Frederic Nabki** (Member, IEEE) received the B.Eng. degree (Hons.) in electrical engineering and the Ph.D. degree in electrical engineering from McGill University, Montreal, QC, Canada, in 2003 and 2010, respectively.

In 2008, he joined the Université du Québec à Montréal (UQAM), Montreal, where he was an Associate Professor in microelectronics engineering. In 2016, he joined as an Associate Professor with the Department of Electrical Engineering, École de technologie supérieure, Montreal, University of

Quebec, Quebec City, QC, Canada. He has authored or coauthored two book chapters and over 100 publications. He was or is supported by the Microsystems Strategic Alliance of Quebec (ReSMiQ), the Quebec Fund for Research in Nature and Technology, the Ministry of Economy, Science and Innovation of Quebec, the Natural Sciences and Engineering Research Council of Canada, and the Canada Foundation for Innovation. He holds 11 issued patents and 21 pending patent applications related to MEMS and CMOS/MEMS monolithic integration. His research interests include MEMS and RF/analog microelectronics.

Dr. Nabki was a recipient of the Governor General of Canada's Academic Bronze Medal, the J. J. Archambault IEEE Canada Medal, and the UQAM Faculty of Science Early Career Research Award.



**Michaël Ménard** (Member, IEEE) was born in Québec City, QC, Canada. He received the B.Eng. and Ph.D. degrees in electrical engineering from McGill University, Montreal, QC, Canada, in 2002 and 2009, respectively.

He was with McGill University, where he worked on the design and implementation of novel devices for optical telecommunication applications, including spatial formatting in dense wavelength division multiplexer and broadband high-density electrooptical space switches in III-V waveguides. From

2009 to 2011, he was a Post-Doctoral Fellow with the Cornell Nanophotonics Group, Ithaca, NY, USA, under the supervision of Prof. Michal Lipson, where he investigated broadband wavelength conversion with silicon waveguides for fiber and free space telecommunication. In 2011, he joined the Microelectronic Program, Université du Québec à Montréal, where he is an active member of NanoQAM, the research center on nanomaterials and energy. He jointly manages the Microtechnology and Microsystems Laboratory. In 2019, he was a Visiting Researcher with the Department of Applied Physics, University of Campinas, Campinas, Brazil. He has authored or coauthored over 40 publications. He holds three issued patents and three pending patent applications. He was or is supported by the Microsystems Strategic Alliance of Quebec, the Center for Optics, Photonics, and Lasers, the Quebec Fund for Research in Nature and Technology, Prompt Québec, PRIMA Québec, and the Natural Sciences and Engineering Research Council of Canada. His research interests include integrated optics, silicon photonics, nonlinear optics, microoptoelectro-mechanical systems, optomechanics, and microfabrication.

Dr. Ménard is a member of the Quebec Order of Engineers.



Odile Liboiron-Ladouceur (Senior Member, IEEE) received the B.Eng. degree in electrical engineering from McGill University, Montreal, QC, Canada, in 1999, and the M.S. and Ph.D. degrees in electrical engineering from Columbia University, New York, NY, USA, in 2003 and 2007, respectively.

From 1999 to 2000, she worked at Teradyne, Inc., Boston, MA, USA, as an Applications Engineer in the mass storage business unit. In 2000, she joined Texas Instruments, Inc., Dallas, TX, USA, and spent two years working in the fiber optic business unit as

a test and design engineer. In 2008, she joined the Department of Electrical and Computer Engineering, McGill University, where she is currently an Associate Professor, and the Canada Research Chair in photonics interconnect. She manages the Photonic DataCom Research Team, McGill University. She holds six granted U.S. patents. She has coauthored over 60 peer-reviewed journal articles and more than 100 articles in conference proceedings. She has authored or coauthored four book chapters. Her research interests include optical systems, photonic-integrated circuits, and photonic interconnects.

Dr. Liboiron-Ladouceur was an Elected Member of the IEEE Photonics Society Board of Governance from 2016 to 2018. She was a recipient of the McGill Principal's Prize for Outstanding Emerging Researcher in 2018. She was the General Co-Chair of Photonics in Switching and Computing in 2017, 2019, and 2020. From 2009 to 2016, she was an Associate Editor for the IEEE PHOTONICS TECHNOLOGY LETTER. She has given over 15 presentations as an invited speaker at international conferences.