

Received 20 June 2022; revised 3 August 2022; accepted 5 August 2022. Date of publication 9 August 2022; date of current version 26 August 2022. This article was recommended by Associate Editor C.-M. Hung.

Digital Object Identifier 10.1109/OJCAS.2022.3197333

# An Efficient Filter-Bank Multi-Carrier System for High-Speed Wireline Applications

JEREMY COSSON-MARTIN<sup>®1</sup> (Graduate Student Member, IEEE), HOSSEIN SHAKIBA<sup>®2</sup> (Senior Member, IEEE), AND ALI SHEIKHOLESLAMI<sup>®1</sup> (Senior Member, IEEE)

<sup>1</sup>Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada

<sup>2</sup>HiLink SerDes Group, Huawei Technologies Canada, Markham, ON L3R 5A4, Canada

CORRESPONDING AUTHOR: J. COSSON-MARTIN (e-mail: marti701@ece.utoronto.ca)

This work was supported in part by the Natural Sciences and Engineering Research Council (NSERC).

**ABSTRACT** This paper proposes an efficient multi-carrier system that combines filter-bank multi-carrier signalling, decision-directed channel estimation, and frequency-domain timing recovery to eliminate the overhead associated with cyclic prefix, large side-lobes, and pilot carriers. Furthermore, a technique is proposed to halve the required number of FFTs (IFFTs), reducing their complexity by 29% for a 32-point resolution; a method is proposed to correct tilt and stretch distortion; and a gain controller with adaptive loop coefficients is adopted to achieve the same stability but 65% higher tracking bandwidth regardless of the FFT size. The concept is validated at the system level, where impairments are applied, enabling an in-depth comparison to conventional discrete multi-tone signalling. Assuming a 32-point FFT, a 35dB channel, and an overlap factor of 3, results show 101% improvement in capacity, 100% improvement in power efficiency, and 101% improvement in area efficiency, and all while maintaining comparable latency. This work enables very low-resolution multi-carrier schemes, which were previously impractical due to the significant overhead.

**INDEX TERMS** Adaptive equalization, channel estimation, clock and data recovery (CDR), decisiondirected equalization, discrete multi-tone (DMT), filter-bank multi-carrier (FBMC), frame synchronization, non-linear control system, orthogonal frequency division multiplexing (OFDM), SERDES, single-tap equalization, timing recovery, wireline.

### I. INTRODUCTION

W IRELINE data rates are marching beyond 200Gb/s [1], but a question remains as to how much room is left in this march and how to close the remaining gap to Shannon's capacity limit [2]. In the pursuit to close this gap, alternative signal modulation techniques are being considered, including multi-carrier schemes [3], [4], [5], [6], [7], [8].

Multi-carrier signalling is a mature technology dominant in communication applications that optimize spectral efficiency. This includes baseband telecommunication systems such as ADSL and VDSL [6], [9], long-haul optical applications such as single-mode and multi-mode fiber [10], [11] and wireless systems such as LTE and 5G [12], [13]. Recently, the first high-speed wireline implementation has been proposed in [3]. However, multi-carrier schemes have yet to gain widespread adoption in this field due to several barriers. These include the overhead from Cyclic Prefix (CP) required to avoid symbol interference in a band-limited system, the penalty from large side-lobes resulting from rectangular windowing, and the overhead from pilot carriers used to perform channel estimation and timing recovery. This paper analyzes these problems and proposes a system to alleviate each. First the CP is removed, and side-lobes lowered by adopting Filter-Bank Multi-Carrier (FBMC) signalling, and second an adaptive equalizer with timing recovery is proposed to achieve accurate channel estimation and frame synchronization without the use of pilot carriers.

The paper is organized as follows. Section II provides a brief background of multi-carrier signalling. Section III describes the proposed FBMC system with adaptive equalization and timing recovery. Section IV compares our proposed



FIGURE 1. Generalized top-level block diagram of a multi-carrier system.

solution to conventional DMT signalling in terms of capacity, power efficiency, area efficiency, and latency, analyzes the reduction in complexity from the proposed simplified FBMC coding, and highlights the improvement in noise tracking bandwidth from the proposed adaptive gain controller. Section V concludes the paper.

#### **II. BACKGROUND**

This section provides a brief background of multi-carrier signalling, highlights the shortcomings of DMT and advantages of FBMC, discusses its implementation, and analyzes the challenges of channel estimation and timing recovery. For a comprehensive background of FBMC, we refer the readers to [7] and [8].

Throughout this manuscript, we reserve i to denote the packet (or frame) index, k to denote the frequency bin (or symbol) index, and n to denote the time (or sample) index.

#### A. MULTI-CARRIER SIGNALLING

Multi-carrier signalling is the concept of splitting a channel into multiple frequency bins and sending a modulated tone in each to convey information.

Shown in Fig. 1, the transmitter maps a sequence of bits  $D_{IN}$  to a vector of N - 1 complex-valued symbols  $\Omega_i[k]$ , each selected from pre-defined constellations of size  $2^{B[k]}$  representing B[k] bits. These symbols are scaled by P[k] to produce symbols  $X_i[k]$ . Then the vector is encoded, serialized and sent through the DAC to form the continuous-time transmit signal x(t). After passing through the channel, the received signal y(t) is sampled, deserialized, and decoded to recover N - 1 symbols  $Y_i[k]$ . Each is scaled by C[k] to produce symbols  $\Pi_i[k]$ . Finally, using constellations B[k],  $\Pi_i[k]$  is converted back to bits to form the receiver binary output sequence  $D_{OUT}$ .

The number of encoded bits (bit-loading, B[k]) and the separation between symbols (power-loading, P[k]) is optimized using the Water Pouring Algorithm [14]. B[k] is set to transmit as many bits as possible using the least amount of signal power, whereas P[k] is adjusted to ensure an equal error rate across all bins. This maximizes spectral efficiency [2].

Practical channels are band-limited, producing Inter-Symbol-Interference (ISI). This is a significant burden for 4-PAM. However, assuming multi-carrier frequency bins are sufficiently narrow, the frequency response remains approximately constant over each bin's bandwidth. As such, symbols only experience scaling and rotation errors which is easily corrected using single-tap equalization. This promises a more precise and efficient method to overcome the impairment leading to higher data rates compared to 4-PAM [4], [5], [6], [8].

High-speed wireline applications must adhere to strict latency and complexity requirements. In a multi-carrier system, both parameters scale proportionally with the coding resolution and thus the number of bins. As a result, these systems are bin-limited [15]. Moreover, whereas in wireless, the number of bins equals the coding resolution, in wireline, this number is half [16]. The reason is that wireless systems use bandpass communication, which enables the transmission of complex encoder outputs. However, wireline systems use baseband communication, which restricts the system to real-valued outputs. Given a 2N-point encoder, the first N bins may send unique information, but the remaining N bins must be reserved to send the complex conjugate of the data [10], [16]. Finally, the DC (bin 0) is generally not used as it cannot support complex modulation, and AC coupling would likely block it. As a result, a typical implementation may assume a resolution of 2N = 32, enabling only N-1 = 15 usable bins. This resolution is proposed in [3] and achieves a compromise between system performance, latency and complexity. From simulation, such a system has a latency of around 10ns, significantly less than the 100ns required for Forward-Error-Correction (FEC) [17] and exhibits a complexity similar to 4-PAM [3]. This limitation worsens the overhead associated with conventional multicarrier techniques such as employing CP and reserving bins for channel estimation and timing recovery.

#### B. DISCRETE MULTI-TONE (DMT)

The most well-known coding scheme is Discrete Multi-Tone (DMT), also referred to in wireless applications as Orthogonal Frequency Division Multiplexing (OFDM). Here, the encoder (decoder) implements the Inverse Fast-Fourier Transform IFFT (FFT) following (1) and (2).

$$x_i[n] = IFFT_{2N}\{X_i[k]\} = \frac{1}{2N} \sum_{k=0}^{2N-1} X_i[k] e^{j2\pi kn/2N}$$
(1)

$$Y_i[k] = FFT_{2N}\{y_i[n]\} = \sum_{n=0}^{2N-1} y_i[n]e^{-j2\pi kn/2N}$$
(2)

As shown in Fig. 2, DMT implements rectangular windowing [7], [8]. Consequently, ISI causes energy from one frame to leak into the next, producing interference among symbols and degrading performance. To mitigate this effect, a guard interval called a Cyclic Prefix (CP) is inserted between subsequent frames [8], [18]. The CP is created by appending



FIGURE 2. Time and frequency-domain response of DMT's rectangular window. xi is a vector of 2N samples where i is the packet (or frame) index.



FIGURE 3. DMT encoder block diagram.



FIGURE 4. Time and frequency-domain response of FBMC's frame-shaping filter for O = 2 and O = 3.

the end portion of each frame to its front, resulting in an overall length of  $2N + N_{CP}$  Unit Intervals (UI), where  $N_{CP}$  is the length of the CP. As demonstrated in [10], when the CP is long enough, the overlap among frames is removed, eliminating interference among symbols and enabling single-tap equalization. This encoding process is depicted in Fig. 3.

Although DMT is the most straightforward implementation of multi-carrier signalling, it suffers from two shortcomings. First, the CP adds an overhead that is most pronounced with short frame lengths in bin-limited systems [8], [19]. And second, as shown in Fig. 2, rectangular windowing produces large side-lobes in the frequency domain, with the first being attenuated by only 13dB [20], [21]. In an ideal system, frequency bins are orthogonal to one another, and this is not a concern. However, in a practical system, impairments such as time and frequency offset deteriorate orthogonality among bins and produce interference among symbols with a magnitude proportional to the side-lobe height [7], [8], [20]. As such, minimizing side-lobes is critical for achieving capacity.

smooths the transition between frames, alleviating the need for CP while also improving side-lobe attenuation [8], [24]. However, unlike a rectangular window with a length of 2N, this filter has a length of 2NO where O is an integer larger than 1 denoted as the overlap factor. To maintain the same throughput, these lengthened frames overlap each other. As such, whereas DMT employs a CP to avoid frame overlap, FBMC takes advantage of it.

An alternative coding scheme that overcomes these pitfalls is

Filter-Bank Multi-Carrier (FBMC) [22], [23]. Also referred

to as offset-QAM or Staggered Multi-Tone (SMT) [8], it

is considered as one of the critical innovations enabling

wireless 5G communication [13]; we propose to adopt this

As depicted in Fig. 4, whereas DMT employs a rectangular

window, FBMC applies a shaped window p(t). This filter

C. FILTER-BANK MULTI-CARRIER (FBMC)

scheme for high-speed wireline communication.

A popular FBMC filter is described by (3) where  $a_r$  follows Table 1 [24]. Note that with O = 1, we obtain a



FIGURE 5. a) DMT's symbol interference pattern spaced  $(2N + N_{CP})/f_S$  seconds in time and  $f_S/2N$  Hertz in frequency. b) FBMC's symbol interference pattern for O = 4 spaced  $N/f_S$  seconds in time and  $f_S/2N$  Hertz in frequency. c) DMT's symbol transmission pattern. d) FBMC's symbol transmission pattern.

TABLE 1. Filter coefficients.

|  |                       | DMT   | FBMC   |        |        |        |        |
|--|-----------------------|-------|--------|--------|--------|--------|--------|
|  |                       | 0 = 1 | 0 = 2  | 0 = 3  | 0 = 4  | 0 = 5  | 0 = 6  |
|  | <i>a</i> <sub>1</sub> |       | -0.707 | -0.911 | -0.972 | -0.992 | -0.998 |
|  | a2                    |       |        | +0.411 | +0.707 | +0.865 | +0.948 |
|  | <i>a</i> <sub>3</sub> |       |        |        | -0.235 | -0.501 | -0.707 |
|  | <i>a</i> <sub>4</sub> |       |        |        |        | +0.128 | +0.317 |
|  | $a_5$                 |       |        |        |        |        | -0.060 |

conventional rectangular filter.

$$p[n] = \begin{cases} 1 + 2\sum_{r=1}^{O-1} a_r \cos\left(\frac{2\pi rn}{2NO}\right) & \text{if } 0 \le n < 2NO \\ 0 & \text{otherwise} \end{cases}$$
(3)

In DMT, symbols are independent of one another. As shown in Fig. 5a, by sending a symbol at bin k and frame *i*, the receiver output signal power is contained within a single bin and frame. However, in FBMC, frame shaping produces a deliberate symbol interference pattern [7], [8]. This pattern, depicted in Fig. 5b, is two-dimensional, spanning multiple bins and frames. As a result, whereas in DMT, one complex-valued symbol is transmitted per bin in every frame, in FBMC, the real and imaginary components of symbols are transmitted separately where the latter is delayed by half a frame period. This concept is depicted in Fig. 5c,d for DMT and FBMC, respectively. By doing so, we ensure symbols are independent of one another. At the receiver, both symbol components are combined, effectively achieving one complex QAM symbol per bin in every frame [8]. As such, in an ideal system, both DMT and FBMC achieve the same capacity. However, with a practical system having impairments, FBMC outperforms DMT as it does not suffer from the overhead associated with CP and reduced side-lobe attenuation [7], [8], [20], [24].

The encoding and decoding of FBMC signals can be expressed analytically. Equations (4) and (5) describe the synthesis and analysis functions, respectively. The transmission pattern and realignment process are expressed in (6) and (7), respectively.

$$g[n,k] = \frac{1}{2NO^2} p[n] e^{j2\pi kn/2N} e^{j2\pi k/4}$$
(4)

$$\gamma[n,k] = p[n]e^{-j2\pi kn/2N}e^{-j2\pi k/4}$$
(5)

$$x_i[n] = \sum_{k=0}^{2N-1} \Re\{X_i[k]\}g[n,k] + \sum_{k=0}^{2N-1} j\Im\{X_i[k]\}g[n-N,k]$$
(6)

$$Y_{i}[k] = \sum_{n=0}^{2NO-1} \Re\{y_{i}[n]\gamma[n,k]\} + \sum_{n=0}^{2NO-1} j\Im\{y_{i}[n+N]\gamma[n,k]\}.$$
(7)

#### D. FBMC IMPLEMENTATION

There are three methods to implement FBMC coding. These include adopting filter banks, Frequency Spreaders (FS), or Poly-Phase Networks (PPN) [7], [13], [25]. We will focus on the last as it is the most efficient in terms of hardware. This method is depicted in Fig. 6.

Here, a vector of N-1 input symbols  $X_i[k]$  is decomposed into in-phase and quadrature components. Then quadrature phase rotation is applied, ensuring a  $\pi/2$  phase difference between neighbouring bins. Next, Hermitian symmetry is added as the symbols enter a pair of IFFTs, each generating 2N real-valued output samples. These are sent through a pair of PPNs which concatenates frames O times, applies shaping, and overlaps them with one another. For its implementation, see Fig. 9. Finally, the quadrature waveform is staggered by half a frame period, combined with the in-phase waveform, serialized, and transmitted through the DAC.

#### E. CHANNEL ESTIMATION AND TIMING RECOVERY

Channel estimation is the process of determining the required C[k] to correct linear distortion. Timing recovery deals with finding the ADC sampling frequency and phase that correctly positions the FFT window (coarse synchronization) and minimizes jitter (fine tracking) [19], [26]. Although the channel is assumed short-term stationary, impairments such as jitter and noise cause the overall link response to change





FIGURE 6. FBMC encoder block diagram using Poly-Phase Networks (PPN).

continuously. Furthermore, in FBMC, frame shaping reduces the tolerable sampling phase error compared to DMT. For these reasons, the ability to accurately track changes with sub-UI accuracy is critical to improve link performance.

A popular approach is to send channel estimation and timing recovery information by transmitting known symbols called pilot carriers. These can either be sent as a periodic training sequence once every few hundred frames, or by reserving a fraction of the bins to send them continuously [19], [27], [28], [29], [30]. However, the former approach is blind most of the time and results in poor tracking performance [31], whereas the latter gathers information continuously but introduces overhead, particularly for binlimited systems [30].

As an example, if we assume a 32-point DMT system and restrict pilot carriers to once every 256 frames, the overhead is negligible; however, from simulation, the tracking bandwidth is reduced by 18 times and no-longer meets the CEI-56G-LR-PAM4 requirement [32]. On the other hand, by reserving four bins to send them continuously, the standard is met, but a 27% overhead is introduced [19]. When including the overhead from three CP, the total overhead worsens to 33%. Furthermore, this does not include the penalty resulting from large side-lobes. Evidently, an alternative is needed to overcome this trade-off.

### **III. PROPOSED FBMC SYSTEM**

This section proposes using FBMC signalling in wireline applications to eliminate the CP and lower side-lobes, and proposes an accurate FBMC channel estimation and timing recovery system to eliminate pilot carriers.

The proposed receiver is depicted in Fig. 7a. Initially, the received signal y(t) is sampled by the ADC and fed to the FBMC decoder to produce symbols  $Y_i[k]$  where the in-phase and quadrature components remain separate. Next, the pair of N-1 symbols are sent to a set of N-1 adaptive equalizers, one per bin, to correct linear distortion, producing final decisions  $\Pi_i[k]$ , which are later converted to bits. Finally, rotation equalization  $\angle C_i[k]$  is sent to the timing recovery to estimate and correct sampling phase error. We now provide more details on these three components.

### A. SIMPLIFIED FBMC CODING

This work adopts PPN coding since it is the most efficient hardware alternative among the three schemes mentioned



FIGURE 7. Block diagram of proposed FBMC receiver with adaptive equalization and timing recovery.

earlier. However, we propose a modification that differentiates it from previous works.

As shown in Fig. 6, conventional FBMC PPN coding requires a pair of IFFTs (FFTs) to enable separate computation of the in-phase and quadrature waveforms [7], [13]. Although the symbols are complex-valued, the transmitted waveforms are real-valued. As a result, the imaginary component from each IFFT output (FFT input) is unused. This inefficiency can be exploited to process both signals simultaneously yet independently, using a single IFFT (FFT).

An IFFT generates a real-valued output when the input symbols are mirrored around the Nyquist bin, and complex conjugation is applied (Hermitian symmetry). To generate an imaginary-valued output, the input symbols are multiplied by *j*, then mirrored around the Nyquist bin where negative complex conjugation is applied. As such, given two streams of input symbols  $X_I$  and  $X_Q$ , by sending  $X_I + jX_Q$  in the lower N - 1 bins and  $X_I - jX_Q$  in the upper N - 1 bins, two independent signals are produced simultaneously using a single IFFT; the first is contained within the real component



FIGURE 8. Proposed FBMC encoder and decoder using a single IFFT/FFT.

of the output, and the second within the imaginary. Evidently, both outputs are orthogonal and can be separated for further processing. At the receiver, the reverse operation is applied. Here, the FFT receives two orthogonal signals. The first set of symbols is recovered by mirroring the upper N - 1 bins around Nyquist, applying complex conjugation, adding to the lower N - 1 bins, and scaling by 1/2. The second set of symbols is recovered by mirroring the upper N - 1 bins around Nyquist, applying negative complex conjugation, adding to the lower N - 1 bins, and scaling by -j/2. This has been expressed analytically in (8). Section IV analyses the IFFT (FFT) complexity, achieving a 29% reduction for a 32-point resolution.

$$[(Y_I + jY_Q) + (Y_I - jY_Q)] \cdot (1/2) = Y_I [(Y_I + jY_Q) - (Y_I - jY_Q)] \cdot (-j/2) = Y_Q$$
(8)

Fig. 8 depicts the proposed FBMC encoder and decoder. Input symbols  $X_i[k]$  are decomposed into in-phase and quadrature components. Then quadrature-phase rotation is applied. Next, instead of applying Hermitian symmetry, the proposed IFFT encoding is added before entering a single IFFT. The now complex-valued output is separated into real and imaginary streams and sent through a pair of PPNs. The block diagram of the transmit PPN is displayed in Fig. 9 assuming O = 4. Next, the quadrature path is staggered by half a frame period, combined with the in-phase path, serialized, and transmitted through the DAC. At the receiver, samples are de-serialized. A half frame period delay is applied to the in-phase path as both signals enter a pair of PPNs. The receiver PPN has the coefficients of each FIR filter set in the reversed order. The quadrature waveform is then combined with the in-phase waveform. After being sent through a single FFT, the proposed FFT decoding is applied. Finally, quadrature-phase rotation is removed forming the decoder outputs  $Y_{I,i}[k]$  and  $Y_{O,i}[k]$ . These complex-valued outputs contain deliberate interference and thus must remain separate until after equalization.



FIGURE 9. Poly-Phase Network (PPN) implementation for O = 4.

The proposed scheme does not introduce additional power, area, or latency. This is because the j, -j, 1/2, -j/2, conj, and *flip* operations can all be implemented trivially, and thus the proposed IFFT encoding and FFT decoding only require 2N-2 complex additions each. However, these adders can be eliminated by incorporating the operations within the quadrature phase rotation, which also requires 2N - 2 additions. As such, we remove an IFFT and an FFT from the link at no added cost.

## B. ADAPTIVE EQUALIZER

Each equalizer, shown in Fig. 7b takes concepts discussed in [19] and adapts them for FBMC signalling by applying four modifications, discussed shortly. The basic concept is as follows. Instead of using pilot carriers, the approach performs decision-directed channel estimation. It determines the channel response by directly observing the scaling and rotation error of data-filled constellations. This alleviates the need for pilot carriers and improves accuracy [19].

Each input symbol pair  $(Y_{l,i}[k], Y_{Q,i}[k])$  is multiplied by a complex coefficient  $C_i[k]$  determined during the previous frame period. Then tilt and stretch correction, discussed shortly, is applied to produce the equalized symbol  $\hat{X}_i[k]$ .  $\hat{X}_i[k]$  is then sliced to produce the final decision  $\Pi_i[k]$ , i.e., rounded to its nearest constellation point following (9) where *S* is the set of constellation points [19], [26].

$$\Pi_i[k] = \arg\min_{s \in S} |\hat{X}_i[k] - s| \tag{9}$$

The feedback path separates the magnitude and phase from symbols  $\hat{X}_i[k]$  and  $\Pi_i[k]$ . This is accomplished using two cartesian-to-polar (C2P) converters [33]. The two extracted magnitude values are sent to the gain controller and the two-phase values to the rotation controller. Both implement proportional-integral control. Following (10) and (11), the two control loops correct the error by adjusting  $C_i[k]$ . They eventually drive the gain and phase differences to 0, ensuring symbols land in the center of their decision boundaries. Finally, the resultant  $|C_{i+1}[k]|$  and  $\angle C_{i+1}[k]$  are converted back to cartesian form using a polar-to-cartesian (P2C) converter [34] to produce the next frame's equalization value  $C_{i+1}[k]$ .

$$M_{err,i}[k] = |\Pi_i[k]| - |\hat{X}_i[k]| \to 0$$
(10)

$$\phi_{err,i}[k] = \angle \Pi_i[k] - \angle \hat{X}_i[k] \to 0 \tag{11}$$

At startup, a training sequence synchronizes initial coefficients. Decisions  $\Pi_i[k]$  are overwritten to match the transmit values, allowing the equalizers to converge.

The proposed equalizer implements four features that differentiate it from previous works. First, the in-phase and quadrature symbols are equalized separately. This is a requirement of FBMC; however, it does not double the complexity since each multiplier is only required to output either its real or imaginary component. Second, whereas in DMT, sampling phase error causes all constellations to rotate in the same direction following (12), where  $\theta_{err}$  is the phase error normalized to the sampling period; in FBMC, odd bins rotate in the opposite direction. Similarly, rotation equalization causes these bins to revolve in the opposite direction. As such, to ensure convergence, loop coefficients are made negative for every other bin. With this, when subject to sampling phase error, all equalizer coefficients C[k] rotate in the same direction. Third, a tilt and stretch correction function is added; this is discussed next. And fourth, the gain controller implements adaptive loop coefficients to achieve constant amplitude-independent stability; this is discussed shortly.

$$Y[\theta_{err}, k] = e^{-j2\pi k \theta_{err}/2N} \cdot Y[k] = e^{j\phi_{err}[k]} \cdot Y[k]$$
(12)

Timing recovery targets the average group delay of bins. However, with practical channels, certain bins will arrive earlier than others. As such, it is common for bins to experience sampling phase error, even with ideal timing. In DMT, this causes constellations to rotate, which is easily corrected



FIGURE 10. a,b) Sampling phase error sweep showing magnitude, rotation, tilt, and stretch distortion of a 4-QAM constellation in bin 1. c) A 16-QAM constellation after applying single-tap equalization but before applying tilt and stretch compensation. d) The same constellation after applying tilt and stretch compensation.

using single-tap equalization. On the other hand, FBMC experiences additional forms of linear distortion. Fig. 10a,b depict the magnitude and phase response of a 4-QAM FBMC constellation in bin 1 as sampling phase error is swept across 2N = 32 samples. Whereas DMT would experience a flat magnitude response and a linearly descending phase response, FBMC does not. In fact, each symbol within the constellation experiences a different response. This behaviour results in constellation rotation, scaling, tilt and stretch. Tilt distorts constellations into a rhombus shape, whereas stretch distorts constellations into a rectangular shape. Fig. 10c, depicts a 16-QAM constellation following single-tap equalization but without tilt and stretch compensation. Although magnitude and rotation distortion is removed, tilt and stretch are still present. This causes certain symbols to land close to decision boundaries which degrades performance. The solution is to add cross-coupled equalization, as shown in Fig. 11.  $\psi[k]$  corrects tilt distortion, and  $\lambda[k]$  corrects stretch distortion. Both coefficients are set using lookup tables controlled by the current rotation equalization. Approximate arithmetic can be employed using programmable three-stage shift-and-add circuits to reduce complexity. Furthermore, it is sometimes possible to fix  $\lambda[k] = 1$  if the relative group



FIGURE 11. Cross-coupled equalization for tilt and stretch compensation.

delay between bins is small. Fig. 10d shows the constellation following tilt and stretch correction. Both are removed, with all symbols landing near the center of their decision boundary.

Gain control is a non-linear system. With a linear controller, outer symbols experience a lower phase margin than inner ones. As a result, the loop must be over-damped to avoid stability concerns, but this reduces the noise tracking bandwidth. Alternatively, [19] implements a linear gain controller in the logarithmic domain, effectively linearizing the system. However, applying LOG and EXP conversions requires additional delays to meet timing requirements, lowering the bandwidth. Instead, this work adaptively updates loop coefficients to maintain stability without introducing additional delay.

Constant stability is achieved by ensuring a constant sensitivity function:  $\partial M_{err}[k]/\partial |C[k]|$  [35]. Such a result is possible by continuously dividing the loop coefficients  $K_P$ and  $K_I$  by the input amplitude  $|Y_i[k]|$  as shown in (13).

$$K'_{P,i} = \frac{K_P}{|Y_i[k]|}, K'_{I,i} = \frac{K_I}{|Y_i[k]|}$$
(13)

However, in FBMC signalling, obtaining  $|Y_i[k]|$  is not trivial since  $Y_{I,i}[k]$  and  $Y_{Q,i}[k]$  contain deliberate interference and thus remain separate until after equalization. Instead, the input amplitude can be estimated by dividing  $|\Pi_i[k]|$  by  $|C_i[k]|$ . However, assuming the channel is short-term stationary,  $|C_i[k]|$  is approximately constant and can be omitted. As such, we obtain (14). The operation is implemented using two lookup tables as depicted in Fig. 12. Assuming the size of constellations does not exceed 256-QAM, each lookup table only requires 32 entries; thus, the complexity is minimal. The noise tracking performance of the proposed system is analyzed in Section IV, showing the same stability with 65% higher bandwidth when compared to [19].

$$|Y_i[k]| \approx \frac{|\Pi_i[k]|}{|C_i[k]|} \propto |\Pi_i[k]|.$$
(14)

#### C. TIMING RECOVERY

The timing recovery also adopts concepts discussed in [19] but applies one modification, discussed shortly. The basic concept is as follows. When the sampling phase deviates



FIGURE 12. Adaptive equalizer's gain control block diagram.

from its ideal position, constellations rotate according to (12) [19], [28]. As mentioned, a conventional approach uncovers this rotation by reserving bins to send known pilot carriers. However, this introduces overhead. Instead, our approach obtains timing information by monitoring the required rotation equalization of data-filled bins. This alternative alleviates the need for pilot carriers and improves accuracy [19].

Shown in Fig. 7c, the timing recovery block extracts N-1 rotation equalization values  $\angle C_i[k]$ . These values are subtracted by N-1 offsets  $\phi_{expect}[k]$  used to adjust the ideal sampling phase  $\theta_{ofs}$  following:  $\phi_{expect}[k] = 2\pi k \theta_{ofs}/2N$ . The error  $\phi_{err,i}[k]$  is sent through a linear regression block to estimate the sampling phase error  $\theta_{err,i}$ . The result is input into a proportional-integral controller, which controls a phase interpolator and adjusts the sampling phase of the ADC.

At startup, a training sequence is used to synchronize equalization. Then timing recovery enters synchronization mode, where it only observes the lowest bin. By rotating the phase interpolator, it drives  $\phi_{err}[1] \rightarrow 0$ . Doing so achieves coarse frequency and phase synchronization. At run-time, data is sent, and the timing recovery enters normal operation mode where it observes all bins. If phase drift is encountered, constellations try to rotate. However, the adaptive equalizers compensate by applying appropriate equalization, ensuring they stay square. By observing the rotation equalization across all bins, an accurate estimate of the timing offset is uncovered and corrected. This procedure has been verified in simulation.

The proposed modification concerns the PI controller. A first-order controller is unable to remove residual sampling phase error when frequency-offset is present [19]. Although this error can be removed by adjusting  $\phi_{expect}[k]$ , it requires a separate control circuit. Instead, this work implements a second-order transfer function by adding an additional second-order integration path. By keeping the coefficient small, the stability of the system is unaffected, yet the error is removed.



FIGURE 13. Top-level block diagram of DMT and FBMC models.

#### **IV. SIMULATION RESULTS**

This section compares the proposed FBMC system against conventional DMT signalling. We then analyze the reduction in complexity from simplified FBMC coding before concluding with the analysis of the proposed gain controller.

As shown in Fig. 13, a top-level model is created, whose coding can be set to either DMT or FBMC. Typical impairments are included, such as crosstalk from seven aggressor channels, a 7-bit 32GS/s DAC (ADC) with a 5.5-bit ENOB,  $1.5mV^2$  of input-referred noise from a Continuous-Time Linear Equalizer (CTLE), 0.01UI<sub>rms</sub> of random jitter, and 0.02UIP of Dual-Dirac jitter (set to meet the CEI 56Gb/s specifications for channel compliance testing using reference transmitter and receiver [32]). The victim and aggressor channels are publically available through the IEEE 802.3 Ethernet Working Group [36]. For both modulations, a CTLE is used to partially equalize the received signal and help reduce DMT's CP lengths to 0UI, 1UI, 2UI, 2UI, and 3UI for the 0dB, 12dB, 22dB, 35dB, and 42dB channels, respectively. This optimization ensures the CP encompasses the majority of the pulse response without adding unnecessary overhead. Note that a CTLE is not required but improves the link's capacity. Furthermore, 2, 4 and 6 pilot carriers are employed for 16, 32, and 64-point DMT coding to enable channel estimation and timing recovery. This amount ensures sufficient averaging without introducing unnecessary overhead and is based on the analysis performed in [19]. The proposed FBMC system does not require CP or bin reservation. Bit and power-loading are optimized for a Bit-Error-Rate (BER) of  $10^{-4}$ , a typical pre-FEC target.

Results are displayed in Fig. 14. Each column depicts one of the four comparison metrics, including communication link capacity, power consumption efficiency, silicon area efficiency, and signal processing latency. In contrast, the rows depict the swept simulation parameters, including channel attenuation at Nyquist, IFFT (FFT) resolution, and frame overlap factor. For each experiment, when not swept, the channel attenuation is fixed at 35*dB*, the IFFT (FFT) resolution at 32, and the overlap at 3. We now analyze these results.

#### A. CAPACITY

Column *a* in Fig. 14 highlights the relative improvement in capacity with respect to a conventional DMT system. Noticeably, the proposed FBMC system outperforms in all scenarios. While sweeping channel attenuation, we observe a maximum of 2.01 with a 35dB channel. The trend is expected to increase with higher attenuation channels since this lengthens the CP and adds overhead. Apart from an anomaly, our results observe this trend. While sweeping FFT resolution, we observe a maximum of 2.01 at a resolution of 32. The trend is expected to decrease with resolution since CP and bin reservation overhead are most pronounced with short frame lengths. This trend can be observed from the results. Finally, while sweeping the overlap factor, results show a maximum of 2.08 for O = 4. The trend is expected to increase with larger overlap since increasing the filter length further attenuates side-lobes, reducing interference among symbols [20], [21]. This trend is also noticeable in the results. Note that with an overlap of 1, we are comparing two DMT systems, one of which employs the proposed pilot-carrier-less channel estimation and timing recovery; this matches results from [19]. From analysis, FBMC modulation improves capacity by 35%. The rest comes from eliminating the overhead.

### **B. POWER EFFICIENCY**

Column *b* in Fig. 14 highlights the relative improvement in power efficiency with respect to a conventional DMT system, in terms of bit per unit energy. Power estimates are based on post-layout results from a synthesized 16nm FinFET DMT design. Results from this design are scaled based on a first-order approximation:  $P = \mu CV^2 f$ . This considers the transition activity percentage  $\mu$ , relative number of gates *C*, supply voltage *V*, and clock frequency *f*. We assume the DAC and ADC power is constant no matter the design. Furthermore, although conventional DMT coding does not employ the proposed pilot-carrier-less channel estimation or timing recovery, it too requires similar mechanisms. Therefore, we assume the same power consumption for these similar functions.

As shown, the power efficiency is considerably improved in FBMC. The trend follows the increase in capacity with a maximum improvement of 2.00 experienced with a 35dB channel, 2.00 with a 32-point IFFT (FFT), and 2.02 with an overlap of 4. This result is expected since the power consumption of the proposed FBMC system closely matches that of DMT but at almost twice the performance. Two factors are responsible for this. First, simplified FBMC coding reduces complexity, as will be analyzed shortly. And second, FBMC serialization and deserialization are vastly more straightforward than in DMT. Let us assume a parallelized DAC and ADC architecture is adopted, which inherently serializes and deserializes data to and from a bus of width 32. In a DMT system, each frame requires gearboxing to a width of  $32 + N_{CP}$ . This operation requires a complex FIFO circular buffer since the gearing ratio is not



FIGURE 14. Comparison results showing the relative improvement of the proposed FBMC system with respect to a conventional DMT system. When not swept, channel attenuation is fixed at 35*dB*, IFFT (FFT) resolution is set to 32, and overlap is set to 3.

a power of two. Post-layout results reveal that this module consumes more than 18% of the receiver's DSP power budget. In contrast, although FBMC frames are also stretched, they are overlapped such that the effective frame lengths remain unchanged, eliminating the need for gear-boxing and enabling a more straightforward and less power-hungry design.

# C. AREA EFFICIENCY

Column c in Fig. 14 highlights the relative improvement in area efficiency with respect to a conventional DMT system, in terms of bit per unit area. Silicon area estimates are also based on the same 16nm synthesized design. In a similar manner, we assume the analog area is unchanged and scale the DSP area based on the relative number of gates.

Much like the previous metric, results closely follow the capacity and power efficiency trend with a maximum improvement of 2.01 with a 35dB channel, 2.01 with a 32-point IFFT (FFT), and 2.02 with an overlap of 4. This is because the proposed FBMC area is comparable to that of DMT but at almost twice the performance.

#### D. LATENCY

Column d in Fig. 14 highlights the relative change in system latency with respect to a conventional DMT system. Although transitioning from DMT to FBMC provides many benefits, it can experience a slight increase in latency. Results from this analysis are derived from the same 16nm design, which adopts a pipeline implementation set to meet post-layout timing requirements. As shown, while the relative

processing latency is unaffected by channel attenuation and only slightly by IFFT (FFT) resolution, its increase with larger overlap factors is more pronounced. These results are expected as the decoder must wait to receive the whole frame before processing. With a first-order approximation, latency is directly proportional to the frame length. As the length of an FBMC frame is often multiple times longer than in DMT, it will experience more latency. However, the effect is less pronounced than one might anticipate. Results show a relative increase of only 13% (from 10.6ns to 12.0ns) even with a triple-length frame. The reason is that the coding of signals is not instantaneous. Practical implementations often take multiple clock cycles to complete. As a result, the increase in frame length contributes only a small percentage to the overall processing latency. Furthermore, simplifications made using the proposed FBMC coding and removing the gearbox reduce the computation, contributing to less delay.

Note that this analysis depicts the relative change in latency between DMT and FBMC. On the other hand, if we observe the absolute change in latency between FBMC scenarios, it will increase proportionally with coding resolution. For example, a 16-point FBMC system experiences a latency of 6.0ns whereas a 64-point version experiences 24.0ns.

In general, transitioning from DMT to FBMC results in improved capacity, power efficiency, and area efficiency at the cost of a slight increase in latency. This being said, there are scenarios where transitioning from conventional DMT to the proposed FBMC improves all aspects with no compromise. One such example occurs with a 42dB channel

CAS IEEE Open Journal of CAS Circuits and Systems

10



FIGURE 15. IFFT complexity comparison between the proposed approach using a single IFFT versus the conventional approach using two redundant IFFTs.

when transitioning from 64-point DMT to 16-point FBMC with O = 2. Our simulations uncover a 30% improvement in capacity, 6% improvement in power efficiency, 29% improvement in area efficiency, and 3.8 times less latency. This observation coincides with results from [20] and highlights the considerable performance benefits found in FBMC signalling.

Furthermore, our analysis does not consider the complexities associated with clocking. As mentioned, a DMT frame is rarely a power of two in length. Assuming the DAC and ADC are implemented using a power of two number of slices, the outputted DSP clock will require a nontrivial division from fs/32 to  $fs/(32 + N_{CP})$ . This adds additional complexity and could further motivate the transition to FBMC signalling.

# E. SIMPLIFIED FBMC CODING

We now analyze the reduction in complexity thanks to the proposed FBMC coding, where only a single IFFT (FFT) is needed to encode both the real and imaginary waveforms. This analysis assumes multiplication dominates complexity, and addition is comparatively negligible.

Fig. 15 compares the number of real-valued multipliers required for the proposed single IFFT versus the conventional double IFFTs. Results show an improvement which is inversely proportional to resolution. For example, with a 32-point IFFT, the proposed approach reduces complexity by 29%. This improvement is also experienced in the receiver and is invariant of the overlap factor.

To explain these results, a conventional pipelined IFFT (FFT) design requires approximately *Nlog*<sub>2</sub>(2*N*) complex multipliers [37], where each complex multiplier requires three real-valued multipliers. However, assuming Hermitian symmetric symbols and thus a real-valued signal, redundant calculations can be removed; we denote this as a redundant IFFT (FFT) [37]. As a side note, the reference DMT system adopts this simplification. Nevertheless, whereas conventional FBMC coding requires a pair of redundant IFFTs (FFTs), the proposed approach requires only a single conventional IFFT (FFT). As shown, this reduces complexity, ensuring FBMC systems remain comparable to their DMT counterpart.



Noise Tracking

FIGURE 16. Noise tracking performance comparison between the proposed approach using adaptive loop coefficients and the conventional approach adopting LOG and EXP conversion.

F. GAIN CONTROL WITH ADAPTIVE LOOP COEFFICIENT This section compares the noise tracking performance of the conventional gain control system found in [19] to the proposed one. As shown in Fig. 16, the Noise Transfer (NTRAN) and Noise Tracking (NTRACK) functions are overlaid for both designs. NTRAN describes the amount of noise recovered by the system, whereas NTRACK describes the amount of noise remaining at the output. For this analysis, we assume a 1GHz DSP clock. The loop gain in both systems is equated to achieve the same amount of NTRACK peaking. From simulation results, the 3dB point of the proposed NTRACK curve, or the point at which noise is no longer tracked, is increased by 1.65 times from 2.3MHz to 3.8MHz. This is thanks to the reduction in loop latency from 8 clock periods to 5. The improvement is constant no matter the coding resolution or overlap factor. As such, this approach outperforms the alternative while also reducing complexity.

### V. CONCLUSION

10

This paper proposes an efficient multi-carrier system that combines filter-bank multi-carrier signalling, decisiondirected channel estimation, and frequency-domain timing recovery to eliminate the overhead associated with cyclic prefix, large side-lobes, and pilot carriers. Furthermore, a technique is proposed to halve the required number of FFTs (IFFTs), reducing their complexity by 29% for a 32point resolution; a method is proposed to correct tilt and stretch distortion; and a gain controller with adaptive loop coefficients is adopted to achieve the same stability but 65% higher tracking bandwidth regardless of the FFT size. A top-level model is created to enable an in-depth comparison between the proposed solution and a conventional discrete multi-tone system in terms of communication link capacity, power consumption efficiency, silicon area efficiency, and signal processing latency. Assuming a 32-point FFT, a 35dB channel, and an overlap factor of 3, results show 101% improvement in capacity, 100% improvement in power efficiency, and 101% improvement in area efficiency, and all while maintaining comparable latency. In general, we have demonstrated that transitioning from conventional DMT signalling to the proposed FBMC system improves the data rate without necessarily compromising power, area, or latency. The proposed system enables very low resolution multi-carrier schemes, which were previously impractical due to the detrimental overhead.

#### ACKNOWLEDGMENT

The authors thank the anonymous reviewers for their valuable feedback on the first draft of this paper. They also thank Huawei Canada for their expertise and assistance throughout the course of this research. Access to CAD tools was provided by CMC Microsystems.

#### REFERENCES

- Y. Seual et al., "A 1.41pJ/b 224Gb/s PAM-4 SerDes receiver with 31dB loss compensation," in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), vol. 65, 2022, pp. 114–116.
- [2] C. E. Shannon, "A mathematical theory of communication," *Bell Syst. Tech. J.*, vol. 27, pp. 1–55, Jul.–Oct. 1948.
- [3] G. Kim et al., "30.2 a 161mW 56Gb/s ADC-based discrete multitone wireline receiver data-path in 14nm FinFET," in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2019, pp. 476–478.
- [4] B. Vatankhahghadim, N. Wary, and A. C. Carusone, "Discrete multitone signalling for wireline communication," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, 2020, pp. 1–5.
- [5] J. Salinas, J. Cosson-Martin, M. Laghaei, H. Shakiba, and A. Sheikholeslami, "Performance comparison of baseband signaling and discrete multi-tone for wireline communication," *IEEE Open J. Circuits Syst.*, vol. 2, pp. 65–77, Jan. 2021. [Online]. Available: https:// ieeexplore-ieee-org.myaccess.library.utoronto.ca/document/9318036
- [6] N. Éiselt *et al.*, "Performance comparison of 112-gb/s DMT, Nyquist PAM4, and partial-response PAM4 for future 5G Ethernetbased Fronthaul architecture," *J. Lightw. Technol.*, vol. 36, no. 10, pp. 1807–1814, May 15, 2018.
- [7] A. Sahin, I. Guvenc, and H. Arslan, "A survey on multicarrier communications: Prototype filters, lattice structures, and implementation aspects," *IEEE Commun. Surveys Tuts.*, vol. 16, no. 3, pp. 1312–1338, 3rd Quart., 2014.
- [8] B. Farhang-Boroujeny, "OFDM versus filter bank multicarrier," *IEEE Signal Process. Mag.*, vol. 28, no. 3, pp. 92–112, May 2011.
- [9] G. Kim, W. Kwon, T. Toifl, Y. Leblebici, and H. Bae, "Design considerations and performance trade-offs for 56Gb/s discrete multi-tone electrical link," in *Proc. IEEE 62nd Int. Midwest Symp. Circuits Syst.* (*MWSCAS*), 2019, pp. 1147–1150.
- [10] J. Armstrong, "OFDM for optical communications," J. Lightw. Technol., vol. 27, no. 3, pp. 189–204, Feb. 1, 2009.
- [11] R. Nguyen, "8.6 A highly reconfigurable 40-97GS/s DAC and ADC with 40GHz AFE bandwidth and sub-35fJ/conv-step for 400Gb/s coherent optical applications in 7nm FinFET," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC)*, 2021, pp. 136–138.
- [12] K. Mizutani, T. Matsumura, and H. Harada, "A comprehensive study of universal time-domain windowed OFDM-based LTE downlink system," in *Proc. 20th Int. Symp. Wireless Pers. Multimedia Commun.* (WPMC), 2017, pp. 28–34.
- [13] J. Nadal, C. A. Nour, and A. Baghdadi, "Design and evaluation of a novel short prototype filter for FBMC/OQAM modulation," *IEEE Access*, vol. 6, pp. 19610–19625, 2018.
- [14] J. Anderson and A. Svensson, Coded Modulation Systems. New York, NY, USA: Kluwer Acad., 2003.
- [15] D. K. Kim, S. H. Do, H. B. Cho, H. J. Chol, and K. B. Kim, "A new joint algorithm of symbol timing recovery and sampling clock adjustment for OFDM systems," *IEEE Trans. Consum. Electron.*, vol. 44, no. 3, pp. 1142–1149, Aug. 1998.
- [16] J. Cioffi. "Fundamentals of Synchronization." 2019. [Online]. Available: https://cioffi-group.stanford.edu/doc/book/chap6.pdf
- [17] I. Sudeep Bhoja. "FEC Codes for 400Gbps 802.3bs." [Online]. Available: https://www.ieee802.org/3/bs/public/14\_11/parthasarathy\_ 3bs\_01a\_1114.pdf (Accessed: Jul. 2022).
- [18] B. Ai, Z.-X. Yang, C.-Y. Pan, J.-H. Ge, Y. Wang, and Z. Lu, "On the synchronization techniques for wireless OFDM systems," *IEEE Trans. Broadcast.*, vol. 52, no. 2, pp. 236–244, Jun. 2006.

- [19] J. Cosson-Martin, H. Shakiba, and A. Sheikholeslami, "Timing recovery and adaptive equalization for discrete multi-tone signalling in wireline applications," *IEEE Open J. Circuits Syst.*, vol. 2, pp. 856–868, Nov. 2021. [Online]. Available: https://ieeexplore-ieee-org.myaccess. library.utoronto.ca/document/9625630
- [20] K. W. Martin, "Small side-lobe filter design for multitone datacommunication applications," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 45, no. 8, pp. 1155–1161, Aug. 1998.
   [21] S. Mirabbasi and K. Martin, "Design of prototype filter for near-
- [21] S. Mirabbasi and K. Martin, "Design of prototype filter for nearperfect-reconstruction overlapped complex-modulated transmultiplexers," in *Proc. IEEE Int. Symp. Circuits Syst.*, vol. 1, 2002, p. 1.
- [22] B. Saltzberg, "Performance of an efficient parallel data transmission system," *IEEE Trans. Commun. Technol.*, vol. 15, no. 6, pp. 805–811, Dec. 1967.
- [23] Y. Dandach and P. Siohan, "FBMC/OQAM modulators with half complexity," in Proc. IEEE Global Telecommun. Conf., 2011, pp. 1–5.
- [24] S. Mirabbasi and K. Martin, "Overlapped complex-modulated transmultiplexer filters with simplified design and superior stopbands," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 50, no. 8, pp. 456–469, Aug. 2003.
- [25] M. Bellanger, "FS-FBMC: An alternative scheme for filter bank based multicarrier transmission," in *Proc. 5th Int. Symp. Commun. Control Signal Process.*, 2012, pp. 1–4.
- [26] J.-H. Kim and W.-W. Kim, "Frequency domain-DFE coupled with common phase error tracking loop in OFDM systems," in *Proc. IEEE* 61st Veh. Technol. Conf., vol. 2, 2005, pp. 1248–1252.
- [27] M. Engels, Wireless OFDM Systems, How to Make Them Work? Norwell, MA, USA: Kluwer Acad., 2002.
- [28] D. Chang, "Analysis and compensation of channel correction in pilotaided OFDM systems with symbol timing offset," in *Proc. IEEE Int. Conf. Electro/Inf. Technol.*, 2006, pp. 324–329.
- [29] M. Carta, V. Lottici, and R. Reggiannini, "Frequency recovery for filter-bank multicarrier transmission on doubly-selective fading channels," in *Proc. IEEE Int. Conf. Commun.*, 2007, pp. 5212–5217.
- [30] G. Dainelli, V. Lottici, M. Moretti, and R. Reggiannini, "Efficient carrier recovery for FBMC systems over time-frequency selective channels," in *Proc. 8th Int. Workshop Multi-Carrier Syst. Solut.*, 2011, pp. 1–5.
- [31] M. Li and W. Zhang, "A novel method of carrier frequency offset estimation for OFDM systems," *IEEE Trans. Consum. Electron.*, vol. 49, no. 4, pp. 965–972, Nov. 2003.
- [32] "Common Electrical I/O (CEI)—Electrical and Jitter Interoperability Agreements for 6G+ bps, 11G+ bps, 25G+ bps I/O and 56G+ bps." 2017. [Online]. Available: https://www.oiforum.com/wp-content/ uploads/2019/01/OIF-CEI-04.0.pdf
- [33] S. Arar. "An Introduction to the Cordic Algorith." 2017. AllAboutCircuits.com. [Online]. Available: https://www.allabout circuits.com/technical-articles/an-introduction-to-the-cordicalgorithm/
- [34] "Cordic Part Two: Rectangular to Polar Conversion." Gisselquist Technology. 2017. [Online]. Available: https://zipcpu. com/dsp/2017/09/01/topolar.html
- [35] K. W. Martin and M. T. Sun, "Adaptive filters suitable for realtime spectral analysis," *IEEE J. Solid-State Circuits*, vol. 21, no. 1, pp. 108–119, Feb. 1986.
- [36] "IEEE P802.3ck Task Force—Tools and Channels." 2018. [Online]. Available: https://www.ieee802.org/3/ck/public/tools/
- [37] N. Ganesamoorthy, S. Deivasigamani, and K. Balasubadra, "An efficient multi-path delay commutator architecture," *Int. J. Comput. Appl.*, vol. 98, pp. 21–23, Jul. 2014.



JEREMY COSSON-MARTIN (Graduate Student Member, IEEE) received the B.A.Sc. degree in Electrical Engineering from Queen's University, Kingston, ON, Canada, in the spring of 2018. In the summer of 2018, he joined Huawei Canada, Toronto, ON, as an intern, where he was involved in creating inlab measurement scripts for a prototype 56Gbps SerDes integrated chip. He also received experience in 7nm FinFET layout. In the fall of 2018, he began an M.A.Sc. program at the University of Toronto, ON, Canada, under

the supervision of Professor Ali Sheikholeslami. In 2019, he transferred into a Ph.D. program. Currently, he is researching multi-tone schemes for ultra-high-speed wireline applications.





**HOSSEIN SHAKIBA** (Senior Member, IEEE) received his B.Sc. and M.Sc. degrees in Electrical Engineering from the Department of Electrical and Computer Engineering at the Isfahan University of Technology, Iran, in 1985 and 1989, respectively, and his Ph.D. degree in Electrical Engineering from the Department of Electrical and Computer Engineering at the University of Toronto, Canada, in 1997. He has over 35 years of teaching, research, design, and management experience in the area of analog circuit and system design for

various applications with focus on wireline communication in both the industry and academia. He is currently working on system and circuit development for next generation serial links at Huawei Canada in collaboration with the wireline industry with emphasis on link design, modeling, and analysis including statistical and signal integrity. He is also actively involved in conducting research with various universities and co-supervises several graduate students.



ALI SHEIKHOLESLAMI (Senior Member, IEEE) received the B.Sc. degree from Shiraz University, Iran, in 1990 and the M.A.Sc. and Ph.D. degrees from the University of Toronto, Canada, in 1994 and 1999, respectively, all in electrical engineering.

In 1999, he joined the Department of Electrical and Computer Engineering at the University of Toronto, where he is currently Professor. He was on research sabbatical with Fujitsu Labs in 2005– 2006, and with Analog Devices, Toronto, ON,

Canada, in 2012–2013. His research interests are in analog and digital integrated circuits, high-speed signaling, and CMOS annealing. He has co-authored over 70 journal and conference papers, 10 patents, and a graduate-level textbook entitled "Understanding Jitter and Phase Noise".

Dr. Sheikholeslami served on the Memory, Technology Directions, and Wireline Subcommittees of the ISSCC in 2001–2004, 2002–2005, and 2007–2013, respectively. He was an SSCS Distinguished Lecturer in 2018–2019. He currently serves as the Education Chair for ISSCC and the vice president, Education, for SSCS. He is an Associate Editor for the *Solid-State Circuits Magazine*, in which he has a regular column entitled "Circuit Intuitions". He was an Associate Editor for the IEEE TCAS-I for 2010–2012, and the program chair for the 2004 IEEE ISMVL.

Dr. Sheikholeslami has received numerous teaching awards, including the 2005–2006 Early Career Teaching Award and the 2010 Faculty Teaching Award, both from the Faculty of Applied Science and Engineering at the University of Toronto. He is a registered professional engineer in Ontario, Canada.