# Low-Complexity Real-Time Receiver for Coherent Nyquist-FDM Signals

Benedikt Baeuerle<sup>(D)</sup>, Arne Josten<sup>(D)</sup>, Marco Eppenberger<sup>(D)</sup>, David Hillerkuss, and Juerg Leuthold<sup>(D)</sup>, Fellow, IEEE

Abstract—We propose and demonstrate a new low-complexity hardware architecture and digital signal processing (DSP) implementation for coherent reception of Nyquist frequency division multiplexed (Nyquist-FDM, digital subcarrier multiplexing) signals in real time. Key to achieve lowest complexity is the combination of an optimized frequency domain and time domain processing block. In the frequency domain processing, we combine subcarrier equalization and timing recovery with a noninteger oversampling ratio of 16/15. In the time domain, we take advantage of polar coordinate processing for the carrier recovery to avoid complex multiplications. The receiver is optimized for flexible operation and allows the adaption of filter coefficients and modulation format between 4QAM, hybrid 4/16QAM, and 16QAM within one clock cycle. The efficiency of the DSP is demonstrated by a real-time coherent receiver implementation on a single FPGA and is experimentally evaluated. Despite of the limited hardware resources, the receiver can detect a 30 GBd Nyquist-FDM signal with four subcarriers and a net data rate of 60 Gb/s (40AM), 90 Gb/s (4/160AM), or 120 Gb/s (16QAM) sampled with 32 GSa/s and demodulate one of the subcarriers at a time. Transmission of 300 km through standard single mode fiber is demonstrated with a BER below the soft-decision forward error correction limit.

*Index Terms*—Coherent communication, coherent detection, digital signal processing, low power, optical coherent transceiver, optical fiber communication.

# I. INTRODUCTION

**E** FFICIENT real-time receivers and their low complexity implementation are key towards large-scale deployment of coherent technologies in next generation compact optical networks. Coherent optical detection [1], [2] enables the implementation of high capacity communication systems by exploiting information encoding on all four dimensions of the optical field (quadrature and polarization). Today, coherent communication is the work horse in optical long-haul transmission [3].

Manuscript received June 25, 2018; revised September 6, 2018 and October 18, 2018; accepted October 18, 2018. Date of publication October 24, 2018; date of current version November 28, 2018. This work was supported in part by the European Commission under FP7 Program, project FOX-C under Grant 318415, in part by the ERC Grant PLASILOR under Grant 670478, in part by the Xilinx University Program, and in part by Sterlite Technologies Limited. (*Corresponding author: Benedikt Baeuerle.*)

B. Baeuerle, A. Josten, M. Eppenberger, and J. Leuthold are with the ETH Zurich, Institute of Electromagnetic Fields, Zurich 8092, Switzerland (e-mail: bbaeuerle@ethz.ch; ajosten@ethz.ch; marco.eppenberger@ief.ee.ethz.ch; juerg.leuthold@ief.ee.ethz.ch).

D. Hillerkuss was with the ETH Zurich, Institute of Electromagnetic Fields, Zurich 8092, Switzerland. He is now with the Huawei Technologies Duesseldorf GmbH, Optical and Quantum Laboratory, Munich Office, German Research Center, München 80992, Germany (e-mail: david.hillerkuss@huawei.com).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JLT.2018.2877479

Recently, coherent technology is also moving towards its application in metro [4], data center interconnects [5], [6], and access scenarios [7]–[9]. One of the critical aspects to enable large scale deployment of these applications is power dissipation [10], [11]. The careful digital signal processing (DSP) design, aiming towards low hardware complexity, is one of the keys to achieve low cost, low power consumption, and small form factors [4], [12], [13].

One of the promising candidates as modulation format in future coherent optical communication systems is Nyquistfrequency division multiplexing (Nyquist-FDM) also known as digital subcarrier multiplexing (SCM) [14]-[16]. Mainly for the reason that it enables mitigation of non-linearity in optical transport systems [17], [18]. Nyquist-FDM is a digital subcarrier multiplexing scheme, which splits a high symbol rate single carrier signal into several lower symbol rate subcarriers with non-overlapping spectra. The granularity of the subcarriers allows to control the subcarrier symbol duration while keeping a constant overall bandwidth. This ability of optimizing the symbol duration allows to adapt the signal for non-linearity tolerance and can therefore be used to extend the reach of optical transport systems [19]. Furthermore, Nyquist-FDM allows the flexible adaption of the bandwidth [20] and modulation format [21] of each individual subcarrier. On the one hand, this allows for a flexible design of the spectral efficiency and therefore a finer granularity in transmission reach. On the other hand, this increases the tolerance towards optical filtering in reconfigurable add-drop multiplexer systems [22]. Another advantage is the simplification of the digital chromatic dispersion compensation filter since the filter length scales quadratic with the symbol duration [21]. This enables a more efficient hardware implementation. A disadvantage of Nyquist-FDM scheme is its larger sensitivity concerning transmitter impairments like IOskew, which can be solved by precise calibration or by implementing additional calibration algorithms [23], [24].

The main challenge on the way towards a commercialization of Nyquist-FDM in coherent optical communication systems beyond 100G, is the complexity of a Nyquist-FDM real-time hardware implementation. To test the various DSP algorithms for their efficiency, FPGA prototyping is a suitable platform. And indeed, FPGA based programmable real-time receivers have been shown in several demonstrations for OFDM and single carrier signals. Real-time reception of 100 Gbit/s coherent OFDM signals was demonstrated [25] as well as a first field demonstration [26]. In case of OFDM also a new efficient real-time processing architecture for beyond 100G has been

This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/



Fig. 1. Hardware architecture. Two analog-to-digital converters (ADCs) are connected to a field programmable gate array (FPGA) board via multi gigabit transceivers (MGT). The digital signal processing comprises frequency domain processing and time domain processing. The frequency domain processing comprises a 256-points discrete Fourier transform (DFT), Nyquist-FDM (NFDM) demultiplexing, static equalizer, timing recovery, and down-sampling. The time domain processing comprises carrier recovery, symbol decision, and bit-to-error ratio tester (BERT). Block wise processing and continuous convoluting is ensured by the overlap and save method.

presented [27]. Coherent single carrier real-time reception has also been shown for 100 Gbit/s with 4QAM [28] and 16-APSK/QAM [29]. The question now is, if the complexity of Nyquist-FDM processing can be controlled to the extent that real-time processing can be implemented.

In this paper, we propose and demonstrate a new efficient hardware architecture for coherent reception with 32 GSa/s and a non-integer oversampling of 16/15. It enables the implementation and real-time demonstration of a Nyquist-FDM receiver on a single FPGA. The DSP chain includes static equalization, timing recovery, carrier frequency and phase recovery, hard symbol decision and bit-to-error ratio counting. Multi-format reception allows for dynamic switching between 4QAM, hybrid 4/16QAM and 16QAM. In an experiment we show the reception of a 30 GBd Nyquist-FDM frequency band with four subcarriers, which is sampled with only 32 GSa/s. The single-polarization receiver detects an individual subcarrier of the Nyquist-FDM signal with 4QAM (60 Gbit/s line rate), hybrid 4/16QAM (90 Gbit/s line rate), and 16QAM (120 Gbit/s line rate). Also, the chromatic dispersion for 300 km transmission is compensated in real-time in the FPGA. This work is in part based on earlier work related to the first FPGA-based coherent multi-format real-time NFDM receiver [30], [31] and supplies a detailed insight into our DSP architecture.

#### II. HARDWARE ARCHITECTURE

# A. Digital Signal Processing

In this section we introduce the efficient digital signal processing (DSP) architecture. The overview of the hardware design with its individual processing steps is depicted in Fig. 1. It comprises the analog to digital converters (ADC) and the field programmable gate array (FPGA) board which are connected via multi gigabit transceivers (MGT). The DSP is implemented on the FPGA for real-time processing. It can be distinguished between two main blocks, which are the frequency domain processing and the time domain processing. The frequency domain processing where filtering operation and timing recovery take place is crucial to achieve the non-integer oversampling. In time domain we exploit polar coordinate processing which enables the implementation of our carrier recovery with limited hardware resources [32].

Key to achieve the low hardware-complexity and highthroughput digital signal processing is a scheme that utilizes an oversampling factor  $\eta$  close to one. In our case the non-integer oversampling  $\eta = k/l$  with k and l integers is  $\eta = 16/15$ . This allows a sampling rate which is much smaller than required for the standard oversampling of  $\eta = 2$  [13]. The reduced data throughput directly translates into a significantly reduced hardware complexity and improved efficiency. Furthermore, the minimal oversampling eases the requirements on the high-speed ADCs and conversely maximizes the receivable symbol rate R for a given limited sampling rate  $f_s$ .

In our example we work with a system that operates with a sampling rate of  $f_{\rm S} = 32$  GSa/s. This allows us to operate the system at a symbol rate R = 30 GBd. When compared to a system that requires an oversampling of  $\eta = 2$  the achievable signal bandwidth and possibly data rate is increased by 87.5%. In case of a Nyquist-FDM signal with four subcarriers, four individual subcarriers with a symbol rate  $R_{\rm SC}$  of 7.5 GBd can be allocated.

In the following we explain in details the different DSP blocks. We distinguish between frequency domain processing and time domain processing. Frequency domain processing comprises a discrete Fourier transform (DFT), Nyquist-FDM demultiplexing, static equalization, timing recovery, and down-sampling. Time domain processing comprises an inverse discrete Fourier transform (IDFT), carrier recovery, symbol decision, and BER testing.

1) Frequency Domain Processing: The received time domain signal, sampled from the ADCs, is in a first step transformed to the frequency domain. For block-wise processing and to ensure a continuous convolution in frequency domain we use the overlap-and-save method [33], [34] with an overlap of 128 samples. The overlap module is build out of a shift register which stores the data blocks of three consecutive clock cycles. This allows adding of overlap samples on the edges of the processed block. Afterwards, there is a multiplexer to select which part of the samples in the shift register is processed further. This allows coarse timing corrections, applied from a feedback of the timing synchronization module in the frequency domain. To transform the 128 samples and the additional 128 overlap samples to the frequency domain, we use a 256-point fast Fourier transform (FFT) [35], [36].

Nyquist-FDM demultiplexing in the frequency domain can be implemented efficiently by discarding three-fourths of the samples, which results in decimation by a factor of four. In our implementation, only one out of four Nyquist-FDM channels is processed at a time because of limited hardware resources on the FPGA chip. The Nyquist-FDM sub channel is then shifted to the baseband by rearranging the numbering. Subsequently, we continue with 64 samples, which represent one single Nyquist-FDM channel with 7.5 GBd and an oversampling ratio of 16/15.

To ensure processing with low complexity the following DSP steps, namely static equalization, timing recovery, and down-sampling, work on polar coordinate samples. This allows to avoid complex multiplications. The coordinate transformation from Cartesian to polar coordinates is achieved by the CORDIC algorithm [37].

The static equalizer acts as a combined filter for ADC frequency response compensation, matched filtering, and chromatic dispersion compensation. The filter coefficients can be changed dynamically during operation. The overlap and save method [33], [34] relates the P-1 overlap samples directly to the maximal achievable length P of the filter's finite impulse response (FIR). The amount of frequency domain samples is defined as N = L + P - 1 where L is the effective number of processed samples. In our design we utilize N = 64, L = 32, and P = 33 per subcarrier.

Fig. 2 shows examples of the three different frequency responses in polar coordinates, i.e., in magnitude and phase, which are compensated during the real-time processing. Fig. 2(a) is the frequency response of the ADC for all four subcarriers. Depending on which subcarrier is demodulated the inverse of the specific part of the spectrum is used to compensate for the frequency dependent characteristics of the ADC. Fig. 2(b) shows the applied matched filter for one specific subcarrier in the base band. It has a square-root-raised-cosine shape with a roll-off factor of 1/15. Fig. 2(c) shows the dispersion compensation filter for a specific subcarrier in the baseband. The phase response has a quadratic dependence on the frequency and the amplitude resembles an all-pass. It is an example of a filter which compensates for the accumulated dispersion after a propagation length z of 300 km through standard single mode fiber (SSMF) with a dispersion coefficient  $D_{\rm C}$  of  $17 \frac{\rm ps}{\rm nm\cdot km}$ . The frequency domain representation of the dispersion compensation filter is mathematically described [38] as follows:

$$\mathbf{H}\left(f_{\mathrm{s}}, z\right) = \exp\left(\mathrm{j}\frac{D_{\mathrm{c}}\lambda_{0}\pi}{c_{0}}f_{s}^{2}z\right),\tag{1}$$

where  $\lambda_0$  describes the optical carrier,  $c_0$  is the speed of light and  $f_s$  is the sampling frequency or the frequency band of the



Fig. 2. Frequency responses of magnitude (blue) and phase (red). (a) Frequency response for magnitude (blue) and phase (red) of the ADC electronic circuit. (b) Matched filter for a square-root-raised-cosine shaped signal with a roll-off factor of 1/15 and a symbol rate of 7.5 GBd. (c) Compensation filter of chromatic dispersion over 300 km SSMF for a signal with a symbol rate of 7.5 GBd.

considered signal. The required amount of filter taps  $P_{\rm C}$  of the finite impulse response (FIR) of such a chromatic dispersion compensation filter [38] is described by

$$P_{\rm C} = 2 \times \left(\frac{|D_C| \lambda_0 z}{2c_0} f_s^2\right) + 1 \tag{2}$$

From Eq. (2) it can be seen how the low oversampling factor and working with Nyquist-FDM subcarriers helps to dramatically reduce the calculation complexity. This can be seen as follows: The amount of filter taps  $P_{\rm C}$  needed to perform the dispersion compensation depends linearly on the transmission length z but quadratically on the sampling frequency  $f_s$ . Consequently, our design which works with minimal oversampling and Nyquist-FDM subcarriers equalization reduces the required sampling frequency and thereby shortens dramatically the length  $P_{\rm C}$  of the required taps in the impulse response as  $P \ge P_{\rm C}$ . The reduced lengths  $P_{\rm C}$  of the impulse response directly relaxes the amount P-1 of required overlap samples and thus the necessary length N of the DFT. This ultimately minimizes the hardware usage. It is worth comparing the proposed scheme which requires a sampling rate  $f_s$  of 8 GSa/s for each 7.5 GBd subcarrier with a 30 GBd single carrier scheme with two-fold oversampling and therefore a sampling rate  $f_s$  of 60 GSa/s. The relative length of taps due to Eq. (2) is thus  $8^2/60^2 = 56$ . Consequently, one benefits from a reduction of the filter length by a factor of around 56. Of course the scheme needs to be implemented four times in parallel which makes it still beneficial.

Another complexity reduction has been obtained by performing the equalization in polar coordinates rather than Cartesian coordinates. Applying an equalization in Cartesian coordinates would have required a complex multiplication. In polar coordinates it boils down to a multiplication of the magnitude and a simple add and subtract operation rather than a multiplication.



Fig. 3. Frequency domain downsampling of a raised cosine shape signal. The frequency components of upper and lower part are shifted towards each other and the aliasing components are added coherently.

In the next step, the timing error is estimated and corrected in frequency domain. The timing estimation is done by the modified Godard technique [39]. This algorithm not only allows to perform the timing error estimation of a signal with noninteger oversampling but also without any multiplications in the frequency domain. In our case the oversampling is as small as 16/15. The timing correction in frequency domain can be efficiently implemented by simply adding a linear phase slope onto the signal. The slope of the linear phase development is defined by the estimated timing offset. In short, this timing error correction in the frequency domain that does not rely on multiplications requires way less complexity than a timing error correction in time domain where in the ideal case a convolution with a finite impulse response (FIR) filter would be required. Only timing errors within two consecutive samples are corrected directly in the frequency domain. Coarse timing corrections are done with a feedback to the overlap module at the beginning of the DSP. We guarantee that the receiver is sampling slightly faster than the transmitter to avoid loss of data. Once the limit of the shift register is reached, the whole DSP is halted for one clock cycle and the shift register is moved by one clock cycle. This guarantees timing recovery without loss of data.

In the last step, of the frequency domain processing, we down -sample the signal to one sample per symbol. This is realized by shifting the frequency components of the lower and upper frequency band into each other and coherently adding up the aliasing components [34]. The down-sampling is mathematically described by

$$X_{2}[k] = \begin{cases} X_{1}[k], & 0 \le k < M \\ X_{1}[k] + X_{1}[k + N_{1} - N_{2}], & M \le k < \frac{N_{1}}{2} \\ X_{1}[k + N_{1} - N_{2}], & \frac{N_{1}}{2} \le k < N_{2} \end{cases}$$
(3)

where  $X_1[k]$  and  $X_2[k]$  are the frequency domain samples before and after down-sampling at discrete frequencies k. The variable M defines the begin of the overlapping region and is defined as  $M = N_2 - N_1/2$ . The frequency domain resampling is depicted graphically in Fig. 3. The overlapping part is marked in red. This operation translates the original  $N_1 = 64$  samples into  $N_2 = 60$  samples. And again, resampling in frequency domain has been efficiently implemented only by shifting and summation operations. Furthermore, upsampling can also be achieved efficiently in the frequency domain just by adding zeros between the lower and upper frequency band.



Fig. 4. Schematic of the 60-points DFT with a hierarchically split of ratios of five and three into 15 parallel radix-4 FFTs.



Fig. 5. Carrier Recovery. A second-order feedback loop for frequency offset tracking with an incorporated polar based blind phase search (polar BPS) for carrier phase recovery. The frequency recovery comprises the estimation of the phase slope e[k], the second order loop filter with a linear and an integral gain  $g_1$  and  $g_2$ , and the numerical controlled oscillator (NCO) implemented as a look up table (LUT) with a controlled index i[k]. The inset of the 16QAM constellation diagram depicts the calculation of  $\Delta \varphi_l$ .

2) *Time Domain Processing:* The conversion to the time domain is achieved by a 60-points inverse discrete Fourier transform (IDFT).

We implemented the IDFT with help of the Winograd algorithm [40]–[42]. The 60-point IDFT is split in a top down hierarchy with a ratio of five and three into 15 radix-4 butterfly structures, see Fig. 4. The exact flow graph of the split-3 and split-5 structure can be found in [42].

To retain the overlap and save method in time domain we discard the remaining 30 overlap samples which results in 30 symbols per clock cycle.

In the next step, we apply the carrier recovery for carrier frequency offset correction and carrier phase correction. We exploit processing in polar coordinates which enables low complexity processing of the phase. Phase manipulation can be efficiently implemented by bit shifts and summations instead of multiplications which again provides a low complexity implementation. Fig. 5 depicts the schematic of the complete carrier recovery which can be separated into two parts, the frequency recovery and the phase recovery. The slow frequency changes are tracked and corrected by the second-order feedback loop. The fast phase changes are correct by the feed forward operation of the polar blind phase search (BPS) [32]. The polar BPS applies several test phases in parallel onto the signal and chooses the optimal correction phase by minimizing a cost function. We utilize polar coordinates processing for phase manipulation and the calculation of the cost function. To additionally correct the carrier frequency offset we incorporated the polar BPS in a second order feedback loop for frequency tracking. The second order feedback loop reuses the phase differences  $\Delta \varphi_l$  (see inset in Fig. 5) which are calculated in the polar BPS. The  $\Delta \varphi_l$ is the phase difference between a reference symbol  $r_l$ , which has been identified as the closest, and the received symbol  $x_l$ . In case of minimal frequency offset the difference of the first phase difference estimate  $\Delta \varphi_1$  and the last  $\Delta \varphi_L$  in a block of L time varying estimates should be minimal. Therefore, the phase slope  $e[k] = \Delta \varphi_1 - \Delta \varphi_L$  needs to be minimized during the feedback operation for carrier frequency synchronization. The phase slope is additionally averaged by a second order loop filter. It includes a proportional and an integral gain  $g_1$  and  $g_2$ . The filtered phase slope controls the index i[k] at the kth clock cycle on a LUT with a set of different frequencies. The feedback operation of the frequency recovery is mathematically described as follows [43], [44].

$$i[k] = i[k-1] + g_1(1+g_2)e[k] - g_1e[k-1].$$
(4)

The gain parameters  $g_1$  and  $g_2$  can be adapted in real-time during operation with values between 0.0625 and 0.9375 with a resolution of 4 bits. This ensures the flexibility to adapt the frequency recovery for different scenarios and optimize the performance. Following the description in [43] the characteristics of the loop filter's transfer function can be estimated. The natural frequency  $\omega_n$  can be tuned between 75 MHz and 1.119 GHz, the damping factor  $\zeta$  between 0.53 and 0.97, and the noise equivalent bandwidth between 246 MHz and 13.93 GHz. In the experiments we used gain parameters of 0.0625 and 0.125 which we optimized in real-time. It is a trade-off between a large loop bandwidth to track fast time variations and a small loop bandwidth to decrease the amount of noise to pass the loop. The LUT is implemented with 512 frequencies between -100 MHzand 100 MHz in a block RAM on the FPGA chip. The different correction frequencies can also be generated by direct digital synthesis (DDS). The optimal frequency is then applied on the incoming signal. The application of the frequency offset is implemented only with phase additions in polar coordinates. We tested the carrier recovery experimentally in an electrical back to back measurement. Fig. 6 shows the BER performance of the individual subcarriers measured for different values of the carrier frequency offset (CFO). The CFO was applied digitally onto the signal. The subcarriers one and four are the outer subcarriers and two and three are the inner subcarriers. It can be observed that the BER performance is almost independent of the CFO in the designed range of the carrier recovery between



Fig. 6. BER as a function of different carrier frequency offsets (CFO) in an electrical back to back measurement. The BER is shown for the individual subcarriers.

-100 MHz and 100 MHz. The outer subcarriers have a worse performance since they suffer more from the frequency dependent ENOB of the ADCs. In case of larger frequency offsets beyond -100 MHz and 100 MHz, the range of the test frequencies in the LUT can be increased associated with a larger block RAM and therefore larger hardware utilization. If the CFO is beyond the tracking range of the feedback loop an additional coarse frequency recovery can be applied in the frequency domain before the static filter. Such a frequency recovery can exploit the symmetry characteristic [45], [46] of the power levels of two mirrored subcarrier in the lower and upper sideband.

After carrier recovery, we transform the signal back to Cartesian coordinates by the CORDIC algorithm [37]. Demodulation is done by hard decision on real and imaginary part. The received bits are compared with a pseudo random bit sequence (PRBS) of length  $2^{11}$ -1 to calculate the BER. The BER is calculated for  $2^{30}$  received bits live on the FPGA. Due to the hardware constraints, we received only one Nyquist-FDM channel in a single polarization in parallel.

Nevertheless, our frequency domain processing architecture can be extended for polarization multiplexing and polarization mode dispersion compensation by utilizing an adaptive frequency domain multiple input multiple output (MIMO) equalizer [47]–[49]. This requires two additional ADCs for the second polarization with the same DSP steps as described in Fig. 1. Besides, the static equalizer needs to be extended to an adaptive MIMO equalizer with the two complex data streams as input. The filter coefficients can be updated with a gradient descent algorithm where the error is calculated in time domain after the carrier recovery [48], [49].

# B. ADC-FPGA Interface

The receiver consists of two ADCs connected with an FPGA board. The ADCs have a sampling rate  $f_{\rm S}$  of 32 GSa/s and are connected to a single Xilinx Virtex 7 FPGA (xc7vx690t), schematics see Fig. 1. The multi-gigabit transceivers (MGTs) integrated on the FPGA receive 48 parallel 8 Gbit/s digital lanes

| TABLE I                                                     |   |
|-------------------------------------------------------------|---|
| HARDWARE UTILIZATION IN ABSOLUTE AND RELATIVE NUMBERS (% OF | А |
| XILINX A XC7VX690T FPGA BOARD)                              |   |

| DSP block  | Slice     | Slice     | DSP48E      | Resolution |
|------------|-----------|-----------|-------------|------------|
|            | Registers | Registers | (e.g.       | (bits)     |
|            | 0         | 0         | multiplier) | ì í        |
| 256-FFT    | 75,099    | 75,026    | 1,656       | 14         |
|            | (17.34%)  | (8.66%)   | (46.0%)     |            |
| FD Cordics | 44,422    | 47,558    | 0           | 12         |
|            | (10.26%)  | (5.49 %)  | (0.0%)      |            |
| NFDM       | 1,408     | 1,409     | 0           | 14         |
| Demux      | (0.33%)   | (0.16%)   | (0.0%)      |            |
| FD Equal.  | 1,634     | 5,186     | 64          | 14         |
| _          | (0.38%)   | (0.60 %)  | (1.78 %)    |            |
| Timing     | 7,670     | 7,156     | 0           | 14         |
| Recovery   | (1.77%)   | (0.83 %)  | (0.0%)      |            |
| Down       | 89        | 1,079     | 0           | 14         |
| sampling   | (0.02%)   | (0.12 %)  | (0.0%)      |            |
| 60-IDFT    | 13,720    | 18,423    | 216         | 14         |
|            | (3.17%)   | (2.13 %)  | (6.00%)     |            |
| Polar CFR  | 1,359     | 4,715     | 3           | 10         |
|            | (0.36%)   | (0.48%)   | (0.48%)     |            |
| Polar BPS  | 71,055    | 100,972   | 0           | 10         |
|            | (16.40%)  | (11.65 %) | (0.0%)      |            |
| Demod. &   | 1,802     | 2,721     | 0           | 11         |
| BERT       | (0.42)    | (0.31 %)  | (0.0%)      |            |
| Misc.      | 47,187    | 84,458    | 0           | -          |
|            | (10.87%)  | (9.75%)   | (0.0%)      |            |
| Total      | 265,445   | 348,703   | 1,939       | -          |
|            | (61.28)   | (40.25%)  | (53.9%)     |            |

from both ADCs. The 48 parallel data streams need to be synchronized during the start-up procedure. The MGTs deserialize each lane by a factor of 32, which results in a total of 1536 parallel bits processed with a clock speed of 250 MHz. The bits represent 128 complex samples with a 6 bit resolution per dimension. The in-phase and quadrature components are deskewed once during start-up by a BER optimization.

## C. Complexity and Hardware Utilization

The digital design of the coherent real-time receiver was realized in VHDL, synthesized, and implemented with the Xilinx Vivado Design tool for a Xilinx Virtex 7 chip. The on-chip processing clock speed is 250 MHz, which processes 30 symbols in parallel per subcarrier. The design can dynamically change the filter coefficients online within one clock cycle and switch between the modulation formats. With this, the design can demodulate one sub channel at a time. The carrier phase recovery utilizes 25 test phases. The hardware requirements for the final design and its sub DSP blocks is shown in Table I. The table lists the absolute value and the relative utilization of the slice LUTs, slice Registers, and the DSP units on the used Xilinx Virtex7 FPGA boards (xc7vx690t). Besides, it shows the different computational resolutions which are used for the DSP blocks. The word width of the signal path between the DSP blocks is resized after each step to keep the resolution within defined limits.



Fig. 7. Experimental Setup. The optical coherent transmitter comprises two digital to analog converters (DAC), radio frequency (RF) driver amplifiers, IQ-Mach Zehnder modulator (IQ-MZM), and an external cavity laser (ECL). The transmission line is build out of four standard single mode fiber (SSMF) spans with 75 km and four Erbium doped fiber amplifiers (EDFA). The received signal can be mixed with additional noise from a noise loading source and the optical signal-to-noise ratio (OSNR) is measured by a optical spectrum analyzer (OSNR). The receiver comprises optical band pass filters (OBF), an EDFA, a polarization controller (PC), a local oscillator (LO), a 90° hybrid mixer, two balanced optical receivers (BOR), two analog-to-digital converters (ADC), and a receiver field programmable gate array (Rx-FPGA) board. The first inset shows an optical spectrum of the Nyquist-FDM signal with four subcarrier and a symbol rate of 30 GBd. The second inset shows the experimental setup of the coherent optical real-time receiver in the lab.

#### III. EXPERIMENTS

# A. Experimental Setup

The real-time receiver is demonstrated in an experiment. Fig. 7 shows the experimental setup of the real-time experiment. It comprises an optical transmitter, a transmission line and the optical real-time receiver. The optical transmitter includes an external cavity laser (ECL) with a specified linewidth below 100 kHz, which is modulated by a Lithium-Niobate IQmodulator. The two outputs of an arbitrary waveform generator (AWG) are amplified by broadband RF amplifiers and generate the drive signal for the IQ-modulator. The AWG has a sampling rate of 64 GSa/s and an electrical bandwidth of 24 GHz. The electrical signal is generated offline and loaded onto the memory of the DACs. We generate offline a Nyquist-FDM signal with four digitally multiplexed subcarriers. Each individual subcarrier has a symbol rate  $R_{SC}$  of 7.5 GBd, which results in combined symbol rate R of 30 GBd. The subcarriers are pulse shaped by a square-root-raised-cosine with a roll-off factor of 0.0667. No guard intervals are used between the subcarriers. The subcarriers are modulated either by 4QAM, a combination of 4QAM (outer subcarriers) and 16QAM (inner subcarriers), or 16QAM. We used differential encoding and a PRBS of length 2<sup>11</sup>-1 as bit sequence. Besides, different power ratios between outer and inner subcarriers are used for optimal performance. In case of pure 4QAM and 16QAM modulation, we emphasized the outer channel by a factor of 1.05 and in case of mixed 4/16QAM modulation we suppressed the outer subcarriers by a factor of 0.85. Furthermore, we used linear pre-distortion to compensate for the frequency dependent impairments of the DACs, the RF amplifiers, and the IQ-modulator.

The transmission line comprises 300 km of standard single mode fiber (SSMF). We used four fiber spans with 75 km respectively and compensate the loss in each fiber span with an Erbium doped fiber amplifier (EDFA). The input power into each fiber span can be adapted. In case of back to back measurements, we can optionally add white Gaussian noise to the signal in front of the receiver to adapt different optical signal-tonoise ratio (OSNR) levels. The OSNR is measured with help of an optical spectrum analyzer (OSA) with a spectral resolution of 0.1 nm.

Before the optical real-time receiver, we filter the signal with optical bandpass filters (OBF) with a bandwidth of 2 nm. An EDFA is used to achieve ideal power levels. A polarization controller (PC) ensures the optimal polarization for best performance. The signal is combined with a second ECL in a 90° hybrid mixer. Two balanced optical receivers (BOR) convert the optical in-phase and quadrature component into the electrical domain and are connected to the electrical real-time receiver. The electrical real-time receiver comprises two analog-to-digital converters (ADCs) for coherent reception and an FPGA board. The ADCs have a sampling rate  $f_S$  of 32 GSa/s, an electrical 6 dB bandwidth of more than 16 GHz and an frequency dependent effective number of bits (ENOB) of between 4.5 bits and 3.5 bits.

## B. Experimental Results

Fig. 8 shows the captured digital spectra and corresponding constellation diagrams at the highest OSNR levels for 16QAM, 4/16QAM and 4QAM. The digital spectra is the Fourier transform of the raw time domain signal which is sampled by the ADCs and loaded to a memory onto the FPGA board without any DSP steps. The characteristic frequency dependence of the ADCs can be observed. It can be seen that the signal fits exactly in the 16 GHz baseband bandwidth of the ADC which samples at 32 GSa/s. The constellation diagrams are loaded from an onboard memory which captures the symbol directly after the carrier recovery. The memory can store 30720 symbols and the plotted constellation diagrams are ten times the overlapped content of the memory measured at different instants of time. We show the calculated BER of each subcarrier individually. It can be observed that the performance on the outer subcarriers is decreased which can be attributed to the frequency dependent ENOB.

In the following experiments we measured the bit to error ratio (BER) under influence of additive white Gaussian noise (AWGN), fiber nonlinearities, and emulated chromatic dispersion. Each BER measurement point is the average of the BERs of all subcarriers calculated in real-time on the FPGA board. The BER is calculated over 2<sup>30</sup> received bits and is



Fig. 8. The spectra of the digitized waveforms and constellation diagrams after real-time processing. (a) 16QAM in blue (b) 4/16QAM in green (c) 4QAM in red. The spectra are plotted as power spectral density (PSD) as function of the frequency. The constellation diagrams are depicted for each subcarrier individually with the calculated bit to error ratios (BER).

additionally averaged over one second. For the following experiments, we determined the optical signal power as a sum over the four individual subcarriers which represents a 30 GBd signal. This value was used to calculate the OSNR and the fiber input power. Thus, the net data rates of the combined subcarriers are 60 Gbit/s (4QAM), 90 Gbit/s (4/16QAM), and 120 Gbit/s (16QAM). After the Nyquist-FDM demultiplexing, we switched dynamically between the different subcarriers since the hardware resources on a single FPGA chip are not enough to demodulate all subcarriers at the same time. The frequency offset between transmitter laser and receiver laser was kept below 50 MHz for all experiments. In a first step, we measured the back to back performance of the real-time receiver at different OSNR levels.

Fig. 9 shows the BER performance as a function the optical signal to noise ratio (OSNR). All curves reach an error floor below the SD-FEC limit of  $2 \times 10^{-2}$ . Three different curves are shown, in red for 4QAM with an error floor of  $1.7 \times 10^{-7}$ , in green for 4/16QAM with an error floor of  $4.8 \times 10^{-4}$ , and in blue for 16QAM with an error floor of  $6.4 \times 10^{-3}$ . The larger implementation penalty of 16QAM is mainly attributed to the modulation formats sensitivity towards a limited ENOB at larger frequencies. Additionally, the carrier phase recovery induces another penalty since higher order modulation formats are less robust against laser phase noise. Especially, the limited amount of 25 test phases and the short block wise estimation window of 30 symbols increase the penalty [32]. The error floor of the 4QAM signal is associated with a penalty from the timing recovery which is not optimized to follow fast clock phase changes.



Fig. 9. Experimental results of the back to back measurements. BER is shown as a function of the optical signal to noise ratio (OSNR) for 4QAM (red), 4/16QAM (green), and 16QAM (blue). The SD-FEC limit is show in black at a BER of  $2 \times 10^{-2}$ . The net data rates of the combined subcarriers are 60 Gbit/s (4QAM), 90 Gbit/s (4/16QAM), and 120 Gbit/s (16QAM). The theoretical BER curves for 30 GBd signals with the associated modulation formats and differential encoding are depicted in dotted lines.



Fig. 10. Experimental results of the fiber transmission experiment over 300 km SSMF. BER is shown as a function of fiber span input power for 4QAM (red), 4/16QAM (green), and 16QAM (blue). The SD-FEC limit is show in black at a BER of  $2 \times 10^{-2}$ . The net data rates of the combined subcarriers are 60 Gbit/s (4QAM), 90 Gbit/s (4/16QAM), and 120 Gbit/s (16QAM).

In the second step, we investigated the real-time receiver performance after transmission through 300 km of SSMF. Fig. 10 shows the results for the transmission experiments. The BER is measured as a function of the signal's input power into each fiber span. The optimal input power is found around -1 dBm. In the low power regime the signal quality is limited by additive white Gaussian noise and in the high power regime by the fiber nonlinearities. At the optimal input power we measured a BER of  $3.3 \times 10^{-7}$  for 4QAM (red),  $9.1 \times 10^{-4}$  for 4/16QAM (green), and  $9.3 \times 10^{-3}$  for 16QAM (blue). All BERs are below the SD-FEC limit of  $2 \times 10^{-2}$ . Because of limited amount optical fibers in our lab we are restricted to transmissions up to 300 km. To further analyze the performance of the static chromatic dispersion compensation filter, we emulated transmission by pre-distortion of the transmitted signal by a defined amount of chromatic dispersion, i.e., the inverse operation of eq. (1). Fig. 11



Fig. 11. Experimental results for emulated transmission distances Chromatic dispersion is introduced via pre-distortion to emulated transmission. BER as a function of transmission distance for 4QAM (red), 4/16QAM (green), and 16QAM (blue). The SD-FEC limit is show in black at a BER of  $2 \times 10^{-2}$ . The net data rates of the combined subcarriers are 60 Gbit/s (4QAM), 90 Gbit/s (4/16QAM), and 120 Gbit/s (16QAM).



Fig. 12. Experimental results of long term measurement over 16 hours for 16QAM. The left y-axis shows in blue the BER over time. The right y-axis shows in red the detected and corrected carrier frequency offset over time.

shows the BER performance for different emulated amounts of transmission. The different curves in red, green, and blue show the results for 4QAM, 4/16QAM, and 16QAM. We show BER performance below the SD-FEC limit for emulated transmission distances of 4500 km, 5000 km, and 6000 km for 4QAM, 4/16QAM, 16QAM. Considering eq. (2) with P = 33,  $f_s = 8$ GSa/s, and  $D_C = 17 \frac{\text{ps}}{\text{nm-km}}$ , we can estimate a maximum transmission distance z around 3600 km up to which transmission without penalty can be assumed. This corresponds to Fig. 11 where we can observe an increasing penalty for all three modulation formats after a distance of around 3600 km. To evaluate the long term stability of the real-time processing we measured the BER over almost 16 hours. The measurement data is collected in 5 minute intervals from the memory of the receiver. Fig. 12 shows the measured BER results for 16QAM over time in blue. In red we show the estimated and corrected carrier frequency offset. The mean value of the CFO over 16 hours is -6.9 MHz and the CFO drifts between -42.5 MHz and 22.1 MHz. It can be seen that the BER stays continuously at around  $7.5 \times 10^{-3}$ even with drifting CFOs.

## IV. CONCLUSION

We proposed and demonstrated a new coherent hardware architecture. It combines frequency domain processing with noninteger oversampling for timing recovery and Nyquist-FDM subcarrier equalization with an efficient time domain processing in polar coordinates for carrier recovery. The coherent receiver processes the data with a non-integer oversampling of 16/15 or lower during the entire processing chain for lowest hardware utilization. The design includes Nyquist-FDM demultiplexing, subcarrier equalization, timing recovery, and down-sampling in the frequency domain. Carrier frequency and phase recovery, symbol hard decision and bit-to-error ratio calculation is performed in the time domain. We experimentally demonstrated real-time reception of a 30 GBd Nyquist-FDM signal with four subcarriers over 300 km of SSMF. We received successfully 60 Gbit/s (4QAM), 90 Gbit/s (4/16QAM), and 120 Gbit/s (16QAM) signals. Each subcarrier has been demodulated individually at a time instant. The proposed architecture shows that a careful design of low-complexity DSP is essential for the realtime implementation of lowest power receivers. The proposed frequency domain processing scheme with a non-integer oversampling factor can not only be used for Nyquist-FDM signals but also adapted for the reception of single carrier signals. The careful choice of the size of DFT and inverse DFT allows for arbitrary oversampling factors.

## REFERENCES

- K. Kikuchi, "Fundamentals of coherent optical fiber communications," J. Lightw. Technol., vol. 34, no. 1, pp. 157–179, Jan. 2016.
- [2] E. Ip, A. P. T. Lau, D. J. F. Barros, and J. M. Kahn, "Coherent detection in optical fiber systems," *Opt. Express*, vol. 16, pp. 753–791, 2008.
- [3] K. Roberts et al., "High capacity transport—100G and beyond," J. Lightw. Technol., vol. 33, no. 3, pp. 563–578, Feb. 2015.
- [4] H. Zhang *et al.*, "Real-time transmission of 16 Tb/s over 1020 km using 200 Gb/s CFP2-DCO," *Opt. Express*, vol. 26, pp. 6943–6948, Mar. 19, 2018.
- [5] K. Zhong, X. Zhou, J. Huo, C. Yu, C. Lu, and A. P. T. Lau, "Digital signal processing for short-reach optical communications: a review of current technologies and future trends," *J. Lightw. Technol.*, vol. 36, no. 2, pp. 377–400, Feb. 2018.
- [6] J.-P. Elbers, N. Eiselt, A. Dochhan, D. Rafique, and H. Grießer, "PAM4 vs Coherent for DCI Applications," in *Proc. Signal Process. Photon. Commun.*, New Orleans, LA, USA, 2017, pp. 1–3, Art. no. SpTh2D.1.
- [7] D. Lavery, R. Maher, D. S. Millar, B. C. Thomsen, P. Bayvel, and S. J. Savory, "Digital coherent receivers for long-reach optical access networks," *J. Lightw. Technol.*, vol. 31, no. 4, pp. 609–620, Feb. 2013.
- [8] N. Suzuki, H. Miura, K. Matsuda, R. Matsumoto, and K. Motoshima, "100 Gb/s to 1 Tb/s based coherent passive optical network technology," *J. Lightw. Technol.*, vol. 36, no. 8, pp. 1485–1491, Apr. 2018.
- [9] A. Shahpari *et al.*, "Coherent access: A review," J. Lightw. Technol., vol. 35, no. 4, pp. 1050–1058, Feb. 2017.
- [10] F. Frey, R. Elschner, and J. K. Fischer, "Estimation of trends for coherent DSP ASIC power dissipation for different bitrates and transmission reaches," in *Proc. ITG-Symp. Photon. Netw.*, May 11–12, 2017, pp. 1–8.
- [11] D. A. Morero, M. A. Castrillón, A. Aguirre, M. R. Hueda, and O. E. Agazzi, "Design tradeoffs and challenges in practical coherent optical transceiver implementations," *J. Lightw. Technol.*, vol. 34, no. 1, pp. 121–136, Jan. 2016.
- [12] J. C. Geyer, C. Rasmussen, B. Shah, T. Nielsen, and M. Givehchi, "Power efficient coherent transceivers," in *Proc. Eur. Conf. Opt. Commun.*, Sep. 18–22, 2016, pp. 1–3.
- [13] B. S. G. Pillai *et al.*, "End-to-end energy modeling and analysis of longhaul coherent transmission systems," *J. Lightw. Technol.*, vol. 32, no. 18, pp. 3093–3111, Sep. 2014.

- [14] R. Schmogrow *et al.*, "Nyquist frequency division multiplexing for optical communications," in *Proc. Conf. Lasers Electro-Opt.*, San Jose, CA, USA, May 6–11, 2012, pp. 1–2.
- [15] J. D. McNicol *et al.*, "Single-carrier versus sub-carrier bandwidth considerations for coherent optical systems," in *Proc. SPIE*, 2011, vol. 7960, pp. 796006-1–796006-11.
- [16] M. Mitchell *et al.*, "Optical integration and multi-carrier solutions for 100G and beyond," *Opt. Fiber Technol.*, vol. 17, pp. 412–420, 2011.
- [17] M. Qiu *et al.*, "Digital subcarrier multiplexing for fiber nonlinearity mitigation in coherent optical communication systems," *Opt. Express*, vol. 22, pp. 18770–18777, 2014.
- [18] P. Poggiolini *et al.*, "Analytical and experimental results on system maximum reach increase through symbol rate optimization," *J. Lightw. Technol.*, vol. 34, no. 8, pp. 1872–1885, Apr. 2016.
- [19] L. B. Du and A. J. Lowery, "Optimizing the subcarrier granularity of coherent optical communications systems," *Opt. Express*, vol. 19, pp. 8079– 8084, 2011.
- [20] P. C. Schindler *et al.*, "Full flex-grid asynchronous multiplexing demonstrated with Nyquist pulse-shaping," *Opt. Express*, vol. 22, pp. 10923– 10937, 2014.
- [21] D. Krause, A. Awadalla, A. Karar, H. H. Sun, and K.-T. Wu, "Design considerations for a digital subcarrier coherent optical modem," in *Proc. Opt. Fiber Commun. Conf. Exhib.*, Los Angeles, CA, USA, Mar. 19–23, 2017, pp. 1–3.
- [22] T. Rahman *et al.*, "Digital subcarrier multiplexed hybrid QAM for datarate flexibility and ROADM filtering tolerance," in *Proc. Opt. Fiber Commun. Conf. Exhib.*, Anaheim, CA, USA, Mar. 20–24, 2016, pp. 1–3.
- [23] G. Bosco, S. M. Bilal, A. Nespola, P. Poggiolini, and F. Forghieri, "Impact of the transmitter IQ-Skew in multi-subcarrier coherent optical systems," in *Proc. Opt. Fiber Commun. Conf. Exhib.*, Anaheim, CA, USA, Mar. 20–24, 2016, pp. 1–3.
- [24] B. Baeuerle, A. Josten, R. Bonjour, D. Hillerkuss, and J. Leuthold, "Effect of transmitter impairments on Nyquist-FDM signals with increasing subband granularity," in *Proc. Signal Process. Photon. Commun.*, Vancouver, BC, Canada, Jul. 18–20, 2016, pp. 1–3, Art. no. SpW3F.4.
- [25] X. Xiao, F. Li, J. Yu, X. Li, Y. Xia, and F. Chen, "100-Gb/s single-band realtime coherent optical DP-16QAM-OFDM transmission and reception," in *Proc. Opt. Fiber Commun. Conf.*, San Francisco, CA, USA, Mar. 9–13, 2014, pp. 1–3.
- [26] N. Kaneda *et al.*, "Field demonstration of 100-Gb/s real-time coherent optical OFDM detection," *J. Lightw. Technol.*, vol. 33, no. 7, pp. 1365– 1372, Apr. 2015.
- [27] A. Tolmachev, M. Meltsin, R. Hilgendorf, M. Orbah, and T. Birk, "Realtime hardware demonstration of 180 Gbps DFT-S OFDM receiver based on digital sub-banding," in *Proc. Eur. Conf. Opt. Commun.*, Dusseldorf, Germany, Sep. 18–22, 2016, pp. 1–3.
- [28] L. Mo, D. Ning, X. Qingsong, G. Guowei, F. Zhiyong, and C. Shiyi, "A 100-Gb/s real-time burst-mode coherent PDM-DQPSK receiver," in *Proc. Eur. Conf. Exhib. Opt. Commun.*, London, U.K., Sep. 22–26, 2013, pp. 1–3.
- [29] N. Kikuchi, T. Yano, and R. Hirai, "FPGA prototyping of singlepolarization 112-Gb/s transceiver for optical multilevel signaling with intensity and delay detection," *J. Lightw. Technol.*, vol. 34, no. 8, pp. 1762– 1769, Apr. 2016.
- [30] B. Baeuerle, A. Josten, M. Eppenberger, E. Dornbierer, D. Hillerkuss, and J. Leuthold, "FPGA-based real-time receiver for nyquist-FDM at 112 Gbit/s sampled with 32 GSa/s," in *Proc. Opt. Fiber Commun. Conf.*, Los Angeles, CA, USA, Mar. 19–23, 2017, pp. 1–3.
- [31] B. Baeuerle, A. Josten, M. Eppenberger, D. Hillerkuss, and J. Leuthold, "FPGA-based real-time receivers for Nyquist-FDM," in *Proc. Signal Process. Photon. Commun.* New Orleans, LA, USA, Jul. 24, 2017, pp. 1–3, Art. no. SpM3F.3.
- [32] B. Baeuerle *et al.*, "Multi-format carrier recovery for coherent real-time reception with processing in polar coordinates," *Opt. Express*, vol. 24, pp. 25629–25640, Oct. 31, 2016.
- [33] A. V. Oppenheim, *Discrete-Time Signal Processing*. Chennai, India: Pearson Education India, 1999.
- [34] M. Borgerding, "Turning overlap-save into a multiband mixing, downsampling filter bank," *IEEE Signal Process. Mag.*, vol. 23, no. 2, pp. 158–161, Mar. 2006.
- [35] P. Milder, F. Franchetti, J. C. Hoe, and M. Püschel, "Computer generation of hardware for linear digital signal processing transforms," ACM Trans. Des. Autom. Electron. Syst., vol. 17, pp. 1–33, 2012.
- [36] 2017. [Online]. Available: www.spiral.net.

- [37] J. E. Volder, "The CORDIC trigonometric computing technique," *IEEE Trans. Electron. Comput.*, vol. EC-8, no. 3, pp. 330–334, Sep. 1959.
- [38] S. J. Savory, "Digital filters for coherent optical receivers," *Opt. Express*, vol. 16, pp. 804–817, 2008.
- [39] A. Josten, B. Baeuerle, E. Dornbierer, J. Boesser, D. Hillerkuss, and J. Leuthold, "Modified godard timing recovery for non integer oversampling receivers," *Appl. Sci.*, vol. 7, 2017, Art. no. 655.
- [40] H. Silverman, "An introduction to programming the Winograd Fourier transform algorithm (WFTA)," *IEEE Trans. Acoust., Speech, Signal Process.*, vol. 25, no. 2, pp. 152–165, Apr. 1977.
- [41] S. Winograd, "On computing the discrete Fourier transform," Math. Comput., vol. 32, pp. 175–199, 1978.
- [42] K. Li, W. Zheng, and K. Li, "A fast algorithm with less operations for length-N = q\*2<sup>^</sup>m DFTs," *IEEE Trans. Signal Process.*, vol. 63, no. 3, pp. 673–683, Feb. 2015.
- [43] U. Mengali, Synchronization Techniques for Digital Receivers. New York, NY, USA: Springer, 2013.
- [44] I. Fatadin, D. Ives, and S. J. Savory, "Compensation of frequency offset for differentially encoded 16- and 64-QAM in the presence of laser phase noise," *IEEE Photon. Technol. Lett.*, vol. 22, no. 3, pp. 176–178, Feb. 2010.

- [45] T. Nakagawa *et al.*, "Wide-range and fast-tracking frequency offset estimator for optical coherent receivers," in *Proc. 36th Eur. Conf. Exhib. Opt. Commun.*, Sep. 19–23, 2010, pp. 1–3.
- [46] J. C. M. Diniz *et al.*, "Simple feed-forward wide-range frequency offset estimator for optical coherent receivers," *Opt. Express*, vol. 19, pp. B323– B328, 2011.
- [47] B. Spinnler, "Equalizer design and complexity for digital coherent receivers," *IEEE J. Sel. Topics Quantum Electron.*, vol. 16, no. 5, pp. 1180– 1192, Sep./Oct. 2010.
- [48] M. S. Faruk and K. Kikuchi, "Adaptive frequency-domain equalization in digital coherent optical receivers," *Opt. Express*, vol. 19, pp. 12789– 12798, 2011.
- [49] M. Paskov, D. Millar, and K. Parsons, "A fully-blind fractionallyoversampled frequency domain adaptive equalizer," in *Proc. Opt. Fiber Commun. Conf. Exhib.*, Vancouver, BC, Canada, 2016, pp. 1–3.

Authors' biographies not available at the time of publication.