# Design and Analysis of a Nanosecond Burst-Mode CDR Using MATLAB/Simulink and Opti-System Co-Simulation

Heng Zhang<sup>10</sup>, Yuandong Li<sup>10</sup>, Wenhe Yin<sup>10</sup>, Yuan Du<sup>10</sup>, Senior Member, IEEE, and Li Du<sup>10</sup>, Member, IEEE

Abstract-Optical packet switching (OPS) networks are promising to accommodate the growing traffic and reduce power consumption in data center communications. OPS networks with nanosecond switching time require nanosecond clock and data recovery (CDR) circuits. The nanosecond CDR can be achieved by utilizing a global frequency-synchronized reference clock for both transmitters (TX) and receivers (RX) and adopting a phase compensation scheme, which leads to predictably managed frequency and phase. However, the CDR still needs to be comprehensively evaluated considering various interferes. We add more analysis by developing a novel optoelectronic co-simulation system that combines the software Opti-system and MATLAB/Simulink. We set up a simple OPS network equipped with the CDR architecture using the simulation system. The feasibility of the CDR mechanism is validated, then various interferes are characterized to evaluate the CDR's stability, including the location variation of reference clock source, channel jitters, and carrier power variations.

Index Terms—Data center networks, optical packet switching network, clock and data recovery, MATLAB/Simulink, optisystem.

#### I. INTRODUCTION

**D** ATA centers are being rapidly deployed in various organizations, including companies, institutions, and government offices. These centers host a growing number of applications like scientific computing, deep learning, and financial analysis, resulting in increased demand for bandwidth in data center communication networks [1], [2], [3]. Current data center networks are dominated by multi-tier electrical switch networks. However, the bandwidth limitations of electrical processing chips and the power consumption associated with frequent electrical-optical conversions pose challenges [4], [5], [6]. To overcome these issues, optical packet switch (OPS) networks

Manuscript received 7 July 2023; accepted 19 August 2023. Date of publication 23 August 2023; date of current version 4 September 2023. This work was supported in part by the National Key Research and Development Program of China under Grant 2021YFA0717700 and in part by the National NSF of China under Grants 62211530492, 62141411, 62004096, and 62004097. (*Corresponding authors: Yuan Du; Li Du.*)

This article has supplementary downloadable material available at http://doi. org/10.1109/JPHOT.2023.3307687, provided by the authors.

This article has supplementary downloadable material available at https://doi.org/10.1109/JPHOT.2023.3307687, provided by the authors.

Digital Object Identifier 10.1109/JPHOT.2023.3307687

have emerged as a promising solution due to their high bandwidth capacity and elimination of electrical-optical conversions [7], [8], [9]. Given that many applications in data centers produce short traffic packets, OPS networks with nanosecond configuration time are required, which also necessitates the development of CDR with nanosecond locking time [10]. Existing burst-mode CDR circuits, such as all-digital CDR [11], [12], gated VCO CDR [13], and over-sampling CDR [14], [15], [16], either lack practical integration in OPS transceivers or can only achieve microsecond-level data recovery [17], [18]. Therefore, there is an urgent need to develop efficient nanosecond CDRs to meet the demands of nanosecond OPS networks.

For packet-based optical switching networks, the variability in clock frequency and phase from packet to packet contributes to the lengthy locking time of CDR circuits. To address this issue, references [19], [20], [21] proposed a sub-nanosecond CDR architecture specifically designed for data center OPS networks by taking a network-level perspective. Briefly, frequencysynchronized reference clocks are used for both transmitters (TX) and receivers (RX) in the OPS network, thereby requiring the determination of only the phases in the RX. By leveraging the observation that the phase offsets between specific pairs of transceivers remains relatively constant in a stable environment, the group first measures the phase in the RX and then applies phase compensation in the TX. This approach aligns the phases in RX, thereby accelerating the locking process. The proposed CDR architecture demonstrated impressive performance in 25.6 Gbps burst-mode data transmission, achieving a locking time of 625 ps. However, various interferences can introduce noise into phase offsets, potentially leading to the failure of the phase compensation process. The paper [20] thoroughly investigated the impact of temperature variation on phase offsets, considering it as the primary concern. Nonetheless, it is crucial to further characterize the impacts of other interferes, such as reference clock variations, channel jitters, and carrier power variations.

To provide more insights into the nanosecond CDR, we design and analyze a 25.6 Gbps OPS network equipped with the proposed CDR architecture by developing a novel optoelectronic co-simulation system. The overall structure of our simulation is depicted in Fig. 1. Two software tools are used in the simulation: Opti-System for optical-related simulations and MATLAB/Simulink for electrical-related simulations. In

The authors are with the School of Electronic Science and Engineering, Nanjing University, Nanjing 210023, China (e-mail: yuandu@nju.edu.cn; ldu@nju.edu.cn).



Fig. 1. Block diagram of the entire simulation model. The green part is modeled in Opti-System and the blue part is modeled in MATLAB/Simulink.

MATLAB/Simulink, we model one TX node and one RX node for the OPS network. The TX and RX are interconnected by an optical channel, which is modeled using Opti-System. Two optical carriers are used and switched alternately to mimic the function of packet switching. We successfully validated the feasibility of the modeled nanosecond OPS network using the simulation test bench. We conducted an investigation into the influence of reference clock variations on the overall system performance. Additionally, we assessed the stability of the nanosecond OPS network under channel jitters and optical carrier power fluctuations. It is worth noting that the simulation system proposed in this paper can also be utilized to validate other electrical-optical codesigned modules.

# II. DETAILS OF MODELED OPS NETWORK

# A. Overview of the Simulation Model

The overall structure of the simulated 25.6 Gbps OPS network is illustrated in Fig. 1. Inside the software Opti-System, an 800 M optical reference clock is generated and transmitted to TX and RX through two optical fibers. In MATLAB/Simulink, two phase lock loops (PLLs) are modeled to multiply the 800 M clock to 12.8G separately in TX and RX. This ensures frequency synchronization of reference clocks in both the TX and RX. The details of PLL are shown in Fig. S5 (supplementary materials). The 12.8G reference clock is adjusted with appropriate phase shifts to generate the required single-ended 25.6G clock. Further details on this process will be explained in Section II-B. TX then sends data at the 25.6G reference clock. Two optical carriers are utilized and switched by an optical switch at a fixed switching time (100ns). The optical signals are transmitted in an optical fiber modeled in Opti-system. RX module includes modified a Bang-Bang phase detector (BBPD) and a finite state machine (FSM) for phase determination and a PLL-based CDR for clock recovery and jitter tracking.

To achieve a short locking time of PLL-based CDR, the phases of the two wavelengths need to be aligned in RX. In our simulation, RX initially measures the phase offsets for the



Fig. 2. Generation of four-phase 12.8G clock based on the 6-bit PI code.

 TABLE I

 Reference Clocks Selection Based on MSB\_2Bit

| MSB_2bit | clock_s1 | clock_s3 |
|----------|----------|----------|
| 00       | 0°       | 180°     |
| 01       | 45°      | 225°     |
| 10       | 90°      | 270°     |
| 11       | 135°     | 315°     |

TABLE II PHASE SELECTION BASED ON LSB\_4BIT

| clock_s1&clock_s3    | LSB_4bit                                                                                                                                                                                                                                                          | clock_s1&clock_s3                                                                                                                                                                                                                                                                                                                                                  |
|----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| $0 \times T_{delay}$ | 1000                                                                                                                                                                                                                                                              | $8 \times T_{delay}$                                                                                                                                                                                                                                                                                                                                               |
| $1 \times T_{delay}$ | 1001                                                                                                                                                                                                                                                              | $9 \times T_{delay}$                                                                                                                                                                                                                                                                                                                                               |
| $2 	imes T_{delay}$  | 1010                                                                                                                                                                                                                                                              | $10 \times T_{delay}$                                                                                                                                                                                                                                                                                                                                              |
| $3 \times T_{delay}$ | 1011                                                                                                                                                                                                                                                              | $11 \times T_{delay}$                                                                                                                                                                                                                                                                                                                                              |
| $4 	imes T_{delay}$  | 1100                                                                                                                                                                                                                                                              | $12 \times T_{delay}$                                                                                                                                                                                                                                                                                                                                              |
| $5 	imes T_{delay}$  | 1101                                                                                                                                                                                                                                                              | $13 \times T_{delay}$                                                                                                                                                                                                                                                                                                                                              |
| $6 \times T_{delay}$ | 1110                                                                                                                                                                                                                                                              | $14 \times T_{delay}$                                                                                                                                                                                                                                                                                                                                              |
| $7 \times T_{delay}$ | 1111                                                                                                                                                                                                                                                              | $15 \times T_{delay}$                                                                                                                                                                                                                                                                                                                                              |
|                      | $\frac{\text{clock}_{s1}\&\text{clock}_{s3}}{0 \times T_{delay}}$ $\frac{1 \times T_{delay}}{2 \times T_{delay}}$ $\frac{3 \times T_{delay}}{4 \times T_{delay}}$ $\frac{5 \times T_{delay}}{6 \times T_{delay}}$ $\frac{6 \times T_{delay}}{7 \times T_{delay}}$ | $\begin{array}{c c} clock\_s1\&clock\_s3 & LSB\_4bit\\ \hline 0 \times T_{delay} & 1000\\ \hline 1 \times T_{delay} & 1001\\ \hline 2 \times T_{delay} & 1010\\ \hline 3 \times T_{delay} & 1011\\ \hline 4 \times T_{delay} & 1100\\ \hline 5 \times T_{delay} & 1101\\ \hline 6 \times T_{delay} & 1110\\ \hline 7 \times T_{delay} & 1111\\ \hline \end{array}$ |

 $T_{delay} = T_{12.8GHz}/64.$ 

two optical carriers. These phase offsets, represented as 6-bit PI codes, are then transmitted back to the transmitter (TX). Subsequently, the TX performs phase compensations for the corresponding wavelengths based on each packet, aligning the data delay in the RX for fast CDR locking.

### B. TX Node Model

TX transmits data at a bandwidth of 25.6G and has the capability to adjust the phases of the reference clock with a resolution of  $1/64 \times 2\pi$ . By employing proper phase compensation, the phases of the two wavelengths can be aligned at RX. The details of the phase interpolators are depicted in Fig. 2. The 12.8G reference clock is fed into 8 identical delay modules, resulting in 8 reference clocks with phase shifts of 45°, 90°, 135°, 180°, 225°, 270°, 315°, and 360°. Assuming the PI codes of the two optical carriers are already sent from the RX. The MSB 2bit of the 6-bit digit is utilized to select 2 clocks from the 8 reference clocks according to the correspondence specified in Table I. These selected clocks are labeled as clock\_s1 and clock\_s3. For instance, if the MSB\_2bit is '00', clocks with phase values of 0° and 180° are chosen. Subsequently, clock\_s1 and clock\_s3 serve as differential reference clocks. The LSB\_4bit is utilized to simultaneously delay clock s1 and clock s3 based on preset phase shift values, as illustrated in Table II. The delayed clocks



Fig. 3. (a) Overview of TX. (b) Waveforms in generating 25.6 clock.

are further labeled as clock\_1 and clock\_3. Clock\_1 and clock\_3 are both delayed by another 90° to obtain clock\_2 and clock\_4.

Clock\_1, clock\_2, clock\_3, and clock\_4 are combined to generate a 25.6G clock, as depicted in Fig. 3(a). Two D flip-flops are utilized, where clock\_1 and clock\_3 serve as the trigger clock, while clock\_4 and clock\_2 serve as the reset clock. Consequently, clock signals Q1 and Q2 are obtained, as shown in Fig. 3(b). Q1 and Q2 are then integrated by an OR circuit to produce the 25.6G reference clock, which is used to send data. The 25.6G reference clock possesses the desired phase shift.

In the simulation, we use repeated 16-bit sequences (0000,1111,0110,0101) to serve as the transmission data. This sequence is repeated 128 times to form a data packet of 2048 bits. Given that the transmission rate is set as 25.6 Gbps, it takes 80 ns to transmit a single data packet. An inter-packet gap of 20 ns is set to allow for tasks such as optical carrier tuning and phase compensation.

## C. RX Node Model

RX has a BBPD and FSM for phase determination and a PLL-based CDR for clock recovery. The phase determination mechanism in RX is shown in Fig. 4. Optical signals from the channel are distributed to 4 D flip-flops and sampled by clock\_1, clock\_2, clock\_3, and clock\_4. These four clocks are generated in a similar manner as in TX. Upon the 4-phase clock, specific logic is designed to determine whether the clock is leading or lagging the data. As shown in Fig. 5(a), clock\_1, clock\_2, and clock\_3 are first used to sample the data to obtain signals S1, S2, and S3. S1 and S2, S2 and S3 are then passed through two XNOR



Fig. 4. Details in phase determination in RX.



Fig. 5. (a) Waveforms in the BBPD sampling process, (b) the circuit details of phase determination process.

logic modules, which are sampled by  $clock_4$  (red circles and arrows in Fig. 5(a)). If S1 XNOR S2 = 1, and S2 XNOR S3 = 0, indicating that the clock is leading the data, then total\_lead = total\_lead+1. On the contrary, If S1 XNOR S2 = 0, and S2 XNOR S3 = 1, which indicates that the clock is lagging behind the data, then total\_lag = total\_lag+1. A similar process happens in S3, S4, and S1, which are sampled by clock\_2 (blue circles and arrows).

The decision regarding the relative timing of the clock and data signals is made every 64 bits of data by comparing the total number of leads and lags, as depicted in Fig. 5(b). Within each 64-bit data segment, these lead and lag judgments are made 32 times, which are added by two Adders. If, after these 32

| Algorithm: PI_code_determination(Total_lead, Total_lag)                    |  |  |  |
|----------------------------------------------------------------------------|--|--|--|
| <b>Input:</b> Total lead - total lead counts, Total lag – total lag counts |  |  |  |
| Output: PI_code                                                            |  |  |  |
| 1: Initialize PI_code_determination                                        |  |  |  |
| 2: Load Total_lead, Total_lag                                              |  |  |  |
| 3: <b>if</b> Total_lead > Total_lag <b>then</b>                            |  |  |  |
| 4: PI code $\leftarrow$ PI code + 1                                        |  |  |  |
| 5: end                                                                     |  |  |  |
| 6: <b>if</b> <i>Total_lead</i> > <i>Total_lag</i> <b>then</b>              |  |  |  |
| 7: $PI\_code \leftarrow PI\_code - 1$                                      |  |  |  |
| 8: end                                                                     |  |  |  |
| 9: <b>if</b> <i>Total_lead</i> = <i>Total_lag</i> <b>then</b>              |  |  |  |
| 10: PI_code $\leftarrow$ PI_code                                           |  |  |  |
| 11: end                                                                    |  |  |  |
| 12: return PI code                                                         |  |  |  |
| -                                                                          |  |  |  |

Fig. 6. Pseudocode of the FSM.

iterations, the total number of leads is equal to the total number of lags, the PI code remains unchanged. If the total number of leads is greater than the total number of lags, the PI code is decremented by 1. Conversely, if the total number of leads is less than the total number of lags, the PI code is incremented by 1. Subsequently, the updated PI codes are used to generate new sets of four-phase clocks. The Adders are reset to 0 and the lead or lag decisions are made again. The pseudocode of the FSM is presented in Fig. 6.

6-bit PI codes are used for precise phase control, resulting in a phase shifting precision of  $1/64 \times 2\pi$ . The range of phase adjustment during a single packet is  $2 \times 32 \times 1/64 \times 2\pi$  $= 2\pi$ , covering all possible conditions. Finally, the stable PI codes obtained through this iterative process are sent back to the transmitter (TX) for phase compensation, ensuring accurate phase alignment for the two wavelengths.

#### D. Optical Fiber Channel

The optical channel is modeled in Opti-system. As shown in Fig. S4 (supplementary materials), the data packet generated by the TX in Simulink is converted into NRZ electronic pulses by an NRZ pulse generator in Opti-System. Two continuous waves (CW) with wavelengths of 1550 nm and 1450 nm are generated using the laser modules to serve as optical carriers. These optical carriers are then routed through an optical switch, which alternates between the two carriers every 100 ns. The powers of the two CW laser modules vary in the transmission process (see Fig. S6 in supplementary materials). The modulated optical carrier is transmitted through an optical fiber modeled in Opti-system. The fiber parameters, such as length (1 km), reference wavelength (1500 nm), attenuation (0.2 dB/km), group velocity dispersion (17 ps/nm/km), and effective area (80  $\mu$ m<sup>2</sup>), are set to mimic Corning SMF-28 optical fiber characteristics. To represent temperature-induced phase variation, an 80 ps delay module is included. A photodetector module with a sensitivity of 1 A/W converts the optical signal back into an electrical signal. A low-pass Gaussian filter is applied to eliminate highfrequency noise, with a cutoff frequency set at 30 GHz and an



Fig. 7. (a) Signals of first two data packets, (b) lead votes, (c) lag votes, and (d) PI code altering.

insertion loss of 15 dB. A trans-conductance amplifier (TIA) with a gain of 40 dB converts the photocurrent into voltage. Finally, a limiting amplifier (LA) with a gain of 30 dB amplifies the voltage signal to a maximum of 1V and a minimum of 0 V.

#### **III. SIMULATION RESULTS AND DISCUSSIONS**

#### A. A Typical Transmission Process

Repeated packets were utilized to validate the CDR. The initial two packets in each cycle were employed to measure the phase delays of the optical carriers, and the obtained phase shifts (PI codes) were sent back to the TX. Fig. 7 illustrates a typical phase determination process. Fig. 7(a) displays the RX signals of the first two data packets. By employing the BBPD and FSM, the initial lead vote is observed to be 28, while the initial lag vote is 4 (see Fig. 7(b) and (c)). Based on these vote results, PI code is decremented by 1 and the reference clock is delayed by  $1/64 \times 2\pi$ . Then the phase detection process is repeated. After 5 decisions, the lag vote and lead vote become equal, indicating the attainment of a stable PI code (see Fig. 7(d)). It should be noted that slight fluctuations ( $\pm 1/64 \times 2\pi$ ) may occur after the lag votes and lead votes reach equal. However, these fluctuations resemble the CDR locking process of practical PI CDR circuits that are implemented by FPGAs and do not impact the data recovery process.

During the gap of the subsequent 99998 packets, TX adjusts the phases of the reference clock periodically to compensate the phase offsets, allowing PLL-based CDR to rapidly lock onto the correct phases of data packets for both wavelengths. The PI codes are updated every 1000000 packets (100 ms). The bit error ratio (BER) is calculated after a 100-second transmission test using a BER module modeled in MATLAB/Simulink (see



Fig. 8. Eye diagram of the received electrical signal with deterministic jitters.

Fig. S1). The BER module compares the recovered data with preset local data. With the phase compensation process, the received data is completely accurate, except for the first two packets, demonstrating an instant phase locking for the subsequent packets.

#### B. Transceiver Performance Under Interferes

Temperature variations are a significant factor affecting phase delays, as extensively studied in [20]. However, other parameters such as reference clock synchronization, channel jitters, and optical carrier power variations also impact the phases. To assess the stability of the OPS network with different reference clock source locations, we characterized the distribution of the 800 M optical source. By introducing two optical fiber modules with varying lengths, we designed five cases to examine different scenarios (see Fig. S7 in supplementary materials). The PI codes varied among the cases, indicating changes in detected phase offsets due to the different distances traveled by the optical reference clock. After performing corresponding phase compensation in the first two data packets, the recorded BERs for subsequent data packets were consistently 0 across all cases. This implies that the phase compensation CDR is insensitive to location variations of optical reference clock source.

We further introduced deterministic jitters into the channel. Sinusoidal jitters with peak-to-peak amplitudes  $(SJ_{\rm pkpk})$  of 0.1UI, 0.3UI, and 0.5UI were separately added at a frequency of 1MHz. The eye diagram of the received data is shown in Fig. 8. The PI codes produced by the FSM remained unchanged for the three deterministic jitters. This suggests that channel deterministic jitter of up to 0.5UI SJ<sub>pkpk</sub> has a negligible influence on the phase variation. The BER results after 1000 cycles of transmission tests showed a value of 0.

Next, we introduced random jitters into the system with different root mean square ( $J_{RMS}$ ) values: 0UI, 0.05UI, 0.1UI, 0.15UI, 0.20UI, and 0.25UI. The resulting eye diagrams are displayed in Fig. 9. The BER values, as shown in Fig. 10, demonstrated that when the RMS of random jitter was below 0.1UI, the recorded remained at 0. However, as the  $J_{RMS}$  exceeded 0.1UI, the BER increased exponentially with the RMS amplitude. The standard deviations were calculated from fifty sets of repeated experiments.

#### C. Influence of Carrier Power Variation

In the real data center environment, the carrier power may suffer from variations because of the mismatch of different light sources and semiconductor optical amplifiers. To evaluate the



Fig. 9. Eye diagram of the received electrical signal with random jitters of different RMS values.



Fig. 10. Relationship between J<sub>RMS</sub> and BER.

influence of optical carrier power variations, the optical carrier power is first set as 0 dBm. Then after transmitting the first 500000 packets, the powers of both optical carriers are switched to 2 dbm to send the following packets (Fig. S6 in supplementary materials). The eye diagram of the optical signals with zero jitters added is shown in Fig. 11(a). Fig. 11(b) shows the two signals after passing through LA. Signals of 0 dbm suffer from an obvious reduction in duty cycle, which shrinks the data sampling interval. But the additional phase offset is not observed in Fig. 11(b). BER results are all tested to be 0.

We further evaluate the performance of the CDR under the simultaneous influence of channel jitters and carrier power variations. As shown in Fig. 11(c), random jitter of 0.1UI RMS was added into the channel, and it can be seen that the optical signals suffered from obvious disturbances. After the signals pass through LA, a narrow eye diagram is observed from Fig. 11(d). The BER of the two carriers (0 dBm and 2 dBm) with the same random jitter is recorded by the BER module, which is  $6.2 \times 10^{-10}$  and  $2.7 \times 10^{-12}$  respectively. The BER can be further decreased by inserting error-correcting code into the data packets.

#### IV. CONCLUSION

In this study, we have presented a novel optoelectronic co-simulation method that combines Opti-system for optical simulations and MATLAB/Simulink for electrical simulations.



Fig. 11. (a) Optical carrier of 0 dBm power and 2 dBm power with no jitters, (b) its amplified signals after passing LA, (c) optical carrier of 0 dBm power and 2 dBm power with 0.1UI RMS jitter, (d) its amplified signals after passing LA.

Using this simulation method, we have designed and analyzed a nanosecond optical packet switch (OPS) network equipped with a nanosecond phase caching CDR. To provide more insights into of the CDR, we investigated the impact of reference clock location variations and found that it does not disrupt the CDR locking. Furthermore, we have examined the stability of the CDR by introducing channel jitters and optical carrier power variations into the simulation system. Our results demonstrate that the nanosecond CDR can tolerate deterministic jitters up to 0.5UI and random jitters up to 0.1UI (RMS). However, when random jitters exceed an RMS value of 0.1UI, the bit error rate increases exponentially. Additionally, the CDR can handle optical carrier power variations from 0 dBm to 2 dBm. Based on the evaluation and analysis, we conclude that the phase-compensation CDR is promising to support the nanosecond OPS networks for future data center communications and our simulation can assist in the design of Application-Specific Integrated Circuits (ASICs) specifically tailored for this technology.

#### REFERENCES

- [1] H. J. S. Dorren, E. H. M. Wittebol, R. de Kluijver, G. Guelbenzu de Villota, P. Duan, and O. Raz, "Challenges for optically enabled high-radix switches for data center networks," *J. Lightw. Technol.*, vol. 33, no. 5, pp. 1117–1125, Mar. 2015.
- [2] M. D. Diego, B. F. Giovanni, M. Melkzedekue, R. Mônica, and J. Carmo, "Silicon modulator design using a system-oriented methodology for high-speed data center interconnect PAM-4 applications," *Opt. Commun.*, vol. 492, 2021, Art. no. 126977.

- [3] A. Guleria, J. Lakshmi, and C. Padala, "QuADD: Quantifying accelerator disaggregated datacenter efficiency," in *Proc. IEEE 12th Int. Conf. Cloud Comput.*, 2019, pp. 349–357.
- [4] G. Wu, H. Gu, K. Wang, X. Yu, and Y. Guo, "A scalable AWG-based data center network for cloud computing," *Opt. Switching Netw.*, vol. 16, pp. 46–51, 2015.
- [5] A. Singh et al., "Jupiter rising: A decade of Clos topologies and centralized control in Google's datacenter network," *Commun. Assoc. Comput. Machinery*, vol. 45, pp. 183–197, 2015.
- [6] Y. C. Huang, Y. Yoshida, S. Ibrahim, R. Takahashi, A. Hiramatsu, and K. Kitayama, "Bypassing route strategy for optical circuits in OPSbased data center networks," *IEEE Photon. J.*, vol. 8, no. 2, Apr. 2016, Art. no. 0601510.
- [7] F. Yan, W. Miao, H. Dorren, and N. Calabretta, "Novel flat data center network architecture based on optical switches with fast flow control," *IEEE Photon. J.*, vol. 8, no. 2, Apr. 2016, Art. no. 0601310.
- [8] Y. Mori and K. I. Sato, "High-port-count optical circuit switches for intra-datacenter networks," J. Opt. Commun. Netw., vol. 13, pp. D43–D52, 2021.
- [9] P. Andreades, K. Clark, P. M. Watts, and G. Zervas, "Experimental demonstration of an ultra-low latency control plane for optical packet switching in data center networks," *Opt. Switching Netw.*, vol. 32, pp. 51–60, 2019.
- [10] K. Clark et al., "Synchronous subnanosecond clock and data recovery for optically switched data centres using clock phase caching," *Nature Electron.*, vol. 3, pp. 426–433, 2020.
- [11] M. Verbeke et al., "A 25 Gb/s all-digital clock and data recovery circuit for burst-mode applications in PONs," *J. Lightw. Tech.*, vol. 36, no. 8, pp. 1503–1509, Apr. 2018.
- [12] S.-H. Chu, W. Bae, G.-S. Jeong, J. Joo, G. Kim, and D.-K. Jeong, "A 26.5 Gb/s optical receiver with all-digital clock and data recovery in 65nm CMOS process," in *Proc. IEEE Asian Solid-State Circuits Conf.*, 2014, pp. 101–104.
- [13] J. Terada, O. Yusuke, K. Nishimura, K. Hiroaki, S. Kimura, and Y. Naoto, "Jitter-reduction and pulse-width-distortion compensation circuits for a 10Gb/s burst-mode CDR circuit," in *Proc. IEEE Int. Solid-State Circuits Conf.*, 2009, pp. 104–105.
- [14] N. Suzuki, K. Nakura, T. Suehiro, S. Kozaki, M. Nogami, and J. Nakagawa, "Over-sampling based burst-mode CDR technology for high-speed TDM-PON systems," in *Proc. Opt. Fiber Commun. Conf. Expo. Nat. Fiber Opt. Engineers Conf.*, 2011, pp. 1–3.
- [15] D.-H. Kwon, Y.-S. Park, and W.-Y. Choi, "A clock and data recovery circuit with programmable multi-level phase detector characteristics and a built-in jitter monitor," *IEEE Trans. Circuits Syst. I: Reg. Papers*, vol. 62, no. 6, pp. 1472–1480, Jun. 2015.
- [16] A. Rylyakov et al., "25 Gb/s burst-mode receiver for low latency photonic switch networks," *IEEE J. Solid-State Circuits*, vol. 50, no. 12, pp. 3120–3132, Dec. 2015.
- [17] X. K Xue and N. Calabretta, "Nanosecond optical switching and control system for data center networks," *Nature Commun.*, vol. 13, 2022, Art. no. 2257.
- [18] X. Xue, B. Pan, X. Guo, and N. Calabretta, "Flow-controlled and clockdistributed optical switch and control system," *IEEE Trans. Commun.*, vol. 70, no. 5, pp. 3310–3319, May 2022.
- [19] H. Ballani et al., "Sirius: A flat datacenter network with nanosecond optical switching," in Proc. Annu. Conf. Assoc. Comput. Machinery Special Int. Group Data Commun. Appl., Technol., Architectures, Protoc. Comput. Commun., 2020, pp. 782–797.
- [20] K. Clark, H. Ballani, P. Bayvel, D. Cletheroe, and Z. Liu, "Sub-nanosecond clock and data recovery in an optically-switched data centre network," in *Proc. Eur. Conf. Opt. Commun.*, 2018, pp. 1–3.
- [21] K. A. Clark et al., "Low thermal sensitivity hollow core fiber for opticallyswitched data centers," J. Lightw. Tech., vol. 38, no. 9, pp. 2703–2709, May 2020.