

Received 7 November 2022, accepted 21 November 2022, date of publication 24 November 2022, date of current version 30 November 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3224451



# A Fast-Lock All-Digital Clock Generator for Energy Efficient Chiplet-Based Systems

JUNGHOON JIN, (Student Member, IEEE), SEUNGJUN KIM, (Student Member, IEEE), AND JONGSUN KIM<sup>®</sup>, (Member, IEEE)

Department of Electronic and Electrical Engineering, Hongik University, Seoul 04066, South Korea

Corresponding author: Jongsun Kim (js.kim@hongik.ac.kr)

This work was supported in part by the Korea Institute for Advancement of Technology (KIAT) funded by the Korean Government (MOTIE) under Grant P0020966, in part by the National Research and Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT under Grant 2022M3I8A1077243, and in part by the EDA tools funded by IDEC.

**ABSTRACT** An all-digital clock frequency multiplier that achieves excellent locking time for an energyefficient chiplet-based system-on-chip (SoC) design is presented. The proposed architecture is based on an all-digital multiplying delay-locked loop (MDLL) to provide fast locking time and multiplied output clock frequency. The proposed MDLL has two operation modes: TDC tracking and sequential tracking. At the beginning of the operation, the MDLL utilizes a cyclic Vernier time-to-digital converter (TDC) to detect the initial phase error between the reference clock and the output clock. Then the TDC generates a digital code word (DCW) for controlling the digitally controlled oscillator (DCO) to achieve a fast lock time. The gains of TDC and DCO are designed to match well with each other, enabling phase and frequency locking in only two searches in the TDC tracking mode. After locking, the TDC is turned off, and the MDLL performs the sequential tracking mode and minimizes jitter by using the delta-sigma modulator (DSM)-based dithering jitter reduction scheme. The prototype all-digital MDLL is fabricated in a 40-nm CMOS process and achieves a fast lock time of less than six reference clock cycles at 1.6 GHz from a 100 MHz reference clock. Even when the 100 MHz reference clock has a relatively high RMS jitter of 2.19 ps (peak-to-peak jitter = 15.74 ps), the measured RMS and peak-to-peak jitter values of the 1.6 GHz MDLL output clock are only 2.75 ps and 23.01 ps, respectively. The proposed all-digital MDLL occupies an active area of only 0.024 mm<sup>2</sup> and dissipates 3.56 mW at 1.6 GHz.

**INDEX TERMS** MDLL, multiplying delay-locked loop, clock generator, time-to-digital converter, TDC, digitally-controlled oscillator, PLL, chiplet, die-to-die interface, heterogeneous integration, SoC, SiP.

#### I. INTRODUCTION

Recent advances in heterogeneous integration (HI) and packaging technologies make it possible to integrate separately manufactured smaller dies into a 2.5D or 3D level system in package (SiP) that provides enhanced functionality and improved performance with less production cost [1]. Multiple disparate chiplet dies (e.g., CPU, GPU, AI, analog, RF, memory, and ASIC) with different technology nodes can be linked horizontally and vertically with die-to-die interconnects using interposer technology and through-silicon-vias (TSVs). AMD, Intel, Marvell, and Nvidia have already released

The associate editor coordinating the review of this manuscript and approving it for publication was Teerachot Siriburanon.

heterogeneous chiplet-based processor designs [2], [3], [4], [5], [6], [7], [8]. Chiplets can break a large die into smaller dies with a higher yield. Therefore, chiplet-based system design requires ultra-high bandwidth die-to-die I/O interconnects between close dies. Ultra-fine pitch interconnects consisting of silicon interposers [9], organic interposers [10], and silicon bridges [11], [12] are developed to provide high die-to-die interconnect density. The current die-to-die I/O standards include: BOW [13], [14], Intel's AIB [15], TSMC's LIPINCON [16], Synopsys's OpenHBI [17], and universal chiplet interconnect express (UCIe) [18].

Die-to-die I/O interfaces can be implemented as a serial link or a parallel bus to transfer massive amounts of data with low latency and error-free characteristics. A serial link-based







FIGURE 1. Typical die-to-die I/O interface architectures based on (a) parallel bus (forwarded or source-synchronous clocking) (b) serial link (embedded clocking).

I/O interface minimizes the number of required lanes by using a simple extra-short reach/ultra-short reach (XSR/USR) SerDes physical layer devices (PHY) with a high data rate per lane of up to 112 Gbps [19], [20], [21] On the other hand, the parallel bus-based I/O interface provides the required bandwidth using a huge number of ultra-fine pitch low-speed (up to 16 Gbps/line) single-ended lines [22], [23], [24], [25], [26], [27].

Furthermore, die-to-die I/O links generally have a low loss channel with a shorter distance and lower latency compared to off-chip or off-package I/O links, which can further reduce the complexity and power consumption of the die-to-die I/O PHY.

Fig. 1 shows two typical die-to-die I/O interface architectures and their clocking schemes used for inter-chiplet communication. Die-to-die I/O interfaces are similar to general chip-to-chip I/O structures [28] on conventional multi-chip module (MCM) substrates or printed circuit boards (PCBs). However, in recent advanced 2.5D or 3D integration, the flip-chip bare dies are un-packaged and directly bonded on the die-to-die interconnect substrate. Therefore, the length of the inter-die wire is much shorter (usually below 10 mm), and the wire inductance is negligible [29]. Also, the required electrostatic discharge (ESD) protection circuitry overhead is much smaller than the off-package I/Os.

The first, shown in Fig. 1(a), is a parallel bus die-to-die I/O interface that uses a forwarded clocking (or source synchronous clocking) scheme [22], [23], [26], [27]. Usually, one clock lane and multiple data lanes are placed in parallel.

Chiplet A's PLL receives the low-frequency reference clock from an external clock source (i.e., crystal oscillators), multiplies it to high frequency, and transmits it to Chiplet B through the clock lane at full or reduced rates. Data lanes are usually single-ended for area efficiency, and the clock lane uses a differential line for signal integrity. In Chiplet B, a delay-locked loop (DLL) can be used to de-skew or maintain proper phase alignment between data and clock, regardless of process, voltage, and temperature (PVT) variation [14]. In some cases, quadrature clock-to-data timing is implemented between the forwarded clock and data lanes using a matched-delay clock forwarding scheme without using a DLL [26].

The second, shown in Fig. 1(b), is a serial link dieto-die I/O interface that uses an embedded clocking scheme [19], [20], [21]. These structures are particularly effective when ultra-high shoreline (or die edge) bandwidth density (=Gb/s/mm) is required. Chiplet A's PLL generates a high-frequency clock for driving the serializer and Tx driver. This structure does not use a separate clock lane for data transmission. In Chiplet B, a clock and data recovery (CDR) circuit is used to provide the driving clock of Rx that receives the transmitted data. That is, the clock for driving the Rx receiver is directly recovered from the incoming data, and the clock information is embedded in the transmitted data. This CDR-based serial link structure is often called simply SerDes because it contains a serializer and deserializer. This serial link can realize a very high data rate with a small number of data lanes and pins. Still, the power consumption is extensive due to power-hungry high-speed building blocks such as equalizers (EQs), PLLs, and CDRs.

As the performance and power consumption of chipletbased SoC increases, the importance of the I/O interface and clocking energy efficiency becomes crucial. We can consider the dynamic voltage and frequency scaling (DVFS) and burst mode communication as aggressive power management methods for reducing the chiplet I/O interface and clocking energy consumption. However, these methods inevitably require the clock generator's fast lock time or rapid power-on capability. It was shown that if a fast lock DLL is used to eliminate idle mode power in the mobile memory interface, about 30% energy efficiency can be improved (in a case of 40% CPU utilization) [30]. Furthermore, another mobile memory interface architecture utilizing a global synchronous PLL clock pause technique was introduced to enable rapid idleto-active power state transitions and achieve power-efficient bandwidth scaling [31].

On-chip clock generators typically use phase-locked loops (PLLs) to provide the necessary frequency multiplication and phase alignment functions. Output clocks of this PLL are transmitted to computation processor cores and communication I/O blocks through clock distribution networks. Reduction of lock time or power-on time of PLLs enables aggressive power management using DVFS, thereby reducing system power consumption [32], and even multiprocessor SoC's percore power management is possible [33]. Also, it can be seen that a fast-lock PLL plays an essential role in reducing





FIGURE 2. Proposed all-digital MDLL architecture.

I/O power in wire-line applications composed of high-speed serial links [34].

Although many all-digital ring oscillator-based PLLs targeting fast lock characteristics have been announced, most have a long lock time of more than several tens or hundreds of reference clock cycles [32], [35], [36]. Also, fast lock MDLLs have been proposed to improve further the locking time of clock generators [37], [38], [39], [40]. For example, [38] claims to have a fast lock time of two reference clock cycles by using an analog voltage-controlled oscillator (VCO) and a bang-bang phase detector (BBPD). However, this VCO-based MDLL attempts to lock by reusing the frequency code stored during the previous turn-off period, and a fast lock cannot be guaranteed when the PVT condition changes. [39] has a lock time of 16 cycles using a modified SAR-based binary search, but it is not easy to further reduce the lock time. Similarly, [40] achieved a lock time of 40 reference cycles using a SAR-based binary search. As described above, conventional fast lock PLLs and MDLLs generally have a lock time of at least several tens of reference clock cycles. Clock generators claiming a lock time of less than ten reference clock cycles usually reuse pre-recorded frequency codes [38], which cannot be a suitable solution due to PVT variations.

This paper presents a new all-digital MDLL-based fast lock clock generator for low-power chiplet-based SoC design. The proposed all-digital MDLL measures the initial phase error using a wide-range fine-resolution time-to-digital converter (TDC) and utilizes this information to the digitally-controlled oscillator (DCO) control to implement a fast lock time of less than 6 reference clock cycles. This is the first all-digital MDLL-based clock frequency multiplier with measurement results showing how TDC can be applied to a digital MDLL to achieve fast lock capability. The proposed chip was implemented with a conventional analog design flow.

The rest of the paper is organized as follows: Section II introduces the proposed all-digital MDLL architecture and circuit design. Section III presents the measured results. Finally, section IV concludes the paper.

# II. PROPOSED ALL-DIGITAL MDLL ARCHITECTURE AND CIRCUIT DESIGN

# A. MDLL ARCHITECTURE AND OPERATION MODES

Fig. 2 shows the block diagram of the implemented all-digital MDLL. It is composed of a DCO, a select logic (SEL), a frequency divide-by-N divider (DIV) with N=16, a bangbang phase detector (BBPD), a cyclic Vernier TDC, two digital loop filters (9-bit DLF#1 and 4-bit DLF#2), three



FIGURE 3. (a) Operation modes of the proposed all-digital MDLL (b) Locking process of the proposed MDLL.



FIGURE 4. Simulated DSM-based dithering jitter reduction operation.



FIGURE 5. Proposed digitally-controlled oscillator (DCO).

binary-to-thermometer decoders (5-to-31, 4-to-15, and 2-to-3), a second order delta-sigma modulator (DSM), a lock detector, a DSM controller, and a start controller.

Fig. 3(a) is a flowchart illustrating the operation modes of the proposed all-digital MDLL. The proposed MDLL

has two operation modes: TDC tracking and sequential tracking [41]. The locking process is described in more detail in the time domain in Fig. 3(b). When the MDLL is turned on, the MDLL starts at the maximum operating frequency, and the TDC tracking mode is first performed. The TDC measures the initial phase error ( $\Delta t$ ) between the MDLL output clock (CLK<sub>OUT</sub>) and the reference input clock (CLK<sub>REF</sub>) through a coarse/fine TDC search. The generated TDC output code, TDC[8:0], is then applied to the DCO control through DLF#1. The 9-bit DLF#1 filters the 9-bit TDC code and generates the input DCW[8:0] codes of the 5-to-31 and 4-to-15 binary-to-thermometer decoders. In TDC mode, the gain of this DLF#1 corresponds to 1. Three reference clock cycles are required for one coarse/fine TDC search and DCO application. In this design, the TDC search is performed at most twice, corresponding to six reference clock cycles, where Fig. 3(b) explains when it is performed only once for simplicity.

In Fig. 3, the phase lock condition is when the residual phase error  $\Delta t$  is reduced to less than 10 ps in this design. If the phase lock condition is met after the TDC search, the MDLL turns off the TDC and starts sequential tracking by using the BBPD as follows: First, with the DSM turned off, the MDLL continues its sequential tracking to reduce the residual phase error further. Second, when the COMP (=output of the BBPD) signal changes, the MDLL turns on the DSM and continues the sequential tracking to maintain a closed loop and achieve dithering jitter reduction. Then, the 4-bit DLF#2 shown in Fig. 2, acting as an accumulator using the COMP signal, generates the DLF[3:0] signal. The second-order DSM provides the high-frequency DSM[1:0] signal operating at a frequency 16 times higher than the reference clock. And the 2-to-3 decoder provides the D[2:0] signal that controls the dithering cell of the DCO. Fig. 4 shows the





FIGURE 6. Proposed select (SEL) logic (a) Schematic (b) initial operation timing diagram.



FIGURE 7. Proposed cyclic Vernier TDC (a) Architecture (b) initial operation.

simulation results in which this DSM-based dithering jitter reduction scheme effectively improves the jitter performance. If there is no DSM, F[14:0], which controls the fine cells of the DCO, is toggled by at least one bit at every reference clock injection, which causes a large reference spur and increases the deterministic jitter [43].

Fig. 5 shows the block diagram of the implemented DCO, which consists of a pseudo-differential 2-to-1 multiplexer (MUX) and four stages of pseudo-differential delay cells. The output signals of the two decoders (5-to-31/4-to-15), C[30:0] and F[14:0], are used to adjust the coarse cells and fine cells of the DCO, respectively. The delay cell's



coarse delay resolution is about 1.9 ps, and the fine delay resolution is about 0.13 ps. The resolution of the dithering cell is the same as the resolution of the fine cell. In this design, when F[14:0] increases by one bit, the amount of phase delay shift of the DCO is about 16.6 ps (=0.13 ps  $\times$  4 delay cells  $\times$  2  $\times$  N, where N = 16).

Fig. 6(a) shows the structure of the select logic (SEL), which is similar to that used in [42]. Fig. 6(b) shows the initial operation timing diagram of the select logic before phase locking. When the output SEL signal is low, the DCO forms a closed loop and performs ring oscillation. When the SEL signal becomes high, the reference clock is injected, and the accumulated jitters are removed.

#### B. GAIN MATCHED TDC ARCHITECTURE

Fig. 7(a) shows the simplified architecture of the proposed 9-bit cyclic Vernier TDC. It comprises two slow/fast oscillators, an edge lock detector, a TDC\_OFF detector, a slow enable block, a fast enable block, a reset generator, two 5-bit coarse/fine counters, and a divide-by-2 divider. To minimize the mismatch problem and achieve enhanced gain matching between the TDC and DCO, the structure of the delay elements inside the two fast/slow oscillators is identical to that used in the DCO.

The period of the slow oscillator is  $T_{SOSC}$ , and the period of the fast oscillator is  $T_{FOSC}$ . The resolution of the TDC is determined by  $T_{SOSC}$ – $T_{FOSC}$ . The TDC gain ( $K_{TDC}$ ) is defined by the ratio between the TDC[8:0] code value and the TDC input time difference. To achieve high linearity, the maximum value F[4:0] of the 5-bit fine counter should be the same as the least-significant bit (LSB) of the 5-bit coarse counter. Since the proposed cyclic Vernier TDC uses a ring oscillator structure, if the number of bits (=K) of the coarse counter is increased, the input detection range can be increased in proportion to  $2^K$ . In this design, 5-bit C[4:0] is used for coarse bit counting, so the input time detection range of this TDC corresponds to about 8.5 ns.

Fig. 7(b) shows the initial operation of the proposed TDC. When the MDLL is turned on, the TDC first enables the  $C\_TDC_{EN}$  on the (n+1)th rising edge of  $CLK_{OUT}$  to operate the slow oscillator. Then, on the rising edge of the next  $CLK_{REF}$ , the  $F\_TDC_{EN}$  is enabled to turn on the fast oscillator. The coarse counter counts the number of SOSC pulses between the  $C\_TDC_{EN}$  and  $F\_TDC_{EN}$  to generate C[4:0]. The fine counter counts the number of FOSC pulses between the rising edge of  $F\_TDC_{EN}$  and the rising edge of Detect to generate F[4:0].

Fig. 8(a) shows a simplified feedback loop between the TDC and DCO in this design. When designing a TDC-based MDLL, it is essential to reduce the TDC latency and match between  $K_{TDC}$  (=TDC gain) and  $K_{DCO}$  (=DCO gain) to implement fast lock time. The TDC latency means the time it takes to generate an output code (TDC[8:0]) by comparing the two TDC inputs (CLK<sub>REF</sub> and CLK<sub>OUT</sub>). Fig. 8(b) shows the post-layout simulation results for the gain characteristics of the TDC and DCO used in this design. Here, the output



**FIGURE 8.** (a) Simplified feedback loop between the TDC and DCO (b) Simulated gain matched TDC-DCO characteristics.



FIGURE 9. Simulated DNL and INL of the proposed TDC.

code (TDC[8:0]) characteristic for the input time difference of TDC is indicated by a red line, and this slope corresponds to the TDC gain,  $K_{TDC}$ . Also, the input code (DCW[8:0]) on the y-axis and the output delay shift amount of the DCO are indicated by a blue line, and this slope corresponds to the DCO gain,  $K_{DCO}$ . In the TDC tracking mode, the gain of the loop filter (DLF#1) is one, so the gain of TDC and DCO was directly compared. As shown in Fig. 8(b), the TDC gain and DCO gain are well designed so that the gain characteristics match each other. It can be confirmed that  $K_{TDC} = 1/K_{DCO}$  (or  $K_{TDC} \cdot K_{DCO} = 1$ ) in almost all ranges. If the gain matching relationship between  $K_{TDC}$  and  $K_{DCO}$  changes, the MDLL lock time may increase.

The simulated differential nonlinearity (DNL) and integral non-linearity (INL) of the TDC is shown in Fig. 9. The TDC achieved the maximum DNL of  $\pm 0.368$  LSB and INL of  $\pm 0.461$  LSB, respectively.

## **III. MEASUREMENT RESULTS**

The prototype all-digital MDLL chip was fabricated in a 40-nm CMOS process with an active area of 0.024 mm<sup>2</sup>. Fig. 10(a) shows the die, test board, and chip layout of





FIGURE 10. Proposed MDLL (a) die, test board, and chip layout (b) measurement setup.

the implemented MDLL. The chip is packaged in a quad flat no-lead (QFN) package. Fig. 10(b) shows the measurement setup used to probe the prototype IC. The input (100 and 200 MHz) reference clock (CLK<sub>REF</sub>) is obtained from the TI LMK62XX PLL IC, which is mounted on the test board. The digital oscilloscope (Tektronix DPO71254C) is used for the time domain jitter measurements. The spectrum analyzer (Agilent E4440A) is used to measure the reference spurs. The measurements are performed by on-chip probing on the test board. The proposed MDLL operates over a frequency range of 1.6-to-3.2 GHz from a 1.1 V supply. The MDLL consumes 3.56 mW at 1.6 GHz (N = 16) with a 100 MHz reference clock.

Fig. 11 shows the measured locking process of the prototype all-digital MDLL operating at 1.6 GHz with a frequency multiplication factor N = 16. As shown in Fig. 11(a), a pre-run for three reference cycles was intentionally allocated for the test before starting the MDLL. At this time, it can be seen that the DCO inside the MDLL operates at the maximum frequency, and the initial phase error  $\Delta t$  is kept constant. In Fig. 11(a), it can be seen that the TDC search was performed twice, and the phase and frequency locking of this MDLL was obtained within 6 reference clock cycles. Fig. 11(b) shows that the measured initial phase error ( $\Delta t$ ) of the MDLL is about 3.03 ns. Finally, in Fig. 11(c), it can be seen that the phases of the input and output clocks are well aligned with almost zero phase difference after locking, and the 1.6 GHz output clock multiplied by N (=16) times is appropriately generated. Assuming a residual phase error of 10 ps after two TDC searches, the calculated maximum frequency error at the locking point is approximately 0.1 %.

Fig. 12 shows the jitter measurement results. As shown in Fig. 12(a), the root-mean-square (RMS) and peak-to-peak (p-p) jitter of the 100 MHz input reference clock (CLK<sub>REF</sub>)







FIGURE 11. Measured all-digital MDLL operation (a) full locking process with a lock time of less than 6 reference clock cycles (b) initial locking point (c) after locking.

provided by the PLL IC on the PCB are 2.19 ps and 15.74 ps, respectively. The main reason for the high jitter characteristics of the measured input  $CLK_{REF}$  is that channel termination is not perfect in the on-board interconnect between the PLL IC and the MDLL chip, as shown in Fig. 10(b). Although this low-quality input clock was used as a reference input clock, the prototype MDLL achieved excellent jitter characteristics. As shown in Fig. 12(a), the measured RMS and p-p jitter of the 1.6 GHz output clock ( $CLK_{OUT}$ ) are 2.75 ps and 23.01 ps,



|                                     | [32]         | [35]         | [36]         | [39]          | [40]                     | This work                   |
|-------------------------------------|--------------|--------------|--------------|---------------|--------------------------|-----------------------------|
| Architecture                        | PLL          | PLL          | PLL          | MDLL          | MDLL                     | MDLL                        |
| Process                             | 65 nm        | 90 nm        | 28 nm        | 350 nm        | 65 nm                    | 40 nm                       |
| Supply [V]                          | -            | 1.0          | 1.0          | 3.3           | 1.0                      | 1.1                         |
| Ref. Clock Freq. [MHz]              | 20-50        | 20           | 60           | 4–200         | 300–1200                 | 100-200                     |
| Output Freq. [GHz]                  | 1–2          | 0.06-0.6     | 0.6–2.6      | 0.06-0.45     | 0.7–2.0                  | 1.6-3.2 a                   |
| Multiplication Factor [N]           | 20–40        | 2–128        | 10–43        | 2–15          | N=1,4,5,8,10/<br>M=1,2,3 | 16                          |
| Lock Time [Ref. cycles]             | 35           | 4 b          | 60           | 16            | 40                       | 6                           |
| Lock Time [ns]                      | -            | 800          | 1000         | -             | -                        | 60                          |
| Peak-to-peak jitter [ps] @x-<br>GHz | 7.55@1.5     | 102@0.6      | -            | 37.8@0.45     | 22@2                     | 23.01 ° @1.6<br>28.01° @3.2 |
| RMS jitter [ps] @x-GHz              | 3.09<br>@1.5 | 13.7<br>@0.6 | 2.53<br>@2.4 | 4.04<br>@0.45 | 2.8<br>@2                | 2.75 d @1.6<br>3.097 d @3.2 |
| Power [mW] @x-GHz                   | 10.8<br>@1.5 | 0.92<br>@0.6 | 2.89<br>@2.4 | 17<br>@0.45   | 3.31<br>@1               | 3.56<br>@1.6                |
| Chip Area [mm²]                     | 0.0438       | 0.065        | 0.0255       | 0.216         | 0.019                    | 0.024                       |
| FOM <sub>1</sub>                    | -219.86      | -217.62      | -227.33      | -215.56       | -225.86                  | -225.70 @1.6GHz             |
| FOM <sub>2</sub>                    | -349.76      | -339.56      | -347.33      | -351.48       | -363.36                  | -370.14 @1.6GHz             |

TABLE 1. Performance comparison of state-of-the-art all-digital fast-lock frequency multipliers employing ring-based PLLs and MDLLs.

- <sup>a.</sup> 3.2 GHz output clock using 200 MHz input clock freq.
- b. Lock time of 4 cycles with 5% frequency error.
- <sup>c</sup>. These p-p jitters include an input clock's p-p jitter of 15.74 pa at 100 MHz (19.146 ps at 200 MHz).
- d. These RMS jitters include an input clock's RMS jitter of 2.19 ps at 100 MHz (2.045 ps at 200 MHz).
- e.  $FOM_1 = 10 \log[ (J_{RMS} / 1sec)^2 \times power (mW) ]$
- <sup>f.</sup>  $FOM_2 = FOM_1 + 10 \log[(locking-time / 1sec)^2]$ .



FIGURE 12. Measured jitter performance (a) 100 MHz input ( $CLK_{REF}$ ) and 1.6 GHz MDLL output ( $CLK_{OUT}$ ) (b) 200 MHz input ( $CLK_{REF}$ ) and 3.2 GHz MDLL output ( $CLK_{OUT}$ ).

respectively. Fig. 12(b) shows the p-p (=28.1 ps) and RMS (=3.09 ps) jitter values of the output clock at 3.2 GHz. If the



FIGURE 13. Power consumption breakdown of the proposed MDLL at 1.6 GHz.

input jitter is subtracted from the output jitter, the effective RMS and p-p jitter of the proposed MDLL (@3.2 GHz) are only 1.05 ps and 8.96 ps, respectively. Fig. 13 shows the power consumption breakdown of the proposed MDLL at 1.6 GHz.

Table 1 compares the performance of state-of-the-art all-digital fast-lock integer-N frequency multipliers employing ring-based PLLs and MDLLs. [34] claims a lock time of four reference clock cycles, but the frequency error is as high as 5%, so the actual lock time is considerable, more than 1  $\mu$ s. Therefore, to the best of our knowledge, the proposed all-digital MDLL achieves the shortest lock time of less than six reference clock cycles or 60 ns. Two types of figure-of-merits (FOMs) [36] were used to compare the performance of the clock frequency multipliers in Table 1. The proposed all-digital MDLL achieves the best FOM<sub>2</sub> despite using an input clock source with significant jitter values.



## **IV. CONCLUSION**

Fast lock clock generators are essential for energy-efficient chiplet-based SoCs requiring dynamic frequency scaling. This paper presents a new all-digital MDLL utilizing a cyclic Vernier TDC that achieves excellent locking time. The proposed all-digital MDLL is fabricated in 40-nm CMOS technology and achieves fast lock time of less than six reference clock cycles at 1.6 GHz from a 100 MHz reference clock. The measured FOM<sub>2</sub> is –370.14 dB at 1.6 GHz, which shows the best lock time performance compared to the state-of-the art all-digital integer-N clock generators.

#### **ACKNOWLEDGMENT**

The authors would like to thank MetaCNI for their support.

#### **REFERENCES**

- P. Vivet et al., "IntAct: A 96-core processor with six chiplets 3D-stacked on an active interposer with distributed interconnects and integrated power management," *IEEE J. Solid-State Circuits*, vol. 56, no. 1, pp. 79–97, Jan. 2021.
- [2] A. C. Durgun, Z. Qian, K. Aygun, R. Mahajan, T. T. Hoang, and S. Y. Shumarayev, "Electrical performance limits of fine pitch interconnects for heterogeneous integration," in *Proc. IEEE 69th Electron. Compon. Technol. Conf. (ECTC)*, Las Vegas, NV, USA, May 2019, pp. 667–673.
- [3] M. Wade et al., "TeraPHY: A chiplet technology for low-power, high-bandwidth in-package optical I/O," *IEEE Micro*, vol. 40, no. 2, pp. 63–71, Mar./Apr. 2020.
- [4] L. T. Su, S. Naffziger, and M. Papermaster, "Multi-chip technologies to unleash computing performance gains over the next decade," in *IEDM Tech. Dig.*, San Francisco, CA, USA, Dec. 2017, p. 1.
- [5] R. Viswanath, A. Chandrasekhar, S. Srinivasan, Z. Qian, and R. Mahajan, "Heterogeneous SoC integration with EMIB," in *Proc. IEEE Electr. Design Adv. Packag. Syst. Symp. (EDAPS)*, Chandigarh, India, Dec. 2018, pp. 1–3.
- [6] W. W.-M. Dai, "Historical perspective of system in package (SiP)," *IEEE Circuits Syst. Mag.*, vol. 16, no. 2, pp. 50–61, 2nd Quart., 2016.
- [7] G. Singh and S. Ahmad, "Xilinx 16 nm datacenter device family with inpackage HBM and CCIX interconnect," in *Proc. IEEE Hot Chips Symp.*, Cupertino, CA, USA, 2017.
- [8] D. Stow, Y. Xie, T. Siddiqua, and G. H. Loh, "Cost-effective design of scalable high-performance systems using active and passive interposers," in *Proc. IEEE/ACM Int. Conf. Computer-Aided Design (ICCAD)*, Irvine, CA, USA, Nov. 2017, pp. 728–735.
- [9] S. S. Iyer, "Heterogeneous integration for performance and scaling," *IEEE Trans. Compon., Packag., Manuf. Technol.*, vol. 6, no. 7, pp. 973–982, Jul. 2016.
- [10] K. Oi, S. Otake, N. Shimizu, S. Watanabe, Y. Kunimoto, T. Kurihara, T. Koyama, M. Tanaka, L. Aryasomayajula, and Z. Kutlu, "Development of new 2.5D package with novel integrated organic interposer substrate with ultra-fine wiring and high density bumps," in *Proc. IEEE 64th Electron. Compon. Technol. Conf. (ECTC)*, Lake Buena Vista, FL, USA, May 2014, pp. 348–353.
- [11] H. Braunisch, A. Aleksov, S. Lotz, and J. Swan, "High-speed performance of silicon bridge die-to-die interconnects," in *Proc. IEEE 20th Conf. Electr. Perform. Electron. Packag. Syst.*, San Jose, CA, USA, Oct. 2011, pp. 95–98.
- [12] R. Mahajan, Z. Qian, R. S. Viswanath, S. Srinivasan, K. Aygün, W.-L. Jen, S. Sharan, and A. Dhall, "Embedded multidie interconnect bridge—A localized, high-density multichip packaging interconnect," *IEEE Trans. Compon., Packag., Manuf. Technol.*, vol. 9, no. 10, pp. 1952–1962, Oct. 2019.
- [13] S. Ardalan, R. Farjadrad, M. Kuemerle, K. Poulton, S. Subramaniam, and B. Vinnakota, "An open inter-chiplet communication link: Bunch of wires (BoW)," *IEEE Micro*, vol. 41, no. 1, pp. 54–60, Jan. 2021.
- [14] S. Ardalan, H. Cirit, R. Farjad, M. Kuemerle, K. Poulton, S. Subramanian, and B. Vinnakota, "Bunch of wires: An open die-to-die interface," in *Proc. IEEE Symp. High-Perform. Interconnects (HOTI)*, Piscataway, NJ, USA, Aug. 2020, pp. 9–16.

- [15] D. Kehlet. Advanced Interface Bus (AIB) Specifications, 1.2. (2019). Accelerating Innovation Through A Standard Chiplet Interface: The Advanced Interface Bus (AIB). Intel Corp. Santa Clara, CA, USA. Accessed: Oct. 1, 2022. [Online]. Available: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/accelerating-innovation-through-aib-whitepaper.pdf
- [16] M. Lin, T.-C. Huang, C.-C. Tsai, K.-H. Tam, C.-H. Hsieh, T. Chen, W.-H. Huang, J. Hu, Y.-C. Chen, S. K. Goel, C.-M. Fu, S. Rusu, C.-C. Li, S.-Y. Yang, M. Wong, S.-C. Yang, and F. Lee, "A 7 nm 4 GHz Arm-core-based CoWoS chiplet design for high performance computing," in *Proc. Symp. VLSI Circuits*, Kyoto, Japan, Jun. 2019, pp. C28–C29.
- [17] K. Ma, A. He, T. Wilson, M. S. Chae, J. Bostak, P. Nyasulu, B. Worobey, B. A. Nguyen, M. Mittal, and X. Wang. Open High Bandwidth Interconnect (OpenHBI) Specifications, 1.0. (2021). OpenHBI Specification Version 1.0. Xilinx Corp. San Jose, CA, USA. Accessed: Oct. 1, 2022. [Online]. Available: https://www.opencompute.org/documents/odsa-openhbi-v1-0-spec-rc-final-1-pdf
- [18] D. D. Sharma, G. Pasdast, Z. Qian, and K. Aygün, "Universal chiplet interconnect express (UCIe): An open industry standard for innovations with chiplets at package level," *IEEE Trans. Compon., Packag., Manuf. Technol.*, vol. 12, no. 9, pp. 1423–1431, Sep. 2022.
- [19] C. F. Poon, W. Zhang, J. Cho, S. Ma, Y. Wang, Y. Cao, A. Laraba, E. Ho, W. Lin, D. Z. Wu, K. H. Tan, P. Upadhyaya, and Y. Frans, "A 1.24-pJ/b 112-Gb/s (870 Gb/s/Mm) transceiver for in-package links in 7-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 57, no. 4, pp. 1199–1210, Apr. 2022.
- [20] G. Gangasani, D. Hanson, D. Storaska, H. H. Xu, M. Kelly, M. Shannon, M. Sorna, M. Wielgos, P. B. Ramakrishna, S. Shi, S. Parker, U. K. Shukla, W. Kelly, W. Su, and Z. Yu, "A 1.6Tb/s chiplet over XSR-MCM channels using 113Gb/s PAM-4 transceiver with dynamic receiver-driven adaption of TX-FFE and programmable roaming taps in 5 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2022, pp. 122–124.
- [21] R. Yousry, E. Chen, Y.-M. Ying, M. Abdullatif, M. Elbadry, A. ElShater, T.-B. Liu, J. Lee, D. Ramachandran, K. Wang, C.-H. Weng, M.-L. Wu, and T. Ali, "11.1 A 1.7 pJ/b 112 Gb/s XSR transceiver for intra-package communication in 7 nm FinFET technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2021, pp. 180–182.
- [22] B. Dehlaghi and A. C. Carusone, "A 0.3 pJ/bit 20 Gb/s/wire parallel interface for die-to-die communication," *IEEE J. Solid-State Circuits*, vol. 51, no. 11, pp. 2690–2701, Nov. 2016.
- [23] T. O. Dickson, Y. Liu, S. V. Rylov, B. Dang, C. K. Tsang, P. S. Andry, J. F. Bulzacchelli, H. A. Ainspan, X. Gu, L. Turlapati, M. P. Beakes, B. D. Parker, J. U. Knickerbocker, and D. J. Friedman, "An 8x 10-Gb/s source-synchronous I/O system based on high-density silicon carrier interconnects," *IEEE J. Solid-State Circuits*, vol. 47, no. 4, pp. 884–896, Apr. 2012.
- [24] T. O. Dickson, Y. Liu, S. V. Rylov, A. Agrawal, S. Kim, P.-H. Hsieh, J. F. Bulzacchelli, M. Ferriss, H. A. Ainspan, A. Rylyakov, B. D. Parker, M. P. Beakes, C. Baks, L. Shan, Y. Kwark, J. A. Tierno, and D. J. Friedman, "A 1.4 pJ/bit, power-scalable 16×12 Gb/s source-synchronous I/O with DFE receiver in 32 nm SOI CMOS technology," *IEEE J. Solid-State Circuits*, vol. 50, no. 8, pp. 1917–1931, Aug. 2015.
- [25] J. W. Poulton, W. J. Dally, X. Chen, J. G. Eyles, T. H. Greer, S. G. Tell, J. M. Wilson, and C. T. Gray, "A 0.54 pJ/b 20 Gb/s ground-referenced single-ended short-reach serial link in 28 nm CMOS for advanced packaging applications," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3206–3218, Dec. 2013.
- [26] J. W. Poulton, J. M. Wilson, W. J. Turner, B. Zimmer, X. Chen, S. S. Kudva, S. Song, S. G. Tell, N. Nedovic, W. Zhao, S. R. Sudhakaran, C. T. Gray, and W. J. Dally, "A 1.17-pJ/b, 25-Gb/s/pin ground-referenced single-ended serial link for off- and on-package communication using a process- and temperature-adaptive voltage regulator," *IEEE J. Solid-State Circuits*, vol. 54, no. 1, pp. 43–54, Jan. 2019.
- [27] S. Ma, H. Yu, Q. J. Gu, and J. Ren, "A 5–10-Gb/s 12.5-mW source synchronous I/O interface with 3-D flip chip package," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 2, pp. 555–568, Feb. 2019.
- [28] B. Casper and F. O'Mahony, "Clocking analysis, implementation and measurement techniques for high-speed data links—A tutorial," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 56, no. 1, pp. 17–39, Jan. 2009.
- [29] S. Pal and P. Gupta, "Pathfinding for 2.5D interconnect technologies," in Proc. ACM/IEEE Int. Workshop Syst. Level Interconnect Predict. (SLIP), San Diego, CA, USA, Nov. 2020, pp. 1–8.



- [30] W. S. Choi, T. Anand, G. Shu, A. Elshazly, and P. K. Hanumolu, "A burst-mode digital receiver with programmable input jitter filtering for energy proportional links," *IEEE J. Solid-State Circuits*, vol. 50, no. 3, pp. 737–748, Mar. 2015.
- [31] B. Leibowitz, R. Palmer, J. Poulton, Y. Frans, S. Li, J. Wilson, M. Bucher, A. M. Fuller, J. Eyles, M. Aleksic, T. Greer, and N. M. Nguyen, "A 4.3 GB/s mobile memory interface with power-efficient bandwidth scaling," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 889–898, Apr. 2010.
- [32] F. Rahman, G. Taylor, and V. Sathe, "A 1–2 GHz computational-locking ADPLL with sub-20-cycle locktime across PVT variation," *IEEE J. Solid-State Circuits*, vol. 54, no. 9, pp. 2487–2500, Sep. 2019.
- [33] S. Höppner, S. Haenzsche, G. Ellguth, D. Walter, H. Eisenreich, and R. Schüffny, "A fast-locking ADPLL with instantaneous restart capability in 28-nm CMOS technology," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 60, no. 11, pp. 741–745, Nov. 2013.
- [34] T. Anand, M. Talegaonkar, A. Elkholy, S. Saxena, A. Elshazly, and P. K. Hanumolu, "A 7 Gb/s embedded clock transceiver for energy proportional links," *IEEE J. Solid-State Circuits*, vol. 50, no. 12, pp. 3101–3119, Dec. 2015.
- [35] C. C. Chung, W. S. Su, and C. K. Lo, "A 0.52/1 V fast lock-in ADPLL for supporting dynamic voltage and frequency scaling," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 1, pp. 408–412, Jan. 2016.
- [36] H. H. Cheong and S. Kim, "A fast-locking all-digital PLL with triple-stage phase-shifting," *IEEE Access*, vol. 9, pp. 160224–160237, 2021.
- [37] T. Anand, A. Elshazly, M. Talegaonkar, B. Young, and P. K. Hanumolu, "A 5 Gb/s, 10 ns power-on-time, 36 μW off-state power, fast power-on transmitter for energy proportional links," *IEEE J. Solid-State Circuits*, vol. 49, no. 10, pp. 2243–2258, Oct. 2014.
- [38] D. Wei, T. Anand, G. Shu, J. E. Schutt-Ainé, and P. K. Hanumolu, "A 10-Gb/s/ch, 0.6-pJ/bit/mm power scalable rapid-ON/OFF transceiver for on-chip energy proportional interconnects," *IEEE J. Solid-State Cir*cuits, vol. 53, no. 3, pp. 873–883, Mar. 2018.
- [39] C.-K. Liang, R.-J. Yang, and S.-I. Liu, "An all-digital fast-locking programmable DLL-based clock generator," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 55, no. 1, pp. 361–369, Feb. 2008.
- [40] J. Kim and S. Han, "A fast-locking all-digital multiplying dll for fractionalratio dynamic frequency scaling," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 65, no. 3, pp. 276–280, Mar. 2018.
- [41] D. Park, S. Choi, and J. Kim, "A fast lock all-digital MDLL using a cyclic Vernier TDC for burst-mode links," *Electronics*, vol. 10, no. 2, p. 177, Ian 2021
- [42] B. M. Helal, M. Z. Straayer, G.-Y. Wei, and M. H. Perrott, "A highly digital MDLL-based clock multiplier that leverages a self-scrambling time-to-digital converter to achieve subpicosecond jitter performance," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 855–863, Apr. 2008.
- [43] D. Park and J. Kim, "A low-jitter 2.4 GHz all-digital MDLL with a dithering jitter reduction scheme for 256 times frequency multiplication," *IEICE Electron. Exp.*, vol. 17, no. 19, pp. 1349–2543, Sep. 2020.



JUNGHOON JIN (Student Member, IEEE) received the B.S. degree in electronics and electrical engineering from Hongik University, Seoul, South Korea, in 2020. He is currently pursuing the M.S. degree. His current research interests include high-speed I/O transceiver circuits, phase-locked loops (PLLs), delay-locked loops (DLLs), multiplying delay-locked-loops (MDLLs), and frequency synthesizers.



**SEUNGJUN KIM** (Student Member, IEEE) received the B.S. degree in electronics and electrical engineering from Hongik University, Seoul, South Korea, in 2020. He is currently pursuing the M.S. degree. His current research interests include high-speed and low-power transceiver circuits, high-speed SerDes/CDRs, and jitter attenuating frequency multipliers.



JONGSUN KIM (Member, IEEE) received the Ph.D. degree in electrical engineering from the University of California at Los Angeles (UCLA), Los Angeles, in 2006, in the field of integrated circuits and systems. He was a Postdoctoral Fellow at UCLA, from 2006 to 2007. From 1994 to 2001 and from 2007 to 2008, he was with Samsung Electronics as a Senior Research Engineer in the DRAM Design Team, where he worked on the design and development of synchronous DRAMs,

SGDRAMs, Rambus DRAMs, DDR3, and DDR4 DRAMs. He joined the School of Electronic and Electrical Engineering, Hongik University, in March 2008. His research interests include the areas of high-performance mixed-signal circuits and systems design. His current research interests include high-speed and low-power transceiver circuits for chip-to-chip and inter-chiplet communications, clock recovery and synchronization circuits (PLLs/DLLs/MDLLs), high-speed SerDes/CDRs, frequency synthesizers, signal integrity and power integrity, DDR/GDDR/LPDDR/HBM memories, and power-management ICs (PMICs).

• • •