# 20-ps Resolution Clock Distribution Network for a Fast-Timing Single-Photon Detector N. Egidos<sup>®</sup>, R. Ballabriga<sup>®</sup>, F. Bandi, M. Campbell<sup>®</sup>, D. Gascón, S. Gómez<sup>®</sup>, J. M. Fernández-Tenllado, X. Llopart, R. Manera, J. Mauricio, D. Sánchez, A. Sanmukh, and E. Santin Abstract—The time resolution of active pixel sensors whose timestamp mechanism is based on time-to-digital converters is critically linked to the accuracy in the distribution of the master clock signal that latches the timestamp values across the detector. The clock distribution network (CDN) that delivers the master clock signal must compensate process-voltage-temperature variations to reduce static time errors (skew) and minimize the power supply bounce to prevent dynamic time errors (jitter). To achieve sub-100-ps time resolution within pixel detectors and thus enable a step forward in multiple imaging applications, the network latencies must be adjusted in steps well below that value. Power consumption must be kept as low as possible. In this work, a selfregulated CDN that fulfills these requirements is presented for the FastICpix single-photon detector aiming at a 65-nm process. A 40-MHz master clock is distributed to 64 x 64 pixels over an area of $2.4 \times 2.4 \text{ cm}^2$ using digital delay-locked loops, achieving clock leaf skew below 20 ps with a power consumption of 26 mW. Guidelines are provided to adapt the system to arbitrary chip area and pixel pitch values, yielding a versatile design with very fine time resolution. Index Terms—Clock synchronization, delay-locked loop (DLL), fast timing, phase detector (PD), random jitter, skew. #### I. Introduction CTIVE pixel detectors with very fine time resolution are an attractive alternative in a wide range of fast-timing imaging systems, such as medical diagnosis with positron emission tomography (PET), molecular studies with mass spectrometry imaging (MSI), and particle tracking in high-energy physics (HEP). There is a lot of active research aimed at developing detectors with sub-100-ps time resolution, which can enable millimetric spatial resolution and real-time image Manuscript received November 25, 2020; revised January 27, 2021 and January 30, 2021; accepted February 2, 2021. Date of publication February 5, 2021; date of current version April 16, 2021. This work was supported by the ATTRACT Project funded by European Commission (EC) under Grant 777222. N. Egidos, R. Ballabriga, M. Campbell, J. M. Fernández-Tenllado, and X. Llopart are with CERN, 1211 Meyrin, Switzerland (e-mail: nuria.egidos.plaja@cern.ch; rafael.ballabriga@cern.ch; michael.campbell@cern.ch; jose.fernandez@cern.ch; xavier.llopart@cern.ch). F. Bandi is with IMSE-CNM, CSIC-Universidad de Sevilla, 41092 Seville, Spain (e-mail: nahuel@imse-cnm.csic.es). D. Gascón, S. Gómez, R. Manera, J. Mauricio, D. Sánchez, and A. Sanmukh are with ICCUB, 08028 Barcelona, Spain (e-mail: dgascon@fqa.ub.edu; sgomez@fqa.ub.edu; rafelmanera@icc.ub.edu; jmauricio@fqa.ub.edu; dsanchez@fqa.ub.edu; asanmukh@fqa.ub.edu). E. Santin was with CERN, 1211 Meyrin, Switzerland. He is now with AlpsenTek GmbH, 8050 Zürich, Switzerland (e-mail: edineisantin@gmail.com). Color versions of one or more figures in this article are available at https://doi.org/10.1109/TNS.2021.3057581. Digital Object Identifier 10.1109/TNS.2021.3057581 processing, enhance molecule discrimination, and time tag an increasing number of particle collisions accurately, among others [1]–[5]. In the readout electronics, a time-to-digital converter (TDC) can be used per group of pixels to time stamp the particle arrival. TDCs are dispersed across the pixel matrix and synchronized by means of a shared time reference (master clock). This signal is delivered by means of a clock distribution network (CDN). Due to process, voltage, or temperature (PVT) variations, the circuit elements that compose the CDN may have a slightly different delay in the various branches. As a result of these nonidealities, there is a static time error or skew in the actual latencies or propagation delays from the source to the TDCs. On top of this variability, the delays will also be dynamically affected by perturbations in the supply voltage, voltage droop, or temperature gradients during operation, and due to noise coupled mainly from the power supply due to the switching activity of the circuitry (a.k.a. power supplyinduced jitter or PSIJ). These effects manifest as jitter on the clock edges. Jitter can also enter the CDN superimposed to the clock source, as a result of the nonidealities of the clock generator [6]. With the goal of an accurate clock distribution, which is indispensable for a reliable timestamp, the CDN must include mechanisms to self-regulate the latencies, so as to reduce the impact of skew and jitter. In this work, such a CDN is proposed for the FastICpix chip [7], [8]. This ATTRACT phase-I-funded project consists of a reconfigurable singlephoton pixel detector that can be tailored in an area to different applications by means of adaptable pixel pitch and front-end signal summation, while providing a very fine single-photon time resolution (SPTR). The target SPTR (10 ps<sub>RMS</sub>) motivates a 20-ps TDC time bin. To achieve this time resolution, the latency of the CDN branches can be adjusted in steps finer than 20 ps, so that the maximum time error in the timestamp due to the CDN is $\pm 1$ TDC count. Since the CDN adapts to the chip area and pixel pitch, the concept is also suitable for other designs that pursue a comparable time resolution. In this work, the CDN requirements and some architectural alternatives are discussed in Section II. The selected architecture is described in Section III. Guidelines are provided to scale the design to arbitrary chip area and pixel pitch values in Section IV, and the main contributions to the time errors are described in Section V. To reduce the impact of such errors, a strategy to update the CDN latencies is described in Section VI. The circuit simulated performance is summarized in Section VII, followed by a discussion on the obtained results. ## II. TOWARD A PROPOSAL OF CDN ARCHITECTURE The CDN architecture must fulfill these conditions: - Adaptability to chip area (area across which the CDN spans) and pixel pitch (number of sinks or target TDCs). - 2) Time error due to the CDN at each of its sinks lower than the TDC time bin (20 ps). This implies that: a) the total time error at each sink must be below 20 ps and b) the latency must be adjustable in steps finer than 20 ps. CDNs have traditionally exploited the network symmetries to limit skew [9], [10]. However, open-loop strategies (trees, meshes, spines, etc.) become insufficient to achieve the aforementioned time errors in the envisaged large chip areas (few cm<sup>2</sup>). A solution based on free-running, mutually coupled oscillators distributed across the chip (the output of which becomes the master clock delivered to the corresponding TDC) has the potential to reduce both static and dynamic time errors to the required margin [11]. However, the associated power consumption may be a concern in the largest chip areas. Alternatively, delay-locked loops (DLLs) can reduce skew in a wide range of areas [13]-[15]. In some solutions, a local control action is applied to compensate skew between adjacent sinks, with the DLLs embedded into an H-tree or a mesh structure [12]. This may result in area and power overhead with respect to using a controller per branch. Besides, since the individual control actions are not synchronized, a stable latency from the clock source to the sinks cannot be guaranteed across PVT corners. As a result, it cannot be ensured that the clock will arrive distributed during one period to the different sinks, which might lead to PSIJ; and the timestamp error associated with the CDN can only be bound at a local level. These nonidealities are prevented in the Timepix4 pixel detector [16]: the CDN branches consist of digital DLLs (dDLLs) and local clock trees to distribute a 40-MHz master clock across an area of close to 7 cm<sup>2</sup> with a skew in the order of 100 ps. Digital low-pass filtering is used to reduce the impact of jitter. The aforementioned alternatives are benchmarked in Table I (see Appendix C for further details). The Timepix4 CDN has been selected as a starting point for this work. The complexity of scaling the CDN has been addressed by designing several dDLL flavors, as it will be seen in Section IV. Besides the low time errors, power consumption, and area overhead, this solution features a stable latency from the clock source to the sinks, which provides robustness to PVT variations in the TDC timestamp measurement, and it ensures that the clock arrival is distributed during one period, thus preventing PSIJ. #### III. FASTICPIX CDN ARCHITECTURE Fig. 1 shows the CDN structure for the largest envisaged chip area $(2.4 \times 2.4 \text{ cm}^2)$ . The clock source is an external reference; it is located at the center of the chip and distributed to the CDN branches by means of a clock tree. The branches, which consist of dDLLs, span across half the chip height and TABLE I BENCHMARK OF SEVERAL CDN CONFIGURATION ALTERNATIVES | CDN config. | Power<br>scaled to 40<br>MHz<br>(mW/cm²) | Area (%<br>w.r.t. chip<br>area) | Largest<br>skew (ps) | Ease of scalability<br>with chip area and<br>pixel pitch | |-----------------------------------------|------------------------------------------|-----------------------------------|---------------------------------------|----------------------------------------------------------| | dDLLs<br>[16] | 25 | 2 | 100 | dDLL flavors,<br>complexity of the<br>local clock trees | | Mut.<br>Coupled<br>Oscil.<br>(8x8) [11] | 152 | 0.5 | 150 (600<br>Ω<br>coupling<br>resist.) | Interconnect more oscillators | | Local de<br>skewing<br>[12] | - | 56 times<br>more PDs<br>than [16] | 13 | Interconnect more PDs and compensators | | Grid [27] | 0.62 | | 75 | Potentially tool-<br>automated | Fig. 1. Sketch of the CDN structure for large chip areas. are mirrored with respect to the opposite half. In small chip areas, the clock source is located in a side periphery and the branches span across the full chip height. An overview of the dDLL structure is provided on the right of Fig. 1. It consists of a phase detector (PD), a digitally controlled delay line (DCDL) whose nominal delay is 1 master clock period, and a controller ("Ctrl") that provides the bits to regulate the DCDL delay. In this figure, the DCDL includes 32 adjustable delay buffers (ADBs), highlighted in blue, half of them guiding the clock upward in a column of pixels (U0-U15), and the other half driving it downward in an adjacent column of pixels (D15-D0). The output of each ADB drives a local clock tree to deliver the clock to a group of TDCs (four TDCs in this case, although this number will depend on the pixel pitch). The dDLL structure is shown in more detail in Fig. 2, and its principle of operation will be explained next. The PD compares the rising edge of the clock entering the DCDL (*ckin\_up*), which comes from the clock source, to the rising edge at the output of the DCDL (*ckout\_down*). A timing diagram illustrating the operation of the PD is shown in Fig. 3. If the output edge arrives earlier than the input edge (the delay of the line is shorter than one master clock period), then the $up\_or\_downn$ output is set to 1 so that the controller increases the delay of the line. In the case where the output edge arrives later than the $ckin\_up$ edge, the $up\_or\_downn$ output is cleared to 0 to reduce the line delay. The time resolution of the PD is $\sim$ 2 ADB LSB, and it changes accordingly with PVT corners. Only if the separation between the input and output edges is larger than $\pm$ 1 ADB LSB, a pulse is generated Fig. 2. High-level diagram of the dDLL structure. | | t. difference > ADB LSB | t. difference < ADB LSB | |-----------------------------------------------|---------------------------------------------|------------------------------------------------------| | ckout_down<br>arrives earlier<br>than ckin_up | ckin_up ckout_down up_or_downn clk_PD_ready | ckin_up<br>ckout_down<br>up_or_downn<br>clk_PD_ready | | ckout_down<br>arrives later<br>than ckin_up | ckin_up ckout_down up_or_downn clk_PD_ready | ckin_up ckout_down up_or_downn clk_PD_ready | Fig. 3. Operation principle of the PD. at the *clk\_PD\_ready* output, and its rising edge triggers the synchronous, finite-state machine (FSM) of the controller. The PD outputs are digitally low-pass-filtered to reduce the impact of jitter. Since the delivery of the master clock is distributed during one master clock period, the power supply pull is spread out throughout the period, which prevents PSIJ. According to the *up\_or\_downn* value, the controller will update the delay of the line by changing the control bits of the ADBs until the total delay is 1 master clock period $\pm 1$ ADB LSB (lock is achieved). The adjustable delays are regulated by means of digital lines. They are composed of a coarse section (largest LSB is 80 ps in the slow corner), which is updated simultaneously in all stages via the coarse control bits, and a fine section (largest LSB is 7 ps in the slow corner), which can be regulated independently with the fine control bits and hence provide a fine adjustment of the line delay. To regulate the fine sections individually, the controller broadcasts the fine control bits and the address to be updated; the last is compared to the local address of each ADB and, if the comparison is successful, the value of the fine control bits is loaded to the selected stage. The delay cells were originally designed for Timepix4 and consist of full custom blocks that have been characterized with Cadence Liberate to be integrated in the digital-on-top implementation flow. The demonstrator dDLL is implemented for a chip of $2.4 \times 2.4$ cm<sup>2</sup> ( $64 \times 64$ pixels with 376- $\mu$ m pixel pitch). This architecture can be applied to smaller chip sizes, for which an even better timing performance could be expected. A commercial 65-nm process will be used, with a 1.2-V voltage supply. The DCDL is composed of 32 ADBs, one per group of 4 pixels, and the master clock frequency is 40 MHz. ## IV. CDN SCALABILITY WITH CHIP AREA AND PIXEL PITCH The presented architecture can be adapted to different chip area and pixel pitch dimensions as follows. - 1) The number of DCDL stages and dDLLs increases with the chip area. To limit the number of DCDL flavors to be implemented, two situations are proposed: for small chip areas (up to 1.2 × 1.2 cm²), the master clock source is located on one side of the chip and the DCDLs span across the full chip height, while for greater chip areas, the clock source is at the center of the chip, as shown in Fig. 1. - 2) The same ADB design can be used in all cases, except for the smallest chip area. In this case, the ADB introduces half the delay by reducing the coarse section contribution. This choice is explained in the next point. - 3) To reuse the ADB design for different chip areas, the master clock frequency increases for shorter DCDL lengths, so that the total delay can be adjusted to 1 period. The TDCs are based on a ring oscillator running at 2 GHz with a tap delay of 20 ps (the TDC time bin). The change in the master clock frequency will have an impact on the TDC output count. To avoid using TDCs with different measurement ranges and to limit the required ADB flavors, the variation in the master clock frequency is limited to a factor 2 across the range of used frequencies. - 4) The same PD design can be used in all cases. - 5) The same controller design can be used in all cases (the ADB indexing shall be adapted to the DCDL length). Table II compiles numeric examples of these guidelines. Adaptation to the pixel pitch is handled at the local clock tree that starts at the output of each ADB and drives the TDCs in the corresponding group of pixels. For 376- $\mu$ m pixel pitch, this clock tree drives four TDCs. For a smaller pitch, and for the same chip area, the number of sinks to be served by the local clock tree will increase by a certain factor (376 $\mu$ m/new pixel pitch). The variation in power consumption associated with the different chip areas will be discussed in Section VII-C. #### V. CDN TIME RESOLUTION The main contributions to the dDLL time errors are the non-idealities of the DCDL and PD, as well as jitter. Section V-A introduces the time errors associated with the DCDL and the controller, while Section V-B is focused on the PD. #### A. Time Errors in the DCDL A different latency or propagation delay from the clock source to the output of the ADBs causes skew or time offset between sinks, which has two components: skew by design (the arrival of the master clock is distributed over a clock period along the line), which can be compensated offline; and the static time error on top of the skew by design. The second is due to the following factors: 1) differences in the layout of the ADBs; 2) cell delay variation over PVT corners; and | · | | | | | |-----------------------|--------------------------------------------------|-----------------------------|---------------------------------------|----------------------------| | Chip<br>area<br>(cm²) | Number of<br>pixels (pixel<br>pitch = 376<br>µm) | Number<br>of DCDL<br>stages | Master<br>clock<br>frequency<br>(MHz) | Number of dDLLs in the CDN | | 0.3x0.3 | 8x8 | 8 <sup>a</sup> | 80 ° | 2 | | 0.6x0.6 | 16x16 | 16 a | 75 | 4 | | 0.9x0.9 | 24x24 | 24 <sup>a</sup> | 50 | 6 | | 1.2x1.2 | 32x32 | 32 <sup>a</sup> | 40 | 8 | | 1.5x1.5 | 40x40 | 20 <sup>b</sup> | 60 | 20 | | 1.8x1.8 | 48x48 | 24 <sup>b</sup> | 50 | 24 | | 2.1x2.1 | 56x56 | 28 <sup>b</sup> | 45 | 28 | | 2.4x2.4 | 64x64 | 32 b | 40 | 32 | TABLE II GUIDELINES TO SCALE THE CDN WITH THE CHIP AREA Clock from one side of the chip (dDLL spans across full chip height). <sup>b</sup>Clock from the center of the chip (dDLL spans across half the chip <sup>c</sup>The ADB introduces half the delay in the rest of chip areas, so that the maximum spread in the range of master clock frequencies is bound to a factor 2 between the largest and the smallest frequencies. 3) divergence in the value of fine control bits along the line when lock is achieved, since the fine sections are regulated independently. A useful figure to understand the impact of skew is the integral nonlinearity (INL) of the DCDL when lock is achieved, which is calculated as $$INL(k) = \sum_{i=U1,k} DNL(i)$$ $$DNL(k) = [l(k) - l(k-1)] - [l_i(k) - l_i(k-1)].$$ (2) $$DNL(k) = [l(k) - l(k-1)] - [l_i(k) - l_i(k-1)].$$ (2) With l being the actual latency, $l_i$ the ideal latency, k, i the indices representing the ADBs from U1 onward [17]. Note that in this work, the INL will be expressed in time units (picoseconds), and not normalized to the LSB. The ideal latency is obtained when all stages introduce the same delay (it represents the skew by design). Hence, the INL provides the distance between the ideal and actual latencies or, in other words, the static time error to be minimized. With this purpose, the ADBs are carefully laid out to ensure the physical symmetry between the stages that propagate the clock upward in the column of pixels (U0, U1, ...) and those that propagate it downward (... D1, D0). And the controller follows an algorithm to update the fine sections in such an order that seeks to reduce the INL associated with the divergence in the fine control bit values along the line when lock is achieved. This algorithm will be explained in Section VI. Concerning dynamic time errors, the aim of this work is to provide a budget for jitter, which is modeled by adding a dynamic variation to the edges of ckin\_up. The half period of this signal changes as (ideal half period of the master clock + random delay), where random delay is a random magnitude with Gaussian distribution and zero mean. Different values of standard deviation of this magnitude are considered, to determine which is the largest variability for which the time error target is still met. The highest total time error must be bound to the TDC time bin $$\max(|\text{INL}(k)|) + 3\sigma_i < 20 \text{ ps} \tag{3}$$ Fig. 4. Overview of the PD architecture. where max(|INL(k)|) represents the maximum of the absolute value of the INL among all stages, and $\sigma_i$ is the standard deviation of jitter. Since a Gaussian distribution is considered to model jitter, the variability is expected to be comprised within three standard deviations (three-sigma rule of thumb [18]). The presence of random jitter leads to the PD behavior explained in Section V-B. This type of jitter is expected from the clock source, due to supply and temperature variations, and so on. The clock lines are shielded to prevent the injection from (and to) other periodic signals, thus preventing periodic jitter. #### B. Time Errors in the PD To understand the origin of the PD nonidealities, an overview of its architecture (sketched in Fig. 4) will be provided first. A fully digital PD architecture has been selected, which is the most suitable for the digital-on-top approach followed for the dDLL implementation. The detection range is $\pm$ half the master clock period [19]. Standard cell flipflops (FFs) sample the time difference between the input and output clocks of the DCDL [16], [20], [21]. Since these signals can have an arbitrary time difference depending on the delay of the DCDL and jitter, there can occur setup-and-hold time violations in such FFs, which could lead to a metastable output. The propagation of a metastable signal is prevented by adding a second FF in a row, which samples the output of the first after a certain time, so that the metastable signal collapses to a stable 0 or 1 (which of the two cannot be foreseen) [22]. This yields a 2-FF synchronizer [23], denoted by B3 and B4 in Fig. 4. B3 is used to determine whether the delay of the line should increase or decrease (up\_or\_downn\_aux should be 1 or 0, respectively). B1 and B2 are the same fine delay cells used to compose the fine section of the ADBs. The first introduces the smallest available delay plus ADB LSB, while the second introduces the smallest available delay. In practice, this means that an artificial offset of ADB LSB is introduced between the inputs of the 2-FF synchronizers denoted by B4. The purpose of this offset is to define the $\pm$ ADB LSB target resolution window. Using the same cells as in the ADB enables tracking the variation of ADB LSB with the PVT corners. Ideally, Fig. 4(a) will be 1 if ckout\_down arrives later than ckin up by a time difference larger than ADB LSB, while Fig. 5. Impact of the low-pass filter on the PD performance. Fig. 4(b) will be 1 if *ckout\_down* arrives earlier than *ckin\_up* by a time difference larger than ADB LSB. These signals will be 0 if the aforementioned time differences are smaller than ADB LSB. As a result, the OR of Fig. 4(a) and (b) will be high only when the time difference between *ckin\_up* and *ckout\_down* is larger than ADB LSB (in absolute value), indicating that a pulse should be generated in *clk\_PD\_ready\_aux*. The generation of this pulse is triggered with the falling edge of the last clock to arrive, either *ckin\_up* or *ckout\_down*, which is selected with a multiplexer and some auxiliary logic in the "Generate trigger" block. This choice of polarity and clock enables that the involved signals are stable when the trigger signal is to be selected, thus yielding a valid stimulus. One *clk\_PD\_ready* pulse should be generated per input clock pulse, as long as the time difference between *ckin\_up* and *ckout\_down* is larger than the sensitivity window of the PD. Due to the jitter superimposed to *ckin\_up* (which is propagated to *ckout\_down*), the sampled time difference is distorted. And when setup-and-hold time violations occur in the first FF of the synchronizers, its output, although having a stable value, might not have the right polarity. These two effects are reflected as a ringing in *up\_or\_downn\_aux*; and in the OR of Fig. 4(a) and (b), which leads to the presence of *clk\_PD\_ready\_aux* pulses when they should not be generated, or their absence when they should be generated. A digital low-pass filter has been implemented to mitigate the errors in *up\_or\_downn\_aux* and *clk\_PD\_ready\_aux*. Its impact is shown in Fig. 5. The top half of the figure represents the outputs of an ideal PD. The line delay is swept from values lower than the master clock period (<code>ckout\_down</code> arrives earlier than <code>ckin\_up</code>), for which <code>up\_or\_downn</code> is 1; toward values larger than one period (<code>ckout\_down</code> arrives later than <code>ckin\_up</code>), for which <code>up\_or\_downn</code> is 0. <code>clk\_PD\_ready</code> pulses are generated when <code>ckout\_down</code> arrives earlier (later) than <code>ckin\_up</code> by an amount larger than ADB LSB, which is labeled as <code>S(E)</code>. <code>S</code> and <code>E</code> represent the start (<code>-ADB LSB</code>) and end (<code>+ADB LSB</code>) of the ideal sensitivity window of the PD, or the range of time differences for which no <code>clk\_PD\_ready</code> pulse is generated. The bottom half of the figure represents the actual behavior of the PD in the presence of jitter and taking into account the setup-and-hold time violations of the FFs. Both effects are reflected in the un-filtered outputs, $up\_or\_downn\_aux$ (ringing) and $clk\_PD\_ready\_aux$ (generation of a pulse for small time differences or absence of a pulse for large time differences). As a result, lock cannot be achieved: the $clk\_PD\_ready\_aux$ pulses trigger the controller and force a continuous change in the delay of the line, toggling between incrementing and decrementing 1 ADB LSB. up\_or\_downn\_aux and clk\_PD\_ready\_aux are low-passfiltered to reduce the ringing in the first and to reduce the range with wrong pulse generation (i.e., the range of time differences between points S and E), so that the sensitivity window after the filter approaches ±ADB LSB. The digital filter works as follows: if the value of up\_or\_downn\_aux remains stable for W consecutive clk\_PD\_ready\_aux pulses, one pulse is generated at clk\_PD\_ready and this value of up\_or\_downn\_aux is propagated to up\_or\_downn. If the value of up\_or\_downn\_aux toggles before completing the filter window, the count is reset and neither up\_or\_downn nor clk\_PD\_ready is updated. W (depth of the filter window) has been set to 16, the smallest value that yields the required time resolution after the filter, as it will be shown in Section VII. The PD layout must prevent distorting the time difference between the input and output clocks of the line. On the one hand, the internal clock paths must be symmetric and, on the other hand, the parasitic load in the interface PD-DCDL must match the load of the interconnection between ADBs inside the DCDL. Summarizing the concepts introduced in this section, the ideal time resolution of the PD is $\pm ADB$ LSB, but it can be deteriorated due to the following sources of time error. - Node capacitance and resistance in the connection to the DCDL: the routing of ckout\_down and ckin\_up must be symmetric and introduce the same parasitics as the interconnection between the intermediate stages of the DCDL. Otherwise, an artificial offset is added to the time difference of interest. - 2) Setup-and-hold window of the FFs that sample the time difference between *ckin\_up* and *ckout\_down*. - 3) The jitter superimposed to *ckin\_up*, which is propagated and thus observed at *ckout\_down* as well. Jitter distorts the time difference to be measured and causes ringing in the PD outputs, which forces the unnecessary update of the controller and prevents the achievement of lock. The impact of effect: 1) can be reduced with a careful layout and 2) and 3) can be mitigated by low-pass-filtering the PD outputs. ## VI. ALGORITHM TO UPDATE THE FINE CONTROL BITS TO MINIMIZE THE DCDL STATIC TIME ERROR The controller can update the fine control bits of the ADBs individually by selecting the address of the concerned stage and sending the new value of fine control bits. This enables the fine adjustment of the latencies in steps of ADB LSB, but it also opens the door to suffering static time error (INL) on the intermediate stages of the DCDL. To understand the impact of the fine control bit distribution along the line on the INL, an ideal DCDL of eight stages is considered in this Fig. 6. INL of an example DCDL of eight stages and different combinations of fine control bit values along the line. introduction. To achieve lock, four of the stages have their fine control bits at 0, and the other four have their fine control bits set to 1. Fig. 6 shows the DCDL INL for different distributions of the fine control bits along the line, as indicated in the subplot title. The shape of the error is relevant at this point, not its magnitude. From Fig. 6, we can conclude that: - 1) The INL depends on the distribution of fine control bits along the line. - 2) It is minimized when different values of fine control bits are evenly distributed [(e.g., Fig. 6(c) and (f)]. The order in which the controller updates the ADB fine control bits until a given distribution is reached is called update sequence. It should guarantee that, every time the line delay is incremented or decremented by one ADB LSB, the new and the former fine control bit values are distributed as evenly as possible along the line. Next, an algorithm is proposed to determine an update sequence that: - Can be implemented with binary logic, yielding a low area, power consumption, and latency associated with the control action. - 2) Ensures that only one ADB is modified between consecutive delay settings, while the rest of the stages retain the former value of fine control bits. This prevents transient fluctuations in the line delay and thus avoids switching noise and increasing the jitter to be handled by the PD. - 3) The resulting INL when lock is achieved is lower than the TDC time bin, 20 ps. The algorithm is first derived in Section VI-A for a DCDL of four stages and then expanded to lines of arbitrary length in Section VI-B. The controller performs a random access to the fine control bits of one stage at a time, following the update sequence defined by this algorithm. #### A. Updating the Fine Control Bits of a Four-Stage DCDL This section is focused on an example DCDL composed of four stages (ADB<sub>0</sub>-ADB<sub>3</sub>), whose fine control bits can TABLE III ALGORITHM TO UPDATE THE FINE CONTROL BITS (FOUR-STAGE DCDL) | Evolu | tion of the | fine contro | ol bits | Up-<br>date<br>seq. | | ering<br>de | | ary<br>nter | |---------|-------------|-------------|------------------|---------------------|----|-------------|----------------|----------------| | $ADB_0$ | $ADB_1$ | $ADB_2$ | ADB <sub>3</sub> | | 01 | 00 | b <sub>1</sub> | b <sub>0</sub> | | 0 | 1 | 0 | 0 | $ADB_1$ | 0 | 1 | 0 | 0 | | 0 | 1 | 0 | 1 | $ADB_3$ | 1 | 1 | 0 | 1 | | 1 | 1 | 0 | 1 | $ADB_0$ | 0 | 0 | 1 | 0 | | 1 | 1 | 1 | 1 | $ADB_2$ | 1 | 0 | 1 | 1 | take the value 0 or 1. The aim is to define an optimal update sequence, which minimizes INL when lock is achieved. Initially, the controller clears the fine control bits of all stages to 0 and then proceeds to set them to 1, one stage at a time, until lock is achieved. Depending on which stage is updated first, there are four possible update sequences that pursue an even distribution of the fine control bits along the line. The four options are shown in Table IX (Appendix A). These alternatives have been expanded to the demonstrator DCDL size (32 stages) following the indications that will be provided in Section VI-B, and the resulting dDLLs have been simulated. Update sequence B yields the best performance in terms of INL (see Fig. 9 in Appendix B), so it will be used from here on as the optimal update sequence. Table III compiles the optimal update sequence of the four-stage DCDL and the evolution of the fine control bits along the line as the sequence is applied. The stages can be addressed by means of a 2-bit "Ordering code" depending on their location along the line. The optimal sequence can be implemented by means of a 2-bit binary ripple counter, also shown in the table for convenience. Each word of the 2-bit binary counter is translated to the stage address by means of the mapping function: $o_1 = b_0$ , $o_0 = 1 - b_1$ . #### B. Updating the Fine Control Bits for Lines of Arbitrary Length The algorithm explained in Section V-A will first be expanded to the longest DCDL (32 stages), which can be addressed with five bits and then generic expressions will be provided for the case of N-bit ordering codes (in the case of FastICpix, $N \in [3.5]$ for the DCDL lengths defined in Table II). Here the fine control bits of the stages will take values 0 or 1 to simplify the algorithm, but the actual controller can replace 0, 1 for any pair of consecutive values than can be covered with the 4-bit control words. The 32-stage DCDL is divided into four quartiles, $Q_0$ (which comprises $ADB_0-ADB_7$ ) up to $Q_3$ (which comprises $ADB_{24}-ADB_{31}$ ). Analogous to the optimal sequence defined in Section V-A, these quartiles will be updated starting with $Q_1$ , then $Q_3$ , $Q_0$ , and $Q_2$ . This is equivalent to applying the 2-bit ordering code defined in Table III to positions MSB $(o_4)$ and MSB-1 $(o_3)$ of the 5-bit ordering code. Inside each quartile, the eight corresponding stages are divided into subquartiles, which will also be updated following the aforementioned order. This is equivalent to applying the TABLE IV EVOLUTION OF FINE CONTROL BITS ALONG THE LINE AND OPTIMAL UPDATE SEQUENCE FOR THE 5-BIT ORDERING CODE | ( | ) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ĺ | |----------------------------------------|---|--------------------------------------|---------------------------------|---------------------------------|------------------------------------------------|-------------------------------------------|-------------------------------------------|------------------|------------------|----------------------------|--------------------------------------|-------------------------------------------|------------------------------------------------|-------------------------------------------|-----------------------------------------------------|--------------------------------------|-------------------------------------------|---------------------------------|--------------------------------------------------------------------|-----------------------------------------------------|-----------------------------------------------------|-------------------------------------------|-----------------------------------------------------|--------------------------------------|-------------------------------------------|--------------------------------------|---------------------------------|--------------------------------------|------------------------------------------------|------------------------------------------------|-----------------------------------------------------|------------------------------------------------|------------------------------------------------------------------------------------------------------| | ( | ) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ADB10 | | ( | ) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ADB26 | | ( | ) | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ADB2 | | ( | ) | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ADB18 | | ( | ) | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ADB14 | | ( | ) | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | ADB30 | | ( | ) | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | ADB6 | | ( | ) | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | ADB22 | | ( | ) | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | ADB8 | | ( | ) | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | ADB24 | | 1 | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | ADB0 | | 1 | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | ADB16 | | 1 | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | ADB12 | | 1 | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | ADB28 | | 1 | | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | ADB4 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1 | 4 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | | | | | | 1 | 0 | 1 | 0 | 1 | 0 | | | | | | | 0 | | | 1 | | 1 | 0 | 1 | 1 | 1 | 0 | 1 | | 1 | | 1 | | 1 | | | 0 | 1 | | 1 | | 1 | 0 | 1 | 0 | ADB11 | | 1 | | 0 | 1 | | 1 | | | | | | | 1<br>1 | | | | | | | | 0 | | 0 | 1 | 0 | 1 | | | | | | | | ADB11<br>ADB27 | | 1 | | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | | | 0 | 1 | 0 | ADB11<br>ADB27<br>ADB3 | | 1 | | 0 | 1 | 0 | 1 | 0 | 1 | 0<br>0 | 1 | 0 | 1 | 1<br>1 | 1<br>1 | 0 | 1 | 0 | 1 | 0 | 1<br>1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1<br>1 | | 1 | 0 | 1 | 0 | ADB11<br>ADB27<br>ADB3<br>ADB19 | | 1 | | 0<br>0<br>0<br>0 | 1<br>1<br>1 | 0 | 1 | 0<br>0<br>0 | 1<br>1<br>1 | 0<br>0<br>0 | 1<br>1<br>1 | 0<br>0<br>0 | 1<br>1<br>1 | 1<br>1<br>1 | 1<br>1<br>1 | 0<br>0<br>0 | 1<br>1<br>1 | 0<br>0<br>0 | 1<br>1<br>1 | 0<br>0<br>0 | 1<br>1<br>1 | 0<br>0<br>0 | 1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1 | 0<br>0<br>0 | 1<br>1 | | 1 | 0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | ADB11<br>ADB27<br>ADB3<br>ADB19<br>ADB15 | | 1 1 1 | | 0<br>0<br>0<br>0 | 1<br>1<br>1 | 0<br>0<br>1<br>1 | 1<br>1<br>1 | 0<br>0<br>0 | 1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 1<br>1<br>1<br>1 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0 | 1<br>1<br>1 | 0<br>0<br>0 | 1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1 | 0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>1<br>1<br>1 | 1<br>1<br>1 | 0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | ADB11<br>ADB27<br>ADB3<br>ADB19<br>ADB15<br>ADB31 | | 1 1 1 1 1 | | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>1<br>1 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 1<br>1<br>1<br>1 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>1<br>1 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>1<br>1<br>1 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>1 | ADB11<br>ADB27<br>ADB3<br>ADB19<br>ADB15<br>ADB31<br>ADB7 | | 11 11 11 11 11 11 | | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>1<br>1 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>1 | 1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>1<br>1 | 1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>1<br>1<br>1 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0 | ADB11<br>ADB27<br>ADB3<br>ADB19<br>ADB15<br>ADB31<br>ADB7<br>ADB23 | | 11 11 11 11 11 11 11 | | 0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>1<br>1 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1 | 0<br>1<br>1<br>1 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>1 | ADB11<br>ADB27<br>ADB3<br>ADB19<br>ADB15<br>ADB31<br>ADB7<br>ADB23<br>ADB9 | | 11 11 11 11 11 11 11 | | 0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>1<br>1 | ADB11<br>ADB27<br>ADB3<br>ADB19<br>ADB15<br>ADB31<br>ADB7<br>ADB23<br>ADB9<br>ADB25 | | 11 11 11 11 11 11 | | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>1<br>1<br>1 | ADB11<br>ADB27<br>ADB3<br>ADB19<br>ADB15<br>ADB31<br>ADB7<br>ADB23<br>ADB9<br>ADB25<br>ADB1 | | 11 11 11 11 11 11 11 | | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>1<br>1<br>1<br>1 | ADB11<br>ADB27<br>ADB3<br>ADB19<br>ADB15<br>ADB31<br>ADB7<br>ADB23<br>ADB9<br>ADB25<br>ADB1<br>ADB17 | | 11 11 11 11 11 11 11 11 11 11 11 11 11 | | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>1<br>1<br>1<br>1<br>1 | ADB11<br>ADB27<br>ADB3<br>ADB19<br>ADB15<br>ADB31<br>ADB7<br>ADB23<br>ADB9<br>ADB25<br>ADB1 | | 11 11 11 11 11 11 11 11 11 11 11 11 11 | | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0 | 1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 | 1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1<br>1 | 0<br>0<br>0<br>0<br>1<br>1<br>1<br>1<br>1<br>1 | ADB11<br>ADB27<br>ADB3<br>ADB19<br>ADB15<br>ADB31<br>ADB7<br>ADB23<br>ADB9<br>ADB25<br>ADB1<br>ADB17 | 2-bit ordering code defined in Table III to positions $o_2$ and $o_1$ of the 5-bit ordering code (for every $o_4o_3$ combination). Finally, the order in which the two stages belonging to a subquartile is updated does not impact the peak of the INL, only its sign. This means that the LSB of the ordering code will be 0 for half the range and 1 for the other half, and which half comes first does not impact the resulting static time error. In Table IV, the optimal update sequence is shown on the right of the evolution of fine control bits along the 32-stage DCDL. An even distribution of the initial and final fine control bit values is achieved in the middle of the update sequence. The 5-bit ordering code corresponding to this update sequence is shown in Table V. It can be implemented by means of a 5-bit binary ripple counter, which is also shown in the table for convenience, by applying the bit mapping shown in the rightmost column, for N=5. This mapping function can also be applied to the rest of DCDL lengths by adapting N to the number of bits required to address a particular length. This algorithm has been implemented at the controller as a synchronous FSM, which updates the total DCDL delay following the aforementioned sequence. The benefits resulting from the algorithm action can be quantified from simulation, as it will be shown with Fig. 7. #### VII. TIME AND POWER PERFORMANCE OF THE DLL The dDLL performance for a DCDL of 32 stages, 40-MHz master clock and ordering option B is presented. Three PVT corners are considered: slow (125 °C, 1.08 V, slow- TABLE V ALGORITHM TO UPDATE THE FINE CONTROL BITS, LONGEST DCDL | | Bina | ry cou | ınter | | | Ord | lering | code | | Mapping | |----------------|----------------|--------|-------|-------|----|-----|--------|----------------|----|--------------------------------------------------------------------------------| | b <sub>4</sub> | b <sub>3</sub> | $b_2$ | $b_1$ | $b_0$ | 04 | 03 | $o_2$ | $\mathbf{o}_1$ | 00 | ordering code-<br>binary counter | | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | • | | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | - 1 | 0 | Odd stages: | | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | | | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | $o_{N-1-2i} = b_{2i}$ | | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | $i \in \left[0, \left[\frac{N-1}{2}\right]\right]$ | | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | - 1 | 0 | $\iota \in [0, \lfloor \frac{1}{2} \rfloor]$ | | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | | | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | Even stages: | | 0 | 1 | 0 | 0 | 0 | 0 | - 1 | 0 | 0 | 0 | | | 0 | 1 | 0 | 0 | 1 | 1 | - 1 | 0 | 0 | 0 | $o_{N-1-2i-1} = 1-b_{2i+1}$ $i \in \left[0, \left[\frac{N-2}{2}\right]\right]$ | | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | $i \in \left[ 0 \left[ \frac{N-2}{2} \right] \right]$ | | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | ι C [0, [ <sub>2</sub> ]] | | 0 | 1 | 1 | 0 | 0 | 0 | - 1 | - 1 | 0 | 0 | | | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | | | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | [N-1] [N-2] | | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | $\left[\frac{N-1}{2}\right]$ and $\left[\frac{N-2}{2}\right]$ | | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | - 1 | 1 | stand for the | | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | | | 11 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | integer part of | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | these | | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | magnitudes | | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | S | | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | | | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | | | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | | | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | | | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | | | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | | | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | | | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | | | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | | | 1 | 1 | 1 | 1 | 1 | 1 | 0 | - 1 | 0 | 1 | | Fig. 7. Total INL (in absolute value) of the DCDL when the dDLL is in lock (back-annotated simulation), for ordering option B and $\sigma_j = 3$ ps. slow (SS)), typical (25 °C, 1.2 V, typical-typical (TT)), and fast (-40 °C, 1.32 V, FF). The following results have been obtained with a digital simulation of the postlayout netlist of the dDLL, flattened (taking into account the load effects from the interconnection of the different blocks), back-annotated (using the actual propagation delays of all cells and interconnects), with all timing checks enabled (including the setup-and-hold window limitation in the PD). Different values of standard deviation of the jitter superimposed to $ckin\_up$ , $\sigma_j$ , are considered. A value change dump (VCD) file has been generated from these simulations, containing information on the switching activity of all nets in the circuit [24]. This file has been used to perform a static power analysis with Cadence Voltus [25]. #### A. Time Performance The ADB LSB and the range of adjustment of the DCDL delay are reported in Table VI. The latencies can be updated $\label{eq:table_vi} \text{TABLE VI}$ Time Performance of the CDN | Cor | ADB<br>LSB<br>(ps) | Min.<br>delay<br>line | Max.<br>delay | Number of clock cycles required to lock for various values of the standard deviation of jitter, $\sigma_j$ (ps) | | | | | | | | |------|--------------------|-----------------------|---------------|-----------------------------------------------------------------------------------------------------------------|-------|-------|-------|--|--|--|--| | nei | | (ns) | line<br>(ns) | 1 | 2 | 3 | 4 | | | | | | Fast | 4 | 11.14 | 26.24 | 6481 | 8389 | 10297 | 5512 | | | | | | Тур | 5 | 16.21 | 41.04 | 11386 | 15113 | 17697 | 7105 | | | | | | Slow | 7 | 24.64 | 67.57 | 14088 | 14182 | 14712 | 15144 | | | | | Fig. 8. Start (S) and end (E) of the sensitivity window of the PD as a function of the standard deviation of the jitter superimposed to ckin\_up. in steps finer than the TDC time bin (20 ps), and the master clock period (25 ns) can be accommodated in the range of available delays in all corners. The number of master clock cycles required to lock from the time when an asynchronous reset is applied is listed for different $\sigma_j$ values. The time required to lock 1) increases when $\sigma_j$ is comparable to the ADB LSB, because there is a more significant ringing in $up\_or\_downn\_aux$ and thus the counter of the PD filter is reset more often (more cycles need to be processed to generate a pulse at $clk\_PD\_ready$ ); and 2) depends on the corner according to the delay sweep performed by the controller: in the fast corner, the sweep relies mainly on the coarse control bits, while in the slow corner the controller sweeps mainly the fine control bits, which is a slower operation. The absolute value of the DCDL INL is represented for the different corners in Fig. 7, for $\sigma_j=3$ ps. This result takes into account the nonidealities in the implementation of the dDLL (ADB layout imbalances, load effects in the interface PD-DCDL, etc.) and the divergence in the fine control bit values along the line. 3 ps is the largest standard deviation for which the time error target defined in (3) is met in all corners: the peak of the INL absolute value is at most 11 ps, which leaves a room of 9 ps for jitter and other nonidealities. #### B. Time Resolution of the PD The time resolution of the PD is reported as the start and end of the sensitivity window (point *S* and *E* in Fig. 5), both before (*S before*, *E before*) and after the digital filter (*S after*, *E after*), to evaluate its impact on the time performance of the PD. In Fig. 8, these variables are depicted a function of $\sigma_j$ for the tree PVT corners considered. The horizontal, unbroken TABLE VII POWER CONSUMPTION OF ONE DLL | Corner | Power PD<br>(μW) | Power<br>controller<br>(µW) | Power<br>ADB (μW) | Power<br>dDLL<br>(µW) | |--------|------------------|-----------------------------|-------------------|-----------------------| | Fast | 45.655 | 1.617 | 23.394 | 795.895 | | Тур | 34.833 | 1.128 | 15.582 | 534.583 | | Slow | 26.699 | 1.822 | 10.854 | 375.862 | TABLE VIII ESTIMATED POWER CONSUMPTION OF THE CDN AT THE CHIP LEVEL | Chip | 0.3 | 0.6 | 0.9 | 1.2 | 1.5 | 1.8 | 2.1 | 2.4 | |-----------------------|----------|----------|----------|----------|----------|----------|----------|----------| | area (cm²) | x<br>0.3 | x<br>0.6 | x<br>0.9 | x<br>1.2 | x<br>1.5 | x<br>1.8 | x<br>2.1 | x<br>2.4 | | P <sub>CDN</sub> (mW) | 0.6 | 1.7 | 3.7 | 6.4 | 10.3 | 14.6 | 19.7 | 25.5 | lines at the center represent the ideal start (S ideal) and end (E ideal) of the sensitivity window (-ADB LSB and + ADB LSB, respectively). The tilted lines in the top half of the image are the linear fit of E before, while the tilted lines in the bottom half of the image are the linear fit of S before. E before and S before are shown with square, star, and dot markers, the trend of which is illustrated with the linear fit. When $\sigma_i$ 0 ps, the resolution window before the filter is dominated by the setup-and-hold window of the FFs that sample the time difference between the input and output clocks of the DCDL. As $\sigma_i$ increases, the resolution window before the filter is widened with a slope close to $3\sigma_i$ (as it was introduced in Section V-A, this is the largest expected time deviation caused by jitter). Due to the nonidealities of the PD and the jitter superimposed to the input clock, the resolution window before the filter clearly drifts apart from $\pm ADB$ LSB. The tilted lines closer to the center of the figure are the linear fit of E after. S after is not available from the performed simulations; given the symmetry between E before and S before, S after could be extrapolated as -E after. E after can be approximated as E before/4, where the reduction factor stands for the square root of the digital filter window, W=16. This is the smallest depth that yields the required sensitivity window after the filter. With this configuration, the digital filter provides a fourfold enhancement in the resolution with respect to the sensitivity window before the filter, which enables achieving the desired resolution of $\pm ADB$ LSB. #### C. Power Consumption Table VII shows the total dDLL power consumption, including switching, leakage, and internal components. The highest allowed $\sigma_j$ (3 ps) is reported. This result corresponds to a simulation in which the dDLL is reset, let run until lock is achieved and remains in lock for a few thousand cycles (the same number of cycles is reported for the three corners). Table VIII shows the estimated CDN power consumption at the chip level for the different chip areas and 376- $\mu$ m pixel pitch. It is calculated from the values reported in Table VII, Fig. 9. Total INL (in absolute value) of the DCDL when the dDLL is in lock, for the four ordering options and different values of $\sigma_j$ . for the worst case power consumption (fast corner) and scaling the consumption with the number of stages, number of dDLLs in the chip, and master clock frequency (according to the guidelines provided in Table II) as $$P_{\text{CDN}} = k_{\text{dDLL}} * P_{\text{dDLL}} \tag{4}$$ where $P_{\rm CDN}$ is the estimated total power consumption of the CDN at the chip level, $k_{\rm dDLL}$ is the number of dDLLs, and $P_{\rm dDLL}$ is the estimated power consumption of one dDLL: $$P_{\text{dDLL}} = k_f * (P_{\text{ctrl}} + P_{\text{PD}} + k_{\text{ADB}} * P_{\text{ADB}})$$ (5) 1) $k_f$ : scale factor related to the master clock frequency, calculated as frequency in the particular scenario (MHz)/40 MHz, since the switching frequency is the dominant contribution (over 90% of the power reported - in Table V, while leakage has a negligible contribution) and it scales linearly with frequency [26]. - 2) $P_{\text{ctrl}}$ , $P_{\text{PD}}$ , $P_{\text{ADB}}$ : controller, PD, and ADB power consumption, respectively. - 3) $k_{ADB}$ : 0.5 for the smallest chip area, since in this case the ADBs introduce half the delay and thus have a smaller coarse section; 1 for the rest of scenarios. The CDN power consumption is mainly related to the chip area. For a smaller pixel pitch, the power consumption due to the dDLL is not expected to change, since the DCDL, PD and controller design will be the same. #### VIII. DISCUSSION A self-regulated CDN for the timestamp mechanism of the FastICpix chip has been presented. The selected architecture TABLE IX ALGORITHM TO UPDATE THE FINE CONTROL BITS IN THE FOUR-STAGE DCDL | Order. | cc | the<br>intro | tion<br>fine<br>ol bit<br>DB<br>oer | ts | Seq.<br>update<br>fine<br>control | co | Order.<br>code | | ary<br>nter | Map. binary counter - order. code | |--------|----|--------------|-------------------------------------|----|-----------------------------------|----|----------------|----------------|----------------|-----------------------------------| | | 0 | 1 | 2 | 3 | bits | 01 | 00 | b <sub>1</sub> | b <sub>0</sub> | | | | 1 | 0 | 0 | 0 | $ADB_0$ | 0 | 0 | 0 | 0 | $o_1 = b_0$ | | A | 1 | 0 | 1 | 0 | $ADB_2$ | 1 | 0 | 0 | 1 | $o_0 = b_1$ | | А | 1 | 1 | 1 | 0 | $ADB_1$ | 0 | 1 | 1 | 0 | | | | 1 | 1 | 1 | 1 | $ADB_3$ | 1 | 1 | 1 | 1 | | | | 0 | 1 | 0 | 0 | $ADB_1$ | 0 | 1 | 0 | 0 | $o_1 = b_0$ | | В | 0 | 1 | 0 | 1 | $ADB_3$ | 1 | 1 | 0 | 1 | $o_0 = 1 - b_1$ | | Ь | 1 | 1 | 0 | 1 | $ADB_0$ | 0 | 0 | 1 | 0 | | | | 1 | 1 | 1 | 1 | $ADB_2$ | 1 | 0 | 1 | 1 | | | | 0 | 0 | 1 | 0 | $ADB_2$ | 1 | 0 | 0 | 0 | $o_1 = 1 - b_0$ | | С | 1 | 0 | 1 | 0 | $ADB_0$ | 0 | 0 | 0 | 1 | $o_0 = b_1$ | | C | 1 | 0 | 1 | 1 | $ADB_3$ | 1 | 1 | 1 | 0 | | | | 1 | 1 | 1 | 1 | $ADB_1$ | 0 | 1 | 1 | 1 | | | | 0 | 0 | 0 | 1 | $ADB_3$ | 1 | 1 | 0 | 0 | $o_1 = 1 - b_0$ | | D | 0 | 1 | 0 | 1 | $ADB_1$ | 0 | 1 | 0 | 1 | $o_0 = 1 - b_1$ | | | 0 | 1 | 1 | 1 | $ADB_2$ | 1 | 0 | 1 | 0 | - • | | | 1 | 1 | 1 | 1 | $ADB_0$ | 0 | 0 | 1 | 1 | | 1) can adapt to the chip area and pixel pitch and 2) is robust to static and dynamic time errors, so that the total time error in the delivery of the master clock to all target TDCs is bound to the TDC time bin, 20 ps. The reported performance corresponds to the most challenging scenario: largest chip area; postlayout, back-annotated, flattened netlist of the dDLL. The CDN latencies can be adjusted in steps of 7 ps and the DCDL static time error is below 20 ps in all corners. Contrast with the starting point of this work, the Timepix4 CDN, the presented solution has the potential to enhance the accuracy in the master clock distribution by an order of magnitude, while providing the versatility to tailor the readout chip to the application to optimize the signal collection. #### APPENDIX A As it was introduced in Section VI-A, there are four possible update sequences for the four-stage DCDL pursuing that the initial and final fine control bits values are evenly distributed along the line. These update sequences are labeled as A–D in Table IX. This table compiles the evolution of the fine control bit values in every stage as the sequence is applied; the sequence itself; the encoding of such a sequence with bits $o_1$ , $o_0$ and the relation between these bits and the 2-bit binary counter used to implement it. As it was indicated in Section VI-B, the mini-matrix can be expanded to arbitrary DCDL lengths or, in other words, the 2-bit ordering code can be expanded to an up-to-5 bits ordering code by relating the ordering code to the binary counter, which is shown in Table X for the different ordering options. The resulting sequence in which the stages are updated is also shown, with the stages named according to the ADB nomenclature introduced in Fig. 1. In Table XI, the mapping between the ordering code and the binary counter presented in Table X is expressed in a generic fashion for all the DCDL lengths to which the algorithm TABLE X ORDERING CODES AND SEQUENCES FOR ALL ORDERING OPTIONS | | 1 | T | |------------------|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Ordering options | Mapping binary | Sequence to update | | | counter – ordering | fine control bits | | | ecunier cruering | | | A | $o_4 = b_0$ | $U0 \rightarrow D15 \rightarrow U8 \rightarrow D7 \rightarrow$ | | | $o_3 = b_1$ | $U4 \rightarrow D11 \rightarrow U12 \rightarrow D3 \rightarrow$ | | | | $U2 \rightarrow D13 \rightarrow U10 \rightarrow D5 \rightarrow$ | | | $o_2 = b_2$ | $U6 \rightarrow D9 \rightarrow U14 \rightarrow D1 \rightarrow$ | | | $o_1 = b_3$ | $U1 \rightarrow D14 \rightarrow U9 \rightarrow D6 \rightarrow$ | | | $o_0 = b_4$ | $U5 \rightarrow D10 \rightarrow U13 \rightarrow D2 \rightarrow$ | | | 00 04 | $U3 \rightarrow D12 \rightarrow U11 \rightarrow D4 \rightarrow$ | | | | $U7 \rightarrow D8 \rightarrow U15 \rightarrow D0$ $U10 \rightarrow D5 \rightarrow U2 \rightarrow D13 \rightarrow$ | | В | $o_4 = b_0$ | $\begin{array}{c} 0.10 \rightarrow D5 \rightarrow 0.2 \rightarrow D13 \rightarrow \\ 0.14 \rightarrow D1 \rightarrow 0.6 \rightarrow D9 \rightarrow \end{array}$ | | | $o_3 = 1 - b_1$ | $U8 \rightarrow D7 \rightarrow U0 \rightarrow D15 \rightarrow$ | | | $o_2 = b_2$ | $U12 \rightarrow D3 \rightarrow U4 \rightarrow D11 \rightarrow$ | | | | $0.12 \rightarrow 0.3 \rightarrow 0.4 \rightarrow 0.11 \rightarrow 0.11 \rightarrow 0.11 \rightarrow 0.4 \rightarrow 0.3 \rightarrow 0.12 \rightarrow 0.11 0.11$ | | | $o_1 = 1 - b_3$ | $U15 \rightarrow D0 \rightarrow U7 \rightarrow D8 \rightarrow$ | | | $o_0 = b_4$ | $U9 \rightarrow D6 \rightarrow U1 \rightarrow D14 \rightarrow$ | | | | $U13 \rightarrow D2 \rightarrow U5 \rightarrow D10$ | | C | $o_4 = 1 - b_0$ | $U5 \rightarrow D10 \rightarrow U13 \rightarrow D2 \rightarrow$ | | C | , , | $U1 \rightarrow D14 \rightarrow U9 \rightarrow D6 \rightarrow$ | | | $o_3 = b_1$ | $U7 \rightarrow D8 \rightarrow U15 \rightarrow D0 \rightarrow$ | | | $o_2 = 1 - b_2$ | $U3 \rightarrow D12 \rightarrow U11 \rightarrow D4 \rightarrow$ | | | $o_1 = b_3$ | $U4 \rightarrow D11 \rightarrow U12 \rightarrow D3 \rightarrow$ | | | | $U0 \rightarrow D15 \rightarrow U8 \rightarrow D7 \rightarrow$ | | | $o_0 = b_4$ | $U6 \rightarrow D9 \rightarrow U14 \rightarrow D1 \rightarrow$ | | | | $U2 \rightarrow D13 \rightarrow U10 \rightarrow D5$ | | D | $o_4 = 1 - b_0$ | $U15 \rightarrow D0 \rightarrow U7 \rightarrow D8 \rightarrow$ | | | $o_3 = 1 - b_1$ | $U11 \rightarrow D4 \rightarrow U3 \rightarrow D12 \rightarrow$ | | | 5 1 | $U13 \rightarrow D2 \rightarrow U5 \rightarrow D10 \rightarrow$ | | | $o_2 = 1 - b_2$ | $U9 \rightarrow D6 \rightarrow U1 \rightarrow D14 \rightarrow$ | | | $o_1 = 1 - b_3$ | $U14 \rightarrow D1 \rightarrow U6 \rightarrow D9 \rightarrow$ | | | $o_0 = b_4$ | $U10 \rightarrow D5 \rightarrow U2 \rightarrow D13 \rightarrow$ | | | 00 04 | $U12 \rightarrow D3 \rightarrow U4 \rightarrow D11 \rightarrow$ | | | ] | $U8 \rightarrow D7 \rightarrow U0 \rightarrow D15$ | TABLE XI ORDERING CODES FOR ALL ORDERING OPTIONS AND GENERIC DCDL LENGTH | Ordering options | Mapping binary counter – ordering | |------------------|------------------------------------------------------------------------------------------------| | A | $o_{N-1-2i} = b_{2i}, N \in [3,5], i \in \left[0, \left[\frac{N-1}{2}\right]\right]$ | | | $o_{N-1-2i-1} = b_{2i+1}$ , $N \in [3,5]$ , $i \in \left[0, \left[\frac{N-2}{2}\right]\right]$ | | В | $o_{N-1-2i} = b_{2i}, N \in [3,5], i \in \left[0, \left[\frac{N-1}{2}\right]\right]$ | | | $o_{N-1-2i-1} = 1 - b_{2i+1}, N \in [3,5], i \in \left[0, \left[\frac{N-2}{2}\right]\right]$ | | С | $o_{N-1-2i} = 1 - b_{2i}, N \in [3,5], i \in \left[0, \left[\frac{N-1}{2}\right]\right)$ | | | $o_0 = b_{N-1}$ | | | $o_{N-1-2i-1} = b_{2i+1}, N \in [3,5], i \in \left[0, \left[\frac{N-2}{2}\right]\right]$ | | D | $o_{N-1-2i} = 1 - b_{2i}, N \in [3,5], i \in \left[0, \left[\frac{N-1}{2}\right]\right)$ | | | $o_0 = b_{N-1}$ | | | $o_{N-1-2i-1} = 1 - b_{2i+1}, N \in [3,5], i \in \left[0, \left[\frac{N-2}{2}\right]\right]$ | will be applied. N represents the number of bits required to address a certain DCDL length; i represents the bit position in the ordering code or the binary counter; [(N-1)/2] and [(N-2)/2] represent the integer part of these magnitudes. The evolution of the fine control bits of every stage as the update sequence is applied (for the simplest scenario, when these bits can take the value 0 and 1) is shown in Tables XII–XV, for the four ordering options. The ideal scenario, in which the alternance in the update is maximized, is located in the middle of the update range. #### APPENDIX B Four flavors of dDLL with 32 DCDL stages and master clock frequency of 40 MHz have been implemented, so as TABLE XII EVOLUTION OF THE FINE CONTROL BITS AS THE UPDATE SEQUENCE IS APPLIED, ORDERING OPTION A TABLE XIII EVOLUTION OF THE FINE CONTROL BITS AS THE UPDATE SEQUENCE IS APPLIED, ORDERING OPTION B to evaluate the timing performance of the different ordering options (A–D). The INL obtained for these ordering options and different $\sigma_j$ is shown in Fig. 9, for the same simulation conditions indicated in Section VII. #### APPENDIX C Table XVI expands Table I to clarify the criteria chosen to benchmark the different CDN configuration alternatives introduced in Section II, as well as to understand the performance reported for each of the alternatives. TABLE XIV EVOLUTION OF THE FINE CONTROL BITS AS THE UPDATE SEQUENCE IS APPLIED, ORDERING OPTION C TABLE XV EVOLUTION OF THE FINE CONTROL BITS AS THE UPDATE SEQUENCE IS APPLIED, ORDERING OPTION D Four alternatives are benchmarked: - 1) The Timepix4 CDN [16], in which the branches consist of dDLLs that span across half the chip height. - 2) A solution based on mutually coupled oscillators [11]. The results shown here correspond to a matrix of $8\times 8$ oscillators interconnected with a $600-\Omega$ coupling resistance. - 3) A solution based on a mesh, in which the nodes implement a local deskew action based on a PD and a compensator or adjustable delay [12]. | CDN<br>configuration | Process<br>node<br>(nm) | Clock<br>frequency | CDN power<br>consumption<br>(mW/cm²) | CDN power<br>consumption<br>scaled to 40<br>MHz<br>(mW/cm <sup>2</sup> ) | Chip<br>area | CDN area<br>(% w.r.t.<br>chip area) | Largest<br>skew (ps) | Ease of scalability with chip area and pixel pitch | |------------------------------------------------------------|-------------------------|--------------------|----------------------------------------------------------------|--------------------------------------------------------------------------|-------------------------|-------------------------------------|---------------------------------------|---------------------------------------------------------------------------------------------| | dDLLs [16] | 65 | 40 MHz | 25 | 25 | 7 cm <sup>2</sup> | 2 | 100 | Design multiple<br>dDLL flavors,<br>change the<br>complexity of<br>the local clock<br>trees | | Mutually<br>Coupled<br>Oscillators<br>(8x8 matrix)<br>[11] | 65 | 500 MHz | 1.9•10 <sup>3</sup> | 152 | 1.69<br>mm <sup>2</sup> | 0.5 (area of oscillators) | 150 (600 Ω<br>coupling<br>resistance) | Interconnecting<br>more<br>oscillators, no<br>redesign<br>required | | Local deskewing [12] | 130 | GHz | - | - | - | 56 times<br>more PDs<br>than [16] | 13 | Interconnecting more PDs and compensators | | Grid [27] | 350 | 600 MHz | 9.3•10 <sup>3</sup> (CDN stands for ~40% of total consumption) | 0.62 | 3.1 cm <sup>2</sup> | - | 75 | Potentially (at least partially) tool-automated | TABLE XVI BENCHMARK OF SEVERAL CDN CONFIGURATION ALTERNATIVES 4) The CDN for the Alpha 21264 microprocessor, which features hierarchical grid levels [27]. The metrics used to perform the benchmark are the following. #### A. CDN Power Consumption These articles report the total power consumption of the network expressed for the clock frequency of operation, which differs among the considered solutions. Since the dynamic or switching component is usually dominant and it scales linearly with frequency [26], the reported power is scaled to 40 MHz so as to compare all options in the scenario of interest for this work. The values are further normalized to the chip area and expressed in mW/cm² for a more meaningful comparison. The power consumption values used to perform the benchmark are listed in the "CDN power consumption scaled to 40 MHz (mW/cm²)" column. It can be seen that the microprocessor approach is the most power efficient, while the solution based on coupled oscillators is the most power hungry, which can be a concern for the largest areas envisaged. #### B. CDN Area The area overhead associated with the network components is expressed as a percent of the total die area in the "CDN area (% with respect to chip area)" column. For [16], the CDN area includes the area of all PDs, controllers, and ADBs. For [11], the oscillator's area is considered, which stands for about one-fourth of the TDC area. The remaining references do not provide the area associated with the CDN components, but the following extrapolation can be applied to relate [12] to the present work: in the selected configuration, the worst ratio between number of ADBs and PDs (i.e., the situation in which more PDs are required) is one PD per eight ADBs, and it occurs for the smallest chip area reported in Table II. In [12], 28 PDs are used for 16 compensators (adjustable delays) or, alternatively, 10 PDs would be required for eight compensators, hence, requiring a significant component overhead compared to the selected configuration. #### C. Largest Static Time Error in the Network or Worst Skew The largest skew achieved by the different solutions is listed under the "Largest skew (ps)" column. In [16], it corresponds to a distance comparable to half the chip height, which is the area across which each dDLL spans. In the rest of the cases, the worst skew occurs for sinks separated by the full chip height. The dDLL solution, which is selected for this work, presents a skew comparable to the other alternatives, if not better, for a similar physical separation of the sinks. Yet, it must be mentioned that [11] reports a reduction of 10 log<sub>10</sub>(number of oscillators) in the phase noise or jitter, while the rest of configurations do not reduce the jitter present in the clock delivered to the sinks. The excellent skew reported in [12] cannot be directly compared to the other results, due to the lack of area information. On top of a low skew, the dDLL solution offers a major advantage, which is key for a pixel detector: it can guarantee a stable value (with a bounded static time error) of the latency from the clock source to the sinks across PVT variations. All solutions can guarantee the relative latency, that is, a low skew, between the sinks, but only a dDLL-based solution can offer a stable propagation delay from the clock source to the sinks regardless of the corner. Having a low skew between sinks is translated to a low time error among the measurements of various TDCs for a particular corner, while ensuring the propagation latency across the corners is translated to a low time error on the measurement provided by a particular TDC when PVT variations occur. #### D. Ease of Scalability With the Chip Area and Pixel Pitch The dDLL configuration might be the most complex to scale with the chip area and pixel pitch. In the rest of the scenarios, the network can be expanded by adding more nodes (more oscillators in [11], more PDs and compensators in [12], more buffers in [27]). To scale a solution based on dDLLs, however, different flavors are required to adapt to different areas, which means that some of the components ought to be redesigned; and the local clock tree that starts at the output of each ADB has to increase in complexity to adapt to smaller pixel pitch values. Despite a larger complexity to scale the network with the chip area and pixel pitch, a solution based on dDLLs is preferred for this work, thanks to offering a suitable tradeoff between the achievable skew; a low power consumption and area overhead associated with the network components; and thanks to the advantage of providing a stable latency from the clock source to the TDCs across PVT variations. The Timepix4 CDN, in which the branches are composed of dDLLs, is considered as the starting point of this work. #### REFERENCES - [1] J. H. Jungmann and R. M. A. Heeren, "Detection systems for mass spectrometry imaging: A perspective on novel developments with a focus on active pixel detectors," *Rapid Commun. Mass Spectrometry*, vol. 27, no. 1, pp. 1–23, Jan. 2013, doi: 10.1002/rcm.6418. - [2] E. Berg and S. R. Cherry, "Innovations in instrumentation for positron emission tomography," *Seminars Nucl. Med.*, vol. 48, no. 4, pp. 311–331, Jul. 2018. - [3] P. Lecoq, "Pushing the limits in time-of-flight PET imaging," *IEEE Trans. Radiat. Plasma Med. Sci.*, vol. 1, no. 6, pp. 473–485, Nov. 2017, doi: 10.1109/TRPMS.2017.2756674. - [4] The 10 ps Challenge: A Step Towards Reconstruction-Less TOF-PET. Accessed: Aug. 10, 2020. [Online]. Available: https://the10pschallenge.org - [5] B. Schmidt, "The high-luminosity upgrade of the LHC: Physics and technology challenges for the accelerator and the experiments," J. Phys., Conf. Ser., vol. 706, Apr. 2016, Art. no. 022002, doi: 10.1088/1742-6596/706/2/022002. - [6] T. Xanthopoulos, Clocking in Modern VLSI Systems. Boston, MA, USA: Springer, 2009. - [7] D. Gascón. (2019). Integrated Signal Processing for a New Generation of Active Hybrid Single Photon Sensors With ps Time Resolution (FastICpix). [Online]. Available: https://attract-eu.com/selected-projects/ integrated-signal-processing-for-a-new-generation-of-active-hybridsingle-photon-sensors-with-ps-time-resolution-fasticpix/ - [8] N. Egidos et al., "Self-regulated clock distribution network for a fast-timing active hybrid single photon detector," in Proc. IEEE Nucl. Sci. Symp. Med. Imag. Conf. (NSS/MIC), Nov. 2020. - [9] E. G. Friedman, "Clock distribution networks in synchronous digital integrated circuits," *Proc. IEEE*, vol. 89, no. 5, pp. 665–692, May 2001, doi: 10.1109/5.929649. - [10] V. Oklobdzjja, V. Stojanovic, and D. Markovic, *Digital System Clocking*. Hoboken, NJ, USA: Wiley, 2003. - [11] A. Ximenes, P. Padmanabhan, and E. Charbon, "Mutually coupled time-to-digital converters (TDCs) for direct Time-of-Flight (dTOF) image sensors," *Sensors*, vol. 18, no. 10, p. 3413, Oct. 2018, doi: 10.3390/s18103413. - [12] C. E. Dike, N. A. Kurd, P. Patra, and J. Barkatullah, "A design for digital, dynamic clock deskew," in *Symp. VLSI Circuits. Dig. Tech. Papers*, Jun. 2003, pp. 21–24. - [13] Y.-B. Kim, "Signal de-skewing using programmable dual delay-locked loop," U.S. Patent 5 880 612 A, Mar. 9, 1999. - [14] R. L. Aguiar and D. M. Santos, "Wide-area clock distribution using controlled delay lines," in *Proc. IEEE Int. Conf. Electron., Circuits Syst., Surfing Waves Sci. Technol.*, Sep. 1998, pp. 63–66, doi: 10.1109/ ICECS.1998.814825. - [15] R.-J. Yang and S.-I. Liu, "A 2.5 GHz all-digital delay-locked loop in 0.13 μm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 42, no. 11, pp. 2338–2347, Nov. 2007, doi: 10.1109/JSSC.2007. 906183 - [16] X. Llopart et al., "Study of low power front-ends for hybrid pixel detectors with sub-ns time tagging," J. Instrum., vol. 14, no. 1, Jan. 2019, Art. no. C01024, doi: 10.1088/1748-0221/14/01/C01024. - [17] S. Henzler, *Time-to-Digital Converters*. Dordrecht, The Netherlands: Springer, 2010, doi: 10.1007/978-90-481-8628-0. - [18] G. Upton and I. Cook, A Dictionary of Statistics, 2nd ed. London, U.K.: Oxford Univ. Press, 2008. - [19] Y. Moon, J. Choi, K. Lee, D.-K. Jeong, and M.-K. Kim, "An all-analog multiphase delay-locked loop using a replica delay line for wide-range operation and low-jitter performance," *IEEE J. Solid-State Circuits*, vol. 35, no. 3, pp. 377–384, Mar. 2000. - [20] S. T. Ghasemi and A. Baradaranrezaeii, "A novel high speed, low power, and symmetrical phase frequency detector with zero blind zone and π phase difference detection ability," *Circuits, Syst., Signal Process.*, vol. 39, no. 6, pp. 2880–2899, Jun. 2020, doi: 10.1007/s00034-019-01312-w. - [21] H. Lad Kirankumar, S. Rekha, and T. Laxminidhi, "A dead-zone-free zero blind-zone high-speed phase frequency detector for charge-pump PLL," *Circuits, Syst., Signal Process.*, vol. 39, no. 8, pp. 3819–3832, Aug. 2020, doi: 10.1007/s00034-020-01366-1. - [22] D. Chen et al., "A comprehensive approach to modeling, characterizing and optimizing for metastability in FPGAs," in Proc. 18th Annu. ACM/SIGDA Int. Symp. Field Program. Gate Arrays FPGA, 2010, pp. 167–176. - [23] S. L. Harris and D. M. Harris, "Sequential logic design," in *Digital Design and Computer Architecture*, 2nd ed. Amsterdam, The Netherlands: Elsevier, 2016, pp. 108–171. - [24] S. K. Nithin, G. Shanmugam, and S. Chandrasekar, "Dynamic voltage (IR) drop analysis and design closure: Issues and challenges," in *Proc. 11th Int. Symp. Qual. Electron. Design (ISQED)*, Mar. 2010, pp. 611–617, doi: 10.1109/ISQED.2010.5450515. - [25] Cadence. Voltus IC Power Integrity Solution. Accessed: Sep. 24, 2020. [Online]. Available: https://www.cadence.com/en\_US/home/tools/ digital-design-and-signoff/silicon-signoff/voltus-ic-power-integritysolution.html - [26] J. Rabaey, Digital Integrated Circuits: A Design Perspective, 2nd ed. Upper Saddle River, NJ, USA: Prentice-Hall, 2002. - [27] P. E. Gronowski, W. J. Bowhill, R. P. Preston, M. K. Gowan, and R. L. Allmon, "High-performance microprocessor design," *High-Perform. Syst. Des. Circuits Log.*, vol. 33, no. 5, pp. 395–404, 1998, doi: 10.1109/9780470544846.ch4.