# A Low Nonlinearity, Missing-Code Free Time-to-Digital Converter Based on 28-nm FPGAs With Embedded Bin-Width Calibrations

Haochang Chen, Yongliang Zhang, and David Day-Uei Li

Abstract—This paper presents a low nonlinearity, missingcode free, time-to-digital converter (TDC) implemented in a 28-nm field programmable gate array (FPGA) device (Xilinx Virtex 7 XC7V690T) with novel direct bin-width calibrations. We combine the tuned tapped delay lines (TDLs) and a modified direct-histogram architecture to correct the nonuniformity originated from carry chains, and use a multiphase sampling structure to minimize the skews of clock routes. Results of code density tests show that the proposed TDC has much better linearity performances than previously published TDCs. Moreover, our TDC does not generate missing codes. For a single TDL, the differential nonlinearity (DNL) is within [-0.38, 0.87] LSB (the least significant bit: 10.5 ps) with  $\sigma_{DNL} = 0.20$  LSB, and the integral nonlinearity (INL) is within [-1.23, 1.02] LSB with  $\sigma_{\text{INL}} = 0.50$  LSB. Based on the modified direct-histogram architecture, a direct bin-width calibration method was implemented and verified in the FPGA. By implementing embedded bin-width calibrations, the histogram data of TDCs can be calibrated on the fly. After the calibration, the  $DNL_{pk-pk}$  (peak-to-peak DNL) and  $INL_{pk-pk}$  (peak-to-peak INL) can be reduced to 0.08 LSB with  $\sigma_{DNL} = 0.01$  and 0.13 LSB with  $\sigma_{INL} = 0.02$  LSB, respectively.

Index Terms—Carry chains, code bin width, field-programmable gate arrays (FPGA) multiphase clock, time-of-flight, time-to-digital converters (TDCs).

#### I. INTRODUCTION

TIME-TO-DIGITAL converters (TDCs) are required in many time-resolved applications due to their excellent performances in timing resolution; they have been widely

Manuscript received October 11, 2016; accepted November 10, 2016. Date of publication April 12, 2017; date of current version June 7, 2017. This work was supported in part by the Royal Society under Grant IE140915, in part by the Engineering and Physical Sciences Research Council under Grant EPSRC: EP/M506643/1, and in part by the China Scholarship Council. The Associate Editor coordinating the review process was Dr. Niclas Bjorsell.

H. Chen and D. D.-U. Li are with the Centre for Biophotonics, Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow G4 0RE, U.K. (e-mail: haochang.chen@strath.ac.uk; david.li@strath.ac.uk).

Y. Zhang is with the Centre for Biophotonics, Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow G4 0RE, U.K., and also with the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310007, China (e-mail: yongliang.zhang@strath.ac.uk).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIM.2017.2663498

applied in space sciences [1]–[3], medical diagnosis and imaging [4]–[14], nuclear physics [15], [16], quantum communications [17]–[19], and time-of-flight detections [20]–[23]. TDCs are actually high-precision (several picoseconds) stopwatches that are capable of time-tagging fast events and generating corresponding digital codes. For example, TDCs in time-correlated single photon counting instruments [24], [25] generate picosecond timestamps for photon events in fluorescence lifetime imaging microscopy (FLIM), fluorescence spectroscopy [9]–[13], or time-resolved luminescence experiments for characterizing solid-state materials [26].

With rapid advances in CMOS and digital technologies, TDCs can be implemented in application-specific integrated circuits (ASIC) [14], [20], [27]–[34] or field programmable gate arrays (FPGA) [35]–[59] to achieve a subnanosecond resolution. Compared with FPGA-based TDCs, ASIC-based solutions usually have better precision and linearity [29], [30]. However, they are more expensive and time consuming, usually more suitable for large-scale commercial products. On the other hand, FPGA TDCs provide greater flexibility with a shorter developing cycle for prototyping and verifications. FPGAs are reprogrammable, easy to access (low cost), and promising for product developments. Furthermore, recent advances in FPGAs have allowed tapped delay line (TDL) TDCs to achieve a resolution less than 20 ps [45], [46].

The simplest digital TDCs can be implemented by clock driven counters, but the time resolution of this type of TDCs is limited by the clock frequency [29], [36]. To achieve a better resolution, vernier delay line (VDL) [35], [43] and TDL methods [35], [37], [47] have been widely used. Furthermore, coarse and fine code methods and interpolation methods [43], [50] have been proposed to achieve a larger measurement range with higher precision. Besides, the cyclic pulse shrinking [39] and dynamic reconfiguration methods [41] were proposed to explore the different FPGA-TDC architectures. Over the past few years, the TDL has become a mainstream method for FPGA-TDC implementations [35], [46], [47], [51]–[58]. TDLs can be easily built using carry chains in most FPGA devices [37]. In a TDL, signals ("hits") with transitions (0-1 or 1-0) propagate along a carry chain, and they are sampled and registered by a clock at each tap.

Time intervals between the signal transitions and the rising edge of a sampling clock can be estimated through the registered codes. Thus, the resolution of a TDL-TDC is determined by the propagation speed of carry chains. In 1997, an FPGA-based VDL-TDC was reported [35] achieving 200 ps resolution. In 2006, Xilinx (Virtex-II) and Altera (ACEX 1 K) FPGAs were used to achieve 69.4 and 112.5 ps, respectively [51]. Chen et al. implemented a 17-ps TDC in a 65-nm CMOS FPGA (Xilinx Virtex 5) in 2009 [47]. Fishburn et al. [46] used a 40-nm CMOS FPGA (Xilinx Virtex 6) to achieve 10 ps. For a raw TDL-TDC, the manufacturing process of FPGA is the main factor that determines the resolution. Several methods have been presented to break this process-related limitation to achieve a better resolution such as the wave union (WU) method [40], [53], [60] and the multichain averaging method [49], [52]. WU approaches, however, require extra binary converters and data processing units, whereas the multichain averaging method needs multichannel TDCs to serve one channel.

The dead time is the shortest time interval between the end of a measurement and the start of the next one [36], and it has been studied by various research groups [38], [54], [55], [58]. To reduce the dead time and increase the conversion rate, multichannels TDCs [40], [46], [57], [61] are commonly applied. Dutton *et al.* [54] proposed a multiple-events TDC using a new direct-histogram architecture in a single TDL to allow capturing multiple events for each sampling period and reducing the dead time to less than 500 picoseconds in Virtex 5 FPGA devices.

The nonlinearity of TDCs is usually quantified by differential nonlinearity (DNL) and integral nonlinearity (INL) and evaluated by statistical code density tests [36], [38]. The code density test feeds the TDC with an amount of random hits in time, and the number of hits collected in a single bin is proportional to the individual code bin width [56]. ASIC-based TDCs can achieve better linearity DNL < 1 least significant bit (LSB) and INL < 1 LSB by optimizing circuit design and layouts [29], [38]. The nonlinearity of FPGA-based TDCs is usually worse than that of ASIC-based solutions, due to the skews of sampling clocks and the nonuniformity of carry chains [36], [47], [58]. Won et al. [56] proposed a dual-phase method to reduce the nonlinearity originated from clock routes by placing two parallel TDLs in the central area of an FPGA. Downsampling or decimation methods can be used to reduce the nonuniformity of carry chains, but these methods also degrade the resolution [47], [61]. Wang and Liu proposed a way to reduce the nonuniformity of carry chains and the number of missing codes by realigning and reorganizing the output codes of TDLs [58]. In 2016, Won and Lee reported a tuned-TDL method and implemented it in Xilinx Kintex 7, Virtex 6 and Spartan 6 FPGAs [55]. Their work shows improvements in linearity and bin-width distributions, but they are unable to remove missing codes completely.

Calibrations of process, voltage, and temperature variations and nonlinearities are necessary [59], [60] in FPGA-TDCs. Static nonlinearities caused by the

nonuniformity of TDLs and clock distributions are commonly calibrated by applying on-line or off-line bin-by-bin calibration [37], [59].

In this paper, our contributions are, for the first time as follows.

- To combine the tuned-TDL [55] and the modified direct-histogram architectures [54] to enhance the linearity of carry chains and completely remove missing codes and significantly suppress the number of very-narrow (code bin width <0.33 LSB) and very-wide (code bin width >2 LSB) bins, see Section III, that appear in almost all FPGA-TDCs;
- To implement a multiphase sampling approach inspired by dual-phase TDL-TDCs [56] to minimize the clock skews (therefore enhance the linearity) and to lower the requirements for clock frequencies (from 476 to 159 MHz for Virtex 7 FPGAs);
- To propose innovative on-line bin-width calibrations without any additional processing time by using hardware-friendly weighted addends and bit-shifting operations;
- To implement the proposed TDC in 28-nm Virtex 7 FPGAs.

Comparisons with previously reported TDCs are summarized in the Table I. The proposed TDC clearly shows excellent linearity performances.

#### II. DESIGN AND ARCHITECTURE

We implemented the tuned-direct-histogram TDC in a 28-nm Xilinx Virtex-7 FPGA. The proposed TDC can be used to interface CMOS single-photon avalanche diodes (SPAD) for ranging, FLIM or positron emission tomography applications [7]. The output signals of CMOS SPADs [25] are compatible with the proposed TDC with very simple frontend circuitry converting the SPAD signals into digital ones. The dead time of a SPAD can range from several to tens of nanoseconds. Benefiting from a short dead time (<500 ps), the proposed TDC can serve multichannel SPADs for high-speed time-resolved spectroscopy applications [7].

The architecture of the TDC is shown in Fig. 1. The 'start' port of a tuned-TDL is buffered by a hit signal driven inverter. The TDL is based on cascaded carry chain modules (called CARRY in Xilinx FPGAs) with modified outputs. The states along the TDL are registered by D-flip flops as thermometer codes (1111000... or 0000111...), to be converted to onehot codes (0001000...) by the XOR-based edge detector to indicate the position of the transitions. We applied the directhistogram architecture. But in order to apply the bin width calibration, each bit of the one-hot code drives a synchronous counter instead of a ripple counter used in Dutton's original design [54]. The diagram in Fig. 1(b) shows that each synchronous counter has multiple count registers (according to the coarse code) to extend the measurement range. By this arrangement, these counters can be further employed for novel bin-width calibrations to be detailed in Section III-D. In order to cover the sampling clock period and reduce the length of

| COMPARISON OF DIFFERENT FPGA TDCs |                                     |      |               |                 |                        |                                                   |                                    |
|-----------------------------------|-------------------------------------|------|---------------|-----------------|------------------------|---------------------------------------------------|------------------------------------|
| Authors                           | methods                             | Year | Devices       | Resolution (ps) | RMS<br>uncertainty(ps) | $rac{	ext{DNL, DNL}_{	ext{pk-pk}}}{	ext{(LSB)}}$ | INL, INL <sub>pk-pk</sub><br>(LSB) |
| J. Kalisz [35]                    | VDL                                 | 1997 | QuickLogic    | 200.0           | N/S                    | [-0.50, 0.50], 1.00                               | [-0.20, 1.40], 1.60                |
| R. Szplet [43]                    | Time-coding delay line              | 2000 | QuickLogic    | 100.0           | 70.00                  | N/S                                               | N/S                                |
| J. Wu [37]                        | TDL                                 | 2003 | ACEX 1K       | 400.0           | 130.00                 | N/S                                               | N/S                                |
| I C [51]                          | TDL                                 | 2006 | ACEX 1K       | 65.0            | 129.40                 | [-0.40, 0.80], 1.20                               | [-0.60, 0.70], 1.30                |
| J. Song [51]                      |                                     |      | Virtex-II     | 46.2            | 93.00                  | [-1.00, 1.10], 2.10                               | [-2.00, 1.90], 3.90                |
| I W [52]                          | WU-A, TDL                           | **** | Cyclone-II    | 30.0            | 25.00                  | N/S                                               | N/S                                |
| J. Wu [53]                        | WU-B, TDL                           | 2008 |               | N/S             | 10.00                  | N/S                                               | N/S                                |
| A. Amiri [48]                     | Vernier matrix                      | 2009 | Spartan-3     | 75.0            | 300.00                 | [-1.00, 2.50], 3.50                               | [-2.50, 3.00], 5.30                |
| C. Favi [47]                      | TDL, Turbo Mode                     | 2009 | Virtex-5      | 17.0            | 24.20                  | [-1.00, 3.55], 4.55                               | [-3.00, 2.58], 5.58                |
| J. Wang [59]                      | Fledged TDC                         | 2010 | Virtex 4      | 50.0            | 25.00                  | [-0.40, 1.40], 1.80                               | [-1.30, 1.70], 3.00                |
| R. Szplet [39]                    | Pulse-shrinking                     | 2010 | Spartan-3     | 42.0            | 56.00                  | N/S, 0.98                                         | N/S, 4.17                          |
| M. Daigneault [41]                | Dynamic reconfiguration             | 2011 | Virtex-II Pro | 50.0            | 35.00                  | [-0.80, 1.90], 2.70                               | [-2.20, 1.60], 3.80                |
| M. Fishburn [46]                  | TDL                                 | 2013 | Virtex 6      | 9.8             | 19.60                  | [-1.00, 1.50], 2.50                               | [-2.25, 1.61], 3.86                |
| N. Dutton [54]                    | TDL, direct histogram               | 2014 | Virtex 5      | 16.3            | N/S                    | [-0.90, 3.00], 3.90                               | [1.50; 5.00], 6.50                 |
| 0.01 [52]                         | TDL, multichain averaging           | 2015 | Virtex-6      | 1.5             | 4.2                    | [ -0.70, 0.80], 1.50                              | [-1.00, 0.70], 1.70                |
| Q. Shen [52]                      |                                     |      |               | (M=16)          | (M=16)                 | (M=8, LSB=24ps)                                   | (M=8, LSB=24ps)                    |
| Y. Wang [58]                      | Bin realignment,<br>decimation, TDL | 2015 | Kintex-7      | 17.6            | 12.70                  | [-1.00, 0.84], 1.84 <sup>a</sup>                  | [-0.81, 0.87], 1.68 <sup>a</sup>   |
| J. Won [56]                       | Dual-phase, TDL                     | 2016 | Virtex 6      | 10.0            | 11.03                  | [-1.00, 1.91], 2.91                               | [-2.20, 3.93], 6.13                |
|                                   |                                     |      | Kintex-7      | 10.6            | 8.13                   | [-1.00, 1.45], 2.45                               | [-1.23, 4.30], 5.53                |
| J. Won [55]                       | Tuned-TDL                           | 2016 | Virtex-6      | 10.1            | 9.82                   | [-1.00, 1.18], 2.18                               | [-3.03, 2.46], 5.49                |
|                                   |                                     |      | Spartan-6     | 16.7            | 12.75                  | [-1.00, 1.22], 2.22                               | [-0.70, 2.54], 3.24                |
| This                              | Tuned-TDL,                          | 2017 | V24 7         | 10.5            | 5.11                   | [0.20 0.07] 1.25                                  | [122 102] 2.25                     |

TABLE I

COMPARISON OF DIFFERENT FPGA TDCS

N/A: not specified TDL: tapped delay line, WU-A: wave-union A (finite step response), WU-B: wave-union (infinite step response), a estimated from figures on the paper.

10.5

10.5

5.11

4.42

Virtex-7

Virtex-7

2017

2017

the TDL simultaneously, we proposed a multiphase sampling architecture, Fig. 1(c), extended from the dual-phased method. The histogram data is stored in the registers of the counters and buffered in Block RAM, and transferred to a PC via an on-board universal asynchronous receiver/transmitter (UART) module.

direct-histogram, multiple-phase Bin-width calibration

#### A. CARRY4 Nonlinearity and Tuned-TDL

This work

As Fig. 1 shows, each CARRY4 module includes four cascaded carry elements with each containing a direct and an XORed outputs (labeled as 'C' and 'S' ports, respectively). The first and last elements in a CARRY4 have 'cin' and 'cout' ports respectively for connecting with adjacent CARRY4s. CARRY4 modules provide fast propagation, but the large nonuniformity between delay taps leads to poor linearity. Most previously published FPGA-TDCs used four C-type ports as 'CCCC'. The sampling patterns of delay elements were described in Won's recent work [55], and large non-uniformity can be observed when certain patterns of output ports were used (such as 'CCCC' or 'SSSS'). The principle and theoretical justification have been described in their work, and they suggested that C- and S-type outputs should be used alternately as 'SCSC' to obtain better

performances (for Xilinx Kintex-7 and Virtex-6 devices). We also performed different patterns, and the pattern 'SCSC' also provides the best performance for our Xilinx Virtex-7 device (XC7V690T).

[-0.38, 0.87], 1.25

[-0.04, 0.04], 0.08

[-1.23, 1.02], 2.25

[-0.09, 0.04], 0.13

# B. Missing Codes and Modified Direct-Histogram Architecture

Due to the nonuniformity of carry chains in a TDL, "bubbles" (1110100... or 00010111...) are generated in thermometer codes after sampling. Traditional TDL-TDCs convert one-hot codes to binary-codes, and bubbles must be removed by bubble-proof circuits [47], [60]. However, after the bubble removal, missing codes (DNL  $\leq$  -0.9) appear as the taps are not able to detect enough hit events [58]. Some research groups proposed the bin realignment and tuned-TDL methods to reduce missing codes, but they are not able to remove missing codes completely.

The direct-histogram architecture [54] used in this paper does not convert thermometer codes to binary codes, and the bubbles are counted into the histogram on purpose to remove the missing codes. When bubbles appear in thermometer codes, multihot codes (0011 100) are generated by the XOR edge detector, and the missing codes are compensated and filled up by the multihot codes. Dutton's direct-



Fig. 1. (a) Block diagram of the tuned-TDL with the direct histogram. (b) Block diagram of a histogram counter (c) Triple-phase sampling architecture.

histogram design does not have missing codes, however, its linearity performances are not satisfactory. The proposed TDC combines the direct-histogram with the tuned-TDL, not only removing the missing codes completely, but also greatly enhancing the linearity (see Section III-C). Although bubbles introduce errors, they are static and can be corrected easily by bin-width calibrations according to our study. Our study shows that missing codes have more dominating effects on the linearity performances of a TDC. In addition, benefiting from the direct-histogram architecture, multiple events can be recorded simultaneously by the same TDL, and the dead time is reduced to only hundreds of picoseconds [54].

# C. Clock Distribution Routes and Multiphase Architecture

FPGA chips have well-designed clock routes and different clock regions (CR), as shown in Fig. 2 to reduce clock skews [62]. In order to optimize the linearity of an FPGA-TDC, clock skews have to be considered carefully. The clock signal is delivered to a dedicated global buffer (BUFG) in the center of FPGA chips. And clock signals spread to upper and lower parts of the chip along two vertical routes and branch to horizontal subroutes (in the middle nodes of each CR). The direction of the TDL is vertical, and a large skew exists between two delay cells that are located at the boundary of two adjacent CRs.

The length of the TDL (N, the number of bins in a TDL) should be able to cover at least one period of the sampling clock,  $LSB \times N \geq \tau$  (where LSB is the average code bin width and  $\tau$  is the period of the sampling clock), otherwise the TDC cannot capture events completely [56]. In Virtex 7 devices, the maximum clock frequency (of different speed grades) is from 450 MHz (2.22 ns) to 600 MHz (1.67 ns) [63]. Devices operating at higher clock frequencies are more prone to timing errors. On the other hand, TDLs have larger nonlinearity when they cross the boundaries of CRs. To avoid crossing



Fig. 2. Clock routes, CR, and the clock signal connections of a full length (2000 bins) TDL.

CR boundaries, the length and the location of a TDL should be controlled properly. It is hard to achieve a short TDL and use a high-clock frequency simultaneously, if a single TDL is used. Won *et al.* [56] proposed a dual-phase method to reduce the length of the TDL and to allow the TDCs operating at a lower clock frequency. The dual-phase method used two parallel TDLs sampled by two clocks with  $0^{\circ}$  and  $180^{\circ}$  phases, respectively, and therefore the length of each TDL only needs to cover half of the clock period LSB ×  $N \ge \tau/N_{\rm phase}$  (where the  $N_{\rm phase}$  is the number of phase). The number of phases does not influence the linearity of the TDC directly. A large



Fig. 3. Timing diagrams for the proposed TDL-TDC with triple-phase sampling architectures. The hit signals are sampled by three clock signals separately and recoded in corresponding one-hot codes.

number of phases reduces the clock frequency, but it also increases the system complexity. There is a trade-off, and the number of phases is selected according to the devices or system specifications. After performing full-length TDL tests (it confirms that using dual-phase sampling causes more timing errors), we used the multiphase architecture with three sampling phases. Three parallel TDLs are sampled by three clock signals with 0°, 120°, 240° phase shifts, respectively, and each TDL covers one-third of the clock period. The timing diagrams of the triple-phase architecture are shown in Fig. 3.

#### D. Bin Width Calibration

The bin-by-bin calibration method was widely used to enhance the linearity of TDCs [57], [58], [60]. This method can be summarized as (1), where the calibrated time of Bin n,  $t_n$ , can be derived as

$$t[n] = \frac{W[n]}{2} + \sum_{k=0}^{n-1} W[k]$$
 (1)

where W[n], W[k] are the code bin width of the code bins n and k. The effect of applying (1) is equivalent to discarding all missing codes. However, removing missing codes also reduces the number of effective bins [58]. Equation (1) does not provide a significant improvement even after calibration according to (3) in [40]; the RMS error  $\sigma \sim W_{\rm max}/\sqrt{12}$  [60] ( $\sim$ 0.6 LSB, if  $W_{\rm max}/2$  LSB, is  $W_{\rm max}$ the maximum code bin width).

Another simple calibration approach can be easily derived from the definition of the DNL [54]. Different from the original design for post calibration, we propose a new strategy to allow on-line calibration, denoted as bin-width calibration in this paper. Because the count in a histogram bin is proportional to the code bin width, the DNL is related to the actual count

of code bin k, H[k], and the ideal count, H, as

$$DNL[k] = \frac{(W[k] - Q)}{Q} = \frac{H[k]}{H} - 1$$
 (2)

$$H = \frac{\sum_{k=0}^{N-1} H[k]}{N}$$
 (3)

where Q is the ideal code bin width in a code density test, and N is the number of the bins in a TDL. DNL[k] (k = 1, ..., N) can be obtained after the density test and stored for on-line calibrations. To generate the calibrated count for Bin k,  $H_{\text{cal}}[k]$ , (2) can be rewritten as

$$H_{\text{cal}}[k] = \frac{H[k]}{DNL[k]+1} = \sum_{n=0}^{H[k]-1} \frac{1}{DNL[k]+1}$$
$$\Rightarrow 2^{M} \cdot H_{\text{cal}}[k] = \sum_{n=0}^{H[k]-1} \frac{2^{M}}{DNL[k]+1}. \tag{4}$$

To perform (4), TDCs should not contain any missing codes, and therefore most FPGA-based TDCs with missing codes are not able to apply (4) directly. To implement  $(1 + DNL[k])^{-1}$  in FPGAs with minimum resources, two steps are performed:

1) adding  $2^M \cdot (1 + DNL[k])^{-1}$  to the accumulator for each detected event in the *k*-bin as below

$$\overline{H}_{cal}^*[k] = \overline{H}_{cal}[k] + 2^M (1 + DNL[k])^{-1}$$
 (5)

2) right shifting the accumulator by M-bit to obtain

$$H_{\text{cal}}[k][I-1:0] = \overline{H}_{\text{cal}}^*[k][I+M-1:M]$$
 (6)

where  $H_{\text{cal}}[k]$  is stored in an *I*-bit register. The advantages of this method (5 and 6) are: 1) it is extremely easy to implement and 2) no post processing is needed. The disadvantage is that more resources are required for a bigger M.

## III. EXPERIMENTS AND RESULTS

To evaluate the proposed TDC, code density tests were performed. A raw-TDL and a tuned-TDL were tested with the traditional and the modified direct-histogram architectures, respectively (four combinations). The bin-width calibration method was tested and discussed as well. Two independent low-jitter crystal oscillators (DSC1103) were used as the signal sources for code density tests. The temperature and operating voltage on the FPGA chip were maintained within a stable range (temperature:  $30.1^{\circ}\text{C} \pm 0.3^{\circ}\text{C}$ , voltage:  $0.995 \text{ V} \pm 0.002 \text{ V}$ ).

#### A. Full-Length TDL Test

In order to determine the length and location of the TDL, a full-length TDL with 2000 bins (from bin 0 to bin 1999) was tested. The TDL fully covers a column of slices in the FPGA chip and crosses ten CRs as shown in Fig. 2. The DNL plots, shown in Fig. 4, show large nonlinearity (DNL > 2 LSB) appearing at the boundaries of CRs (at bins 200, 400, 600, 800, 1200, 1400, 1600, and 1800) except at the boundary (bin 1000) between two central CRs (CRX1Y4: bin 800 to bin 999; CRX1Y5: bins 1000 to bin 1199) shown in Fig. 2. The reason of the exception is that these two CRs are symmetrical in terms of the clock routing. At bin 1100 (corresponding to Node *B*,



Fig. 4. DNL plots of full-length (2000 bins) tuned-TDLs with the traditional and the direct-histogram architectures.



Fig. 5. DNL plots of (a) raw-TDL (b) tuned-TDL. INL plots of (c) raw-TDL and (d) tuned-TDL.

the middle of CRX1Y5) and 900 (Node A), two smaller DNLs peaks (DNL > 1 LSB) are noticible (highlighted by the arrows) caused by the clock skews at Nodes A and B. In order to minimize nonlinearity, the length of the single TDL is set to have 200 bins (from bin 900 to bin 1100). In the Vertex-7 FPGA, the average code bin width is 10.5 ps. A single TDL with 200 bins has a propagation delay of 2.1 ns. Three parallel TDLs were implemented for the proposed multiphase method in two central CRs (X1Y4 and X1Y5). Each TDL only covers one-third of the clock period. With this arrangement, the minimum frequency of the sampling clock signal can be reduced to 159 MHz.

#### B. Linearity Tests

In this paper, we compared the direct-histogram architecture with some traditional architectures. The tested TDLs are located between Slice-X106Y225 and Slice-X106Y274 (50 Slices, 200 bins). The output pattern (as 'CCCC') of the



Fig. 6. Bin-width distributions using the traditional thermometer-to-binary method (red bar) and the direct-histogram architecture (black bar) for (a) raw-TDL and (b) tuned-TDL in Virtex-7 FPGAs.



Fig. 7. DNL and INL curves of a single tuned-TDC with the direct-histogram architecture after bin-width calibration with different M values (M = 0, 2, 5).

CARRY4 was used for the raw-TDL, whereas the 'SCSC' output pattern was applied for the tuned-TDL. The DNL, INL and their standard deviations ( $\sigma_{\rm DNL}$  and  $\sigma_{\rm INL}$ ) are the main parameters for evaluating the linearity. From the results shown in Fig. 5 and Table II, the linearity is greatly improved after applying the tuned-TDL and the direct histogram methods. The DNL is reduced from [-1, 4.34] to [-0.96, 1.6] LSB for the raw-TDL and from [-1, 1.53] to [-0.38, 0.87] LSB for the tuned-TDL.  $\sigma_{\rm DNL}$  decreases significantly from 1.20 to 0.61 LSB and from 0.57 to 0.19 LSB, respectively. The INL is reduced from [-6.85, 2.50] to [-2.85, 1.61] LSB for the raw-TDL and from [-2.66, 1.20] to [-1.23, 1.02] LSB for the tuned- TDL.  $\sigma_{\rm DNL}$  decreases from 1.85 to 0.92 LSB and from 0.81 to 0.50 LSB, respectively.

#### C. Bin Width Distribution

The integration of the tuned-TDL and direct-histogram significantly improves the uniformity of the code bin widths as well. According to the bin-width distributions shown in Fig. 6, the traditional raw-TDL generates a large number of missing

TABLE II
RESULTS OF CODE DENSITY TESTS FOR DIFFERENT ARCHITECTURES

| Unit: LSB                                                       | raw-         | TDL                 | tuned-TDL    |                     |  |
|-----------------------------------------------------------------|--------------|---------------------|--------------|---------------------|--|
|                                                                 | traditional  | direct<br>histogram | traditional  | direct<br>histogram |  |
| DNL                                                             | [-1,4.34]    | [-0.96,1.60]        | [-1,1.53]    | [-0.38,0.87]        |  |
| $DNL_{pk-pk}$                                                   | 5.34         | 2.56                | 2.53         | 1.25                |  |
| $\sigma_{\!\scriptscriptstyle DNL}$                             | 1.20         | 0.61                | 0.58         | 0.20                |  |
| INL                                                             | [-6.85,2.50] | [-2.85,1.61]        | [-2.66,1.20] | [-1.23, 1.02]       |  |
| $INL_{pk-pk}$                                                   | 9.35         | 4.47                | 3.86         | 2.25                |  |
| $	extit{INL}_{pk	extit{-}pk} \ oldsymbol{\sigma}_{	extit{INL}}$ | 1.85         | 0.92                | 0.81         | 0.50                |  |

TABLE III
EQUIVALENT BIN WIDTH AND EQUIVALENT STANDARD DEVIATION

| Unit: ps        | Raw-        | TDL                 | Tuned-TDL   |                     |  |
|-----------------|-------------|---------------------|-------------|---------------------|--|
|                 | traditional | direct<br>histogram | traditional | direct<br>histogram |  |
| $W_{ea}$        | 27.57       | 15.65               | 15.07       | 11.15               |  |
| $\sigma_{\!eq}$ | 7.95        | 4.51                | 4.35        | 3.22                |  |

codes and shows poor uniformity ( $\sigma_{bin-width} = 12.60$  ps). Even with the direct-histogram architecture applied in the raw-TDL, the improvement ( $\sigma_{bin-width} = 6.40$  ps) is not significant and very-wide (DNL > 2 LSB) and very-narrow (DNL < 0.33 LSB) bins still exist although missing codes are removed. In Fig. 6(b), the tuned-TDL improves the distribution of the bin-width ( $\sigma_{bin-width} = 5.98$  ps) and reduces the number of the missing codes. However, combining the direct-histogram and the tuned-TDL not only significantly improves the distribution of the bin-width ( $\sigma_{bin-width} = 2.10$  ps), but also completely removes the missing codes and reduces the numbers of very-narrow and very-wide bins.

# D. Equivalent Bin Width and Equivalent Standard Deviation

The equivalent bin width  $w_{\rm eq}$  and the equivalent standard deviation  $\sigma_{\rm eq}$  were proposed by Wu for assessing the linearity performance of TDCs [64], defined as

$$\sigma_{\rm eq}^2 = \Sigma_i \left( \frac{W[i]^2}{12} \times \frac{W[i]}{W_{\rm total}} \right), \text{ where } W_{\rm total} = \Sigma_i W[i]$$
 (7)

$$w_{\rm eq} = \sigma_{\rm eq} \sqrt{12} = \sqrt{\Sigma_i \left(\frac{W[i]^3}{W_{\rm total}}\right)}.$$
 (8)

Applying (7) and (8) to raw and tuned TDLs (with traditional or with direct histogram architectures),  $w_{\rm eq}$  and  $\sigma_{\rm eq}$  are summarized in Table III. The proposed TDC shows the best results with  $w_{\rm eq} = 11.15$  ps and  $\sigma_{\rm eq} = 3.22$  ps before calibration.

# E. Hardware Bin-Width Calibration

The proposed TDC has no missing codes, and therefore, from (3), the bin-width calibration can be implemented in

TABLE IV
RESULTS OF CALIBRATED TDC WITH DIFFERENT M NUMBER

|                                               | M = 0 | M = 1 | M=2   | M=3   | M = 4 | M = 5 | M = 6 | M = 7 |
|-----------------------------------------------|-------|-------|-------|-------|-------|-------|-------|-------|
| DNL pk-pk(LSB)                                | 1.25  | 0.62  | 0.34  | 0.20  | 0.11  | 0.08  | 0.08  | 0.07  |
| $\sigma_{DNL}$ (LSB) $INL_{pk\cdot pk}$ (LSB) | 0.20  | 0.14  | 0.08  | 0.04  | 0.02  | 0.01  | 0.01  | 0.01  |
| $INL_{pk\cdot pk}(LSB)$                       | 2.25  | 2.20  | 1.47  | 0.74  | 0.30  | 0.13  | 0.16  | 0.14  |
| $\sigma_{INL}$ (LSB)                          | 0.50  | 0.58  | 0.37  | 0.18  | 0.02  | 0.02  | 0.04  | 0.03  |
| $W_{eq}$ (ps)                                 | 11.15 | 10.81 | 10.59 | 10.52 | 10.55 | 10.55 | 10.55 | 10.55 |
| $\sigma_{\!eq}\left(\mathrm{ps} ight)$        | 3.22  | 3.12  | 3.06  | 3.04  | 3.05  | 3.05  | 3.05  | 3.05  |

FPGAs, (3)–(5), without extra processing time. A larger M leads to a better calibration for the code bin width, but M > 5 does not improve much further. Fig. 7 and Table IV show DNL and INL curves for different M number (from 0 to 7). Comparing the uncalibrated tuned-TDL (M=0) with the calibrated tuned-TDL (M=5), both the DNL and INL are significantly reduced. The DNL<sub>pk-pk</sub> and INL<sub>pk-pk</sub> are reduced by more than 16-fold and 17-fold, respectively, by setting M=5. The standard deviations,  $\sigma_{\rm DNL}$  and  $\sigma_{\rm INL}$ , decrease by about 20-fold and 25-fold, respectively.

# F. Time Interval Measurements

To verify the linearity of the proposed TDL, a programmable delay generator called IDELAYE2 in Virtex-7 FPGAs was used to generate known time intervals [65]. The delay of each tap in the IDELAYE2 was continuously calibrated by an IDELAYCTRL module based on a low jitter reference clock. The tap delay is  $39 \pm 5$  ps per tap when the reference clock is working at 400 MHz [63]. Furthermore, the external jitter and error are minimized since the time intervals are generated in the FPGA chip and sent to the TDC directly. A copy of the sampling clock was delayed by an IODELAY module and connected to the 'start' port of a single-channel TDL. By controlling the tap value of the IODELAY, the time intervals from 1244 to 2464 ps in a step around 38.1 ps were provided and measured by both uncalibrated and calibrated TDCs. The time intervals were also measured by a commercial time-correlated single-photon counting (TCSPC) module (PicoQuant PicoHarp 300, with 4 ps resolution and DNL < 5%, peak < 1% rms). Each measurement captured more than 100 000 samples, and the time intervals were calculated based on the histogram. The measurement results and the differences between the measured and expected values for uncalibrated and calibrated tuned-TDLs are shown in Fig. 8. The standard deviations of the measurements were calculated according to the differences, and they are 5.11 and 4.42 ps, for the uncalibrated and calibrated TDCs, respectively.

# IV. CONCLUSION

We integrate, for the first time, the tuned-TDL, the modified direct-histogram based on the multiphase architecture to



Fig. 8. Time interval measurements of an uncalibrated TDC (left) and a bin-width calibrated TDC (right).

implement a low nonlinearity, missing-code free TDC with the fast bin-width calibration in FPGAs. The unique advantages are as follows.

- The synergistic effects brought by this combination are significant in suppressing the nonuniformity according to the tested DNL and INL, measurement deviations, the equivalent bin widths and their standard deviations. Moreover, the missing codes are completely removed.
- The multiphase method extended from the dual-phase method provides extra design flexibility to minimize the nonlinearity from clock route skews and to lower the timing requirements for the clock frequency simultaneously.
- 3) Based on the direct-histogram architecture and the missing-code-free feature, a novel bin-width calibration method can be applied, and the performance was presented and evaluated. DNL<sub>pk-pk</sub> and INL<sub>pk-pk</sub> after calibration are reduced to 0.08 and 0.13 LSB, respectively. σ<sub>DNL</sub> and σ<sub>INL</sub> decrease to 0.10 and 0.21 ps, respectively. In summary, a new design concept for FPGA-TDC is presented and evaluated in this paper.

In previously published literatures, traditional thermometer-to-binary architectures have been fully studied and their limitations were evaluated clearly. The newly proposed direct-histogram architecture has not yet gained enough attention and has not been widely applied. This paper shows that the direct-histogram architecture can be widely applied in tuned-TDLs to achieve low nonlinearity, missing-code free with direct bin-width calibrations providing distinguished advantages over traditional methods. Although, the resource consumption is the main drawback of this architecture and need to be noticed. In the further, we will investigate the solution of this drawback continuously.

#### ACKNOWLEDGMENT

The authors would like to thank Professors Sheng-Di Lin and Chia-Ming Tsai, National Chiao-Tung University, Taiwan for lending the PicoQuant PicoHarp 300 TCSPC module.

#### REFERENCES

- J. F. Cavanaugh et al., "The mercury laser altimeter instrument for the MESSENGER mission," Space Sci. Rev., vol. 131, no. 1, pp. 451–479, Aug. 2007.
- [2] M. T. Zuber et al., "Topography of the northern hemisphere of mercury from MESSENGER laser altimetry," Science, vol. 336, pp. 217–220, Apr. 2012.
- [3] G. A. Neumann *et al.*, "Bright and dark polar deposits on mercury: Evidence for surface volatiles," *Science*, vol. 339, no. 6117, pp. 296–300, 2013.
- [4] J. S. Karp, S. Surti, M. E. Daube-Witherspoon, and G. Muehllehner, "Benefit of time-of-flight in PET: Experimental and clinical results," *J. Nucl. Med.*, vol. 49, no. 3, pp. 462–470, Mar. 2008.
- [5] D. M. Kavanagh et al., "A molecular toggle after exocytosis sequesters the presynaptic syntaxin1a molecules involved in prior vesicle fusion," Nature Commun., vol. 5, no. 5774, Dec. 2014, Art. no. 6774.
- [6] L. H. C. Braga et al., "A fully digital 8 × 16 SiPM array for PET applications with per-pixel TDCs and real-time energy output," IEEE J. Solid-State Circuits, vol. 49, no. 1, pp. 301–314, Jan. 2014.
- [7] C. Bruschini et al., "SPADnet: Embedded coincidence in a smart sensor network for PET applications," Nucl. Instrum. Methods Phys. Res. A, Accel. Spectrom. Detect. Assoc. Equip., vol. 734, pp. 122–126, Jan. 2014.
- [8] N. Marino et al., "A multichannel and compact time to digital converter for time of flight positron emission tomography," *IEEE Trans. Nucl. Sci.*, vol. 62, no. 3, pp. 814–823, Jun. 2015.
- [9] G. Giraud et al., "Fluorescence lifetime biosensing with DNA microarrays and a CMOS-SPAD imager," Biomed. Opt. Exp., vol. 1, no. 5, pp. 1302–1308, 2010.
- [10] D. D.-U. Li et al., "Video-rate fluorescence lifetime imaging camera with CMOS single-photon avalanche diode arrays and high-speed imaging algorithm," J. Biomed. Opt., vol. 16, no. 9, 2011, Art. no. 096012.
- [11] D. Tyndall *et al.*, "A high-throughput time-resolved mini-silicon photo-multiplier with embedded fluorescence lifetime estimation in 0.13 μm CMOS," *IEEE Trans. Biomed. Circuits Syst.*, vol. 6, no. 6, pp. 562–570, Dec. 2012.
- [12] S. P. Poland et al., "Development of a fast TCSPC FLIM-FRET imaging system," Proc. SPIE, vol. 8588, pp. 85880X-1–85880X-8 Feb. 2013.
- [13] S. P. Poland *et al.*, "A high speed multifocal multiphoton fluorescence lifetime imaging microscope for live-cell FRET imaging," *Biomed. Opt. Exp.*, vol. 6, no. 2, pp. 277–296, 2015.
- [14] Z. Cheng, M. J. Deen, and H. Peng, "A low-power gateable Vernier ring oscillator time-to-digital converter for biomedical imaging applications," *IEEE Trans. Biomed. Circuits Syst.*, vol. 10, no. 2, pp. 445–454, Apr. 2016.
- [15] F. Bigongiari, R. Roncella, R. Saletti, and P. Terreni, "A 250-ps time-resolution CMOS multihit time-to-digital converter for nuclear physics experiments," *IEEE Trans. Nucl. Sci.*, vol. 46, no. 2, pp. 73–77, Apr. 1999.

- [16] K. Akiba et al., "The Timepix telescope for high performance particle tracking," Nucl. Instrum. Methods Phys. Res. Section A, Accel., Spectrometers, Detect. Assoc. Equip., vol. 723, pp. 47–54, Sep. 2013.
- [17] R. Ikuta *et al.*, "Wide-band quantum interface for visible-to-telecommunication wavelength conversion," *Nature Commun.*, vol. 2, no. 537, 2011, Art. no. 1544.
- [18] M. Förtsch et al., "A versatile source of single photons for quantum information processing," Nature Commun., vol. 4, May 2013, Art. no. 1818.
- [19] N. Matsuda, "Deterministic reshaping of single-photon spectra using cross-phase modulation," Sci. Adv., vol. 2, no. 3, 2016, Art no e1501223
- [20] C. Veerappan et al., "A 160×128 single-photon image sensor with onpixel 55ps 10b time-to-digital converter," in Proc. IEEE Solid-State Circuits Conf. (ISSCC) Tech. Dig. Papers, San Francisco, CA, USA, Feb. 2011, pp. 312–314.
- [21] G. Gariepy, F. Tonolini, R. Henderson, J. Leach, and D. Faccio, "Detection and tracking of moving objects hidden from view," *Nature Photon.*, vol. 10, pp. 23–26, Dec. 2015.
- [22] D. Shin et al., "Photon-efficient imaging with a single-photon camera," Nature Commun., vol. 7, Jun. 2016, Art. no. 12046.
- [23] P. Palojarvi, K. Maatta, and J. Kostamovaara, "Pulsed time-of-flight laser radar module with millimeter-level accuracy using full custom receiver and TDC ASICs," *IEEE Trans. Instrum. Meas.*, vol. 51, no. 5, pp. 1102–1108, Oct. 2002.
- [24] W. Becker, Advanced Time-Correlated Single Photon Counting Applications. Berlin, Germany: Springer, 2015.
- [25] N. A. W. Dutton et al., "A time-correlated single-photon-counting sensor with 14GS/S histogramming time-to-digital converter," in Proc. IEEE ISSCC, San Francisco, CA, USA, Feb. 2015, pp. 1–3.
- [26] D. Bi et al., "Efficient luminescent solar cells based on tailored mixedcation perovskites," Sci. Adv., vol. 2, no. 1, 2016, Art. no. e1501170.
- [27] J.-P. Jansson, A. Mäntyniemi, and J. Kostamovaara, "A CMOS time-to-digital converter with better than 10 ps single-shot precision," IEEE J. Solid-State Circuits, vol. 41, no. 6, pp. 1286–1296, Jun. 2006.
- [28] J. Richardson et al., "A 32×32 50ps resolution 10 bit time to digital converter array in 130nm CMOS for time correlated imaging," in Proc. IEEE Custom Integr. Circuits Conf. (CICC), San Jose, CA, USA, Sep. 2009, pp. 77–80.
- [29] G. W. Roberts and M. Ali-Bakhshian, "A brief introduction to time-to-digital and digital-to-time converters," *IEEE Trans. Circuits Syst. II, Express Briefs*, vol. 57, no. 3, pp. 153–157, Mar. 2010.
- [30] B. Markovic, S. Tisa, F. A. Villa, A. Tosi, and F. Zappa, "A high-linearity, 17 ps precision time-to-digital converter based on a single-stage Vernier delay loop fine interpolation," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 60, no. 3, pp. 557–569, Mar. 2013.
- [31] Z. Cheng, X. Zheng, M. J. Deen, and H. Peng, "Recent developments and design challenges of high-performance ring oscillator CMOS timeto-digital converters," *IEEE Trans. Electron Devices*, vol. 63, no. 1, pp. 235–251, Jan. 2016.
- [32] I. Nissinen and J. Kostamovaara, "On-chip voltage reference-based time-to-digital converter for pulsed time-of-flight laser radar measurements," *IEEE Trans. Instrum. Meas.*, vol. 58, no. 6, pp. 1938–1948, Jun. 2009.
- [33] J.-P. Jansson, V. Koskinen, A. Mäntyniemi, and J. Kostamovaara, "A multichannel high-precision CMOS time-to-digital converter for laser-scanner-based perception systems," *IEEE Trans. Instrum. Meas.*, vol. 61, no. 9, pp. 2581–2590, Sep. 2012.
- [34] C.-Y. Yao, W.-C. Hsia, and Y.-J. Wen, "The soft-injection-locked ring oscillator and its application in a Vernier-based TDC," *IEEE Trans. Instrum. Meas.*, vol. 63, no. 8, pp. 2064–2071, Aug. 2014.
- [35] J. Kalisz, R. Szplet, J. Pasierbinski, and A. Poniecki, "Field-programmable-gate-array-based time-to-digital converter with 200-ps resolution," *IEEE Trans. Instrum. Meas.*, vol. 46, no. 1, pp. 51–55, Feb. 1997.
- [36] J. Kalisz, "Review of methods for time interval measurements with picosecond resolution," *Metrologia*, vol. 41, no. 1, pp. 17–32, 2004.
- [37] J. Wu, Z. Shi, and I. Y. Wang, "Firmware-only implementation of time-to-digital converter (TDC) in field-programmable gate array (FPGA)," in *Proc. IEEE Conf. Rec. NSS*, Portland, OR, USA, Oct. 2003, pp. 177–181.
- [38] P. Napolitano, A. Moschitta, and P. Carbone, "A survey on time interval measurement techniques and testing methods," in *Proc. IEEE Instrum. Meas. Techn. Conf. (IMTC)*, Austin, TX, USA, May 2010, pp. 181–186.
- [39] R. Szplet and K. Klepacki, "An FPGA-integrated time-to-digital converter based on two-stage pulse shrinking," *IEEE Trans. Instrum. Meas.*, vol. 59, no. 6, pp. 1663–1670, Jun. 2010.

- [40] E. Bayer and M. Traxler, "A high-resolution (<10 ps RMS) 48-channel time-to-digital converter (TDC) implemented in a field programmable gate array (FPGA)," *IEEE Trans. Nucl. Sci.*, vol. 58, no. 4, pp. 1547–1552, Aug. 2011.
- [41] M.-A. Daigneault and J. P. David, "A high-resolution time-to-digital converter on FPGA using dynamic reconfiguration," *IEEE Trans. Instrum. Meas.*, vol. 60, no. 6, pp. 2070–2079, Jun. 2011.
- [42] J. Wu, Y. Shi, and D. Zhu, "A low-power wave union TDC implemented in FPGA," J. Instrum., vol. 7, no. 1, 2012. Art. no. C01021.
- [43] R. Szplet, J. Kalisz, and R. Szymanowski, "Interpolating time counter with 100 ps resolution on a single FPGA device," *IEEE Trans. Instrum. Meas.*, vol. 49, no. 4, pp. 879–883, Aug. 2000.
- [44] R. Szplet, P. Kwiatkowski, Z. Jachna, and K. Rózyc, "An eight-channel 4.5-ps precision timestamps-based time interval counter in FPGA chip," *IEEE Trans. Instrum. Meas.*, vol. 65, no. 9, pp. 2088–2100, Sep. 2016.
- [45] J. Wang, S. Liu, L. Zhao, X. Hu, and Q. An, "The 10-ps multitime measurements averaging TDC implemented in an FPGA," *IEEE Trans. Nucl. Sci.*, vol. 58, no. 4, pp. 2011–2018, Aug. 2011.
- [46] M. W. Fishburn, L. H. Menninga, C. Favi, and E. Charbon, "A 19.6 ps, FPGA-based TDC with multiple channels for open source applications," *IEEE Trans. Nucl. Sci.*, vol. 60, no. 3, pp. 2203–2208, Jun. 2013.
- [47] C. Favi and E. Charbon, "A 17ps time-to-digital converter implemented in 65nm FPGA technology," in *Proc. FPGA*, Monterey, CA, USA, Feb. 2009, pp. 113–120.
- [48] A. M. Amiri, M. Boukadoum, and A. Khouas, "A multihit time-to-digital converter architecture on FPGA," *IEEE Trans. Instrum. Meas.*, vol. 58, no. 3, pp. 530–540, Mar. 2009.
- [49] M. Daigneault and J. P. David, "A novel 10 ps resolution TDC architecture implemented in a 130nm process FPGA," in *Proc. 8th Int. NEWCAS Conf.*, Montreal, QC, Canada, Jun. 2010, pp. 281–284.
- [50] R. Szplet, J. Kalisz, and Z. Jachna, "A 45 ps time digitizer with a two-phase clock and dual-edge two-stage interpolation in a field programmable gate array device," *Meas. Sci. Technol.*, vol. 20, no. 2, 2009, Art. no. 025108.
- [51] J. Song, Q. An, and S. Liu, "A high-resolution time-to-digital converter implemented in field-programmable-gate-arrays," *IEEE Trans. Nucl.* Sci., vol. 53, no. 1, pp. 236–241, Feb. 2006.
- [52] Q. Shen et al., "A 1.7 ps equivalent bin size and 4.2 ps RMS FPGA TDC based on multichain measurements averaging method," IEEE Trans. Nucl. Sci., vol. 62, no. 3, pp. 947–954, Jun. 2015.
- [53] J. Wu and Z. Shi, "The 10-ps wave union TDC: Improving FPGA TDC resolution beyond its cell delay," in *Proc. IEEE Nucl. Sci. Symp. Conf. Rec.*, Dresden, Germany, Oct. 2008, pp. 3440–3446.
- [54] N. Dutton *et al.*, "Multiple-event direct to histogram TDC in 65nm FPGA technology," in *Proc. IEEE PRIME*, Grenoble, France, Jun. 2014, pp. 1–5.
- [55] J. Y. Won and J. S. Lee, "Time-to-digital converter using a tuned-delay line evaluated in 28-, 40-, and 45-nm FPGAs," *IEEE Trans. Instrum. Meas.*, vol. 65, no. 7, pp. 1678–1689, Jul. 2016.
- [56] J. Y. Won, S. I. Kwon, H. S. Yoon, G. B. Ko, J.-W. Son, and J. S. Lee, "Dual-phase tapped-delay-line time-to-digital converter with on-the-fly calibration implemented in 40 nm FPGA," *IEEE Trans. Biomed. Circuits Syst.*, vol. 10, no. 1, pp. 231–242, Feb. 2016.
- [57] C. Liu and Y. Wang, "A 128-channel, 710 M samples/second, and less than 10 ps RMS resolution time-to-digital converter implemented in a Kintex-7 FPGA," *IEEE Trans. Nucl. Sci.*, vol. 62, no. 3, pp. 773–783, Jun. 2015.
- [58] Y. Wang and C. Liu, "A nonlinearity minimization-oriented resource-saving time-to-digital converter implemented in a 28 nm Xilinx FPGA," IEEE Trans. Nucl. Sci., vol. 62, no. 5, pp. 2003–2009, Oct. 2015.
- [59] J. Wang, S. Liu, Q. Shen, H. Li, and Q. An, "A fully fledged TDC implemented in field-programmable gate arrays," *IEEE Trans. Nucl. Sci.*, vol. 57, no. 2, pp. 446–450, Apr. 2010.
- [60] J. Wu, "Several key issues on implementing delay line based TDCs using FPGAs," *IEEE Trans. Nucl. Sci.*, vol. 57, no. 3, pp. 1543–1548, Jun. 2010.
- [61] L. Zhao et al., "The design of a 16-channel 15 ps TDC implemented in a 65 nm FPGA," IEEE Trans. Nucl. Sci., vol. 60, no. 5, pp. 3532–3536, Oct. 2013.
- [62] Xilinx. (May 12, 2016). 7 Series FPGAs Clocking Resources User Guide, UG472 (V1.12). [Online]. Available: https://www.xilinx.com/ support/documentation/user\_guides/ug472\_7Series\_Clocking.pdf
- [63] Xilinx. (May 16, 2016). Virtex-7 T and XT FPGAs Data Sheet:DC and AC Switching Characteristics, DS183 (V1.26). [Online]. Available: https://www.xilinx.com/support/documentation/data\_sheets/ds183\_ Virtex\_7\_Data\_Sheet.pdf

- [64] J. Wu, "Uneven bin width digitization and a timing calibration method using cascaded PLL," in *Proc. IEEE 19th RT*, Nara, Japan, May 2014, pp. 1–4.
- [65] Xilinx. (May 5, 2016). 7 Series FPGAs SelectIO Resources User Guide UG471 (V1.8). [Online]. Available: https://www.xilinx.com/ support/documentation/user\_guides/ug471\_7Series\_SelectIO.pdf



Yongliang Zhang received the Ph.D. degree in information and communication engineering from the National University of Defense Technology, Changsha, China, in 2010.

He was a Visiting Scientist with the Center for Biophotonics, Strathclyde Institute of Pharmacy and Biomedical Science, University of Strathclyde, Glasgow, U.K., from 2015 to 2016. He has been an Associate Professor with the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China, since 2014. His cur-

rent research interests include developments of signal processing systems implemented in field-programmable gate array devices.



Haochang Chen was born in Shaanxi, China, in 1990. He received the B.Eng. degree in electronic design automation from the University of Central Lancashire, Preston, U.K., in 2012, the B.Eng. degree in electronics and information engineering from the North China University of Technology, Beijing, China, in 2012, and the M.S. degree in embedded digital systems from the University of Sussex, Brighton, U.K., in 2013. Since 2014, he has been pursuing the Ph.D. degree in time-correlated instrumentation with the Center for Biophotonics,

University of Strathclyde, Glasgow, U.K.

His current research interests include field-programmable gate array system design for ranging and biomedical imaging applications.



David Day-Uei Li is currently a Senior Lecturer with the Center for Biophotonics, Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, U.K. He is interested in CMOS sensors and systems, mixed signal circuits, embedded systems, optical communications, FLIM systems and analysis, and field-programmable gate array/general public utilities computing. His research exploits advanced sensor technologies to reveal low-light but fast biological phenomena. He has authored more than 60 journal

and conference papers and holds 12 patents.