# Exploring the Usage of Fast Carry Chains to Implement Multistage Ring Oscillators on FPGAs: Design and Characterization

Fanny Spagnolo<sup>®</sup>[,](https://orcid.org/0000-0001-8689-4073) *Member, IEEE*, Stefan[i](https://orcid.org/0000-0003-1363-9201)a Perri<sup>®</sup>, *Senior Member, IEEE*, Massimo Vatalaro<sup>®</sup>, *Member, IEEE*, Fabio Frustaci<sup>®</sup>[,](https://orcid.org/0000-0002-5011-6621) *Senior Member, IEEE*, Felice Crupi<sup>®</sup>, *Senior Member, IEEE*, and Pasquale Corsonello<sup>19</sup>[,](https://orcid.org/0000-0002-9528-1110) *Senior Member, IEEE* 

*Abstract*— Ring oscillators (ROs) serve as basic building blocks in a lot of application scenarios, where they must ensure high reliability, flexibility, and low-area/energy footprint. With the recent advances of the Internet-of-Things (IoT) technology, in particular, the necessity to endow interconnected devices with security facilities has increased as well. In this context, the efficient implementation of ROs on field-programmable gate arrays (FPGAs) is crucial, even though it hides some pitfalls. This article presents a new design strategy for multistage ROs relying on the carry chains (CCs) available into modern FPGA devices. Several configurations of ROs designed as proposed here have been characterized in terms of hardware costs, jitter, and temperature/voltage sensitivity. In all the evaluated cases, the proposed design allows to achieve predictable routing schemes through the automatic place and route  $(P\&R)$ , while reducing slice occupancy and energy consumption by up to 50% and 44%, respectively, in comparison with the traditional lookup table (LUT)-based ROs. When realized on a Artix-7 device, the basic version of the proposed oscillator realized using 33 inverting stages allows obtaining multiphase outputs oscillating at 29.7 MHz with a standard deviation less than 10 kHz. The analysis conducted also demonstrates the high flexibility of the novel circuits, such as the possibility to easily change their behavior depending on the target application requirements. As an example, by exploiting additional pass-through elements, the proposed scheme achieves a sensitivity of 49 kHz/℃ that is more than 4 times higher than that shown by the corresponding traditional LUT-based competitor, thus making it more suitable for thermal monitoring applications.

*Index Terms*— Carry chains (CCs), digital circuits, fieldprogrammable gate array (FPGA), ring oscillators (ROs), temperature/voltage sensitivity analysis.

Manuscript received 20 November 2023; revised 28 February 2024 and 25 March 2024; accepted 20 April 2024. Date of publication 7 May 2024; date of current version 26 July 2024. This work was supported in part by PON Ricerca & Innovazione Ministero dell'Universita e della Ricerca under Grant 1062\_R24\_INNOVAZIONE; and in part by National Research Center for High Performance Computing, Big Data and Quantum Computing and by project SERICS under Grant PE00000014 within the Next Generation European Union (EU) Program. *(Corresponding author: Pasquale Corsonello.)*

Fanny Spagnolo, Massimo Vatalaro, Fabio Frustaci, Felice Crupi, and Pasquale Corsonello are with the Department of Informatics, Modeling, Electronics and Systems Engineering, University of Calabria, 87036 Arcavacata di Rende, Italy (e-mail: fanny.spagnolo@unical.it; massimo.vatalaro@unical.it; fabio.frustaci@unical.it; felice.crupi@unical.it; p.corsonello@unical.it).

Stefania Perri is with the Department of Mechanical, Energy and Management Engineering, University of Calabria, 87036 Arcavacata di Rende, Italy (e-mail: s.perri@unical.it).

Color versions of one or more figures in this article are available at https://doi.org/10.1109/TVLSI.2024.3395302.

Digital Object Identifier 10.1109/TVLSI.2024.3395302

#### <span id="page-0-2"></span><span id="page-0-1"></span><span id="page-0-0"></span>I. INTRODUCTION

**R** ING oscillators (ROs) are well established circuits widely<br>adopted as basic building blocks in several applications, adopted as basic building blocks in several applications, ranging from hardware security [\[1\],](#page-11-0) [\[2\],](#page-11-1) [\[3\],](#page-11-2) [\[4\],](#page-11-3) [\[5\],](#page-11-4) [\[6\],](#page-11-5) [\[7\]](#page-11-6) to ON-chip testing and aging monitoring [\[8\],](#page-11-7) [\[9\],](#page-11-8) [\[10\],](#page-11-9) [\[11\].](#page-11-10) An RO is able to generate an oscillatory signal by cascading an odd number of inverting stages, thus resulting in a highly flexible and compact solution compared with other oscillatory sources, such as *LC* resonators [\[12\],](#page-11-11) [\[13\]. S](#page-11-12)uch properties are crucial for the abovementioned applications, where ROs are mainly used to measure the effects of manufacturing process variations [\[1\],](#page-11-0) [\[2\],](#page-11-1) [\[3\],](#page-11-2) [\[4\],](#page-11-3) [5] [or](#page-11-4) to detect performance degradation [\[7\],](#page-11-6) [\[8\],](#page-11-7) [\[9\],](#page-11-8) [\[10\].](#page-11-9)

<span id="page-0-3"></span>With the rapid expansion of the Internet-of-Things (IoT) network, the need for miniaturized objects embedded with electronics, software, and sensors is increased as well [\[2\].](#page-11-1) In this context, the field-programmable gate array (FPGA) technology has nowadays consolidated as one of the most popular realization platform because of its high flexibility and attractive computation capabilities, especially for the implementation of heterogeneous systems-on-chips (SoCs) that exploit both programmable logic fabric and microprocessors [\[14\].](#page-11-13) In the near future, it is expected that in several application fields, such as smart cities, connected vehicles, healthcare systems, and even data centers, more and more infrastructures would benefit from the synergy between the IoT approach and FPGA technologies. This increase of interconnected devices pushes the demand for preserving the information down to the chip level. Indeed, despite of the incessant progresses in the fabrication of logic programmable devices, security is nowadays a crucial issue. As an example, preventing hard faults due to either external attacks or FPGA performance degradation caused by aging mechanism is mandatory before enabling such infrastructures in our daily lives. Also, FPGA bitstream encryption is essential for anticounterfeiting purpose. However, conventional nonvolatile memory (NVM)-based approach for storing the secret key suffers from reverse engineering and sidechannel attacks. In this regard, several approaches have gained popularity in the last decade. Most common solutions in the literature include physically unclonable function (PUF) for hardware authentication [\[1\],](#page-11-0) [\[2\],](#page-11-1) [\[3\], tr](#page-11-2)ue random number

© 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

generator (TRNG) on the fly free session keys generation [\[4\],](#page-11-3) [\[5\], v](#page-11-4)oltage sensors for remote side-channel attacks detection [\[7\], an](#page-11-6)d ON-chip aging estimation circuits [\[9\],](#page-11-8) [\[10\].](#page-11-9) All these methods rely on ROs that have to be efficiently implemented within the FPGA device.

<span id="page-1-3"></span><span id="page-1-2"></span>Although the functionality of an RO is based on a quite simple circuitry, its desired hardware characteristics, and consequently its design, are strongly target application-dependent. Just as an example, ROs used to implement a PUF circuit must be ideally no sensitive to different environmental conditions (e.g., voltage variations) [\[15\], w](#page-11-14)hich is exactly the opposite of the behavior expected by a voltage sensor [\[7\]. O](#page-11-6)ften, such designs make use of multiple RO instances, for which ensuring nominally identical frequency behaviors is mandatory [\[16\],](#page-11-15) [\[17\]. T](#page-11-16)herefore, achieving a relatively good prediction of the nominal RO frequency as a function of its length, at design stage, is crucial. However, as it is well known, SRAM-based FPGAs rely on programmable logic and configurable routing resources: during the implementation phase, unless of specific user's constraints, the place and route (P&R) tool explores the available design space relative to the target chip and adopts the default floor-planning strategy to automatically select the most proper resources, place them onto specific sites, and configure interconnections between logic blocks accordingly. As a result, implementing ROs on such devices with easy-totune, predictable, and repeatable behaviors is not trivial.

The above considerations motivate the focus of this work. Furthermore, many previous research works confirm that FPGA-based ROs designed by exploiting lookup table (LUT) primitives require manual P&R, in order to achieve reliable and effective implementations [\[7\],](#page-11-6) [\[16\],](#page-11-15) [\[17\],](#page-11-16) [\[18\],](#page-11-17) [\[19\].](#page-11-18) In addition, due to its circuitry consisting just of chained LUTs and routing resources, the conventional RO design demonstrates poor adaptability to different application requirements and relatively low flexibility in fine-tuning the oscillator behavior. With the aim to overcome these limitations, this article presents a new design methodology to efficiently deploy ROs on FPGA devices. It adopts in an unconventional manner dedicated fast carry chain (CC) resources available within modern FPGA chip families [\[20\],](#page-11-19) [\[21\]](#page-11-20) in place of configurable LUTs. As a result, the automatic P&R is driven by a dedicated interconnection scheme, thus ensuring predictable and repeatable behaviors without any user's constraint. To the best of our knowledge, this is the first proposal of using CCs to realize multistage and multiphase ROs. The proposed solution significantly simplifies the design and allows better frequency fine-tuning and higher flexibility to achieve the specific application requirements. A comprehensive study, including hardware characterization, intra-/inter-die analysis, and evaluation of sensitivity to voltage/temperature variations, has been conducted to demonstrate the effectiveness of the proposed RO design methodology. Results highlight that, in addition to the advantages in terms of reduced design complexity, high flexibility, and independence of performance of the output load, an RO designed as proposed here is cheaper and less energy consuming compared with the traditional LUT-based counterpart.

<span id="page-1-1"></span>

Fig. 1. LUT-based RO. (a) Design entry. (b) LUT internal structure.

The rest of this article is organized as follows. Section [II](#page-1-0) provides a brief background and overviews the state of the art. Section [III](#page-2-0) introduces the proposed RO design methodology, whereas its mathematical model for frequency estimation is described in Section [IV.](#page-3-0) Section [V](#page-4-0) presents experimental results obtained for ROs at different lengths based on the conventional and new design methodologies. An application case study exploiting ROs based on the proposed design methodology is also discussed in Section [VI.](#page-9-0) Finally, conclusions are drawn in Section [VII.](#page-10-0)

# <span id="page-1-6"></span>II. BACKGROUND AND RELATED WORKS

<span id="page-1-4"></span><span id="page-1-0"></span>According to the literature review from [\[22\], L](#page-11-21)UTs are the most commonly adopted resources in order to implement ROs on FPGAs. These hardware primitives can be properly configured at design time to map possibly any *m*-input Boolean function, with *m* depending on the target FPGA technology. In order to implement an *N*-stage RO, many LUTs are cascaded connected, as shown in Fig.  $1(a)$ . In such a case, with the generic LUT*i* being configured, as depicted in Fig. [1\(b\),](#page-1-1) not(I0) is outputted, regardless of the value assumed by the I1 and I2 inputs. The frequency of the RO combinatorial loop depends on the propagation delay  $\tau_p$  across the overall path, including logic  $\tau_{\text{logic}}$  and net  $\tau_{\text{net}}$  contributions. The latter is, in turn, influenced by the FPGA sites selected for placement, the length of routed interconnections, and the number of pass transistors (PTs) enabled through the programmable switch matrices.

<span id="page-1-8"></span><span id="page-1-7"></span><span id="page-1-5"></span>One of the major challenges in designing these architectures relies on the fact that the layout obtained by the P&R phase is not known a priori, unless of specific user's constraints. The automatic P&R floor-planning strategy exploits complex heuristic searches aimed at identifying a limited design space with balanced characteristics in terms of delay and interconnection density [\[24\]. I](#page-11-22)nterfering with this process in order to drive the P&R toward a desired and predictable implementation is not easy to put into practice. The complexity of the problem significantly increases for those applications that involve the usage of multiple identical ROs [\[1\],](#page-11-0) [\[4\],](#page-11-3) [\[7\],](#page-11-6) [\[8\],](#page-11-7) [\[9\],](#page-11-8) [\[10\],](#page-11-9) [\[11\],](#page-11-10) running at a specific frequency and in conjunction with surrounding logic circuitry [\[25\]. I](#page-11-23)n such a case, specific placement constraints responsible for assigning RO resources to a locked region within the FPGA chip could be used in order to avoid undesired packing of multiple

<span id="page-2-1"></span>

Fig. 2. State-of-the-art ROs. (a) IOBUF-based. (b) DSP-based. (c) Mux-based.

LUTs. However, such a strategy could be not enough. Indeed, neither the position of the generic LUT*i* nor the configured interconnections are locked within the constrained region, potentially leading to different RO frequencies at each new implementation run. As a result, the design of LUT-based ROs with predictable and repeatable frequency behaviors must necessarily pass through manual P&R [\[7\],](#page-11-6) [\[16\],](#page-11-15) [\[17\],](#page-11-16) [\[18\],](#page-11-17) [\[19\]. M](#page-11-18)oreover, given that the RO output is usually retrieved by interfering with the combinatorial loop, the propagation delay  $\tau_p$  is also influenced by the load capacitance [\[26\],](#page-11-24) [\[27\],](#page-11-25) [\[28\].](#page-11-26) This aspect represents a further challenge for many applications. Just as an example, in the case of PUFs, multiple ROs must have identical nominal characteristics, as well as marginal and balanced contributions due to the load capacitance, so that the small frequency difference due to the process variations can be detected adequately [\[26\].](#page-11-24)

<span id="page-2-3"></span><span id="page-2-2"></span>In the recent past, various alternatives to conventional LUT-based oscillators have been demonstrated [\[8\],](#page-11-7) [\[22\],](#page-11-21) [\[29\],](#page-11-27) [\[30\],](#page-11-28) [\[31\]. B](#page-11-29)urgiel et al. [\[29\]](#page-11-27) proposed to use input/output buffer (IOBUF) primitives available in Xilinx FPGAs inside a loop of a conventional LUT-based multistage RO. As illustrated in Fig.  $2(a)$ , this proposal uses the IOBUF element to drive the I/O pad pin, thus allowing the RO frequency to be tuned by changing the drive strength and slew rate. However, the usage of an IOBUF element makes the RO more sensitive to temperature variations, still requires manual P&R, and does not allow granular fine-tuning of the RO frequency. In fact, changing the slew rate of the IOBUF from SLOW to FAST introduces a scaling of approximately 3–10 MHz that cannot be modulated (similar considerations arise for the changes of the drive strength).

All other alternative proposals discuss the realization of single-stage oscillators. Some of them [\[8\],](#page-11-7) [\[30\],](#page-11-28) [\[31\]](#page-11-29) implement noncombinatorial loops by using sequential elements. Their main advantages are compactness and simplicity of the circuitry that, in its most basic design, consists of just one feedback latch/flip-flop (FF). However, the effectiveness of this architecture has been successfully demonstrated just for TRNG applications [\[30\],](#page-11-28) where the metastability produced by the feedback latch is exploited as entropy source. On the contrary, La et al.  $[22]$  evaluated the possibility to implement combinatorial loop ROs by using digital signal processing (DSP) and Mux resources available within modern FPGA devices, as schematized in Fig.  $2(b)$  and [\(c\).](#page-2-1) Both these schemes rely on dedicated resources that allow reducing the design effort mainly because of their compactness, but, as a drawback, they are suitable only for the implementation of single-stage ROs having fixed frequency. On the contrary, the design methodology proposed in this article exploits CCs in a more effective manner and enables the realization of multistage ROs whose behavior can be easily configured to comply with the requirements of the target application.

# III. PROPOSED RO DESIGN

<span id="page-2-0"></span>Nowadays, CCs are available in various extents in most FPGA devices produced by many vendors [\[20\],](#page-11-19) [\[21\]. T](#page-11-20)hey consist of hard-wired resources specialized for efficient ON-chip implementation of arithmetic operations, such as additions and multiplications. A CC typically consists of *k* internal stages of cascaded multiplexers that, in combination with auxiliary XOR gates, implement as many full adders each exploiting the basic 1-bit carry look-ahead logic. Most importantly, CCs are placed neatly within the chip, so that longer chains can be implemented by cascading multiple instances through dedicated routing. Our proposal aims at exploiting this unique property for the implementation of ROs. Indeed, during the automatic P&R step, the position of the first CC can be used as the "anchor point" for the whole design; then, both cascaded CCs and nets are placed and routed through a delay-deterministic scheme, thus avoiding the need for manual P&R.

According to the above consideration, the design illustrated in Fig. [3](#page-3-1) (in the following, named CC\_ROv1) is here proposed as the basic configuration to enable the CC unit to operate as an oscillator. Without loss of generality, the architecture includes *x* CCs, each composed by  $k = 4$  internal stages. In order to enable the propagation of the oscillating signal along the circuit, all the internal stages must be used for each CC, except for the last one where the designer can choose the convenient number of stages to be exploited. Thereby, the selectors of multiplexers MXs, the first one excepted, are properly set to constant values following the scheme shown in Fig. [3.](#page-3-1) Thus, while the multiplexers MXs in the even positions propagate the signal coming from the previous multiplexer, those in the odd ones transfer on the carry line the signal coming from the XOR gate in the previous stage. In this way, a chain of XOR gates, each acting as an inverter stage, is formed. Finally, to make odd the number of inverting stages, the first LUT, highlighted in gray, maps a NAND function: the low-to-high transition of the *en* signal triggers the first inversion through the LUT, which is then propagated to the XOR gate in order to produce  $O_0$ . It is important to note that the function mapped within the first LUT has to be chosen based on the configuration of other components (e.g., selectors and inputs of the multiplexers). We verified that such kind of changes in the first LUT content does not significantly affect the behavior of the RO architecture. In the subsequent chain positions, the proposed scheme allows the oscillating signal to be propagated to the next stage by alternating the outputs from the XOR gate and the multiplexer, respectively. As a result, the CC\_ROv1 configuration actually includes  $y = 2x + 1$  inverting stages, highlighted in gray in Fig. [3.](#page-3-1)

From the schematic of Fig. [3,](#page-3-1) it can be observed that the remaining LUTs are not necessary to produce the oscillation, but they could be exploited as pass-through elements to fine-tune the RO period. In such a case, the external multiplexers aligned to the CC even positions can be used to

<span id="page-3-1"></span>

Fig. 3. Proposed CC-based multistage RO design (CC\_ROv1 configuration). Inverting stages are realized through the XOR gates and the LUT highlighted in gray. Shadowed LUTs are exploited to permanently set the selectors of corresponding multiplexers. Forward propagation of the oscillation signal relies on the chain of MX elements.

<span id="page-3-2"></span>

Fig. 4. Proposed CC-based multistage RO design. (a) CC\_ROv2. (b) CC\_ROv3. Additional LUTs are used as pass-through elements to enable fine-tuning of the RO characteristics.  $O'_i$  signals are delayed copy of  $O_i$ .

select the corresponding LUT outputs, as depicted in Fig. [4.](#page-3-2) This choice expands the design space, leading to the new configurations CC\_ROv2 and CC\_ROv3, where one or two LUTs per CC are enabled, respectively. As deeply analyzed in the following, the proposed configurations exhibit different characteristics in terms of oscillation frequency and sensitivity to voltage/temperature variations.

Besides the interesting properties mentioned above, as shown in Fig. [3,](#page-3-1) the proposed scheme can provide the oscillating output *RO out* through one or more of the unused XOR gates, such as that at the position  $kx - 1$ , thus avoiding interference with the interconnect loop. This allows realizing efficient multiphase oscillators suitable for applications that need load-independent RO frequencies [\[1\],](#page-11-0) [\[26\],](#page-11-24) [\[28\].](#page-11-26)

## IV. MODELING THE PROPAGATION DELAY

<span id="page-3-0"></span>In this section, we introduce the mathematical model for the frequency estimation of the proposed RO scheme. Let us consider the path involved in the combinatorial loop of the CC\_ROv1 configuration, as highlighted in blue in Fig. [5.](#page-4-1) Then, the RO frequency  $f_{\text{RO}}$  can be computed as  $1/2\tau_p$ by applying [\(1\)](#page-3-3) and [\(2\)](#page-3-4) for modeling  $\tau_p$ . Therefore,  $\tau_{i\rightarrow j}$ represents the delay contribution associated with the generic segment  $i \rightarrow j$  in the path,  $\tau_{\text{LUT}}$  is the LUT access delay, and  $\tau_{\text{loop}}$  is the interconnection delay due to the net named *loop* in Fig. [5](#page-4-1)

$$
\tau_p = \tau_{\text{LUT}} + \tau_{S_0 \to O_0} + \tau_d
$$

<span id="page-3-4"></span><span id="page-3-3"></span>
$$
+(x-1)(\tau_{\text{CO}_3\to\text{CI}}+\tau_{\text{CI}\to\text{O}_0}+\tau_d)+\tau_{\text{loop}}\quad(1)
$$

$$
\tau_d = \tau_{O_0 \to B_1} + \tau_{B_1 \to O_2} + \tau_{O_2 \to B_3} + \tau_{B_3 \to CO_3}.
$$
 (2)

According to the adopted FPGA technology, the delay switching characteristics at the nominal conditions are provided by the vendor for most of the contributions highlighted in Fig. [5,](#page-4-1) except  $\tau_{O_0 \to B_1}$ ,  $\tau_{O_2 \to B_3}$ , and  $\tau_{loop}$ . Table [I](#page-4-2) summarizes the values of each contribution  $\tau_{i \to j}$ , with reference to the FPGA chips belonging to the Xilinx Artix-7 (speed grade −1) family; a similar approach could be adopted with different devices. The four examined cases, in the following, named fast–fast (FF), fast–slow (FS), slow–fast (SF), and slow–slow (SS), take into account, respectively, the process corner (first letter) and the standard delay format (SDF) adopted for delay modeling (second letter).

Table [II,](#page-4-3) instead, reports the  $\tau_{O_0 \to B_1}$  and  $\tau_{O_2 \to B_3}$  delays related to the specific tracks illustrated in Fig. [5.](#page-4-1) Although related to programmable routing, such contributions can be considered as constants, regardless of the RO length, since the start point and the endpoint of the interconnection are fixed by the adopted architecture, as it will be detailed later.

It is worth noting that the  $\tau_{\text{loop}}$  delay depends not only on how many stages are involved in the oscillator, but also on

<span id="page-4-1"></span>

<span id="page-4-2"></span>Fig. 5. Propagation delay path for the CC\_ROv1 design. Red labels identify the interconnection segments as reported in the technical documentation for delay estimation purpose.

TABLE I NOMINAL DELAY SWITCHING CHARACTERISTICS (XILINX ARTIX-7 SPEED GRADE −1)

| $\tau_{i\to j}$              | $FF$ (ns) | $FS$ (ns) | $SF$ (ns) | $SS$ (ns) |
|------------------------------|-----------|-----------|-----------|-----------|
| $\tau_{S_0\to O_0}$          | 0.060     | 0.079     | 0.170     | 0.223     |
| $\tau_{B_1\to O_2}$          | 0.110     | 0.186     | 0.358     | 0.554     |
| $\tau_{B_3 \to CO_3}$        | 0.088     | 0.140     | 0.248     | 0.385     |
| $\tau_{CO_3 \rightarrow CI}$ | 0.009     | 0.009     | 0.009     | 0.009     |
| $\tau_{CI\rightarrow O_0}$   | 0.054     | 0.085     | 0.150     | 0.235     |
| $\tau_{LUT}$                 | 0.045     | 0.056     | 0.100     | 0.124     |

<span id="page-4-3"></span>TABLE II INTRASLICE ROUTING DELAY (XILINX ARTIX-7 SPEED GRADE −1)



the interconnection length and the number of pass transistors crossed along the loop path. Without loss of generality, the dependence between  $\tau_{\text{loop}}$  and the number of inverting stages *y* is here derived for the proposed RO design by adopting a procedure similar to [\[32\]. F](#page-11-30)ig. [6](#page-4-4) and [\(3\)](#page-4-5) report the results of fitting the delay extracted at different corners for the Xilinx Artix-7 (speed grade −1) FPGA devices. As expected, the loop contribution grows with the increasing of *y*, because of the longer interconnection, with the fast process corner exhibiting less steep changes

<span id="page-4-6"></span><span id="page-4-5"></span>
$$
\tau_{\text{loop}}(y)|_{\text{FF}} = 0.009y + 0.285
$$
  
\n
$$
\tau_{\text{loop}}(y)|_{\text{FS}} = 0.011y + 0.343
$$
  
\n
$$
\tau_{\text{loop}}(y)|_{\text{SF}} = 0.018y + 0.662
$$
  
\n
$$
\tau_{\text{loop}}(y)|_{\text{SS}} = 0.021y + 0.806.
$$
 (3)

The above model has been validated by comparing its prediction with results obtained by postimplementation timing reports, for several CC\_ROv1 samples, with *y* ranging from 3 to 33. On average, the achieved error is lower than 0.25%. It is worth noting that the model discussed in this section relies on production level devices specifications furnished by the manufacturer. To provide high accuracy, such parameters are released once enough production silicon of a particular device family member has been characterized; thus, no significant risk of underestimation of delays exists.

<span id="page-4-4"></span>

Fig. 6.  $\tau_{\text{loop}}$  as a function of *y* at different corners.

### V. EXPERIMENTAL RESULTS

# <span id="page-4-0"></span>*A. Hardware Implementation*

The CC- and LUT-based designs with various lengths have been implemented by using the Vivado 2018.3 Development Tool. As an example, Fig. [7](#page-5-0) illustrates the layout obtained for the proposed RO\_CCv1 configuration with  $y = 5$  when the Xilinx xc7a100tcsg324-1 FPGA is selected as the target device. In all analyzed cases, using the same hardware description language coding, the P&R tool operates without any additional manual constraints and achieves a predictable routing scheme. This property comes from the carry-chain itself that autoconstraints both the placement and the routing paths between the internal nodes of the oscillator architecture. On the contrary, conventional LUT-based designs require each stage of the oscillator to be carefully placed and routed using a regular distribution over consecutive slices, by means of manual designer action. In particular, from Fig. [7,](#page-5-0) it can be noted that, while the connection between adjacent CCs is based on hard-wired routing (blue line), the interconnections between the XOR gates and the multiplexers rely on fast neighborhood routing tracks and proceed through identical paths across multiple CC stages (purple lines). On the other hand, interconnections external to the CCs, i.e.,  $O_0 \rightarrow B_1$ ,  $O_2 \rightarrow B_3$ , and the *loop*, rely on the programmable routing. However, as visible in Fig. [7,](#page-5-0) such nets connect neighbor elements through a substantially (and automatically) constrained path.

<span id="page-5-0"></span>

Fig. 7. Layout obtained by P&R of the CC\_ROv1 design  $(y = 5)$ . The inset reports the PIP locations configured to route the  $O_0 \rightarrow B_1$  interconnection. ⃝1 –⃝3 and ⃝6 –⃝8 have fixed locations.

This happens because input and output signals of the CC are routed to specific nodes of the switch box that are predetermined by the topology of the architecture. Just as an example, the  $O_0 \rightarrow B_1$  routing is always configured through the PIP junctions  $(1)$ - $(3)$  and  $(6)$ - $(8)$  for input and output connections, respectively. Remaining PIPs  $(4)$  and  $(5)$  are actually the only programmable junction points. However, under default routing order rules, the P&R tool selects them as preferred way points. In very congested layouts, as a common practice, a physical block (PBlock) that inhibits other nets to be routed through the switchboxes used by the oscillator could be easily exploited.

Table [III](#page-5-1) summarizes nominal frequency at the four corners, area occupancy, and energy per transition (EPT) of the proposed RO designs. The latter exhibits different energy/area characteristics at a given *y*, depending on the number of additional LUTs used in the various configurations. In general, the higher the number of stages of the RO, the higher the cost to fine-tune the behavior of the oscillator. Just as an example, the energy overheads of the CC\_ROv2 configuration with respect to CC\_ROv1 one are 26%, 34.8%, and 38.79%, for  $y = 5$ ,  $y = 9$ , and  $y = 17$ , respectively. Similarly, when moving from the CC\_ROv2 to the CC\_ROv3 design, the 22.2%, 25.4%, and 28.2% energy overheads must be payed for the  $y = 5$ ,  $y = 9$ , and  $y = 17$  lengths, respectively. Table [III](#page-5-1) also compares the new ROs with the LUT-based counterparts considering different lengths. At a glance, it can be observed that the proposed CC-based ROs reduce both the amount of occupied slices and the energy consumption with respect to the LUT-based counterparts at similar frequencies. Just as an example, the  $CC_ROv2$  ( $y = 5$ ) design runs at a frequency close to the nine-stage LUT-based RO, but it uses 33.3% less slices and dissipates 13% lower energy. Overall, the slice and energy savings exhibited by the proposed CC\_ROv1 and CC\_ROv2 over the LUT-based implementations span, respectively, from 11.1% to 50% and from 13% to 44%. On the other hand, the CC\_ROv3 circuits allow expanding the space of possible frequencies without requiring additional slices, in contrast to the conventional RO designs. To achieve

<span id="page-5-1"></span>TABLE III IMPLEMENTATION RESULTS (XILINX XC7A100TCSG324-1). FREQUENCY VALUES IN MHZ

|        |                     | $f_{FF}$         | $f_{FS}$ | $f_{SF}$ | $f_{SS}$ | LUTs.               | EPT  |
|--------|---------------------|------------------|----------|----------|----------|---------------------|------|
|        |                     |                  |          |          |          | Slices <sup>1</sup> | (pJ) |
|        | $CC$ $ROV1$         | $\overline{517}$ | 394      | 202      | 155      | 1, 1                | 0.70 |
| $y=3$  | CC RO <sub>v2</sub> | 423              | 325      | 168      | 130      | 2, 1                | 0.84 |
|        | CC ROv3             | 372              | 287      | 142      | 100      | 3, 1                | 0.96 |
|        | CC ROv1             | 311              | 233      | 120      | 90       | 1, 2                | 1.00 |
| $y=5$  | CC_ROv2             | 251              | 187      | 97       | 74       | 3, 2                | 1.26 |
|        | CC ROv3             | 220              | 162      | 80       | 55       | 5, 2                | 1.54 |
|        | CC ROv1             | 173              | 128      | 66       | 49       | 1, 4                | 1.58 |
| $y=9$  | CC RO <sub>v2</sub> | 138              | 101      | 51       | 40       | 5, 4                | 2.13 |
|        | $CC_ROv3$           | 121              | 87       | 43       | 29       | 9,4                 | 2.67 |
|        | CC ROv1             | 92               | 68       | 35       | 26       | 1, 8                | 2.81 |
| $y=17$ | CC RO <sub>v2</sub> | 73               | 52       | 27       | 21       | 9,8                 | 3.90 |
|        | CC ROv3             | 64               | 45       | 22       | 15       | 17, 8               | 5.00 |
|        | $N=5$               | 469              | 385      | 197      | 161      | 6, 2                | 1.25 |
| EUT    | $N=9$               | 265              | 218      | 111      | 91       | 10, 3               | 1.45 |
|        | $N = 17$            | 159              | 131      | 67       | 57       | 18.5                | 2.30 |
|        | $N = 35$            | 79               | 65       | 33       | 27       | 36.9                | 4.50 |
|        | $N = 43$            | 65               | 56       | 27       | 22       | 44, 11              | 5.57 |

<sup>1</sup> The number of slices coincides with the number of CCs for the proposed implementations.

the frequency around 65 MHz at the FF corner, for instance, a 43-stage LUT-based RO would occupy 11 slices and consume 5.57 pJ, which is 37.5% wider and 11.4% more energy consuming than the proposed CC\_ROv3 configuration with  $y = 17$ .

## *B. Test Setup*

<span id="page-5-2"></span>In the next subsections, in order to characterize the behavior of both CC- and LUT-based ROs and to appreciate the general validity of the proposed methodology, we report results of experimental tests performed on 28-nm CMOS Artix-7 xc7a 100tcsg324-1, 28-nm CMOS Zynq xc7z045ffg900-2, 40-nm CMOS Virtex-6 xc6vlx240tffg1156-1, and 16-nm Fin-FET Ultrascale+ xck26sfvc784-2lvc devices. To this purpose, a Tektronix MSO64 mixed signal oscilloscope (2.5 GHz) has been used to capture and analyze output signals. A Keithley precision measurement dc supply 2280S-60-3 has been exploited to power supply the devices under test, also monitoring the current drawn by the FPGA boards. For the purpose of a reliable analysis, frequency measurements were repeated for at least  $10^5$  cycles; then, the average values and the standard deviations were recorded, thus avoiding artifacts due to the stochastic fluctuation. An oscilloscope screenshot is depicted in Fig. [8.](#page-6-0) In this case, the configuration under test is the CC\_ROv1 with  $y = 33$ , obtained by cascading  $x = 16$  CCs. Two phase-shifted output signals are retrieved from the XOR gates in the 31th and 63rd positions within the chain, as shown in Fig.  $8(c)$ . As expected, these signals have the same frequency, with mean value  $\mu$  around 29.7 MHz and a standard deviation  $\sigma$  of ~10 kHz. The histogram plot in Fig. [8\(b\)](#page-6-0) reports the Gaussian frequency distribution retrieved from 115 080 repetitive measurements that achieves σ/µ(%) of ∼0.03%. The latter is in line with the literature [\[33\]](#page-11-31) for conventional LUT-based ROs. In the analyzed case, the time difference between the two output signals is  $\sim$ 7.6 ns, and the histogram plot in Fig. [8\(a\)](#page-6-0) illustrates the

<span id="page-6-0"></span>

Fig. 8. Oscilloscope capture during the acquisition of multiphase outputs from the CC\_ROv1 (*y* = 33) configuration. (a) Skew distribution. (b) Frequency distribution. (c) Plot of the 31th and 63rd output signals.

statistics skew analysis of the proposed multiphase oscillator. It is worth noting that the model extracted in Section [IV](#page-3-0) could be easily exploited also to estimate the skew in the case of multiphase ROs based on the proposed design. With reference to the  $CC_ROv1$  ( $y = 33$ ) configuration under analysis, since the distance between the two output signals consists of eight CCs, the time shift can be modeled as  $8(\tau_{\text{CO}_3 \rightarrow \text{CI}} + \tau_{\text{CI} \rightarrow \text{O}_0}) + 7\tau_d + 3\tau_{\text{MX}}$ , with  $\tau_{\text{MX}}$  being ~30 ps.

## *C. Temperature Sensitivity*

As it is well known, temperature variations have a twofold impact on MOSFET devices, with the carrier mobility and the threshold voltage decreasing as the temperature increases due to the scattering and to the Fermi level and bandgap energy shift, respectively. At the device level, such mechanisms coexist impacting the current in opposite directions. But, while in static CMOS gates operating in the above threshold region, the first one is dominant, thus leading to an increase of the delay as the temperature increases; in PT logic circuits, they balance out quite differently. In such a case, the carrier mobility reduction is significantly contrasted by the lower threshold voltage, which, in turn, lead to a propagation delay reduction. This behavior is expected to be much more evident when ROs are realized on FPGA platforms exploiting a massive usage of PT logic circuits [\[34\],](#page-11-32) [\[35\]. B](#page-11-33)y transistor-level simulations performed by using a standard CMOS process technology, we investigated the effects of the coexistence of PT and static CMOS logic stages, typically occurring in the target platform. Although based on some speculations on the FPGA internal architecture [\[36\], s](#page-11-34)imulation results show that, in such a case, the temperature influences the RO frequency quite differently than in traditional static CMOS circuits. Fig. [9](#page-6-1) illustrates the implementation of the single LUT [\[36\]](#page-11-34) adopted

<span id="page-6-1"></span>

Fig. 9. Evaluation of mixed PTL and static CMOS logic. (a) Referenced LUT architecture. (b) RO frequency over temperature changing.

in the simulations and the plots of the RO frequency under temperature changing. It can be observed that, due to the massive usage of PT stages within the FPGA fabric, initially, the effect of the threshold voltage prevails. Then, as the temperature increases, the two phenomena tend to balance each other out, until the carrier mobility reduction becomes the dominant effect. Moreover, from our simulation, we verified that increasing the number of PT stages in the RO path moves the local maximum toward higher temperatures.

<span id="page-6-3"></span><span id="page-6-2"></span>In order to evaluate the impact of temperature on the analyzed ROs, we performed a measurement campaign using the ACS DY16-T climatic chamber, varying the temperature from 5  $\degree$ C to 75  $\degree$ C. The acquisition of the ROs outputs was always performed after the thermal transient was concluded. To this purpose, the die temperature was monitored through the internal sensor and the precision measurement dc supply that also allowed verifying that the standard deviation of the absorbed current is below 10 nA. Besides the LUT-based reference designs, the CC\_ROv1, CC\_ROv2, and CC\_ROv3 configurations have been characterized for  $y = 3, 5, 9$ , and 17 at the temperatures  $T = 5$  °C, 25 °C, 50 °C, 75 °C.

<span id="page-7-0"></span>

Fig. 10. Frequency variation measured for the CC- and LUT-based ROs under different operating temperatures (xc7a100tcsg324-1 FPGA device). Subplots refer to different RO lengths:  $y = 3$  (top left),  $y = 5$  (top right),  $y = 9$  (bottom left), and  $y = 17$  (bottom right). Reference LUT-based counterparts make use of 5, 9, 17, and 35 inverting stages, respectively.

<span id="page-7-2"></span>TABLE IV FREQUENCY MEASUREMENTS (MHZ) UNDER DIFFERENT TEMPERATURE CONDITIONS AND FPGA SLICE TYPES  $(CC_ROv2 y = 17)$ 

|        | $T = 5^{\circ}C$ | $T=25^{\circ}C$ | $T = 50^{\circ}C$ | $T=75^{\circ}C$ | $\mathrm{VI}_{temp}(\%)$ |
|--------|------------------|-----------------|-------------------|-----------------|--------------------------|
| Down-L | 46.22            | 46.29           | 46.34             | 46.25           | 0.102                    |
| Down-M | 46.44            | 46.49           | 46.49             | 46.38           | 0.083                    |
| $Up-L$ | 44.6             | 44.61           | 44.58             | 44.41           | 0.226                    |

Fig. [10](#page-7-0) plots the frequency variation normalized to the 25  $\degree$ C conditions (i.e.,  $100 \times ((f_T - f_{25})/f_{25})$ , for ROs having similar nominal frequency. It can be observed that, in general, the CC\_ROv1 and LUT-based designs exhibit quite similar trends, reaching the plateau around  $25 \degree C$  and decreasing their frequency with somewhat different variability indexes  $(VI_{temp})$ . The latter is here defined as [\(4\)](#page-7-1) and provides a synthetic metric that allows evaluating how, in the ranges of observation, the RO frequency changes as a consequence of the operating temperature with respect to the nominal condition

<span id="page-7-1"></span>
$$
VI_{temp} = 100 \times \frac{S}{f_{25}}, \quad S = \sqrt{\frac{\sum_{T} (f_T - f_{25})^2}{4}}.
$$
 (4)

As the evidence of the fact that more logic and routing resources contribute to the RO frequency, Fig. [10](#page-7-0) highlights that the  $VI_{temp}$  increases with the RO length for both the CC\_ROv1 and LUT-based designs. Even more interesting are the results obtained for the proposed CC\_ROv2 and CC\_ROv3 configurations. As shown by the preliminary analysis dis-

cussed at the beginning of this section, the synergy between CCs and LUTs could be exploited to modulate the behavior in temperature of the RO by using some LUTs as passthrough elements, as shown in Fig. [4.](#page-3-2) In such a case, the additional PT stages are responsible of the flipped trend within the range 25  $\degree$ C–50  $\degree$ C. As illustrated in Fig. [10,](#page-7-0) depending on the number of additional pass-through LUTs, the behavior in temperature can be relatively fine-tuned to increase or decrease the RO sensitivity with respect to the CC\_ROv1 circuit. Indeed, the  $CC_ROv3$  configuration exhibits a  $VI_{temp}$ up to 0.4% and a sensitivity of 0.049 MHz/ $\degree$ C ( $y = 3$  in the range  $5 \text{ °C} - 50 \text{ °C}$ , which is 4.45 times higher than the five-stage LUT-based counterpart. As a consequence, such a design is ideal for those applications that aim at monitoring performance degradation due to the temperature [\[8\],](#page-11-7) [\[9\],](#page-11-8) [\[10\],](#page-11-9) [\[11\].](#page-11-10) Whereas, the CC\_ROv2 configuration allows achieving the lowest VI<sub>temp</sub>, regardless of the target length, thus becoming an effective candidate for the implementation of circuits requiring high resilience to the temperature variations [\[1\],](#page-11-0) [\[2\],](#page-11-1) [\[3\].](#page-11-2)

For the purpose of a deep analysis, we investigated also the impact of using different types of slices to implement the proposed ROs. The Xilinx FPGA devices from the Series-7, for example, are organized in configurable logic blocks (CLBs) that include two slices for each switch matrix, placed up and down, as shown in Fig. [7.](#page-5-0) Moreover, they account for only logic slices (type L) and more complex slices that

<span id="page-8-0"></span>

Fig. 11. CC\_ROv2 intradie analysis (bolded red values refer to  $\mu$  in MHz).

can also be used as distributed memory (type M). While the above characterization refers to the usage of slices L positioned at the bottom of the CLBs, Table [IV](#page-7-2) reports the RO frequency measured at different temperature points when the placement schemes Up-L and Down-M are chosen on the xc7a100tcsg324-1 chip. It can be appreciated that, even though the three implementations are located within the same die region, the RO frequencies differ, in accordance with the slice type. This is more evident in the case Up-L, since the connections between each slice and the corresponding switch matrix follow different paths.

An intradie analysis has been also carried out by considering the Down-L placement for the proposed CC\_ROv2 configuration within four different FPGA regions (i.e., 1: X0Y3, 2: X1Y2, 3: X1Y1, and 4: X0Y0). Fig. [11](#page-8-0) plots the  $\sigma/\mu$  (%) metric as a function of the temperature and points out two important observations. The former is that, as expected, different sites exhibit different absolute RO frequencies. Moreover, each site has its proper characteristic under different temperature conditions, which suggests an uniqueness related to the process variations. The latter is that, for a given site, the standard deviation  $\sigma$  remains almost stable, indicating that temporal fluctuation has little impact on the measured frequency. It is important underlying that this behavior has not been verified for the LUT-based RO, which demonstrated  $\sigma$  values up to 36% higher at the parity of placement and temperature conditions.

In order to investigate the interdie behavior, the new CC\_ROv2 design has been characterized on ten xc7a100tcsg324-1 chips at different temperatures. The resulting statistics are summarized in Fig. [12,](#page-8-1) where the point indicates the mean frequency value, computed by averaging the  $\mu$  results from the ten chips, and the bars report the distance with respect to the highest/lowest observed frequency. At a glance, it can be noted that the mean frequency follows a trend similar to that noticed until now. The range given by the bars, instead, demonstrates that the RO frequency may vary significantly from die-to-die as the result of global variations [\[33\]. A](#page-11-31)t the same time, when looking at different temperature conditions, the frequency range seems to be unchanged, thus suggesting that the temperature equally affects all the chips.

<span id="page-8-1"></span>

Fig. 12. CC\_ROv2 interdie analysis.

<span id="page-8-2"></span>TABLE V  $\sigma_p/\mu_p$  (%) MEASURED UNDER DIFFERENT TEMPERATURE CONDITIONS

|                       | $T = 5^{\circ}C$ | $T=25^{\circ}C$ | $T=50^{\circ}C$ | $T=75^{\circ}C$ |
|-----------------------|------------------|-----------------|-----------------|-----------------|
| $CC_R$ Ov1 ( $y=17$ ) | 0.113            | 0.107           | 0.102           | 0.101           |
| CC ROv2 $(y=17)$      | 0.099            | 0.093           | 0.088           | 0.095           |
| $CC_ROv3(y=17)$       | 0.101            | 0.095           | 0.090           | 0.089           |
| LUT-based $(N=35)$    | 0.116            | 0.107           | 0.103           | 0.102           |

Finally, with the aim of generalizing the applicability of the proposed strategy, and to show its thermal behavior on very different process technology nodes, Fig. [13](#page-9-1) plots the temperature sensitivity of the 40-nm CMOS Virtex-6, 28-nm CMOS Artix-7, and 16-nm FinFET Ultrascale + implementations of the CC<sub>\_</sub>ROv1 design with  $y = 5$ . It can be clearly observed that the sensitivity to temperature variations is significantly reduced when advanced technologies, such as the Ultrascale+, are employed. This finding is in accord with the conclusions achieved in previous works [\[38\],](#page-11-35) [\[39\].](#page-11-36)

#### <span id="page-8-4"></span>*D. Jitter Analysis*

<span id="page-8-3"></span>To evaluate the jitter behavior of the proposed ROs, we analyzed the variations of the running period within a certain interval time, i.e., long-term jitter [\[37\]. T](#page-11-37)able [V](#page-8-2) reports the relative standard deviation of the RO period ( $\sigma_p/\mu_p$ ) measured for the proposed circuits at length  $y = 17$  and for a 35-stage LUT-based sample, considering the xc7a100tcsg324-1 FPGA device. All the ROs under analysis have shown a Gaussian distribution of the oscillation period, which typically identifies random jitter. In general, it can be observed that the compared designs exhibit quite similar characteristics, with the CC\_ROv2 and CC\_ROv3 configurations able to slightly reduce the  $\sigma_p/\mu_p$  over the LUT-based counterpart, regardless of the temperature.

#### *E. Voltage Sensitivity*

To perform the voltage sensitivity analysis, we used the TI UCD90120A power controller in order to modify the internal core voltage within the range 0.8–1.12 V, with 1 V being the nominal condition. Because of the aggressive voltage scaling, the frequency variation measured for the new and LUT-based RO designs spans also within few tens of MHz. To better highlight the different behaviors exhibited by the referenced

<span id="page-9-2"></span>FREQUENCY VARIATION NORMALIZED TO THE 1-V CONDITION  $[100 \times ((f_a - f_{1V})/f_{1V})]$  UNDER DIFFERENT SUPPLY VOLTAGES *a*  $0.8V$  $0.85V$  $0.95V$  $\overline{0}$  $1.05$  $1.1$ 1.12  $\mathrm{VI}_{volt}(\%)$  $CC_ROv1(y=3)$  $-43.82$  $-31.87$  $-20.58$  $.991$  $\overline{9}$  19 17.33 19.82  $\overline{23}$ 20.20 LUT  $(N=5)$  $-43.97$  $-31.77$  $-20.83$  $-9.99$ 9.36 17.76 23.15  $CC_ROv2(y=3)$  $-44.79$  $-32.85$  $-21.31$  $-10.31$ 9.60 18.29 20.96 23.75 LUT  $(N=7)$  $-43.60$  $-31.78$  $-20.49$  $-9.88$ 9.11 17.31 19.76 22.92  $CC_ROv1(y=5)$  $-43.14$  $-31.38$  $-20.19$  $-9.70$ 8.85 16.57 18.91 22.52 17.23 22.84  $-43.54$  $-31.65$  $-20.41$  $-9.80$ 9.07 19.56 LUT  $(N=9)$  $CC_ROv2(y=5)$  $-44.45$  $-32.50$  $-21.04$  $-10.17$ 9.38 17.87 20.44 23.46 LUT  $(N=11)$  $-43.41$  $-31.63$  $-20.34$  $-9.80$ 9.01 17.12 19.58 22.78  $\overline{22.3}$  $CC_ROv1(y=9)$  $-42.93$  $-31.12$  $-19.96$  $-9.53$ 8.55  $16.25$ 18.47 LUT  $(N=17)$  $-43.66$  $-31.80$  $-20.48$  $-9.83$ 9.12 17.30 19.70 22.93  $CC_ROv2(y=9)$  $-44.34$  $-21.26$ 9.33 23.39  $-32.39$  $-10.09$ 17.69 20.22 LUT  $(N=21)$  $-43.37$  $-20.37$  $-9.78$ 9.01 17.03 19.46 22.75  $-31.61$  $CC_ROv1(y=17)$  $22.24$  $-42.79$  $-19.89$  $-946$  $856$  $16.15$ 1845  $-31.07$ LUT  $(N=35)$  $-43.51$  $-31.64$  $-20.34$  $-9.78$ 8.94 17.07 19.21 22.76  $CC_ROv2(y=17)$  $-44.33$  $-32.39$  $-20.95$  $-10.08$ 9.33 17.67 20.20 23.35 LUT  $(N=41)$  $-43.64$  $-31.67$  $-20.46$  $-9.86$ 8.99 17.13 19.43 22.85

TABLE VI

<span id="page-9-1"></span>

Fig. 13. Frequency variation measured for the CC\_ROv1 under different operating temperatures and FPGA technologies (Virtex-6 xc6vlx240t-1ffg1156, Artix-7 xc7a100tcsg324-1, and Ultrascale+ xck26-sfvc784-2lv-c). The Virtex-6 device stops working above 62 ◦C.

implementations, Fig. [14](#page-10-1) plots the absolute frequency values, whereas their percentage variations with respect to the nominal condition are reported in Table [VI.](#page-9-2) The latter also shows the  $VI_{volt}$  defined in  $(5)$ . From this metric, it can be noted that the frequency of each implementation changes following a similar trend. The CC\_ROv1 sensitivity to voltage variations appears to be slightly lower than the LUT-based counterparts at  $N = 5$ , 9, 17, and 35. The opposite trend is exhibited by the CC\_ROv2 designs compared with the LUT-based ROs at  $N = 7$  11, 21, and 41. We speculate that this is due to the higher number of PT stages used in the path of the latter configuration

<span id="page-9-3"></span>VI<sub>volt</sub> = 100 × 
$$
\frac{S}{f_1}
$$
,  $S = \sqrt{\frac{\sum_{V} (f_V - f_1)^2}{8}}$ . (5)

## <span id="page-9-5"></span><span id="page-9-0"></span>VI. CASE STUDY: THE CC\_ROV1 DESIGN FOR TRNGS

As an example of applications and to verify the effectiveness of the proposed architecture, we utilized the CC-based RO in the realization of a TRNG. To this purpose, we slightly modify the design presented in [\[23\]](#page-11-38) to accommodate a CC\_ROv1 oscillator with  $y = 9$ . According to results in Table [V,](#page-8-2) such a configuration indeed exhibits a jitter quite similar to that of

TABLE VII NIST RANDOMNESS TEST RESULTS

<span id="page-9-4"></span>

| Statistical test                      | Proportion | P-value $\tau$ | Result      |
|---------------------------------------|------------|----------------|-------------|
| Frequency                             | 64/64      | 0.4372         | <b>PASS</b> |
| BlockFrequency                        | 64/64      | 0.3504         | <b>PASS</b> |
| Cumulative Sums <sup>1</sup>          | 63/64      | 0.3241         | <b>PASS</b> |
| Runs                                  | 64/64      | 0.0602         | <b>PASS</b> |
| LongestRun                            | 64/64      | 0.4372         | <b>PASS</b> |
| Rank                                  | 63/64      | 0.2535         | <b>PASS</b> |
| <b>FFT</b>                            | 63/64      | 0.8881         | <b>PASS</b> |
| Non Overlapping Template <sup>1</sup> | 61/64      | 0.0392         | <b>PASS</b> |
| <b>Overlapping Template</b>           | 63/64      | 0.4372         | <b>PASS</b> |
| Universal                             | 63/64      | 0.2327         | <b>PASS</b> |
| Approx. Entropy                       | 64/64      | 0.0668         | <b>PASS</b> |
| Random Excursion <sup>1</sup>         | 36/37      | 0.0314         | <b>PASS</b> |
| Random Exec. Variant <sup>1</sup>     | 35/37      | 0.7727         | <b>PASS</b> |
| Serial <sup>1</sup>                   | 61/64      | 0.2999         | <b>PASS</b> |
| Linear Complexity                     | 64/64      | 0.0821         | <b>PASS</b> |

 $1$  For tests with more than one subtest, it is reported the result of the subtest with the minimum proportion.

the LUT-based oscillators, which can be exploited as entropy source to generate random bit sequences. Fig. [15](#page-10-2) illustrates the block diagram of the realized TRNG. The CC\_ROv1 block is used to generate seven multiphase oscillating outputs at 108 MHz. Such output signals are sampled by FFs receiving a clock signal formed by the DCM. The basic idea exploited in [\[23\]](#page-11-38) is to tune the phase of such clock through the DCM in order to force on one or more FFs to enter the metastability region. The input clock *clk\_in* running at 62.5 MHz is sent to the DCM that, according to the control signal *psen* given by the FSM block, performs a phase shifting each 20 cycles to produce the *clk\_out* used as clock signal for the sampling FFs. The phase shifting performed by the DCM continues until the signal *T* resulting from the XOR operation changes its value, meaning that a violation is occurred at the first stage of FFs. Random bits are, therefore, generated and stored within a 32-bit shift register that is responsible to feed the ON-chip postprocessing circuit demonstrated in [\[23\].](#page-11-38)

Table [VII](#page-9-4) summarizes the results obtained by the NIST SP800-22 statistical tests for 64 M random bits acquired as 64 consecutive 1 M bit sequences. All the statistical tests show

<span id="page-10-1"></span>

Fig. 14. Frequency plot obtained for the CC- and LUT-based ROs under different supply voltages (xc7z045ffg900-2 FPGA device). Subplots refer to different lengths and similar nominal frequencies:  $y = 3$  (top left),  $y = 5$  (top right),  $y = 9$  (bottom left), and  $y = 17$  (bottom right). For reference LUT-based designs, the number of inverting stages that best fits the nominal frequency of the corresponding CC\_ROv1 and CC\_ROv2 was chosen.

<span id="page-10-2"></span>

Fig. 15. TRNG architecture from [\[23\]](#page-11-38) modified to include the CC\_ROv1 block.

a proportion and a *P* value<sub>*T*</sub> higher than the minimum pass threshold set by the NIST SP800-22 standard, demonstrating that the proposed oscillator can be effectively employed in the design of a TRNG able to produce a random bit sequence through a very low complex design consisting of 37 LUTs, 124 FFs, 4 CCs, and 1 DSPs. Finally, we performed the AIS-31 T8 test that measures the randomness of the output bitstream in terms of byte entropy. The AIS test shows that the TRNG architecture of Fig. [15](#page-10-2) achieves a byte entropy of 8.0 overcoming the recent LUT-based competitor scheme [\[6\]](#page-11-5) that reaches a byte entropy of 7.996, with a somewhat similar energy efficiency.

# VII. CONCLUSION

<span id="page-10-0"></span>This work presented a new design methodology to realize multistage ROs by means of the CC resources available within modern FPGA chip families. ROs realized as proposed here show the following benefits.

- 1) They enable automatic P&R while ensuring predictable and repeatable behaviors because of the dedicated interconnection scheme relying on. Furthermore, their oscillation frequency is not affected by the load on the output node. These results are in contrast to those discussed in several prior works dealing with tedious manual layout actions of LUT-based ROs [\[7\],](#page-11-6) [\[16\],](#page-11-15) [\[17\],](#page-11-16) [\[18\],](#page-11-17) [\[19\].](#page-11-18)
- 2) Because of the possibility to strategically add pass-through elements to the oscillating path, they can be easily configured to adjust their thermal/voltage behavior and/or their nominal frequency to comply with the specific application requirements. For instance, our experiments show that one of the proposed configurations is more suitable to work as temperature sensor, whereas the other two solutions can better fit a PUF application because of the reduced temperature sensitivity.
- 3) The hardware description of the proposed ROs is straightforward, and its portability is ensured. To demonstrate this aspect, we characterized it on different chip technologies, including 40-nm CMOS Virtex-6, 28 nm CMOS Artix-7 and Zynq-7000, and finally 16-nm FinFET Ultrascale+.
- 4) As a final remark, the above benefits are achieved by reducing the LUTs count and the energy consumption by up to 83% and 44%, respectively, with respect to the LUT-based solution.

#### **REFERENCES**

- <span id="page-11-0"></span>[\[1\]](#page-0-0) F. Kodýtek, R. Lórencz, and J. Buček, "Three counter value based ROPUFs on FPGA and their properties," *Microprocessors Microsystems*, vol. 88, Feb. 2022, Art. no. 104375.
- <span id="page-11-1"></span>[\[2\]](#page-0-0) B. Halak, M. Zwolinski, and M. S. Mispan, "Overview of PUF-based hardware security solutions for the Internet of Things," in *Proc. IEEE 59th Int. Midwest Symp. Circuits Syst. (MWSCAS)*, Abu Dhabi, UAE, Oct. 2016, pp. 1–4.
- <span id="page-11-2"></span>[\[3\]](#page-0-0) M. Barbareschi, G. Di Natale, L. Torres, and A. Mazzeo, "A ring oscillator-based identification mechanism immune to aging and external working conditions," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 2, pp. 700–711, Feb. 2018.
- <span id="page-11-3"></span>[\[4\]](#page-0-0) B. Yang, V. Rožic, M. Grujic, N. Mentens, and I. Verbauwhede, "ES-TRNG: A high-throughput, low-area true random number generator based on edge sampling," *IACR Trans. Cryptograph. Hardw. Embedded Syst.*, vol. 10, pp. 267–292, Aug. 2018.
- <span id="page-11-4"></span>[\[5\]](#page-0-0) M. A. Prada-Delgado, C. Martínez-Gómez, and I. Baturone, "Autocalibrated ring oscillator TRNG based on jitter accumulation," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, Seville, Spain, Oct. 2020, pp. 1–4.
- <span id="page-11-5"></span>[\[6\]](#page-0-0) M. Grujic and I. Verbauwhede, "TROT: A three-edge ring oscillator based true random number generator with time-to-digital conversion," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 69, no. 6, pp. 2435–2448, Jun. 2022.
- <span id="page-11-6"></span>[\[7\]](#page-0-0) S. Moini et al., "Voltage sensor implementations for remote power attacks on FPGAs," *ACM Trans. Reconfigurable Technol. Syst.*, vol. 16, no. 1, pp. 1–21, Dec. 2022.
- <span id="page-11-7"></span>[\[8\]](#page-0-1) I. Giechaskiel, K. B. Rasmussen, and J. Szefer, "Measuring long wire leakage with ring oscillators in cloud FPGAs," in *Proc. 29th Int. Conf. Field Program. Log. Appl. (FPL)*, Barcelona, Spain, Sep. 2019, pp. 45–50.
- <span id="page-11-8"></span>[\[9\]](#page-0-1) M. Ebrahimi and Z. Navabi, "Selecting representative critical paths for sensor placement provides early FPGA aging information," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 39, no. 10, pp. 2976–2989, Oct. 2020.
- <span id="page-11-9"></span>[\[10\]](#page-0-1) M. M. Alam, M. Tehranipoor, and D. Forte, "Recycled FPGA detection using exhaustive LUT path delay characterization and voltage scaling," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 27, no. 12, pp. 2897–2910, Dec. 2019.
- <span id="page-11-10"></span>[\[11\]](#page-0-1) T. Kilian, D. Tille, M. Huch, M. Hanel, and U. Schlichtmann, "Performance screening using functional path ring oscillators," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 31, no. 6, pp. 711–724, Jun. 2023.
- <span id="page-11-11"></span>[\[12\]](#page-0-2) S. Bae, M. Lee, S.-M. Yoo, and J.-Y. Sim, "A temperature compensated ring oscillator with LC-based period error detection," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 31, no. 12, pp. 2152–2156, Dec. 2023, doi: [10.1109/TVLSI.2023.3322709.](http://dx.doi.org/10.1109/TVLSI.2023.3322709)
- <span id="page-11-12"></span>[\[13\]](#page-0-2) B. Razavi, "The ring oscillator [a circuit for all seasons]," *IEEE Solid StateCircuits Mag.*, vol. 11, no. 4, pp. 10–81, Jun. 2019.
- <span id="page-11-13"></span>[\[14\]](#page-0-3) M. Elnawawy, A. Farhan, A. A. Nabulsi, A. R. Al-Ali, and A. Sagahyroon, "Role of FPGA in Internet of Things applications," in *Proc. IEEE Int. Symp. Signal Process. Inf. Technol. (ISSPIT)*, Ajman, United Arab Emirates, Dec. 2019, pp. 1–6.
- <span id="page-11-14"></span>[\[15\]](#page-1-2) N. N. Anandakumar, M. S. Hashmi, and M. Tehranipoor, "FPGAbased physical unclonable functions: A comprehensive overview of theory and architectures," *Integration*, vol. 81, pp. 175–194, Nov. 2021.
- <span id="page-11-15"></span>[\[16\]](#page-1-3) C. Gu, C. H. Chang, W. Liu, N. Hanley, J. Miskelly, and M. O'Neill, "A large scale comprehensive evaluation of single-slice ring oscillator and PicoPUF bit cells on 28 nm Xilinx FPGAs," in *Proc. 3rd ACM Workshop Attacks Solutions Hardw. Secur. Workshop*, London, U.K., Nov. 2019, pp. 101–106.
- <span id="page-11-16"></span>[\[17\]](#page-1-3) A. Wild, G. T. Becker, and T. Güneysu, "On the problems of realizing reliable and efficient ring oscillator PUFs on FPGAs," in *Proc. IEEE Int. Symp. Hardw. Oriented Security Trust (HOST)*, McLean, VA, USA, 2016, pp. 103–108.
- <span id="page-11-17"></span>[\[18\]](#page-1-4) H. Kareem and D. Dunaev, "Towards performance optimization of ring oscillator PUF using Xilinx FPGA," in *Proc. 17th Iberian Conf. Inf. Syst. Technol. (CISTI)*, Madrid, Spain, Jun. 2022, pp. 1–6.
- <span id="page-11-18"></span>[\[19\]](#page-1-4) O. Petura, U. Mureddu, N. Bochard, V. Fischer, and L. Bossuet, "A survey of AIS-20/31 compliant TRNG cores suitable for FPGA devices," in *Proc. 26th Int. Conf. Field Program. Log. Appl. (FPL)*, Lausanne, Switzerland, Aug. 2016, pp. 1–10.
- <span id="page-11-19"></span>[\[20\]](#page-1-5) AMD. *Xilinx 7 Series FPGAs Data Sheet: Overview DS180 (v2.6.1)*. Accessed: May 4, 2024. [Online]. Available: https://docs.amd. com/v/u/en-US/ds180\_7Series\_Overview
- <span id="page-11-20"></span>[\[21\]](#page-1-5) *Intel Arria 10 Core Fabric and General Purpose I/Os Handbook*. Accessed: May 4, 2024. [Online]. Available: https://cdrdv2.intel. com/v1/dl/getContent/666662?fileName=a10\_handbook-683461- 666662.pdf
- <span id="page-11-21"></span>[\[22\]](#page-1-6) T. M. La et al., "FPGADefender: Malicious self-oscillator scanning for Xilinx UltraScale + FPGAs," *ACM Trans. Reconf. Techn. Syst.*, vol. 13, no. 15, pp. 1–31, 2020.
- <span id="page-11-38"></span>[\[23\]](#page-9-5) F. Frustaci, F. Spagnolo, S. Perri, and P. Corsonello, "A high-speed FPGA-based true random number generator using metastability with clock managers," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 70, no. 2, pp. 756–760, Feb. 2023.
- <span id="page-11-22"></span>[\[24\]](#page-1-7) Y. Zhuo, H. Li, Q. Zhou, Y. Cai, and X. Hong, "New timing and routability driven placement algorithms for FPGA synthesis," in *Proc. 17th ACM Great Lakes Symp. VLSI*, Stresa-Lago Maggiore, Italy, Mar. 2007, pp. 570–575.
- <span id="page-11-23"></span>[\[25\]](#page-1-8) M. Barbareschi, G. Di Natale, and L. Torres, "Implementation and analysis of ring oscillator circuits on Xilinx FPGAs," in *Hardware Security and Trust*. Cham, Switzerland: Springer, 2017, pp. 237–251, doi: [10.1007/978-3-319-44318-8\\_12.](http://dx.doi.org/10.1007/978-3-319-44318-8_12)
- <span id="page-11-24"></span>[\[26\]](#page-2-2) Z. Zulfikar, N. Soin, and S. W. M. Hatta, "Capacitance effects of ring oscillator's waveform quality in designing physically unclonable functions," in *Proc. IEEE Int. Conf. Semiconductor Electron. (ICSE)*, Kuala Lumpur, Malaysia, Aug. 2018, pp. 113–116.
- <span id="page-11-25"></span>[\[27\]](#page-2-2) S. Deepthi, S. R. Ramesh, and M. N. Devi, "Hardware trojan detection using ring oscillator," in *Proc. 6th Int. Conf. Commun. Electron. Syst. (ICCES)*, Coimbatre, India, Jul. 2021, pp. 362–368.
- <span id="page-11-26"></span>[\[28\]](#page-2-2) Z. Huang, J. Bian, Y. Lin, H. Liang, and T. Ni, "Design guidelines and feedback structure of ring oscillator PUF for performance improvement," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, early access, Aug. 2, 2024, doi: [10.1109/TCAD.2023.3301386.](http://dx.doi.org/10.1109/TCAD.2023.3301386)
- <span id="page-11-27"></span>[\[29\]](#page-2-3) J. Burgiel, D. Esguerra, I. Giechaskiel, S. Tian, and J. Szefer, "Characterization of IOBUF-based ring oscillators," in *Proc. Int. Conf. Field-Programmable Technol. (ICFPT)*, Auckland, New Zealand, Dec. 2021, pp. 1–4.
- <span id="page-11-28"></span>[\[30\]](#page-2-3) T. Sugawara, K. Sakiyama, S. Nashimoto, D. Suzuki, and T. Nagatsuka, "Oscillator without a combinatorial loop and its threat to FPGA in data centre," *Electron. Lett.*, vol. 55, no. 11, pp. 640–642, 2019.
- <span id="page-11-29"></span>[\[31\]](#page-2-3) R. D. Sala, D. Bellizia, and G. Scotti, "A novel ultra-compact FPGAcompatible TRNG architecture exploiting latched ring oscillators," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 69, no. 3, pp. 1672–1676, Mar. 2022.
- <span id="page-11-30"></span>[\[32\]](#page-4-6) Y. Zhou, P. Maidee, C. Lavin, A. Kaviani, and D. Stroobandt, "RWRoute: An open-source timing-driven router for commercial FPGAs," *ACM Trans. Reconf. Techn. Syst.*, vol. 15, no. 8, pp. 1–27, 2021.
- <span id="page-11-31"></span>[\[33\]](#page-5-2) V.-T. Tran, Q.-K. Trinh, and V.-P. Hoang, "Enhanced ID authentication scheme using FPGA-based ring oscillator PUF," in *Proc. IEEE 13th Int. Symp. Embedded Multicore/Many-core Systems-on-Chip (MCSoC)*, Singapore, Oct. 2019, pp. 320–327.
- <span id="page-11-32"></span>[\[34\]](#page-6-2) A. Alzahrani and R. F. DeMara, "Process variation immunity of alternative 16 nm HK/MG-based FPGA logic blocks," in *Proc. IEEE 58th Int. Midwest Symp. Circuits Syst. (MWSCAS)*, Fort Collins, CO, USA, Aug. 2015, pp. 1–4.
- <span id="page-11-33"></span>[\[35\]](#page-6-2) A. S. Iyengar, D. Vontela, I. Reddy, S. Ghosh, S. Motaman, and J.-W. Jang, "Threshold defined camouflaged gates in 65 nm technology for reverse engineering protection," in *Proc. Int. Symp. Low Power Electron. Design*, Seattle, WA, USA, Jul. 2018, pp. 1–6.
- <span id="page-11-34"></span>[\[36\]](#page-6-3) M. H. Mottaghi, M. Sedighi, and M. S. Zamani, "Aging mitigation in FPGAS considering delay, power, and temperature," *IEEE Trans. Rel.*, vol. 69, no. 2, pp. 833–844, Jun. 2020.
- <span id="page-11-37"></span>[\[37\]](#page-8-3) M. Shimanouchi, "An approach to consistent jitter modeling for various jitter aspects and measurement methods," in *Proc. Int. Test Conf.*, Baltimore, MD, USA, 2001, pp. 848–857.
- <span id="page-11-35"></span>[\[38\]](#page-8-4) N. Rahmanikia, A. Amiri, H. Noori, and F. Mehdipour, "Performance evaluation metrics for ring-oscillator-based temperature sensors on FPGAs: A quality factor," *Integration*, vol. 57, pp. 81–100, Mar. 2017.
- <span id="page-11-36"></span>[\[39\]](#page-8-4) B. You, S. Feng, Z. Yao, X. Lv, Y. Wang, and Y. Zhou, "Realization of parallel distributed temperature sensor network based on field programmable gate array," in *Proc. 5th Int. Conf. Electron. Eng. Informat. (EEI)*, Wuhan, China, Jun. 2023, pp. 33–36.



Fanny Spagnolo (Member, IEEE) was born in Belvedere Marittimo, Cosenza, Italy, in April 1991. She received the master's degree in electronics engineering and the Ph.D. degree in information and communication technologies from the University of Calabria, Arcavacata di Rende, Italy, in 2016 and 2019, respectively.

She is currently appointed as an Assistant Professor with the Department of Informatics, Modeling, Electronics and System Engineering (DIMES), University of Calabria. Her research interests include

VLSI architectures for image processing, high-performance reconfigurable circuits, embedded systems design, emerging technologies, and approximate computing techniques for low-power deep neural networks. She has coauthored more than 30 articles in these fields.

Dr. Spagnolo serves as a peer reviewer for several VLSI journals. She is an Associate Editor of *Integration, the VLSI Journal*.



Fabio Frustaci (Senior Member, IEEE) received the M.S. and Ph.D. degrees in electronic engineering from the University Mediterranea of Reggio Calabria, Reggio Calabria, Italy, in 2003 and 2007, respectively.

In 2006, he was a Visiting Scholar with the ECE Department, University of Rochester, Rochester, NY, USA. From 2011 to 2013, he was a Visiting Researcher with the EECS Department, University of Michigan, Ann Arbor, MI, USA. He is an Associate Professor with the Computer Science,

Electronics, Modeling and Systems Department, University of Calabria, Arcavacata di Rende, Italy. He has authored more than 70 articles in the field of VLSI design. His research interests include low-power and highperformance VLSI circuits, design techniques for emerging technologies, reconfigurable architectures, and embedded systems.

Dr. Frustaci is currently a member of the editorial board of *Microelectronics Journal*.



Stefania Perri (Senior Member, IEEE) was born in Cosenza, Italy, in April 1971. She received the master's degree in computer science engineering from the University of Calabria, Arcavacata di Rende, Italy, in 1996, and the Ph.D. degree in electronics engineering from the University Mediterranea of Reggio Calabria, Reggio Calabria, Italy, in 2000.

In 1996, she joined the Department of Electronics, Computer Sciences and Systems, University of Calabria, as an Associate Researcher. In 2002, she was appointed as an Assistant Professor of Elec-

tronics with the Department of Electronics, Computer Science and Systems, University of Calabria. In Summer 2004, she was a Visiting Researcher with the Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA, where in 2005, she was appointed as an Adjunct Assistant Professor for four years. In 2010, she was appointed as an Associate Professor of Electronics with the Department of Electronics, Computer Sciences and Systems, University of Calabria, where she joined the Department of Mechanical, Energy and Management Engineering, in 2017. Her current research interests include quantum-dot cellular automata (QCA) based circuits, high-performance embedded systems, low-power design, VLSI circuits for image processing and multimedia, reconfigurable computing, and VLSI design. She is the coauthor of more than 140 technical articles and holds two patents in these fields.

Dr. Perri serves on technical committees for several VLSI conferences and as a peer reviewer for several VLSI journals. She is an Associate Editor of the *Journal of Low Power Electronics and Applications* and *Sensors*.



Massimo Vatalaro (Member, IEEE) received the Ph.D. degree from the University of Calabria, Arcavacata di Rende, Italy, in 2023.

He is currently a Postdoctoral Researcher with the Department of Computer Engineering, Modeling, Electronics and Systems Engineering (DIMES), University of Calabria. His research interests include circuit design in CMOS and emerging technologies and hardware-level security.



Felice Crupi (Senior Member, IEEE) is currently a Full Professor of Electronics with the University of Calabria, Arcavacata di Rende, Italy. His research interests include electronic device reliability, design of ultralow-power analog circuits, and early assessment of emerging technologies for logic and memory applications.

Dr. Crupi was the Technical Program Committee Member of the International Electron Devices Meeting and the International Reliability Physics Symposium.



Pasquale Corsonello (Senior Member, IEEE) was born in Cosenza, Italy, in May 1964. He received the master's degree in electronics engineering from the University of Naples Federico II, Naples, Italy, in 1988.

He joined the Institute of Research on Parallel Computers, National Council of Research of Italy, Naples, where he was working on the design and modeling of electronic transducers for high-precision measurement, receiving a post-graduate two-year grant. In 1992, he joined the Department of Elec-

tronics, Computer Science and Systems, University of Calabria, Arcavacata di Rende, Italy, as a Research Associate. In 1997, he was appointed as an Assistant Professor of Electronics with the Department of Electronics Engineering and Applied Mathematics, University of Reggio Calabria, Reggio Calabria, Italy, where he was the Director of the Microelectronics Laboratory. In 2001, he was appointed as an Associate Professor of Electronics and the Chair of the Ph.D. Program in Electronics Engineering, University of Reggio Calabria. In Summer 2004, he was a Visiting Researcher with the Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA. In 2005, he was appointed as an Adjunct Associate Professor with the Department of Electrical and Computer Engineering. He is currently a Full Professor of Electronics with the Department of Informatics, Modeling, Electronics and System Engineering (DIMES), University of Calabria. His research interests include embedded systems design, low-power design, VLSI architecture for image processing, and quantum-dot cellular automata (QCA) based circuits. He has coauthored over 180 technical articles and holds two patents in these fields.

Dr. Corsonello serves on technical committees for several VLSI conferences and as a peer reviewer for several VLSI journals. He served as the Editor-in-Chief for the *Journal of Low Power Electronics and Applications* and the Associate Editor-in-Chief for IEEE TRANSACTIONS ON VLSI SYSTEMS. He is currently a Senior Area Associate Editor of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS and a member of the Steering Committee of IEEE TRANSACTIONS ON VLSI SYSTEMS.