

Received February 4, 2020, accepted February 17, 2020, date of publication February 27, 2020, date of current version March 6, 2020. *Digital Object Identifier* 10.1109/ACCESS.2020.2976773

# A Fully Static True-Single-Phase-Clocked Dual-Edge-Triggered Flip-Flop for Near-Threshold Voltage Operation in IoT Applications

# YONGMIN LEE<sup>®</sup>, (Student Member, IEEE), GICHEOL SHIN, (Student Member, IEEE), AND YOONMYUNG LEE<sup>®</sup>, (Senior Member, IEEE)

College of Information and Communication Engineering, Sungkyunkwan University, Suwon 16419, South Korea

Corresponding author: Yoonmyung Lee (yoonmyung@skku.edu)

This work was supported in part by the Basic Science Research Program under Grant 2019R1A2C4070438, and in part by the Basic Research Lab Program through the National Research Foundation of Korea (NRF) under Grant 2017R1A4A1015400.

**ABSTRACT** A Dual-Edge-Triggered (DET) flip-flop (FF) that can reliably operate at low voltage is proposed in this paper. Unlike the conventional Single-Edge-Triggered (SET) flip-flops, DET-FFs can improve energy efficiency by latching input data at both clock edges. When combined with aggressive voltage scaling, significant efficiency improvement is expected. However, prior DET-FF designs were susceptible to Process, Voltage and Temperature (PVT) variations, limiting their operation at low voltage regimes. A fully static true-single-phase-clocked DET-FF is proposed to achieve reliable operation at voltages as low as a near-threshold regime. Instead of the two-phase or pulsed clocking scheme in conventional DET-FFs, a True-Single-Phase-Clocking (TSPC) scheme is adopted to overcome clock overlap issues and enable low-power operation. Fully static implementation also enables robust operation in a low voltage regime. The proposed DET-FF is designed in 28nm CMOS technology, and a comprehensive analysis including post-layout Monte Carlo simulation for wide PVT ranges is performed to validate the design approaches. Extensive analysis and comparison with prior-art DET-FFs confirmed that the proposed DET-FF can operate at the lowest voltage of 0.28 V for a temperature range of -40 °C to 120 °C while maintaining nearly-best energy efficiency and power-delay-product.

**INDEX TERMS** Flip-flop, dual-edge-triggered (DET), single phase, low power, near-threshold voltage.

#### I. INTRODUCTION

Flip-flops are one of the most essential circuit elements for modern digital circuit design since they provide local data storage and synchronize the data flow. Since such synchronization should be done throughout the entire clock domain, a large number of flip-flops are utilized in a typical processor, often hundreds of thousands [1]. Because of their sheer number, flip-flops occupy a large portion of the overall circuit architecture in terms of both power and physical area. Therefore, reducing the power consumption of flip-flops has great impact on system-level energy efficiency, especially for energy-constrained IoT applications. Minimizing power consumption in IoT Integrated Circuits (ICs) is very important for maximizing battery life [2]. However, flip-flops, which are toggled by the clock every cycle, consume a large portion

The associate editor coordinating the review of this manuscript and approving it for publication was Gian Domenico Licciardo<sup>10</sup>.

of the dynamic power in a synchronous system. For this reason, many research works have been performed to develop flip-flops with lower power consumption and better energy efficiency to date [3]–[27].

The most popular approach for implementing a flip-flop is using either the rising or falling edge of the clock as the triggering source for the latching operation. This type of flip-flop is called a Single-Edge-Triggered flip-flop (SET-FF), and they are widely used for their simple implementation and ease of timing characterization [3]–[7]. In contrast, Dual-Edge-Triggered flip-flops (DET-FFs) take advantage of the unused clock edge in SET-FFs, i.e. the latching operation is triggered at both the rising and falling clock edges. Utilizing both clock edges potentially achieves better energy efficiency because it allows twice the data throughput compared to SET-FFs with the same clock frequency. Therefore, DET-FFs have been proposed as an alternative sequential circuit element for reducing power consumption compared to SET-FFs [8]. To implement dual edge sensitivity, various topologies, including complementary latch pair and pulse-triggered latch, have been studied [9]–[27], as will be discussed in Section II.

Meanwhile, a simple but widely used design technique for maximizing energy efficiency in an electronic system is voltage scaling [28]–[30]. By scaling down the supply voltage, quadratic reduction in dynamic energy consumption can be achieved. Therefore, voltage scaling is widely adopted among energy-constrained IoT applications [31], [32] and battery-operated miniature sensors [33]–[35]. To apply such voltage scaling to flip-flops, aggressive voltage scalability is of critical concern. However, due to increased sensitivity to the process variation in low voltage regimes, guaranteeing robust flip-flop operation at low supply voltage is a non-trivial challenge, especially with the advanced technologies where process variation is becoming increasingly prominent.

In this paper, a DET-FF suitable for aggressive voltage scaling down to a near-threshold voltage regime is presented. By adopting a single-phase clocking scheme and removing all internal dynamic nodes, robust operation at voltage as low as 0.28 V is verified for temperatures ranging from -40 °C to 120 °C with Monte Carlo simulations.

The rest of this paper is organized as follows: Section II analyzes the prior-art DET-FFs, categorized as three different types, and discusses their strengths and weaknesses. Section III presents the proposed fully static true-single-phase-clocked DET-FF designed for robust operation at low voltage regimes. Section IV shows the simulation results and comparisons with other DET-FFs, and Section V concludes the paper.

## II. REVIEW OF THE PRIOR-ART DUAL-EDGE-TRIGGERED FLIP-FLOPS

The prior-art DET-FFs can be categorized into three types: Latch-Multiplexer, Pulsed-Latch, and C-element. Through the analysis of these 3 types of DET-FF topologies, the characteristics and limitations of the conventional DET-FF designs are highlighted. A new DET-FF design that overcomes these limitations will be introduced in the following section.

# A. LATCH-MULTIPLEXER-TYPE DET-FF

Fig. 1 shows two Latch-Multiplexer (MUX) Dual-Edge-Triggered flip-flops (LM-DET FFs). In this type of DET-FF, a pair of latches is utilized for latching data at the positive and negative clock edges, and a multiplexer selects either of the latches to drive the output.

Fig. 1(a) shows one example of an LM-DET FF, a Static DET FF (SDET FF) presented in [20]. In this FF, each latch consists of back-to-back connected inverters and a transmission gate. The original structure of the SDET FF in [20] utilizes single nmos (or pmos) as the pass gates, which makes it vulnerable to process and voltage variations. In this paper, for comparison with other DET-FFs with improved reliability, the pass gates are replaced with transmission gates for analysis and simulations. Two latches



скі

Multiplexer

FIGURE 1. Latch-MUX-type DET-FFs: (a) Static DET-FF (SDET FF) [20] (b) True-single-phase-clock SDET FF (TSPC-SDET FF) [27].

and a multiplexer are controlled by complementary internal clock signals (CKB, CKI), which are generated from the CLK signal through 2-stage inverters.

When CLK = '1', the input transmission gate of the positive latch is turned on, and the input data is stored in the positive latch. On the other side, the input transmission gate of the negative latch is turned off, keeping the value stored in the last cycle in the negative latch. Meanwhile, the multiplexer is controlled to forward the value stored in the negative latch to output as long as the CLK stays high. At the falling transition of the CLK ('1' $\rightarrow$ '0'), the data in the positive latch is forwarded to the output node Q by the multiplexer. At the same time, the input transmission gate of the positive latch is turned off so that the current input data can be safely latched, fulfilling the latching operation at the falling edge of the CLK. The negative latch performs a complementary operation, storing the following inputs for the next latching operation at the rising transition of the CLK.

In this SDET FF, two-phase clock signals are required to control the tri-state inverters and prevent contention. The two latches are required to operate in a complementary manner. However, CKB and CKI can overlap for a short time since they are generated with 2-stage inverters. Such a problem can worsen in low voltage regimes where delay variation is exacerbated. As demonstrated in [27], the two transmission gates that form a multiplexer can be turned on at the same time during clock overlap, making it possible for one latch's data to upset the other latch's data.

A True-Single-Phase-Clock static DET flip-flop (TSPC-SDET FF) [27], shown in Fig. 1(b), is proposed to address this issue. The positive and negative latches operate in a complementary manner just like in [20]. But unlike in the SDET FF, a single-phase clocking scheme similar to C2MOS [36] is adopted to fundamentally eliminate the two-phase clock overlap issue at the cost of an increased number of transistors.

However, weak pull-up and pull-down in the TSPC-SDET FF make its voltage scalability limited and prone to variation. For example, when CLK = '0' and SNP = '1' in the positive latch, the gate of transistor MP is not fully pulled down to ground due to a PMOS on the pull-down path. Therefore, the node XP is weakly pulled up by MP. Similarly, when CLK = '1' and SNN = '0' in the negative latch, the node XN is weakly pulled down. Therefore, although it claims to be a 'static' DET FF, it is not fully static (i.e., it does not swing across the full voltage range), and, because of this problem, it is not suitable for aggressive voltage scaling and near-threshold voltage operation.

#### **B. PULSED-LATCH-TYPE DET-FF**

As alternatives to the Latch-Multiplexer-type DET-FF, many pulse-triggered DET FFs have been proposed [9]–[18]. In this topology, a pulse generator generates a clock-synchronized pulse at both the rising and falling clock edges. These clock-synchronized pulses are used as triggers for the latching operation of the pulsed latches.

Pulsed-Latch-type DET FFs (PL-DET FFs) consist of a pulse generator and a pulsed latch and can be classified into two types – implicit pulse [13]–[18] and explicit pulse [9]–[12]. Fig. 2 shows a PL-DET FF using an implicit pulse (Fig. 2(a)) and one using an explicit pulse (Fig. 2(b)) [11], [17]. The Dual-edge Conditional Pre-charge flip-flop (DECP FF) [17] shown in Fig. 2(a) utilizes control signals generated by the input CLK. The delayed version of CLK (CK4), the inverted version of CLK (CKD), and the CLK signal generate a short transparency window after both the rising and falling edge of the CLK. During these windows, the input data is forwarded to the output node Q in the pulsed latch. This implicit pulse-type DET-FF requires a robust timing window for reliable operation, so the careful sizing of transistors is an important design issue.

An example of a pulsed-latch-type DET-FF with explicit pulse is the Sense-Amplifier DET flip-flop (SA-DET FF) [11] shown in Fig. 2(b). This DET-FF utilizes pulses generated by complementary two-phase clock signals, namely CLK1 and CLK2. When these signals become equal for a very short period of time due to their generation circuit structure (series-connected inverters), a pulse generator circuit generates a short pulse signal, which is used to trigger the latching operation in a pulsed latch. Since the pulses are generated whenever CLK1 and CLK2 are both '1' or both '0', the data is latched at both CLK edges.



FIGURE 2. Pulsed-Latch-type DET-FF. (a) Dual-Edge Conditional Pre-charge (DECP) FF [17] (b) Sense-Amplifier (SA) DET FF [11].

Pulsed-latch-type DET-FFs require pulse-generating circuits. Since the transparency window of pulsed latches is determined by the pulse length, timing is very important. However, since typical pulse-generating circuits rely on transistor delay to control the pulse length, the pulse characteristics can be easily affected by Process, Voltage and Temperature (PVT) variations. Therefore, careful pulse-generation circuit design is required to make it robust against PVT variations. Such careful design often requires large margins for variation tolerance, resulting in large power and area overhead.

### C. C-ELEMENT-TYPE DET-FF

Another type of recently designed DET-FF includes a circuit element called a 'C-element' [23], [24]. As illustrated in Fig. 3(a), when the two inputs of a C-element are the same, the output is updated to the input value, whereas the earlier output value is retained when the two input values differ from each other. Thanks to the ability of C-element-type DET-FFs to latch new data or retain data depending on the input conditions, many different topologies have been proposed to implement them.

Fig. 3(b), (c) show examples of DET-FF designs using C-elements. The New low-power C-element Dual Data Rate flip-flop (NCDDR FF) [23] shown in Fig. 3(b) includes two latches that store the input values in opposite phases. The latched input is stored in nodes A and B, which are used as



FIGURE 3. C-element-type DET-FF. (a) Truth table and operation waveforms of C-element (b) NCDDR FF [23] (c) FN\_C-DET FF [24].

input for the C-element. In this manner, the C-element can latch input data at both clock edges. However, in this DET-FF design, the latched data is held by back-to-back connected weak inverters, which incur contention when new data is written to the storage node by overpowering the feedback inverter. For reliable operation, the feedback inverters and overriding transistors should be carefully sized. In addition, two input transmission gates controlled by the two-phase clock can potentially cause a clock overlap issue, as observed for SDET FFs.

The Floating Node C-element DET flip-flop (FN\_C-DET FF) [24] shown in Fig. 3(c) uses five C-elements in total: two 2-input C-elements at the input, two 3-input C-elements in the middle, and one output C-element. At the input C-elements, D, CKB and CKI signals determine the logical values of the internal nodes A and B. These A and B signals control the output C-element, which determines the value of Q. The node X is used as a feedback signal at the inner C-element pair. By using five C-elements, an FN\_C-DET FF forwards the input data at both clock edges. However, when the value of A or B switches, contention can occur at the data storage node of the C-element. To reduce contention, the two inner C-elements at the input so that the internal storage data can be flipped correctly.

Although C-element-type DET-FFs offer some advantages, the usage of back-to-back inverters for retaining data [23], [24] creates inevitable contention. Therefore, C-element-based DET-FFs require careful design to improve reliability and apply aggressive voltage scaling.



FIGURE 4. Simplified illustration of operation principle of Latch-MUX-type DET-FFS. (a) SDET FF [20] (b) TSPC-SDET FF [27] (c) proposed FS-TSPC-DET-FF.

## III. PROPOSED FULLY STATIC TRUE-SINGE-PHASE-CLOCKED DUAL-EDGE-TRIGGERED FLIP-FLOP

For energy-efficient digital circuit operation with aggressive voltage scaling in IoT applications, DET-FFs that are robust against PVT variations are required. Among the three types of DET-FFs discussed in Section II, Pulsed-latch-type DET-FFs require precisely timed short pulses, which are prone to PVT variation. In addition, C-element-type DET-FFs suffer from weak feedback, which inevitably creates contention during the latching operation. As a result, these two types of DET-FFs have limited voltage scalability and are not suitable for low-voltage operation. Therefore, a new DET-FF based on Latch-MUX topology is proposed to allow aggressive voltage scaling by avoiding contention during the latching operation. The proposed DET-FF is 1) fully static without weak pullup/down, 2) true single-phase clocked, and 3) contention free. Therefore, the proposed DET-FF is robust against PVT variations and does not require transistor sizing to overcome contention, which allows compact sizing of the FF, enabling small area and power overhead. These advantages enable reliable operation with aggressive voltage scaling across a wide temperature range.

Fig. 4 shows the block-level diagrams of the conventional Latch-MUX-type DET-FFs and the proposed Fully Static True-Single-Phase-Clock DET flip-flop (FS-TSPC-DET FF). In the SDET FF [20], shown in Fig. 4(a), a pair of complementary phase clocks is applied to complementary latches to enable either of the latches at a given point in time.



FIGURE 5. Positive latch in proposed DET-FF. (a) Transistor-level schematic (b) Gate-level schematic (c) Waveforms of positive latch.

At the same time, a multiplexer (MUX), which is a pair of transmission gates controlled by a two-phase clock, forwards the latched data in either of the latches to output Q. The TSPC-SDET FF [27] in Fig. 4(b) utilizes a single-phase clock so that it can avoid a clock-overlap issue and operate reliably with low supply voltage. However, the TSPC-SDET FF still has limitations in near-threshold operations due to the weak pull-up/pull-down, as stated earlier.

For better voltage scalability with robust low-voltage operation, the FS-TSPC-DET FF is proposed, whose concept is shown in Fig. 4(c). Similar to a TSPC-SDET FF, the proposed DET-FF utilizes only a single-phase clock. But compared to TSPC-SDET FF, the proposed DET-FF utilizes a complementary CMOS logic gate as a multiplexer, preventing potential issues with transmission gate MUX or tri-state inverter MUX. To make the CMOS logic gate function as a MUX, two clock-conditioned data signals – CDP and CDN – are generated with clock and data input and used as inputs for the MUX. These signals simultaneously 1) forward latched values to output and 2) control the behavior of each latch without weak pull-up or pull-down. The details of the proposed latch are described in the following sub-sections.

#### A. POSITIVE AND NEGATIVE LATCH

Fig. 5(a) and (b) show schematics of the positive latch in the proposed DET-FF. CDP is a positive clock-conditioned data signal generated by NOR operation of the CLK and latched data (DP). When CLK = '1', CDP is always 0 regardless of the DP value. In this condition, the input clocked inverter

(M in Fig. 5(b)) is activated, while the feedback clocked inverter is deactivated, making the positive latch transparent  $(QP = \overline{DP} = D)$ . In contrast, when CLK = '0', the NOR gate acts as an inverter  $(CDP = \overline{DP})$ . In this case, two scenarios (S1, S2) are possible depending on the DP value: S1) If DP = '0', hence CDP = '1', the back-to-back connected inverters are activated, and the DP and QP nodes remain static state; or S2) If DP = '1', hence CDP = '0', the feedback-clocked inverter (N in Fig. 5(b))'s pull-down path is off since nmos controlled by CDP is turned off. However, the feedback-clocked inverter's pull-up network is on since QP = CLK = '0', allowing stable DP data retention with static DP and QP nodes. As a result, every node in this positive latch is always static, which is desired for stable operation at low voltage.

Fig. 5(c) shows the simulation waveform for the positive latch. It can be clearly seen that the latch is transparent while CLK = '1'. The negative latch in the proposed DET-FF is complementary to the presented positive latch – it utilizes the NAND gate instead of the NOR gate as a negative clock-conditioned data signal generator and becomes transparent while CLK = '0'. These two static latches are symmetric elements required for complementary operation.

#### B. FULLY STATIC COMPLEMENTARY CMOS MULTIPLEXER

Fig. 6(a) shows an LM DET-FF topology using the positive and negative latches described earlier, whose outputs are combined with the transmission-gate-based multiplexer. As with the prior SDET FF [20], either of the latched data at the QP or QN node is transferred to the output Q through the MUX to function as a DET-FF. For controlling the transmission-gate-based MUX, two-phase clock signals are required, which can be generated by a simple inverter chain.

However, as discussed in Section II, such topology can incur data upset during clock overlap due to delay mismatch of CLK and CLKB at the transmission gate MUX. Introducing a tri-state inverter-based MUX [27] still results in weak pull-up/down.

Therefore, in the proposed DET-FF, a fully static complementary CMOS MUX is utilized to 1) prevent data upset due to clock overlap and 2) keep every node fully static (full swing) at any given time.

The fully static complementary CMOS MUX is derived by logic minimization, where multiplexing operation is performed only with existing internal signals. Since all of the nodes in the positive/negative latches are static with full voltage swing, the complementary CMOS MUX can operate without any stability issue. The logic minimization process for the complementary CMOS MUX is as follows:

Firstly, the latched data DP/DN and their inverted value QP/QN can be written as

| $DP = \overline{CDP} \cdot \overline{CLK} + \overline{D} \cdot \overline{CDP} \cdot CLK$ |     |  |  |  |
|------------------------------------------------------------------------------------------|-----|--|--|--|
| $DN = \overline{CDN} \cdot CLK + \overline{D} \cdot CDN \cdot \overline{CLK}$            | (2) |  |  |  |
|                                                                                          |     |  |  |  |

| 2 = | DP | (3) |
|-----|----|-----|
| -   |    |     |

$$QN = \overline{DN}$$
(4)



**FIGURE 6.** (a) The Latch-MUX DET-FF with the proposed two static latches and the transmission gate MUX (before logic minimization) (b) proposed fully static DET-FF with the fully static complementary CMOS MUX using TSPC.

Secondly, the clock-conditioned data CDP/CDN can be obtained with NAND/NOR operation as

$$CDP = \overline{DP + CLK}$$
(5)

$$CDN = \overline{DN \cdot CLK}$$
(6)

Meanwhile, the final output Q should be the output of the MUX, which selectively forwards QP or QN depending on the clock phase. Its logic function can be written as

$$Q = QP \cdot \overline{CLK} + QN \cdot CLK \tag{7}$$

In this equation, the term related to QP can be rewritten as follows by using (3) and (5):

$$QP \cdot \overline{CLK} = \overline{DP} \cdot \overline{CLK} = \overline{DP} + \overline{CLK} = CDP \qquad (8)$$

Therefore, equation (7) can be rewritten as

$$Q = CDP + QN \cdot CLK \tag{9}$$

Therefore, the transmission-gate-based MUX in Fig. 6(a) can be replaced with a complementary CMOS AO (AND-OR) gate as shown in Fig. 6(b). In the proposed DET-FF design, a total of 36 transistors are used, which is less than a prior TSPC-DET FF [27] with 38 transistors.

Fig. 7 illustrates the operation of a FS-TSPC-DET FF in each clock and input data condition. The top row shows the case where the initial DP value is '0' and the input D is '0', whereas the bottom row shows the case where the initial DP is '1' and the input D is '1'. The detailed behavior is presented as the CLK toggles from '0' to '1' and then from '1' to '0'.

In the initial state (CLK = '0'), the positive latch holds the internal data (DP). Note that CDP = '0' when DP = '1' (bottom left in Fig. 7); hence, the stacked NMOS in the feedback-clocked inverter is turned off. However, since QP = '0', only the pull-up path of the feedback inverter matters, and all nodes are still statically driven to  $V_{DD}$  or GND. Meanwhile, the output QB is determined solely by CDP since QN is ANDed with CLK at the MUX.

As the clock rises (CLK = '1'), the data stored in the negative latch is now forwarded to QB through the MUX since CDP is forced to be '0' and CLK is '1'. The internal data at the positive latch (DP) can be updated since 1) the input clocked inverter is fully on (CDP = '0') and 2) the feedback clocked inverter is fully off (CLK = '1' and CDP = '0').

At the clock falling transition, the negative latch becomes transparent again, and the updated data in the positive latch will be forwarded to the output QB/Q. Although the TSPC-SDET FF [27] also utilizes the same single-phase clocking method, the clock-conditioned data signals (CDP/CDN) in the proposed DET-FF can reduce the number of transistors connected to the input clock (from 14 to 10), resulting in lower power consumption caused by clock transition.

## IV. SIMULATION ANALYSIS AND COMPARISON

The proposed FS-TSPC-DET FF is designed and its layout is modeled in 28nm LP bulk CMOS technology for detailed analysis. In addition, 6 other DET-FFs described in section II (SDET FF [20], TSPC-SDET FF [27], DECP FF [17], SA-DET FF [11], NCDDR FF [23], and FN\_C-DET FF [24]) are also designed for comparison. In advanced technology nodes, the reliability and performance of the digital circuits rapidly degrade in a low-voltage regime due to exacerbated PVT variations. Therefore, to verify the proposed design in challenging conditions, relatively advanced process node (28nm), among those that can be accessed, is selected for design and analysis. In addition, since the proposed FF is



FIGURE 7. FS-TSPC-DET FF gate-level schematic and its operation.

targeted for IoT applications, Low-Power (LP) process, rather than General Purpose (GP), is chosen.

Fig. 8 presents the layout of the 7 DET-FFs designed for post-layout analysis. All DET-FF designs are carefully sized to guarantee reliable operation at room temperature and standard supply voltage (0.95 V) while keeping the size as small as possible for energy efficiency. RC parasitics are extracted for the layouts shown in Fig. 8 so that the post-layout simulations can be performed at various operation conditions.

Among the 3 types of DET-FFs, the pulsed-latch-type DET-FF (DECP FF [17], SA-DET FF [11]) and the C-element-type DET-FF (NCDDR FF [23], FN C-DET FF [24]) are more susceptible to PVT variations than the Latch-MUX-type DET-FF (SDET FF [20], TSPC-SDET FF [27] and FS-TSPC-DET FF) since the pulsed-latch-type DET-FF requires precise pulse timing and the C-element-type DET-FF has contention with the weak inverters. Therefore, these DET-FFs require proper transistor sizing for stable operation. However, sizing up transistors to ensure a minimum reliable pulse length for pulsed-latch-type DET-FFs or making a weak inverter for C-element-type DET-FFs would incur greater power and area overhead. Due to this trade-off, the transistor sizes are carefully chosen for these two types of DET-FF designs so that stable operation at the standard supply voltage and room temperature is guaranteed while keeping the area compact and energy efficiency reasonable. This approach also includes the use of non-minimum-length transistors for reliable pulse lengths in pulsed-latch-type DET-FFs.

#### A. STANDARD SUPPLY VOLTAGE OPERATION

Fig. 9 shows the simulation setup for fair data measurement and comparison, which is similar to the one used in [16], [37]. Two inverters are utilized as input drivers for the input clock (CLK) and data (D). To precisely model the power consumption of the DET-FFs, the power consumed for driving each pin is estimated by measuring the power consumed on these drivers. For example, the power consumed for driving the CLK input ( $P_{CLK}$ ) can be calculated as

$$P_{CLK} = P_{Driver\_CLK} - P_{Driver\_CLK\_int}$$
(10)

where  $P_{Driver\_CLK}$  is the total measured power consumption of the power supply of the CLK-driving inverter, and  $P_{Driver\_CLK\_int}$  is the CLK-driving inverter's intrinsic power consumption, which can be measured by measuring the power consumption of the CLK-driving inverter without the DET-FF's CLK pin loading. Similarly, the power consumed for driving D input ( $P_D$ ) can be calculated as

$$P_D = P_{Driver_D} - P_{Driver_D_{int}}$$
(11)

and the power consumed for the internal switching of the flipflop  $(P_{Flip-Flop})$  can be measured by the power consumption



FIGURE 8. Layout of 7 DET-FFS. (a) SDET FF (b) TSPC-SDET FF (c) DECP FF (d) SA-DET FF (e) NCDDR FF (f) FN\_C-DET FF (g) FS-TSPC-DET FF.



FIGURE 9. DET-FF simulation setup including the input driver.

of the flip-flop's power supply. Then the total power consumption of a DET-FF can be represented as

$$P_{Total} = P_{Flip-Flop} + P_D + P_{CLK}$$
  
=  $P_{Flip-Flop} + (P_{Driver\_D} - P_{Driver\_D\_int})$   
+  $(P_{Driver\_CLK} - P_{Driver\_CLK\_int})$  (12)

Meanwhile, the potential wire capacitance and resistance outside the DET-FF layout are not included in the simulation since they can vary significantly depending on the implementation.

To evaluate the standard voltage operation of the 7 DET-FFs, each DET-FF is simulated with the nominal supply voltage (0.95 V) at room temperature (27 °C) assuming TT (typical) process corner. A clock frequency of 1 GHz is used, and a varying data switching activity ratio ( $\alpha$ ) of 10%, 25% or 50% is applied. Fig. 10 shows the measured power consumption of the DET-FFs, and the power consumption is broken down into  $P_{Flip-Flop}$ ,  $P_D$  and  $P_{CLK}$ . A few interesting trends can be observed with the power consumption analysis in Fig. 10.

Firstly, the power consumption of the clock (CLK) input driver ( $P_{CLK}$ ) is proportional to the number of transistors at

the clock input. For example, three DET-FFs using complementary two-phase clocking - namely SDET FF, NCDDR FF, and FN C-DET FF - consume the least amount of power since they have only 2 transistors at the CLK input. In contrast, the proposed FS-TSPC-DET FF has 10 transistors and TSPC-SDET FF has 14 transistors at the CLK input, resulting in relatively higher CLK driver power consumption. Meanwhile, if contention exists on the CLK node, the CLK driver can consume more power than is proportional to the transistor count. For example, DECP FF, shown in Fig. 2, has 7 transistors and SA-DET FF has 6 transistors on the CLK input. However, the pulse generator in SA-DET FF has contention during the pulse generation at the PULS node. Therefore, SA-DET FF consumes more CLK driver power than DECP FF despite its lower CLK input transistor count. Since the CLK node activity does not change with  $\alpha$ ,  $P_{CLK}$ does not change with  $\alpha$ .

Secondly, the power consumption of the data (D) input driver ( $P_D$ ) is directly proportional to the activity ratio ( $\alpha$ ) and accounts for a relatively small percentage of the total power consumption if there is no contention on D input. It can be seen that  $P_D$  is less than 10 % for TSPC-SDET FF, DECP FF, FN\_C-DET FF and FS-TSPC-DET FF. With SDET FF, SA-DET FF and NCDDR FF, D input can be directly connected to the data retention node in certain phases or for a short time period, which creates contention and draws more power from D input driver. In the SDET FF and NCDDR FF, two latches and the D node can be shorted due to clock-phase misalignment, which makes  $P_D$  sharply increase at  $\alpha = 25\%$ and 50% compared to when  $\alpha = 10\%$ .

Thirdly, the power consumption through a flip-flop's power supply  $(P_{Flip-Flop})$  increases with a higher  $\alpha$  but at a lower ratio compared with  $\alpha$ . This is because internal switching of a flip-flop is related to both the clock and data.



For example, the amount of power consumed for driving an inverted clock signal (CLKB) in flip-flops with a two-phase clocking scheme is not correlated with and hence does not change with  $\alpha$ . On the other hand, the amount of power consumed for switching the latched data is directly correlated with  $\alpha$ . Therefore, the increase in  $P_{Flip-Flop}$  with higher  $\alpha$  can be minimized by minimizing the data-related internal switching activity.

In regards to these trends, the following observations can be made for the proposed FS-TSPC-DET FF: 1) A large portion of the proposed DET-FF's power consumption is due to the CLK driver. This is due to the large number of clock-loading transistors. 2) Negligible power is consumed



FIGURE 11. CLK-to-Q delay versus D-to-CLK delay. (a) all output transitions at both clock edges in the proposed FS-TSPC-DET FF. (b) setup time comparison with 7 DET-FFs.

by D input, thanks to the lack of contention. 3) The baseline  $P_{Flip-Flop}$  is relatively small compared with that of the other DET-FF. This is because there is no potential contention within the flip-flop. Overall, the proposed FS-TSPC-DET FF has low power consumption compared with the other analyzed DET FFs: second lowest when  $\alpha = 10\%$  or 25% and the lowest with  $\alpha = 50\%$ . This means that the penalty for increased data activity is smaller, thanks to the reduced internal switching activity.

Fig. 11 shows the CLK-to-Q delay ( $T_{cq}$ ) as a function of D to CLK delay in standard  $V_{DD} = 0.95$  V. For accurate setup/hold time measurement, D-to-CLK delay is swept for every 0.02 ps, and more than 10,000 data points were measured. In Fig. 11(a), the  $T_{cq}$  values for all 4 types of transition cases are shown for the proposed FS-TSPC-DET FF. In this figure,  $T_{cq}$  and the setup time are determined by the case where Q is rising at the rising edge of the CLK. Fig. 11(b) shows  $T_{cq}$  for all 7 DET-FFs. In general, the Latch-MUXtype DET-FFs (SDET FF, TSPC-SDET FF and FS-TSPC-DET FF) provide a shorter  $T_{cq}$  than other types of DET-FFs.

#### B. NEAR-THRESHOLD SUPPLY VOLTAGE OPERATION

The DET-FFs are simulated under near-threshold voltage to examine voltage scalability for low-voltage operation in IoT



**FIGURE 12.** Comparisons of 4 DET-FFs with low supply voltages (a) Total power consumption (P<sub>total</sub>) at varying switching activity ratios (b) Supply voltage versus CLK-to-Q delay (T<sub>cq</sub>) plots.

applications. To evaluate the near-threshold voltage operation of the DET-FFs, each DET-FF is simulated with a supply voltage of 0.3 V at room temperature (27 °C) and a clock frequency of 1 MHz, assuming TT (typical) process corner. Under the given conditions, the 3 DET-FFs (SA-DET FF, DECP FF and NCDDR FF) failed to function properly for at least one clock edge due to failure in pulse generation or data contention. Therefore, only the remaining 4 DET-FFs (SDET FF, TSPC-SDET FF, FN\_C-DET FF and FS-TSPC-DET FF) are reported in this section.

Fig. 12 shows the total power consumption ( $P_{Total}$ ) and CLK-to-Q delay ( $T_{cq}$ ) of these 4 DET-FFs with varying switching activities ( $\alpha = 10\%$ , 25%, 50%). In Fig. 12(a), the proposed FS-TSPC-DET FF shows the second lowest  $P_{Total}$  at low  $\alpha$  of 10% and 25% and the lowest  $P_{Total}$  at high  $\alpha$  of 50%, which is similar to the results obtained under standard voltage. In addition to  $P_{Total}$ , another metric that requires attention under low voltage operation is  $T_{cq}$ . With aggressive voltage scaling down to near-threshold regimes,  $T_{cq}$  can exponentially increase and impact the overall performance of digital circuits. Fig. 12(b) show how  $T_{cq}$  changes with aggressive voltage scaling. As the supply voltage is decreased from 0.4 V to 0.3 V,  $T_{cq}$  rapidly increases. It can be seen that  $T_{cq}$  increases more sharply with the FN\_C-DET FF



FIGURE 13. Comparisons of power-delay-product (PDP) in low supply voltage with 4 DET-FFs.

than with the other 3 DET-FFs, and the designs with true-single-phase clocking schemes (TSPC-SDET FF, FS-TSPC-DET FF) maintain faster operation than the other DET-FFs even at low supply voltage conditions.

The total power consumption ( $P_{Total}$ ) and CLK-to-Q delay ( $T_{cq}$ ) are both important metrics for DET-FF operation in low-voltage conditions, but there can be trade-off between these two metrics. For this reason, the power delay product (PDP =  $P_{Total} \times T_{cq}$ ) is often used [11], [27], [23] to evaluate FFs. Fig. 13 shows the PDPs of the 4 DET-FFs under near-threshold voltage operation. Although the FN\_C-DET FF has the lowest  $P_{Total}$ , TSPC-SDET FF and FS-TSPC-DET FF present significantly better PDPs regardless of  $\alpha$  due to significantly lower  $T_{cq}$ . This implies that the proposed FS-TSPC-DET FF can be a suitable solution for low-voltage operation in IoT applications.

### C. MONTE CARLO SIMULATION WITH SUPPLY VOLTAGE AND TEMPERATURE SCALING

Even though the proposed FS-TSPC-DET FF has low power consumption and small delay, its robustness against PVT variation should be verified for near-threshold voltage operation since the impact of variation is exaggerated in this voltage regime. To evaluate the robustness of the DET-FFs, Monte Carlo simulations (for process corner and mismatch) are performed with aggressive supply voltage scaling and a wide temperature range. In this simulation, the 4 DET-FF designs evaluated for near-threshold voltage operation are examined again.

Fig. 14 shows the input test pattern used for the Monte Carlo simulations, which is carefully designed to check the functionality of each DET-FF for all possible input combinations. All possible transition scenarios of input D and CLK (up to 3 transitions of one signal while the other signal is unchanged) are included in this test pattern, as shown in the truth table. The input clock frequency is set to 1 kHz, which is sufficiently low to allow functional failures with long delay to be checked. A long delay between the CLK and



\* 0 and 1 means GND and VDD, respectively (logical value)





FIGURE 15. Monte Carlo simulation results with 4 DET-FFs with aggressive voltage and temperature scaling.

data transition is assumed so that only the functional failures not related to setup or hold-time violations are checked. If a DET-FF operates with no functional failure, the output Q is expected as shown Fig. 14, synchronizing the input data at each marked point (a–f).

Monte Carlo simulations were performed for a supply voltage range of 0.2 V to 0.5 V with 20-mV steps and a temperature range of -40 °C to 120 °C with 10 °C steps. For each condition, 10k Monte Carlo points were simulated for accurate evaluation. The Shmoo plots for the 4 DET-FFs are shown in Fig. 15. In these Shmoo plots, a green box

represents 'pass' in 10k Monte Carlo simulations and a red box represents 'fail' for failing at least 1 test case. Among the 4 evaluated DET-FFs, the SDET FF and FN\_C-DET FF have limited functionality at low voltage and low temperature. This is because these DET-FFs utilize a two-phase clocking scheme, where the clock overlap issue can be worse with PVT variations. Furthermore, in the FN\_C-DET FF, the strong and weak C-elements have contention on the data storage node, making it susceptible to PVT variations. Thanks to their single-phase clocking scheme, TSPC-SDET FF and FS-TSPC-DET FF can operate without failure at the lower

| DET-FF<br>design                           | SDET<br>[20]                 | TSPC-SDET<br>[27] | DECP<br>[17]                  | SA-DET<br>[11]                | NCDDR<br>[23]                | FN_C-DET<br>[24]             | FS-TSPC-DET<br>This work |
|--------------------------------------------|------------------------------|-------------------|-------------------------------|-------------------------------|------------------------------|------------------------------|--------------------------|
| # of Tr.                                   | 26                           | 38                | 35                            | 22                            | 28                           | 34                           | 36                       |
| Туре                                       | Latch-MUX                    | Latch-MUX         | Pulsed-latch                  | Pulsed-latch                  | C-element                    | C-element                    | Latch-MUX                |
| Clock phase<br>(Trigger-type)              | Two-phase<br>(Complementary) | Single-phase      | Two-phase<br>(Implicit pulse) | Two-phase<br>(Explicit pulse) | Two-phase<br>(Complementary) | Two-phase<br>(Complementary) | Single-phase             |
| Normalized<br>Area                         | 0.79                         | 0.96              | 1.29                          | 0.92                          | 1.20                         | 1.19                         | 1                        |
| <b>Power</b> <sub>0.1</sub> (μW)           | 6.01                         | 5.52              | 9.38                          | 9.91                          | 6.47                         | 4.31                         | 5.03                     |
| Power <sub>0.25</sub> (µW)                 | 7.27                         | 6.58              | 10.5                          | 11.2                          | 10.5                         | 6.20                         | 6.30                     |
| Power <sub>0.5</sub> (µW)                  | 9.35                         | 8.55              | 12.4                          | 13.7                          | 17.1                         | 9.28                         | 7.71                     |
| PDP <sub>0.1</sub> (fJ)                    | 0.73                         | 0.41              | 1.62                          | 1.39                          | 1.44                         | 0.71                         | 0.41                     |
| PDP 0.25 (fJ)                              | 0.88                         | 0.49              | 1.81                          | 1.58                          | 2.33                         | 1.03                         | 0.51                     |
| PDP <sub>0.5</sub> (fJ)                    | 1.13                         | 0.63              | 2.14                          | 1.92                          | 3.80                         | 1.53                         | 0.62                     |
| T <sub>cq</sub> (ps)                       | 121                          | 75.0              | 173                           | 140                           | 223                          | 166                          | 81.6                     |
| Setup time (ps)                            | 20.9                         | 88.9              | 8.32                          | -0.34                         | 111                          | 46.2                         | 82.5                     |
| Hold time (ps)                             | 50.5                         | 2.77              | 143                           | 128                           | 218                          | 114                          | 11.8                     |
| Minimum V <sub>DD</sub><br>(Temp40~120 °C) | > 0.5 V                      | 0.36 V            | _                             | _                             | _                            | > 0.5V                       | 0.28 V                   |

TABLE 1. DET-FF comparison of simulation results at standard conditions.

\* Power<sub>a</sub>, PDP<sub>a</sub> :  $\alpha$  is switching activity ratio

\* The power includes CLK and D driver's power except the driver's internal power

\* Minimum  $V_{\text{DD}}$ : Monte Carlo simulation result in Fig. 15

voltages and temperatures. However, TSPC-SDET FF cannot function below 0.3 V due to the weak pull-up/pull-down (at node XP or XN in Fig. 1(b)). Thanks to the fully static implementation, the proposed FS-TSPC-DET FF has the lowest functional supply voltage and temperature – it could operate without failure at 0.28 V for the entire tested temperature range.

Table 1 summarizes the evaluation results with standard supply voltage conditions ( $V_{DD} = 0.95$  V, input clock frequency = 1 GHz, temp. = 27 °C at TT corner). The normalized areas of the 7 DET-FFs are compared based on the layout sizes shown in Fig. 8 (FS-TSPC-DET FF = '1'). The area is not directly proportional to the transistor counts since some designs require careful transistor sizing for robust operation, especially when they have contention. Thanks to their symmetric and static structure, Latch-MUX-type DET-FFs (the SDET FF, TSPC-SDET FF and FS-TSPC-DET FF) can have smaller layout areas compared to other DET-FF designs despite the use of a large number of transistors.

The proposed FS-TSPC-DET FF has low power consumption comparable to the best performing FN\_C-DET FF. In addition, it also has low CLK-to-Q delay comparable to the best performing TSPC-SDET FF, resulting in the lowest PDP for  $\alpha = 10\%$  and 50% and second lowest PDP for  $\alpha = 25\%$ . Besides the excellent trade-off between power and performance, the proposed FS-TSPC-DET FF also exhibits robust operation at low voltage. Thanks to its fully static design without contention, the minimum operational supply voltage was as low as 0.28 V. Such high energy efficiency and wide voltage scalability suggest that the proposed FS-TSPC-DET FF not only consumes relatively lower power compared to prior-art designs at a given supply voltage but also can operate with the lowest supply voltage. These properties make the proposed FF an excellent candidate as an energy-efficient sequential circuit element for energy/cost-sensitive IoT applications with aggressive dynamic voltage scaling down to a near-threshold voltage of 0.28 V.

#### **V. CONCLUSION**

A dual-edge-triggered flip-flop for high energy efficiency and robust near-threshold operation is proposed in this paper. Unlike most of the conventional DET-FFs, such as the Pulsed-latch-type or Latch-MUX-type DET-FF using a complementary two-phase clock, the proposed FS-TSPC-DET FF utilizes a true single-phase clock and implements a fully static structure without floating state and contention. The comprehensive post-layout analysis of the DET FFs implemented in 28nm CMOS technology showed that the proposed FS-TSPC-DET FF achieves excellent power-delay trade-off and high tolerance against PVT variation, thanks to the simple and fully static implementation. Extensive comparison with prior-art DET-FFs confirmed that the proposed DET-FF can operate at the lowest voltage of 0.28 V for a temperature range of -40 °C to 120 °C, making it an excellent candidate as an energy-efficient flip-flop for energy/cost-sensitive IoT applications.

#### ACKNOWLEDGMENT

The authors would like to thank the IC Design Education Center (IDEC), South Korea, for providing EDA tool.

#### REFERENCES

- J. L. Shin, R. Golla, H. Li, S. Dash, Y. Choi, A. Smith, H. Sathianathan, M. Joshi, H. Park, M. Elgebaly, S. Turullols, S. Kim, R. Masleid, G. K. Konstadinidis, M. J. Doherty, G. Grohoski, and C. McAllister, "The next generation 64b SPARC core in a t4 SoC processor," *IEEE J. Solid-State Circuits*, vol. 48, no. 1, pp. 82–90, Jan. 2013.
- [2] E. D. Korczynski, "IoT demands: Are we ready?" Solid State Technol., vol. 59, no. 4, pp. 19–23, Jun. 2016.
- [3] Y. Kim, "A static contention-free single-phase-clocked 24T flip-flop in 45 nm for low-power applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2014, pp. 466–467.
- [4] N. Kawai, S. Takayama, J. Masumi, N. Kikuchi, Y. Itoh, K. Ogawa, A. Ugawa, H. Suzuki, and Y. Tanaka, "A fully static topologically-compressed 21-transistor flip-flop with 75% power saving," *IEEE J. Solid-State Circuits*, vol. 49, no. 11, pp. 2526–2533, Nov. 2014.
- [5] F. Stas and D. Bol, "A 0.4-V 0.66-fJ/cycle retentive true-single-phaseclock 18T flip-flop in 28-nm fully-depleted SOI CMOS," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 3, pp. 935–945, Mar. 2018.
- [6] Y. Cai, A. Savanth, P. Prabhat, J. Myers, A. S. Weddell, and T. J. Kazmierski, "Ultra-low power 18-transistor fully static contention-free single-phase clocked flip-flop in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 54, no. 2, pp. 550–559, Feb. 2019.
- [7] P. Xu, C. Gimeno, and D. Bol, "Optimizing TSPC frequency dividers for always-on low-frequency applications in 28 nm FDSOI CMOS," in *Proc. IEEE SOI-3D-Subthreshold Microelectron. Technol. Unified Conf. (SS)*, Burlingame, CA, USA, Oct. 2017, pp. 1–2.
- [8] M. Alioto, E. Consoli, and G. Palumbo, "DET FF topologies: A detailed investigation in the energy-delay-area domain," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, Rio de Janeiro, Brazil, May 2011, pp. 563–566.
- [9] T. A. Johnson and I. S. Kourtev, "A single latch, high speed double-edge triggered flip-flop (DETFF)," in *Proc. 8th IEEE Int. Conf. Electron., Circuits Syst. (ICECS)*, Malta, Sep. 2001, pp. 189–192 vol. 1.
- [10] Y.-Y. Sung and R. C. Chang, "A novel CMOS double-edge triggered flip-flop for low-power applications," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2004, p. 665.
- [11] M. W. Phyu, K. Fu, W. L. Goh, and K.-S. Yeo, "Power-efficient explicit-pulsed dual-edge triggered sense-amplifier flip-flops," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 19, no. 1, pp. 1–9, Jan. 2011.
- [12] X.-X. Wu and J.-Z. Shen, "Low-power explicit-pulsed triggered flip-flop with robust output," *Electron. Lett.*, vol. 48, no. 24, pp. 1523–1525, Nov. 2012.
- [13] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper, "Flow-through latch and edge-triggered flip-flop hybrid elements," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 1996, pp. 138–139.
- [14] F. Klass, C. Amir, A. Das, K. Aingaran, C. Truong, R. Wang, A. Mehta, R. Heald, and G. Yee, "A new family of semidynamic and dynamic flip-flops with embedded logic for high-performance processors," *IEEE J. Solid-State Circuits*, vol. 34, no. 5, pp. 712–716, May 1999.
- [15] C. Kim and S.-M. Kang, "A low-swing clock double-edge triggered flipflop," *IEEE J. Solid-State Circuits*, vol. 37, no. 5, pp. 648–652, May 2002.
- [16] P. Zhao, J. McNeely, P. Golconda, M. A. Bayoumi, R. A. Barcenas, and W. Kuang, "Low-power clock branch sharing double-edge triggered flipflop," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 15, no. 3, pp. 338–345, Mar. 2007.
- [17] N. Nedovic, M. Aleksic, and V. G. Oklobdzija, "Conditional pre-charge techniques for power-efficient dual-edge clocking," in *Proc. Int. Symp. Low Power Electron. Design (ISLPED)*, 2002, pp. 56–59.
- [18] J.-F. Lin, "Low-power pulse-triggered flip-flop design using gated pull-up control scheme," *Electron. Lett.*, vol. 47, no. 24, p. 1313, 2011.
- [19] A. Gago, R. Escano, and J. A. Hidalgo, "Reduced implementation of D-type DET flip-flops," *IEEE J. Solid-State Circuits*, vol. 28, no. 3, pp. 400–402, Mar. 1993.
- [20] R. Hossain, L. D. Wronski, and A. Albicki, "Low power design using double edge triggered flip-flops," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 2, no. 2, pp. 261–265, Jun. 1994.
- [21] R. P. Llopis and M. Sachdev, "Low power, testable dual edge triggered flipflops," in *Proc. Int. Symp. Low Power Electron. Design (ISLPED)*, 1996, pp. 341–345.

- [22] S. Muthukumar and G. Choi, "Low-power and area-efficient 9-transistor double-edge triggered flip-flop," *IEICE Electron. Express*, vol. 10, no. 18, 2013, Art. no. 20130639.
- [23] S. V. Devarapalli, P. Zarkesh-Ha, and S. C. Suddarth, "A robust and low power dual data rate (DDR) flip-flop using c-elements," in *Proc. 11th Int. Symp. Qual. Electron. Design (ISQED)*, San Jose, CA, USA, Mar. 2010, pp. 147–150.
- [24] S. Lapshev and S. M. R. Hasan, "New low glitch and low power DET flip-flops using multiple C-elements," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 63, no. 10, pp. 1673–1681, Oct. 2016.
- [25] M. Afghahi and J. Yuan, "Double-edge-triggered D-flip-flops for high-speed CMOS circuits," *IEEE J. Solid-State Circuits*, vol. 26, no. 8, pp. 1168–1170, 1991.
- [26] J.-S. Wang, "A new true-single-phase-clocked double-edge-triggered flip-flop for low-power VLSI designs," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, Jun. 1997, pp. 1896–1899.
- [27] A. Bonetti, A. Teman, and A. Burg, "An overlap-contention free truesingle-phase clock dual-edge-triggered flip-flop," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, Lisbon, May 2015, pp. 1850–1853.
- [28] T. D. Burd, T. A. Pering, A. J. Stratakos, and R. W. Brodersen, "A dynamic voltage scaled microprocessor system," *IEEE J. Solid-State Circuits*, vol. 35, no. 11, pp. 1571–1580, Nov. 2000.
- [29] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, "Low-power CMOS digital design," *IEEE J. Solid-State Circuits*, vol. 27, no. 4, pp. 473–484, Apr. 1992.
- [30] W. Wang and P. Mishra, "System-wide leakage-aware energy minimization using dynamic voltage scaling and cache reconfiguration in multitasking systems," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 20, no. 5, pp. 902–910, May 2012.
- [31] S. Paul, V. Honkote, R. Kim, T. Majumder, P. Aseron, V. Grossnickle, R. Sankman, D. Mallik, S. Jain, S. Vangal, J. Tschanz, and V. De, "An energy harvesting wireless sensor node for IoT systems featuring a near-threshold voltage IA-32 microcontroller in 14nm tri-gate CMOS," in *Proc. IEEE Symp. VLSI Circuits (VLSI-Circuits)*, Honolulu, HI, USA, Jun. 2016, pp. 1–2.
- [32] N. Lotze and Y. Manoli, "Ultra-sub-threshold operation of always-on digital circuits for IoT applications by use of schmitt trigger gates," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 11, pp. 2920–2933, Nov. 2017.
- [33] Y. Shi, M. Choi, Z. Li, Z. Luo, G. Kim, Z. Foo, H.-S. Kim, D. Wentzloff, and D. Blaauw, "A 10 mm<sup>3</sup> inductive-coupling near-field radio for syringe-implantable smart sensor nodes," *IEEE J. Solid State Circuits*, vol. 51, no. 11, pp. 2570–2583, Sep. 2016.
- [34] Y. Lee, S. Bang, I. Lee, Y. Kim, G. Kim, M. H. Ghaed, P. Pannuto, P. Dutta, D. Sylvester, and D. Blaauw, "A modular 1 mm<sup>3</sup> die-stacked sensing platform with low power I<sup>2</sup>C inter-die communication and multi-modal energy harvesting," *IEEE J. Solid-State Circuits*, vol. 48, no. 1, pp. 229–243, Jan. 2013.
- [35] M. Cho, "A 6×5×4 mm<sup>3</sup>general purpose audio sensor node with a 4.7µW audio processing IC," in *Proc. Symp. VLSI Circuits*, Kyoto, Japan, 2017, pp. C312–C313.
- [36] Y. Suzuki, K. Odagawa, and T. Abe, "Clocked CMOS calculator circuitry," *IEEE J. Solid-State Circuits*, vol. 8, no. 6, pp. 462–469, Dec. 1973.
- [37] M. Alioto, E. Consoli, and G. Palumbo, "Analysis and comparison in the energy-delay-area domain of nanometer CMOS flip-flops: Part I— Methodology and design strategies," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 19, no. 5, pp. 725–736, May 2011.



**YONGMIN LEE** (Student Member, IEEE) received the B.S. degree in biomedical engineering from Konyang University, Daejeon, South Korea, in 2016, and the M.S. degree in electronic, electrical, and computer engineering from Sungkyunkwan University, Suwon, South Korea, in 2019. His research interests include low-power circuit design and digital VLSI design.



**GICHEOL SHIN** (Student Member, IEEE) received the B.S. degree in electrical engineering from Sungkyunkwan University, Suwon, South Korea, in 2017, where he is currently pursuing the joint M.S./Ph.D. degree in electronic, electrical, and computer engineering. His research interests include ultra-low-power circuit and hardware accelerator design for deep learning applications.



**YOONMYUNG LEE** (Senior Member, IEEE) received the B.S. degree in electronic and electrical engineering from the Pohang University of Science and Technology (POSTECH), Pohang, South Korea, in 2004, and the M.S. and Ph.D. degrees in electrical engineering from the University of Michigan, Ann Arbor, in 2008 and 2012, respectively.

From 2012 to 2015, he was a Research Faculty with the University of Michigan and performed

research on ultra-low-power circuit design for mm-scale sensor platforms. In 2013, he co-founded CubeWorks Inc., a start-up company specialized in mm-scale sensor platforms. In 2015, he joined Sungkyunkwan University, Suwon, South Korea, where he is currently an Associate Professor. His current research interests include energy-efficient integrated circuits design for low-power high-performance VLSI systems and millimeter-scale wireless sensor systems. He has been serving as a Technical Program Committee Member for A-SSCC, since 2017. He has received numerous awards and scholarships, including the Distinguished Undergraduate Scholarship from the Korea Foundation for Advanced Studies, in 2001, the Samsung Scholarship, in 2005, the Best Paper Award in ISLPED, in 2009, the DAC/ISSCC Student Design Contest, in 2009 and 2011, the Intel Ph.D. Fellowship, in 2011, and the Samsung Human Tech Thesis Contest Silver Award, in 2012. He has been serving as an Associate Editor for the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS, since 2019.

...