

Received 4 October 2023, accepted 25 October 2023, date of publication 30 October 2023, date of current version 6 November 2023. Digital Object Identifier 10.1109/ACCESS.2023.3328563

# **RESEARCH ARTICLE**

# Low-Power High-Speed Sense-Amplifier-Based Flip-Flops With Conditional Bridging

# BOMIN JOO<sup>®</sup> AND BAI-SUN KONG<sup>®</sup>, (Member, IEEE)

Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon 16419, South Korea

Corresponding author: Bai-Sun Kong (bskong@skku.edu)

This work was supported in part by Samsung Electronics Company Ltd.; in part by Institute of Information & Communications Technology Planning & Evaluation (No. 2019-0-00421) under Grant 2019-0-00421; in part by Korea Institute for Advancement of Technology (KIAT) under Grant P0012451; and in part by the Integrated Circuits Design Education Center (IDEC), South Korea, for EDA tool support.

**ABSTRACT** Conventional high-performance flip-flops suffer from large power consumption at the nominal supply region and unreliable operation in the low-voltage region. To overcome these drawbacks, this paper proposes conditional-bridging flip-flops (CBFFs) that can conditionally activate the shorting device in the sense-amplifier stage. There are two versions of the proposed flip-flop. The single-ended version (CBFF-S) adopts a single-ended latching stage to optimize in terms of power consumption and area. For applications requiring high speed with differential outputs, the speed-optimized differential version (CBFF-D) is also proposed. Since the shorting device is adaptively turned on only when it is necessary, the flip-flops have fully static operations with reduced switching power consumption. The conditional bridging technique can also help minimize the effective parasitic capacitance relevant to the shorting device by relieving the design burden of weakening the device, resulting in further power reduction. The technique also provides the complete separation of complementary precharge nodes in the sense-amplifier stage during input sampling, achieving a fast and reliable operation. To further reduce power consumption and latency, the latching stage is designed to have no glitches and signal fighting and to be driven by the first stage output without signal inversion. Moreover, the conditionally bridged sense-amplifier stage having a reliable pull-down of precharge nodes and the latching stage having a contention-free operation allow the flip-flops to provide stable operation down to the near-threshold voltage (NTV) region. The proposed flip-flops were designed in a 28-nm CMOS process, whose performance evaluation results indicated that the power consumption of CBFF-S is reduced by up to 56.2% compared to conventional single-ended flip-flops at 0.1 switching activity. The minimum DQ latency was also reduced by up to 33.6%. They also indicated that CBFF-D offers up to 33.8% less power at 0.1 switching activity and up to 24.1% lower minimum DQ latency than those of conventional differential flip-flops. The resulting power-delay product (PDP) of CBFFs was at least 27.8% less than those of conventional flip-flops. The Monte-Carlo simulation results considering the process, voltage, and temperature (PVT) variations indicated that CBFFs could operate reliably down to a supply voltage of 0.3 V.

**INDEX TERMS** Flip-flop, pulsed latch, sense amplifier, high performance, low power, low voltage.

#### I. INTRODUCTION

The demand for high-speed operation of electronic systems requires clock frequencies to be higher and timing specifications to be tighter [1]. To satisfy relevant timing requirements, high-speed circuits have been employed in these systems,

The associate editor coordinating the review of this manuscript and approving it for publication was Mohammad Hossein Moaiyeri<sup>®</sup>.

which results in large power consumption. At the same time, energy-efficient computations are getting more important as mobile electronic systems are widely used these days [2], [3]. To cope with energy constraints, low-power design techniques including voltage scaling to minimize switching power [4], [5] and(or) conditional operations to eliminate redundant power [6], [7] can be used, sacrificing the speed performance. In a situation where both speed and power are important, a deliberate trade-off between these metrics should be considered, which usually happens in high-performance mobile applications.

In synchronous digital integrated systems, flip-flops and latches play an important role in governing state transitions and synchronized data flow. The high-speed design of flipflops is important because they are usually in timing-critical signal paths determining maximum operating frequencies [8]. Moreover, since multi-million flip-flops can be used in a single processor [9], whose collective power consumption can reach up to 20-40% of the total power [9], [10], [11], the low-power design of flip-flops has become equally important. Therefore, minimizing both the power consumption and latency of flip-flops and latches, which are hard to attain at the same time, is a critical issue in high-speed mobile electronic system design.

Transmission-gate flip-flop (TGFF) [12] (FIGURE 1) traditionally used in synchronous digital ICs has a master-slave structure and can provide moderate data-to-output (DQ) latency and power consumption. The reliable operation of TGFF in the near-threshold voltage (NTV) region can also provide a benefit of power saving by voltage scaling. In TGFF, the separation of sampling and capturing input data in the master and slave stages, respectively, leaves room for reducing the DQ latency, which leads to so-called pulse-based techniques [13], [14], [15]. One example is transmission-gate pulsed latch (TGPL) [13] having a single latching stage with a pulse generator. Since the input data is transferred directly to the output during a narrow pulse period triggered by the clock edge, the master stage in TGFF can be removed, resulting in a reduced DQ latency. Despite high-speed operation, TGPL can suffer from large power consumption due to the circuit overhead for generating the brief pulse. Moreover, the variability of the pulse width due to process variations may result in an unreliable operation, especially in the NTV region. Although recent circuit techniques can resolve some issues related to pulse generation [14], [15], managing internally delayed local clocks for a pulsed operation makes the total power consumption larger than TGFF.

Another way to improve the speed of timing elements is to use the sense-amplifier-based flip-flop (SAFF) technique [17]. By adopting a differential precharged circuit in the first stage and a symmetric latch in the second stage, the flip-flop can fast sample and capture input data at the triggering clock edge, resulting in high-speed operation. Although further power and speed improvements can be achieved by modifying the structure of the latching stage, undesired signal fighting in the latching stage may cause a trade-off between power consumption and latency [18]. As another issue, the utilization of a weak shorting device to ensure static operation renders these flip-flops susceptible to increased variability in the NTV region. Although the problem can be addressed by detecting the transition of precharged nodes [19], overheads in terms of power and latency are high.

In this paper, sense amplifier-based flip-flops adopting a conditional bridging technique (CBFFs) are presented for



FIGURE 2. Pulsed latch-based FFs: (a) TGPL [13], (b) STPL [14], and (c) DCPL [15].

improving the speed and reducing power consumption. The proposed conditional bridging technique solves the issues related to the shorting device described above with no overheads in terms of power and speed. Two versions of flip-flops are proposed: the single-ended version (CBFF-S) for reducing power consumption and the differential version (CBFF-D) for lowering the latency. Along with low power and high speed, CBFFs have no contention issue, supporting a reliable operation in the NTV region. Section II presents state-of-the-art high-performance flip-flops and describes reasons for performance degradation in terms of power, speed, and reliability. Section III introduces the proposed flip-flops and explains how they can overcome the issues of conventional flip-flops. Performance evaluation results are presented in Section IV. Then, Section V concludes the paper.

#### **II. CONVENTIONAL HIGH-SPEED FLIP-FLOPS**

### A. PULSED LATCH-BASED FLIP-FLOPS

The structure of the transmission-gate pulsed latch (TGPL) [13] is illustrated in FIGURE 2(a), which is composed of a single latching stage and an explicit pulse generator. By eliminating the master stage and directly delivering input *DB* to internal node *QN* when brief pulse *PCK* is high, the DQ latency can be reduced. To prevent an unreliable data capture due to too narrow pulse width at the worst PVT corners, TGPL may use many inverters (five to eleven in [14] and [15]) in the pulse generator, resulting in increased power consumption and hold time. Moreover, simply enlarging the pulse width may not be a solution in the NTV region because of increased variability.

The pulse width variability issue of TGPL can be resolved by the self-timed pulsed latch (STPL) [14], illustrated in FIGURE 2(b). Using a dynamic XOR circuit, STPL generates conditional pulses, TRANSb and TRANS, by detecting whether Q transitions and becomes equal to D for letting QN finish capturing DB regardless of PVT conditions. Similar techniques have been used previously [20], [21]. Unlike previous designs, repeated precharge and discharge operations of the dynamic XOR circuit and undesired glitches on TRASNb caused by precharged DQEQb and CK rising edge increase the switching power consumption, as mentioned in [14]. Moreover, a functional failure can occur in the NTV region due to the dynamic XOR circuit driven by the output of a single-ended latch, as explained in [15]. Specifically, after QN captures a high DB at the rising clock edge, it makes QI low by driving  $I_4$  and discharges DQEQb by turning  $M_3$  on. If the discharge of *DQEQb* is so fast that *TRANSb* becomes high before QI becomes fully low,  $M_{16}$  may stay on, and QNcan become unstable. The situation can occur more easily when the flip-flop operates in the NTV region where the circuit variability is high.

Differential contention-free pulsed latch (DCPL) [15] illustrated in FIGURE 2(c) can resolve the reliability issues of STPL mentioned above by ensuring that the falling transitions of QN and QI occur earlier than rising transitions. However, as mentioned in [15], DCPL exhibits increased latency compared to STPL and larger power consumption compared to TGFF. An indirect pull-up of QN by QI turning  $M_{10}$  on may limit the pull-down speed of the output, requiring large-sized transistors along the pull-down path of QI to compensate for the latency degradation. Moreover, similar to STPL, the repeated toggling of DCK can increase the switching power consumption.

# **B. SENSE-AMPLIFIER-BASED FLIP-FLOPS**

The schematic diagrams of sense-amplifier-based flip-flops (SAFFs) are illustrated in FIGURE 3. In Nikolić's SAFF [17]



FIGURE 3. Sense-amplifier-based FFs: (a) Nikolić's SAFF [17], (b) Strollo's SAFF [18], and (c) SAFF-TCD. [19]

shown in FIGURE 3(a), after SB and RB are precharged high when *CK* is low, the differential sense-amplifier (SA) stage quickly samples the input data at the rising edge of CK. The speed bottleneck of the output SR latch in the original SAFF [16] is overcome by adopting a symmetric latch. However, unconditional transitions of one of SB and RB and associated inverted signal (S or R) each cycle regardless of the output change cause a large amount of power consumption. Along with large power consumption, the inverters in the latching stage may increase the overall latency, as mentioned in [18]. Moreover, the connection through the always-on shorting device  $(M_4)$  between nodes X and Y in the SA stage to maintain static operation should be weak to allow a fast and reliable sampling of input data. The weakening of  $M_4$ , which can be done either by increasing the transistor length or by connecting several minimum transistors in series between Xand Y, may result in large area and power overheads and(or) latency degradation.

Strollo's SAFF [18] shown in FIGURE 3(b) seeks to reduce the power consumption and DQ latency by directly driving the symmetric latch with only *SB* and *RB*, removing their inverted signals. Despite the power reduction by this modification, a signal fighting in the latching stage may result in increased latency. For example, assuming that a high input arrives when Q is low, during the period in which QB is being pulled down through  $M_{19}$ - $M_{21}$  after the rising clock edge,  $M_{16}$  stays on and fights against the pull-down of QB until SB is discharged to raise Q, which may result in a longer falling latency of the outputs. To reduce the contention, as mentioned in [18], the size of  $M_{10}$  and  $M_{18}$  can be increased to fast pull Q and QB up, which may result in larger power consumption by an increased load capacitance of SB and RB. Moreover, the problem related to the shorting device ( $M_4$ ) similar to Nikolić's SAFF still causes overheads in terms of power, latency, and area.

Sense-amplifier-based flip-flop with transition completion detection (SAFF-TCD) [19] shown in FIGURE 3(c) overcomes the problem related to the shorting device by detecting the transitions of SB and RB. When CK=0, both SB and RB are precharged high, letting transition completion signal TC be low and  $M_4$  be off. So, the pull-down evaluation of SB or RB is separately performed from each other, necessitating no sizing issue of  $M_4$ . After the pull-down of SB or RB, TC goes high to turn  $M_4$  on for static operation. Although SAFF-TCD can avoid the sizing issue of  $M_4$ , the power consumption and latency show limited improvements compared to Strollo's SAFF, as mentioned in [19]. That is, increased capacitive loads at SB and RB to drive the NAND gate and a large load capacitance of TC (note that TC drives the symmetric latch as well) result in increased power consumption. Note also that TC as well as SB or RB has transitions every cycle. Moreover, the NAND gate delay from the pull-down of RB or SB to the pull-up of TC increases the falling latency of Q and QB as the signal is used in the latching stage.

# III. PROPOSED SENSE-AMPLIFIER-BASED FLIP-FLOP

# A. CONDITIONAL BRIDGING

To resolve issues related to the shorting device  $(M_4)$  in conventional SAFFs in a more power-efficient manner, a conditional bridging technique is proposed. It is inspired by the fact that, in order to eliminate relevant redundant transitions completely, the shorting device should be turned on only when D changes after being captured by Q. In other cases, the device is better turned off to avoid the burden of using a weak device and to prevent the internal node (X or Y) on the opposite branch from being redundantly discharged. The SA stage with a circuit for supporting the conditional bridging is shown in FIGURE 4, where the output (CBG) of the circuit drives the shorting device. The proposed conditional bridge circuit makes  $M_4$  turned on during CK=1 only when D changes and becomes different from Q by monitoring the values of D, DB, SB, and RB. When CK is low, CBG is kept low regardless of the D value since SB and RB are precharged high, turning  $M_{13}$ ,  $M_{17}$ , and one of  $M_{12}$  and  $M_{16}$  on. At the rising clock edge, SB or RB discharges according to the value of D. If SB is assumed to be discharged, D = RB = 1 allows



FIGURE 4. The SA stage with conditional bridging circuit.



FIGURE 5. Single-ended version of the proposed flip-flop.

*CBG* to stay low by  $M_{16}$  and  $M_{17}$ . If *D* changes to low, *CBG* goes high by  $M_{14}$  and  $M_{15}$ , letting  $M_4$  be turned on to provide a DC path to the ground, ensuring static operation.

#### **B. STRUCTURE AND OPERATION**

Adopting the conditional bridging technique described above, two versions of the proposed conditional-bridging flip-flop (CBFF) are proposed. FIGURE 5 depicts the single-ended version (CBFF-S) composed of a senseamplifier stage  $(M_0 - M_9 \text{ and } I_0)$  with the conditional bridging circuit  $(M_{11}-M_{16})$  and a single-ended latching stage  $(M_{18}-M_{16})$  $M_{23}$ ,  $I_1$ , and  $I_2$ ). The conditional bridging circuit is modified to reduce the total number of transistors in the flip-flop. Specifically, the sources of  $M_{11}$  and  $M_{15}$  are directly driven by D and DB, respectively.  $M_{17}$  driven by RB in FIGURE 4 is merged with  $M_{21}$  in the latch in FIGURE 5. For letting the latching stage be optimized in terms of power consumption and device count, the glitch- and contention-free single-ended latch driven by the SA stage with no inversion is used, as shown on the right part in FIGURE 5. The pull-up and pull-down of QN after the rising clock edge are performed by  $M_{18}$  and  $M_{19}$ - $M_{21}$  utilizing only RB, respectively. The insertion of  $M_{20}$  driven by D is to eliminate glitches on QNdue to the precharged high value of RB at the start of the clock high-period. SB is used for driving the source of  $M_{22}$ to let QN be pulled down with no contention. The source of  $M_{23}$  is also connected to node A to let QN be pulled up without contention. Note that the latching stage of CBFF-S in FIGURE 5 is different from traditional pulsed latches [13], [14], [15] because no pulsed operation is involved.

CBFF-S has advantages in terms of power consumption, latency, and operational reliability. Letting the shorting device  $(M_4)$  be turned on only when it is necessary by the control of the conditional bridging circuit can totally eliminate redundant transitions on CBG. Since the transition of CBG happens when D changes after Q captures D during *CK*=1, adopting the conditional bridging circuit will result in a substantial power reduction, especially at low switching activities [22], [23]. The circuit also resolves the issue of weakening the shorting device so that the device can be sized minimum, resulting in further power reduction. Another reason for the reduced power consumption of CBFF-S comes from the fact that the opposite precharge node (X or Y) is discharged only when D changes, which is rare at a typical low input switching activity. As mentioned earlier, in conventional SAFFs, they are precharged and discharged every clock cycle. As for the speed, the reduced parasitic capacitance relevant to the shorting device whose size is the minimum allows the timing-critical signals like SB and RB to be pulled down faster, contributing to lowering the latency. A complete turnoff of the shorting device prevents the signal fighting between SB and RB during the input sampling, further improving the speed. To minimize the clock-to-output (CQ) latency, the latching stage is designed to be directly driven by RB without signal inversion and contention, as mentioned earlier. Completely eliminating the contention in the SA stage can also provide a reliable pull-down of precharged nodes at a low supply voltage region. Composed of the conditional-bridging SA stage and the contention-free latching stage allowing the output to reliably capture input data, CBFF-S can offer a stable operation down to the NTV region where variability is large.

The differential version (CBFF-D) of the proposed flip-flop is shown in FIGURE 6. Thanks to the symmetric differential structure, the conditional bridging circuit can avoid one more transistor ( $M_{13}$  in FIGURE 5) by letting it be merged with  $M_{30}$  in FIGURE 6. In the latching stage, for letting differential outputs Q and QB be directly driven by SB and RB, respectively, some transistors are added with the output inverter ( $I_2$  in FIGURE 5) removed. To improve the pull-down speed of outputs by preventing the fighting against pull-up keeper transistors  $M_{22}$  and  $M_{25}$ ,  $M_{24}$  driven by CK is inserted in series with them. Although a circuit structure similar to one in the single-ended version ( $M_{22}$  driven by SB in FIGURE 5) is effective at nominal supply voltage, our Monte-Carlo simulation result indicates that a reliability issue can happen at worst corners, so a method of inserting a clocked transistor  $(M_{24})$  is used. CBFF-D has almost all the features of its single-ended counterpart, sharing the advantages of reduced power and improved speed. Although the overall power consumption of CBFF-D may be somewhat larger than CBFF-S due to the larger load capacitance of CK for driving the differential latch, its power reduction feature is still valid among differential flip-flops when switching activity is low majorly due to the conditional bridging operation. The speed of CBFF-D will be faster than CBFF-S since the differential outputs of latch, Q and QB, are directly driven in parallel by the outputs of the SA stage.



FIGURE 6. Differential version of the proposed flip-flop.

In general, the setup and hold times can be obtained by finding the time points where Q cannot capture D by sweeping the input arrival time [14]. Then, the setup-hold window representing the minimum required input pulse width can be written as

$$T_{input\_width} = T_{setup} + T_{hold} \tag{1}$$

where  $T_{setup}$ ,  $T_{hold}$ , and  $T_{input\_width}$  are the setup and hold times and the minimum input pulse width, respectively. These timing parameter values of the proposed flip-flops are similar to the conventional SAFFs because they have similar SA stage structures for sampling the input data. The minimum DQ latency can be obtained by sweeping the input arrival time to find the minimum time difference between input arrival and the corresponding output change and can be written as

$$T_{DQ\_min} = min\{T_{D-CK}(t_a) + T_{CQ}(t_a)\}$$
(2)

where  $min\{a\}$  finds the minimum value among all possible values of *a*.  $T_{D-CK}(t_a)$  and  $T_{CQ}(t_a)$  are the time difference from a valid input change to the corresponding clock transition, and the CQ latency at the given input arrival time  $t_a$ , *respectively*. From (2), we can surmise that a lower minimum DQ latency of the proposed flip-flops can be expected by the reduction of the CQ latency resulting from the conditional bridging without contention and signal inversion. Regarding the power consumption of a flip-flop, the overall power consumption results from charging and discharging relevant node capacitances ( $P_{CH}$ ) and having the short-circuit current ( $P_{SC}$ ) and device leakage ( $P_{LK}$ ). So, the overall power consumption can be written as

$$P_{all} = P_{CH} + P_{SC} + P_{LK} \tag{3}$$

Since  $P_{CH}$  and  $P_{SC}$  result from signal transitions, the proposed conditional bridging technique eliminating the redundant transitions and reducing the parasitic capacitances at internal nodes will result in the overall power reduction of the proposed flip-flops.

#### **IV. PERFORMANCE EVALUATION**

To assess the performance, the proposed and conventional flip-flops were designed in a 28-nm CMOS process. The threshold voltages of p- and n-type devices are 0.26 V and 0.34 V, respectively. The size of the transistors in each flip-flop is optimized in terms of power consumption, latency,



FIGURE 7. Flip-flop simulation environment.

and layout area [24]. The physical design of flip-flops has been done, from which parasitic resistance and capacitance are extracted. Using the RC-extracted netlists of flip-flops, timing simulation is executed by Cadence Spectre. The simulation environment for evaluating the performance is shown in FIGURE 7 [25]. For estimating the timing parameters such as the setup/hold times and CQ/DQ latencies, each flip-flop under test is set to receive data D and clock CK from independent drivers  $I_0$  and  $I_1$ . Buffered by a pair of inverters from ideal signal  $D_{IN}$  and  $CK_{IN}$ , respectively, D and CK provide realistic signal transitions obtained by the capacitive loads of input and clock nodes. Each flip-flop is set to drive identical FO4 loads to estimate the speed performance. For estimating power components such as input, clock, and internal power consumptions, the supply rails to the last input and clock inverter stages and the flip-flop circuit itself are separated to be VDD\_D, VDD\_CK, and VDD\_INT, respectively, as shown in FIGURE 7. Then, the overall power consumption in (3) can be measured by summing the amounts of power consumed in these supply rails. The switching power consumption due to the FO4 load is not included. For obtaining the setup time, Dis set to transition to a different value from Q around CK's rising edge. To estimate the hold time, D is set to change its value after Q captures the valid input. Then, the transition timing point of D is swept with a 0.1-ps interval to estimate how much CQ latency is increased from its nominal value. To reflect the correlation between input arrival time and CQ latency, the setup and hold times are estimated by measuring the time when the CQ latency increases by 10% from the nominal value [28], [29]. To estimate the CQ latency, the clock is set to trigger when all internal nodes become stable after an input arrival, and then the propagation delay from the clock rising edge to output is measured. To measure the minimum DQ latency given in (2), after the input data is prepared as with a test condition for the setup time, the lowest data-tooutput latency is obtained by sweeping the input arrival time [26], [27]. The power components of each flip-flop are estimated by separately measuring the amounts of current drawn from supply rails, VDD\_D, VDD\_CK, and VDD\_INT, during data capture operations. For power comparison at various input switching activities, input data patterns having various switching activity values ( $\alpha$ ) at 1-GHz clock frequency are used. For example, for the switching activities of 0.2 and 1, data patterns "1111100000..." and "1010101010..." are used, respectively. To check the reliability of the flip-flop operation in terms of variations, the Monte-Carlo simulation is executed at all possible input, clock, and output transition cases. If a flip-flop fails to sample and capture the desired



1

FIGURE 8. Input switching activity versus power consumption for (a) single-ended and (b) differential flip-flops at TT corner, 1-V supply voltage, and 27°C temperature.

value at least once among the #5000 iterations, the flip-flop is considered not to operate reliably.

The power consumption of flip-flops at a typical process corner with 1-V supply voltage at room temperature is compared in FIGURE 8. The resulting input switching activity versus power consumption for flip-flops having single-ended and differential outputs are plotted in FIGURE 8(a) and (b), respectively. As expected, the overall power consumption of each flip-flop increases as the switching activity increases. As for the single-ended (FIGURE 8(a)), using multiple inverters for generating a narrow local pulse makes TGPL have the largest power consumption at all switching activity values. STPL and DCPL can reduce substantial power consumption compared to TGPL by adopting a dynamic XOR gate for pulsed operation, but their power consumption is still slightly larger than that of TGFF. By replacing the power-consuming pulse generation circuits in TGPL, STPL, and DCPL with the conditionally bridged SA stage, CBFF-S consumes at least 18.7% less power than conventional single-ended flip-flops for the entire switching activity range. As for the differential (FIGURE 8(b)), Nikolić's SAFF and SAFF-TCD consume large power due to inverters between stages and the NAND gate for driving the shorting device and latch, respectively. For Strollo's and Nikolić's SAFFs, the non-negligible parasitic capacitance of the always-on shorting devices having large lengths let their power consumption increase. Meanwhile, CBFF-D consumes lower power than conventional at



FIGURE 9. Timing performance of flip-flops: (a) CQ latency, (b) minimum DQ latency, and (c) setup and hold times at TT corner, 1-V supply voltage and 27°C temperature.

lower switching activities due to conditional bridging operation eliminating redundant transitions. In addition, it has less power consumption than Nikolic's SAFF [17] and SAFF-TCD [19] by having no inverters and the NAND gate between stages, respectively. These features allow CBFF-D to have up to 33.8% less power consumption at 0.1 switching activity. Note that the power consumption of CBFF-D is slightly larger than its single-ended counterpart because of the increased clock load. Instead, it provides a lower latency, as seen below.

The CQ latency, minimum DQ latency, and setup/hold time of flip-flops are compared in FIGURE 9(a), (b), and (c), respectively. As shown in FIGURE 9(a), the CQ latencies of CBFF-S and CBFF-D are the smallest among single-ended



FIGURE 10. Input switching activity versus power-delay product (PDP) of flip-flops at TT corner, 1-V supply voltage.and 27°C temperature.

| VDD(V) | TGFF | TGPL | STPL | DCPL | CBF F-S | Nikolić's<br>SAFF | Strollo's<br>SAFF | SAFF<br>-TCD | CBFF-D |
|--------|------|------|------|------|---------|-------------------|-------------------|--------------|--------|
| 1      | PASS | PASS | PASS | PASS | PASS    | PASS              | PASS              | PASS         | PASS   |
| 0.95   | PASS | PASS | PASS | PASS | PASS    | PASS              | PASS              | PASS         | PASS   |
| 0.9    | PASS | PASS | PASS | PASS | PASS    | PASS              | PASS              | PASS         | PASS   |
| 0.85   | PASS | PASS | PASS | PASS | PASS    | PASS              | PASS              | PASS         | PASS   |
| 0.8    | PASS | PASS | PASS | PASS | PASS    | PASS              | PASS              | PASS         | PASS   |
| 0.75   | PASS | FAIL | PASS | PASS | PASS    | FAIL              | FAIL              | PASS         | PASS   |
| 0.7    | PASS | FAIL | PASS | PASS | PASS    | FAIL              | FAIL              | PASS         | PASS   |
| 0.65   | PASS | FAIL | PASS | PASS | PASS    | FAIL              | FAIL              | PASS         | PASS   |
| 0.6    | PASS | FAIL | PASS | PASS | PASS    | FAIL              | FAIL              | PASS         | PASS   |
| 0.55   | PASS | FAIL | PASS | PASS | PASS    | FAIL              | FAIL              | PASS         | PASS   |
| 0.5    | PASS | FAIL | PASS | PASS | PASS    | FAIL              | FAIL              | PASS         | PASS   |
| 0.45   | PASS | FAIL | FAIL | PASS | PASS    | FAIL              | FAIL              | PASS         | PASS   |
| 0.4    | PASS | FAIL | FAIL | PASS | PASS    | FAIL              | FAIL              | PASS         | PASS   |
| 0.35   | PASS | FAIL | FAIL | PASS | PASS    | FAIL              | FAIL              | PASS         | PASS   |
| 0.3    | FAIL | FAIL | FAIL | FAIL | PASS    | FAIL              | FAIL              | PASS         | PASS   |
| 0.25   | FAIL | FAIL | FAIL | FAIL | FAIL    | FAIL              | FAIL              | FAIL         | FAIL   |

FIGURE 11. Operability of flip-flops at scaled supply voltages obtained by #5000 Monte-Carlo.simulations.

and differential flip-flops, respectively, contributed by a fast pull-down in the SA stage due to conditional bridging, no inverters between stages, and the contention-free operation of the latching stage, as explained earlier. As illustrated in FIGURE 9(b), reflecting the relationship between input arrival time and CQ latency shown in (2), the minimum DQ latency of CBFF-S is lower than conventional singleended flip-flops except for TGPL. TGPL has the lowest minimum DQ latency because it has a negative setup time at the expense of hold time and power consumption. However, recall that TGPL consumes the largest power among all flipflops. In the differential flip-flop category, Nikolić's and Strollo's SAFFs have longer latency due to inverter delay and signal contention, respectively. Moreover, the relatively large parasitic capacitance of the always-on shorting device cannot allow a fast evaluation in the SA stage. Although SAFF-TCD can cut the shorting device off using a NAND gate during input sampling, the logic gate is in the timing-critical path, causing an increased overall latency. The fast pull-down operation in the SA stage and the contention-free operation in the latching stage allow CBFF-D to achieve 13.5%, 18.1%, and 24.1% minimum DQ latency reductions from

|                                                   | TGPL<br>[13] | STPL<br>[14] | DCPL<br>[15] | CBFF-S<br>(prop.)     | Nikolić's<br>[17] | Strollo's<br>[18] | SAFF-<br>TCD<br>[19] | CBFF-D<br>(prop.) | pCNTFF<br>[30] | TNVFF<br>[31]     |
|---------------------------------------------------|--------------|--------------|--------------|-----------------------|-------------------|-------------------|----------------------|-------------------|----------------|-------------------|
| Technology                                        | 28-nm CMOS   |              |              |                       |                   |                   |                      | CNT               |                |                   |
| Туре                                              | Pulse-based  |              |              | Sense-amplifier-based |                   |                   |                      |                   | Pulsed         | Ternary           |
| Output type                                       | Single-ended |              |              | Differential          |                   |                   |                      |                   | Single-ended   |                   |
| # of transistors                                  | 32           | 27           | 26           | 27                    | 29                | 24                | 26                   | 29                | 21             | 16                |
| Supply voltage                                    | 1            | 1            | 1            | 1                     | 1                 | 1                 | 1                    | 1                 | 1              | 0.6               |
| Minimum VDD                                       | 0.8          | 0.5          | 0.34         | 0.3                   | 0.8               | 0.8               | 0.3                  | 0.3               | -              | -                 |
| Setup time <sup>1)</sup> (ps)                     | -21.32       | -8.17        | 1.30         | 6.13                  | -0.87             | 0.46              | 1.20                 | 6.37              | -80            | -                 |
| Hold time <sup>1)</sup> (ps)                      | 72.5         | 75.05        | 71.75        | 35.03                 | 48.06             | 47.43             | 41.31                | 39.83             | 118.2          | -                 |
| Setup-hold window <sup>1)</sup> (ps)              | 51.18        | 66.88        | 73.06        | 41.16                 | 47.19             | 47.89             | 42.51                | 46.20             | 38.20          | -                 |
| CQ latency <sup>1)</sup> (ps)                     | 75.55        | 79.16        | 75.91        | 60.71                 | 69.64             | 76.82             | 79.93                | 52.36             | -              | 39                |
| Min. DQ latency <sup>1)</sup> (ps)                | 61.28        | 74.63        | 84.74        | 71.99                 | 75.70             | 79.93             | 86.31                | 65.50             | 85.17          | -                 |
| Power consumption (µW)<br>@0.1 switching activity | 7.865        | 4.831        | 4.746        | 3.439                 | 5.199             | 4.292             | 5.717                | 3.781             | 0.267          |                   |
| Power consumption (µW)<br>@0.5 switching activity | 8.493        | 6.489        | 6.214        | 4.873                 | 6.131             | 5.973             | 6.614                | 5.606             | 0.297          | 3.82)             |
| Power consumption (µW)<br>@1.0 switching activity | 9.251        | 8.279        | 8.098        | 6.552                 | 7.247             | 7.904             | 7.673                | 7.812             | 0.352          |                   |
| PDP (µW×ps)<br>@0.1 switching activity            | 481.96       | 360.53       | 402.18       | 247.64                | 393.56            | 343.06            | 493.43               | 247.66            | 22.070         |                   |
| PDP (µW×ps)<br>@0.5 switching activity            | 520.45       | 484.27       | 526.57       | 350.81                | 464.08            | 477.38            | 570.85               | 367.19            | 25.029         | 148 <sup>2)</sup> |
| PDP (µW×ps)<br>@1.0 switching activity            | 566.84       | 617.86       | 686.22       | 471.68                | 548.60            | 631.77            | 662.26               | 511.69            | 29.098         |                   |
| Layout area (µm <sup>2</sup> )                    | 14.19        | 12.46        | 12.17        | 12.17                 | 12.95             | 11.59             | 11.88                | 12.46             | -              | -                 |

#### TABLE 1. Performance comparison of FLIP-FLOPS.

1) Worst between rising and falling inputs, 2) Switching activity is not available

Nikolić's SAFF, Strollo's SAFF, and SAFF-TCD, respectively. The worst-case setup and hold times of each flip-flop are compared in FIGURE 9(c). For the single-ended flipflops, TGPL, STPL, and DCPL show negative or near-zero setup time and large hold time compared to TGFF because of a pulsed operation with no master stage. Thanks to the SA stage fast sampling the input at the rising clock edge, both CBFFs exhibit a very small positive setup time. Moreover, unlike pulsed flip-flops that require the input to remain unchanged until the end of the pulse, resulting in a large hold time, CBFFs have shorter hold times by adopting the SA stage, where the input data can change right after input sampling. By using an identical SA stage for sampling the input, all differential flip-flops have similar setup and hold time values, resulting in a similar setup-hold window given in (1).

The power-delay product (PDP) values obtained by the power and latency data in FIGURE 8 and FIGURE 9(b) for various input switching activities are compared in FIGURE 10. The PDP of CBFF-S and CBFF-D beats all other flip-flops in single-ended (hollow) and differential (filled) categories, respectively, resulting in at least 31.3% and 27.8% improvements at  $\alpha = 0.1$ , respectively. At the maximum

input switching activity, the improvements are still 16.7% and 6.7%, respectively.

To check the operability of flip-flops at scaled supply voltages, #5000 Monte-Carlo simulations considering PVT variation have been performed, whose result is presented in FIGURE 11. As with TGFF, DCPL, and SAFF-TCD, CBFF-S and CBFF-D can operate well at scaled supply voltages down to the NTV region. Since TGPL has a pulse-width variation issue and Nikolić's and Strollo's SAFFs have the shorting device sizing issue, their operating voltage is limited to 0.8 V. Although STPL can resolve the issue, the dynamic XOR circuit receiving the latch output is found to cause a malfunction under 0.5-V supply voltage. DCPL solves the issue and is capable of functioning at a supply voltage as low as 0.35 V. Note that CBFF-S, CBFF-D, and SAFF-TCD are capable of functioning at 0.3V by addressing the shorting device's sizing issue.

Table 1 summarizes the overall performance of flip-flops. Note that TGPL, STPL, Nikolić's SAFF, and Strollo's SAFF are unable to operate in the NTV region. Compared to flip-flops capable of functioning in the NTV region such as DCPL and SAFF-TCD, the power consumption and minimum DQ latency of the proposed CBFFs are substantially



**FIGURE 12.** Minimum DQ latency versus power consumption of single-ended (hollow) and differential (filled) flip-flops with 0.1 input switching activity at TT corner, 1-V supply voltage, and 27°C temperature.

reduced. Resulting from a lower DQ latency and smaller power consumption, the PDP values of CBFFs are the minimum among flip-flops in each category. The layout areas of the proposed CBFFs are comparable to or slightly larger than conventional flip-flops. Flip-flops can be designed in an exotic technology for further performance enhancement. Examples are pulse-triggered CNTFET flip-flip (pCNTFF) [30] and ternary nonvolatile flip-flop (TNVFF) [31] (right two columns in Table 1), which show better performance than those in CMOS technology. However, the mass production of highly integrated digital circuits in such technology is still hard and costly to achieve due to immature fabrication technology. Since the CMOS process can provide a cost-effective implementation of a large-scale synchronous system where flip-flops play a major role, the performance innovation of flip-flops designed in a CMOS technology is important and attractive.

FIGURE 12 shows the minimum DQ latency versus power consumption among flip-flops at 0.1 switching activity. Each curved line in the figure represents the points where PDP values are identical to each other. Because of the high-speed feature, all conventional single-ended and differential flipflops are located on the left of TGFF. The large power consumption of these flip-flops locates them above TGFF, limiting their PDP improvement. On the other hand, CBFF-S achieves noticeable power reduction compared to TGFF mainly by the conditional bridging technique. The resulting power reduction allows CBFF-S to be located far below TGFF, achieving a significant PDP improvement. Similarly, CBFF-D stands out among differential flip-flops due to its reduction in both power and latency, letting it be located at the lowest PDP point.

# **V. CONCLUSION**

The paper presents low-power, high-performance, and reliable sense-amplifier-based flip-flops. The proposed conditional bridging adaptively activates the shorting device to guarantee static operation with no redundant transitions. So, the shorting device can be sized minimum, reducing the effective parasitic capacitance along the timing-critical signal paths. The further reduction of power consumption and latency can be achieved by directly driving the latching stage without glitches and contention. To optimize in terms of power consumption and area, the single-ended version of the proposed flip-flop adopts a modified latching stage. To optimize in terms of speed and to support differential operation, the differential version with a differential latching stage is also presented. On top of improvements in terms of power and latency, the proposed flip-flops can reliably operate down to the NTV region. The performance evaluation using a 28-nm CMOS process indicates that the proposed flip-flops are good candidates for use in low-power high-speed digital applications.

#### REFERENCES

- [1] F. S. Ayatollahi, M. B. Ghaznavi-Ghoushchi, N. Mohammadzadeh, and S. F. Ghamkhari, "AMPS: An automated mesochronous pipeline scheduler and design space explorer for high performance digital circuits," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 69, no. 4, pp. 1681–1692, Apr. 2022, doi: 10.1109/TCSI.2021.3138139.
- [2] Y. D. Kim, W. Jeong, L. Jung, D. Shin, J. G. Song, J. Song, H. Kwon, J. Lee, J. Jung, M. Kang, J. Jeong, Y. Kwon, and N. H. Seong, "A 7 nm highperformance and energy-efficient mobile application processor with tricluster CPUs and a sparsity-aware NPU," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2020, pp. 48–50, doi: 10.1109/ISSCC19947.2020.9062907.
- [3] J. P. Cerqueira, T. J. Repetti, Y. Pu, S. Priyadarshi, M. A. Kim, and M. Seok, "Catena: A near-threshold, sub-0.4-mW, 16-core programmable spatial array accelerator for the ultralow-power mobile and embedded Internet of Things," *IEEE J. Solid-State Circuits*, vol. 55, no. 8, pp. 2270–2284, Aug. 2020, doi: 10.1109/JSSC.2020.2978137.
- [4] S. Jain et al., "A 280 mV-to-1.2 V wide-operating-range IA-32 processor in 32 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2012, pp. 66–68, doi: 10.1109/ISSCC.2012.6176932.
- [5] V. De, S. Vangal, and R. Krishnamurthy, "Near threshold voltage (NTV) computing: Computing in the dark silicon era," *IEEE Des. Test.*, vol. 34, no. 2, pp. 24–30, Apr. 2017, doi: 10.1109/MDAT.2016.2573593.
- [6] C.-R. Huang and L.-Y. Chiou, "An energy-efficient conditional biasing write assist with built-in time-based write-margin-tracking for low-voltage SRAM," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 29, no. 8, pp. 1586–1590, Aug. 2021, doi: 10.1109/TVLSI.2021.3084041.
- [7] Y.-W. Kim, J.-S. Kim, J.-W. Kim, and B.-S. Kong, "CMOS differential logic family with conditional operation for low-power application," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 55, no. 5, pp. 437–441, May 2008, doi: 10.1109/TCSII.2007.914414.
- [8] C. Giacomotto, N. Nedovic, and V. G. Oklobdzija, "The effect of the system specification on the optimal selection of clocked storage elements," *IEEE J. Solid-State Circuits*, vol. 42, no. 6, pp. 1392–1404, Jun. 2007, doi: 10.1109/JSSC.2007.896516.
- [9] J. L. Shin, R. Golla, H. Li, S. Dash, Y. Choi, A. Smith, H. Sathianathan, M. Joshi, H. Park, M. Elgebaly, S. Turullols, S. Kim, R. Masleid, G. K. Konstadinidis, M. J. Doherty, G. Grohoski, and C. McAllister, "The next generation 64b SPARC core in a t4 SoC processor," *IEEE J. Solid-State Circuits*, vol. 48, no. 1, pp. 82–90, Jan. 2013, doi: 10.1109/JSSC.2012.2223036.
- [10] H. McIntyre, S. Arekapudi, E. Busta, T. Fischer, M. Golden, A. Horiuchi, T. Meneghini, S. Naffziger, and J. Vinh, "Design of the two-core x86-64 AMD 'Bulldozer' module in 32 nm SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 1, pp. 164–176, Jan. 2012, doi: 10.1109/JSSC.2011.2167823.
- [11] D. Pan, C. Ma, L. Cheng, and H. Min, "A highly efficient conditional feedthrough pulsed flip-flop for high-speed applications," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 28, no. 1, pp. 243–251, Jan. 2020, doi: 10.1109/TVLSI.2019.2934899.

- [12] M. R. Jan, C. Anantha, and N. Borivoje, *Digital Integrated Circuits—* A Design Perspective. Upper Saddle River, NJ, USA: Prentice-Hall, 2002.
- [13] S. D. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. J. Sullivan, and T. Grutkowski, "The implementation of the Itanium 2 microprocessor," *IEEE J. Solid-State Circuits*, vol. 37, no. 11, pp. 1448–1460, Nov. 2002, doi: 10.1109/JSSC.2002.803943.
- [14] H. Jeong, J. Park, S. C. Song, and S.-O. Jung, "Self-timed pulsed latch for low-voltage operation with reduced hold time," *IEEE J. Solid-State Circuits*, vol. 54, no. 8, pp. 2304–2315, Aug. 2019, doi: 10.1109/JSSC.2019.2907774.
- [15] G. Shin, M. Jeong, D. Seo, S. Han, and Y. Lee, "A variation-tolerant differential contention-free pulsed latch with wide voltage scalability," in *Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC)*, Taipei, Taiwan, Nov. 2022, pp. 1–3, doi: 10.1109/A-SSCC56115.2022.9980703.
- [16] J. Montanaro, R. T. Witek, K. Anne, A. J. Black, E. M. Cooper, D. W. Dobberpuhl, P. M. Donahue, J. Eno, W. Hoeppner, D. Kruckemyer, T. H. Lee, P. C. M. Lin, L. Madden, D. Murray, M. H. Pearce, S. Santhanam, K. J. Snyder, R. Stehpany, and S. C. Thierauf, "A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor," *IEEE J. Solid-State Circuits*, vol. 31, no. 11, pp. 1703–1714, Nov. 1996, doi: 10.1109/JSSC.1996.542315.
- [17] B. Nikolic, V. G. Oklobdzija, V. Stojanovic, W. Jia, J. K.-S. Chiu, and M. M.-T. Leung, "Improved sense-amplifier-based flip-flop: Design and measurements," *IEEE J. Solid-State Circuits*, vol. 35, no. 6, pp. 876–884, Jun. 2000, doi: 10.1109/4.845191.
- [18] A. G. M. Strollo, D. De Caro, E. Napoli, and N. Petra, "A novel high-speed sense-amplifier-based flip-flop," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 13, no. 11, pp. 1266–1274, Nov. 2005, doi: 10.1109/TVLSI.2005.859586.
- [19] H. Jeong, T. W. Oh, S. C. Song, and S.-O. Jung, "Sense-amplifier-based flip-flop with transition completion detection for low-voltage operation," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 26, no. 4, pp. 609–620, Apr. 2018, doi: 10.1109/TVLSI.2017.2777788.
- [20] F. Klass, "Semi-dynamic and dynamic flip-flops with embedded logic," in *Symp. VLSI Circuits. Dig. Tech. Papers*, Honolulu, HI, USA, 1998, pp. 108–109, doi: 10.1109/VLSIC.1998.688018.
- [21] S.-D. Shin, H. Choi, and B.-S. Kong, "Variable sampling window flipflop for low-power application," in *Proc. Int. Symp. Circuits Syst. (ISCAS)*, Bangkok, Thailand, 2003, p. 5, doi: 10.1109/ISCAS.2003.1206247.
- [22] C. K. Teh, T. Fujita, H. Hara, and M. Hamada, "A 77% energy-saving 22-transistor single-phase-clocking D-flip-flop with adaptive-coupling configuration in 40nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2011, pp. 338–340, doi: 10.1109/ISSCC.2011.5746344.
- [23] G. Shin, E. Lee, J. Lee, Y. Lee, and Y. Lee, "An ultra-low-power fullystatic contention-free flip-flop with complete redundant clock transition and transistor elimination," *IEEE J. Solid-State Circuits*, vol. 56, no. 10, pp. 3039–3048, Oct. 2021, doi: 10.1109/JSSC.2021.3077074.
- [24] M. Alioto, E. Consoli, and G. Palumbo, "Analysis and comparison in the energy-delay-area domain of nanometer CMOS flip-flops: Part I—Methodology and design strategies," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 19, no. 5, pp. 725–736, May 2011, doi: 10.1109/TVLSI.2010.2041376.
- [25] Y. Lee, G. Shin, and Y. Lee, "A fully static true-single-phase-clocked dual-edge-triggered flip-flop for near-threshold voltage operation in IoT applications," *IEEE Access*, vol. 8, pp. 40232–40245, 2020, doi: 10.1109/ACCESS.2020.2976773.
- [26] V. Stojanovic and V. G. Oklobdzija, "Comparative analysis of masterslave latches and flip-flops for high-performance and low-power systems," *IEEE J. Solid-State Circuits*, vol. 34, no. 4, pp. 536–548, Apr. 1999, doi: 10.1109/4.753687.
- [27] P. Zhao, T. K. Darwish, and M. A. Bayoumi, "High-performance and low-power conditional discharge flip-flop," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 12, no. 5, pp. 477–484, May 2004, doi: 10.1109/TVLSI.2004.826192.

- [28] T. Okumura and M. Hashimoto, "Setup time, hold time and clock-to-Q delay computation under dynamic supply noise," in *Proc. IEEE Custom Integr. Circuits Conf.*, San Jose, CA, USA, Sep. 2010, pp. 1–4, doi: 10.1109/CICC.2010.5617426.
- [29] D. Markovic, B. Nikolic, and R. W. Brodersen, "Analysis and design of low-energy flip-flops," in *Proc. Int. Symp. Low Power Electron. Design (ISLPED)*, Huntington Beach, CA, USA, 2001, pp. 52–55, doi: 10.1109/LPE.2001.945371.
- [30] M. H. Moaiyeri, M. K. Q. Jooq, A. Al-Shidaifat, and H. Song, "Breaking the limits in ternary logic: An ultra-efficient auto-backup/restore nonvolatile ternary flip-flop using negative capacitance CNTFET technology," *IEEE Access*, vol. 9, pp. 132641–132651, 2021, doi: 10.1109/ACCESS.2021.3114408.
- [31] A. Karimi, A. Rezai, and M. M. Hajhashemkhani, "Ultra-low power pulsetriggered CNTFET-based flip-flop," *IEEE Trans. Nanotechnol.*, vol. 18, pp. 756–761, 2019, doi: 10.1109/TNANO.2019.2929233.



**BOMIN JOO** received the B.S. degree in electronics and electrical engineering from Sungkyunkwan University, Suwon, South Korea, in 2015, where he is currently pursuing the Ph.D. degree with the Department of Electrical and Computer Engineering.

His current research interests include the design of analog integrated circuits, clocked storage circuits, and hardware-friendly neural networks.



**BAI-SUN KONG** (Member, IEEE) received the Ph.D. degree in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejon, South Korea, in 1996.

Since 1996, he has been with LG Semicon Company (currently SK Hynix Corporation), Seoul, South Korea, as a Senior Design Engineer, where he worked on the design of high-density and high-bandwidth DRAMs. In 2000, he joined as a

Faculty Member of Korea Aerospace University, Goyang, South Korea, and an Assistant Professor with the School of Electronics Telecommunication and Computer Engineering. In 2005, he moved to Sungkyunkwan University, Suwon, South Korea, where he is currently a Professor with the College of Information and Communication Engineering. From 2018 to 2019, he was with the Center for Nanotechnology, NASA Ames Research Center, CA, USA, and the Nanoelectronic Integrated Systems Laboratory, University of California at Santa Cruz, CA, USA, where he was involved a collaborative research on neuromorphic circuit and system design. His research interests include high-speed low-power processor and memory design, highbandwidth wireline transceiver design, fast-transient high-efficiency DC/DC converter design, and neuromorphic IC design for bio-inspired applications.