Received 13 September 2017; revised 22 January 2018; accepted 26 February 2018. Date of publication 5 March 2018; date of current version 16 April 2018.

Digital Object Identifier 10.1109/JXCDC.2018.2812242

# Improving Energy Efficiency of Low-Voltage Logic by Technology-Driven Design

KAUSHIK VAIDYANATHAN<sup>®</sup><sup>1</sup>, (Member, IEEE), DANIEL H. MORRIS<sup>2</sup>, (Member, IEEE), UYGAR E. AVCI<sup>2</sup>, (Member, IEEE), HUICHU LIU<sup>1</sup>, (Member, IEEE), TANAY KARNIK<sup>2</sup>, (Fellow, IEEE), HONG WANG<sup>1</sup>, AND IAN A. YOUNG<sup>2</sup>, (Fellow, IEEE)

> <sup>1</sup> Intel Corporation, Santa Clara, CA 95054 USA <sup>2</sup> Intel Corporation, Hillsboro, OR 97124 USA CORRESPONDING AUTHOR: K. VAIDYANATHAN (kaushik.vaidyanathan@intel.com)

**ABSTRACT** Reducing  $V_{DD}$  while keeping leakage current low is critical for minimizing energy consumption for systems across the compute-continuum, especially in IoT. Emerging low- $V_{DD}$  logic devices such as tunnel FET (TFETs) offer better low- $V_{DD}$  performance than conventional MOSFETs but lack performance at high- $V_{DD}$ . To assess TFETs, and other transistors optimized for low- $V_{DD}$  operation, we propose a technology-driven design framework. Our framework adapts standard industry flows and tools to optimize the design of logic blocks with full consideration of the tradeoffs possible with the future generation device I-V characteristics and interconnect. Proposed approach optimizes design implementation to improve projected power-performance and area, as well as, expected accuracy of the projections. TFET improves energy efficiency by 2.35x and 1.35x over MOSFET at low- and high-performance points, respectively, for industrial design test cases. Accuracy of the energy efficiency and performance projections is improved by 71% and 40%, respectively.

**INDEX TERMS** Cell library, circuit-device interaction, interconnect, Internet of Things (IoT), synthesis, tunnel FETs (TFETs).

#### I. INTRODUCTION

OW-VOLTAGE logic transistors, such as tunnel FETs (TFETs) are best-suited to augment MOSFETs because of their steep subthreshold slope (SS) [1], [2]. With SS steeper than MOSFETs', such emerging logic devices enable higher performance and reduced energy consumption at low supply voltage [9]. Several low-voltage emerging logic devices, however, have limited drive current at higher supply voltages. For example, the GaSb-InAs TFET is expected to be slower than MOSFET when operated above 0.5 V [2]. Such inherent power/performance tradeoffs at the device-level have to be carefully considered as we optimize logic block implementation. To this end, prior works [4], [5], [7] have compared designs implemented with emerging logic devices, but they suffer from two major shortcomings that introduce severe inaccuracies in power-performance projections. First, prior works fail to optimize a design's implementation with full consideration of an emerging device's unique I-V characteristics. Second, prior works use simple, nonrigorous designoptimization methods to estimate power-performance. For instance, prior research did not use logic synthesis, used small cell libraries, and rudimentary wire RC models. As a result, prior research under predicted TFET energy efficiency and performance compared to MOSFET by 71% and 40%, respectively, for Internet of Things (IoT) system-on-chip (SoC) subblocks (Fig. 15).

To address these shortcomings, we propose technologydriven design approach. Proposed approach enables us to optimize logic paths in a design, cognizant of an emerging device's power-performance characteristics. Furthermore, it leverages cutting edge high productivity tools and methods to synthesize technology-optimal logic blocks. Specifically, technology-driven design uses comprehensive technologyoptimized logic libraries, product-like design flows for logic synthesis and a physical implementation environment with interconnect *RC* models (Fig. 1).

Logic blocks optimized with product-like design flows considering an emerging device's power-performance characteristics, i.e., this paper, has superior energy efficiency compared to logic blocks generated by merely swapping MOSFET devices with emerging devices, i.e., "swap-andsimulate" [4], [5]. Our experimental results suggest technology-driven design, compared to "swap-and-simulate," improves performance/watt by 22% at low performance, and



FIGURE 1. Technology-driven design—highlighting steps that have been considered rigorously in low-voltage logic evaluations in this paper.

frequency by 10% at higher performance points for IoT SoC subblocks (Fig. 16). Furthermore, technology-specific design-optimization improves frequency projection accuracy over prior works [5] by 76%—underscoring the significance of the proposed approach in emerging device benchmarking [19].

The key contributions of this paper is as follows.

- A novel technology-driven design approach is described to accurately benchmark emerging lowvoltage logic devices—spanning from transistor/wire *RC* model through synthesis and physical design. Methods to fully co-optimize device *I-V* characteristics and logic design are presented in detail.
- 2) Efficacy of the proposed approach over prior works is demonstrated by thoroughly evaluating the powerperformance tradeoffs associated with low-voltage logic devices and interconnects (as applicable to ITRS [1] 2018 node) using industrial models, flows, and designs.

# **II. RELATED WORK**

TFET is a leading low-voltage steep-SS emerging logic device candidate. Designs implemented with TFET have been investigated by prior works [4], [7], [20]. Swaminathan et al. [4], [7] use circuits constructed from simple logic gates to assess TFET/MOSFET powerperformance-area (PPA) tradeoffs. They use "swap-andsimulate" method wherein, MOSFETs in a logic path are replaced by TFETs, to estimate TFET's impact on the block's power-performance [4], [5]. While being quick, "swap-and-simulate" technique precludes any technologyspecific design-optimization, wherein circuit implementation for a specific logic function is optimized to best leverage inherent device characteristics. Sharma et al. [5] evaluate TFET/MOSFET tradeoffs for synthesized design blocks. However, use of nonrigorous design methods, such as wire-load models for interconnects, makes their powerperformance estimates inaccurate. We illustrate the significance of technology-specific design optimizations with four examples.

1) Synthesis with technology-specific circuit libraries: Synthesizing designs with a TFET cell library results in >76% higher performance compared to a design that just swaps MOSFETs with TFETs, as in swap-and-simulate [4] (Fig. 9).

2) Size of logic cell library: Limiting synthesis to extremely small cell libraries with <10 logic cells (as in [5])

can result in >150% degradation in performance compared to using comprehensive libraries (containing cells with different functions and drive strengths), making TFET/MOSFET design comparisons synthesized from small cell libraries unrealistic

#### (Figs. 4 and 15).

3) **Technology-optimized cell library:** Logic circuits in a cell library can be optimized to leverage a device's unique characteristics. Logic libraries containing circuits optimized for TFETs [3], [6], such as, flip-flops, multiplexers, and full-adders, can further improve design performance and energy (Fig. 7).

4) Wire RC considerations: Power-performance benchmarking has to comprehend the interaction between an emerging logic device and scaled interconnect. However, prior works use wire-load models (prelayout wire models based on fan-out and gate count) to capture interconnect RC in logic synthesis resulting in inaccurate projections (lower frequency, higher area). Wire-load models fail to capture the physical notion of interconnect with high fidelity due to the lack of physical placement information (Fig. 17). Overall, there is dearth of prior works that explore the interplay between emerging device I-V characteristics and logic design implementation (i.e., cell library design, logic synthesis, and physical synthesis). We propose technology-driven design to co-optimize logic implementation for emerging devices with full consideration of their unique power/performance characteristics using industrial tools and methods.

#### **III. TECHNOLOGY-DRIVEN DESIGN**

SoCs are designed to achieve power/performance targets by optimizing the logic implementation to the underlying device technology. Logic blocks in modern SoCs are predominantly synthesized from circuit libraries using high productivity design-automation tools and flows. However, reusing such product-design methods as-is for early technology benchmarking proves to be very time-consuming and inefficient. To bridge the gap between low-effort low fidelity approaches such as "swap-and-simulate" and a higheffort high fidelity approach akin to product development, we propose the technology-driven design approach. Proposed framework adapts key steps from an industrial design flow while reducing complexity to efficiently analyze low-voltage logic implementation tradeoffs (Fig. 1).

In this section, we describe the proposed approach and illustrate the significance of each step with three SoC design blocks built with low-voltage MOSFET and TFET.

- 1) *CASE-A-3K*, is a simple datapath circuit with 3K gates, similar to those used by prior works [4].
- 2) *CASE-B-110K*, is an industrial low power mediumsized logic-dominated IoT SoC block with 110K+ gates.
- 3) *CASE-C-1.5M*, is an industrial high throughput interconnect-dominated large block with 1.5M+ gates and embedded memories.

Designs with different characteristics have been considered to: 1) show interaction between design parameters (size, power, performance, area) and emerging device characteristics (I-V, C-V) and 2) highlight common pitfalls with drawing conclusions based on a single, and often simple, design (such as, *CASE-A-3K*).

# A. DEVICE MODELING AND CIRCUIT SIMULATION

Device modeling and circuit simulation form the cornerstone of technology-driven design. In this paper, models for TFET and MOSFET devices as applicable to the ITRS 2018 node are used [2]. A 4.7-nm square nanowire with 13-nm gate length [1] and an equivalent oxide thickness (EOT) of 0.8 nm was used for both MOSFET and TFET. For the MOSFET, the conventional silicon material is chosen. n-TFET uses GaSb as p+ source, intrinsic InAs as channel and doped n+ InAs as drain, to enable highest possible drive current [2]. MOSFET is modeled using drift-diffusion simulations and TFET is modeled with atomistic simulations. Device parameters and electrical characteristics are based on those described under the 2018 node (M1 half pitch = 15 nm) in the 2011 issue of the ITRS [2]. Fringing and gate-to-contact capacitances suitable for the technology node are subsequently added to create circuit models for both MOSFET and TFET.

With nonidentical source and drain doping, the source and drain terminals of a TFET are not interchangeable as they are in a MOSFET. With nonoptimized source doping, the III-V p-TFET may be limited to a SS of just 60 mV/dec. However, with optimized lower P-TFET source doping, it is possible to achieve steep-SS without sacrificing substantial  $I_{\rm on}$  current [9]. Thus, n-TFET and p-TFET  $I_{\rm ds}-V_{\rm gs}$  symmetry (i.e., the same or similar SS) was used for the purposed of this paper.

A table-based model implemented in Verilog-A was used for the Cadence Spectre simulator [15] to enable circuit simulation using the device electrical characteristics predicted by the atomistic simulation [3], [9] (Fig. 2). TFETs have significantly lower delay compared to MOSFETs at lower voltages (<0.5 V). Circuit simulation of a 35-stage ring oscillator with RC load and 0.2% activity factor illustrates supply voltages where TFET may be a preferred alternative to MOSFET (Fig. 3). For a given performance and power target, total energy is minimized by optimizing  $V_{DD}$  and  $I_{off}$ (by tuning work function as shown in [13]). The benefits of steep-SS may be seen in either leakage energy (TFET with lower  $I_{off}$ ) or in dynamic energy (with lower  $V_{DD}$ ). We tradeoff slight increase in leakage energy at 0.32 V for TFETs for significant savings in dynamic energy to minimize total energy. Compared to MOSFET circuits with a supply of 0.45 V, TFET circuits with a supply of 0.32 V consume half the energy yet have similar performance (within 10%). Hence, unless explicitly mentioned otherwise our TFET designs are optimized at 0.32 V and MOSFET at 0.45 Visofrequency with 2x energy savings.



FIGURE 2. Simulated TFET characteristics [2]. (a)  $I_{DS}-V_{GS}$  at  $V_{DS} = 0.45$  V. (b) Asymmetric  $I_{DS}-V_{DS}$  results in  $I_{DS} < 1$  nA at low negative  $V_{DS}$ .



FIGURE 3. Power-performance tradeoffs for TFETs/MOSFETs, simulated based on [2]. Here, circuit is 35-stage FO4 inverter with wire *RC* load and 0.2% activity factor and leakage.

The structural difference of the MOSFET device versus the TFET device has implications for cell library layout. The MOSFET device's source/drain has identical composition. For example, the source and drain of nMOS may both be n-doped silicon. Thus, the source and drain of two serially connected nMOSs may directly abut and share a contact. In contrast, the TFET source/drain has different materials and doping types. The source of n-TFET may be p+ doped GaSb and drain may be n+ doped InAs. Thus, the source and drain of two serially connected n-TFETs may not directly abut and share a contact. This asymmetry of source/drain doping results in reduced layout density. The resulting area increase, however, depends on the specific design rules of the process technology.

#### B. CELL LIBRARY DESIGN AND CHARACTERIZATION

Any logic block described in a hardware-description language can be mapped to logic gates in a cell library with synthesis. Logic cell library design and their characterization for power-performance-area (PPA) using transistor models is an important step in technology-driven design. We discuss several cell library considerations in detail.

1) **Cell Library Size:** Industrial class logic libraries have 1000s of cells to meet diverse power, performance, and area specifications [10]. However, prior works build TFET/MOSFET designs using small cell libraries containing a few basic cells. Limiting library contents to namely, nand, nor, INV, and FLOP as in [5] leads to highly suboptimal designs. To illustrate the importance of library size,



FIGURE 4. Small library (*small\_lib*) size has detrimental effect on a design's PPA ( $1.71 \times$  higher energy). Lack of key cells in library increases logic depth by  $4.96 \times$  and indirectly degrades design's PPA. Design synthesized with *medium\_lib* has PPA similar to large library. Design is *CASE-B-110K*.



FIGURE 5. Logic cell library with more cell functions (medium\_lib\_func) is  $1.35 \times$  more energy efficient than a library with fewer cell functions (medium\_lib\_drv). Design used is CASE-B-110K.

we synthesized *CASE-B-110K* using cell libraries with different number of cells, namely *small\_lib* (as in [5]), *medium\_lib* (50+ cells), and *large\_lib* (500+ cells). *Small\_lib* contains the most basic cells (nand, nor, INV, and FLOP), *medium\_lib* contains cells in *medium\_lib\_func* in Fig. 5, and *large\_lib* contains cells with over 110 cell functions with various drive strengths. Our results indicate that an industrial design synthesized with a *large\_lib* is far superior to *small\_lib*, consuming 2.61x lesser delay, 1.46x lower area and 1.71x lower energy-per-op (Fig. 4). This result is expected as logic synthesis relies on a rich library containing cells of different functional types and drive strengths (e.g., cell function type is aoi22; drive strengths are aoi22x1, aoi22x2, and so on). Without such a large diverse library, critical path logic depth (#logic stages between flip-flops) increases by 4.96x.

2) **Cell Library Contents:** Both the size of the library and selection of library cells are critical. To illustrate this point, we created two medium-sized cell libraries, one with few drive strengths but many cell functions (*medium\_lib\_func*), and another with fewer cell functions but many drive strengths per cell function (*medium\_lib\_drv*) (Fig. 5). Both cell libraries have several drive strengths of inverters and buffers. Results from synthesized designs with these two libraries illustrate *medium\_lib\_func* is 1.35x more energy efficient than *medium\_lib\_drv*(Fig. 5). To select the composition of *large\_lib*, we analyzed industrial designs and cell libraries and created a library that has 500+ cells with 110+ different cell functions. They contain five different classes: basic combinational (e.g., nand), complex combina-

tional (e.g., fadd), sequential (e.g., dff), clock (e.g., clkgate or cg), and repeaters (e.g., buf).

3) **Technology-Optimized Cell Library Circuits:** Most common logic circuit topologies with complementary and dual pull-up and pull-down work correctly when an MOS-FET is replaced with a TFET. The same is true for transmission gate-based circuits that require only unidirectional conduction such as mux, xor, latches, and flip-flops, as long as the source and drain terminals of a TFET are oriented correctly. Furthermore, prior works [3], [13] present TFETbased flip-flops and multiplexer circuits that consume lower delay/power/area by leveraging TFET's unidirectional conduction (Fig. 6). While TFET-optimized circuits have clear benefits over baseline TFET (nonoptimized) circuits, block-level impact assessment is essential.

| Cell          | TFET-optimized circuit normalized wrt.<br>baseline |                     |                   |         |
|---------------|----------------------------------------------------|---------------------|-------------------|---------|
|               | Delay                                              | Transistor<br>Count | Energy/<br>switch | Leakage |
| FLIP-FLOP[13] | 0.87x                                              | 0.69x               | 0.6x              | 1x      |
| MUX[3]        | 0.92x                                              | 0.8x                | 0.87x             | 0.625x  |
| FULL-ADDER    | 0.95x                                              | 1x                  | 0.73x             | 0.92x   |

FIGURE 6. PPA benefits of key TFET-optimized logic circuits.

To this end, we synthesized a *CASE-A-3K* using both *tfet-opt* (TFET cell library that includes TFET-optimized circuits) and *tfet-baseline* (CMOS circuits with MOSFET replaced by TFET device) and present results in Fig. 7(a). Superior energy-per-op (up to 23%) for design synthesized with *tfet-opt* comes exclusively from extensive use of TFET-optimized full adder. However, to generalize the benefits of TFET-optimized circuits at the block-level, we next synthesized the *CASE-C-1.5M* design. Results show dynamic energy reduces by 3% and performance improves by 10% (Fig. 7(b)). This exercise emphasizes the need to use different design test cases to assess the impact of design-technology optimizations.

4) Cell Library Characterization: Our cell library contains 510 cells with 115 types. Based on Fig. 3, we characterize TFETs at 0.32, 0.41, and 0.45 V and MOSFETs at 0.45, 0.5, and 0.65 V in TTTT corner at 25C. Manual/scripting approaches to characterization as in prior works are tedious and inefficient [4]. We use a commercial library characterization tool with in-house customization for characterizing our TFET and MOSFET libraries (Fig. 8). The characterization tool takes as input a transistor model, netlists of all cells in the library, preferred circuit simulator (Spectre [15]), and cell and library templates to guide characterization [12]. We use published Verilog-A MOSFET and TFET models (handles TFET's asymmetry). Cell area estimates for TFETs and MOSFETs are the same and does not account for the area impact of asymmetric TFET circuits as it is process design rule-dependent [9].

#### C. LOGIC SYNTHESIS WITH LOW-VOLTAGE LOGIC

1) **Why Synthesis:** Most of the SoC blocks are synthesized using CAD tools. Synthesis takes as input the behavioral



FIGURE 7. (a) CASE-A-3K synthesized with TFET-optimized cell library (*tfet-opt*) is 23% lower energy compared to *tfet-baseline*. Benefits are from extensive use of TFET-optimized full adder. (b) CASE-C-1.5M synthesized with *tfet-opt* has 10% performance improvement over *tfet-baseline*.



FIGURE 8. Logic library characterization flow to generate .lib.

description of a design block and the liberty timing model of a cell library, and generates a gate-level-netlist that meets the PPA constraints. Synthesis optimizes the circuit implementation for the device technology characteristics. Skipping this design-optimization process by simply swapping TFET for MOSFET devices in a MOSFET-optimized design (as in "swap-and-simulate" [4], [5]) precludes us from gaining insight into the advantages and disadvantages of an emerging device to implement logic blocks. We illustrate this point by first synthesizing CASE-B-110K with a large cell library in node N. Next, the node N devices in the optimized design are substituted by devices of the N + 2 generation in [1]. We compare the PPA of this "swap-and-simulate" design with the PPA of a design synthesized using the node N + 2cell library. Please note the cell composition in N and N + 2libraries are identical for fair comparison. The optimized design has a 1.76x higher frequency compared to the "swapand-simulate" design. This is because the node N and N + 2technologies have different  $V_{DD}$ , I-V, and C-V so reoptimization (with synthesis using technology-optimized library) is essential to accurately estimate PPA landing zone of emerging logic devices (Fig. 9).

2) Logic Synthesis Results For *Case-B-110K*: In this paper, we adopt synthesis flows used in product designs to work with our TFET/MOSFET libraries, supporting scan insertion, clock gating, dynamic power optimization, leakage power optimization, and area recovery. Power is measured by generating switching activities from running a real workload/benchmark in the implemented design. The power, performance, and area design spaces for TFET at 0.32 V and MOSFET at 0.45 V for *CASE-B-110K* block after logic synthesis are shown in Fig. 10. Three obser-

Synthesis\_with\_N\_library\_timed\_with\_N+2\_library (A) Synthesis\_with\_N+2\_library\_and\_timed\_with\_N+2\_library (B)







FIGURE 10. TFET/MOSFET implementation tradeoffs for low power block after logic synthesis. Energy savings for TFET over MOSFET varies with frequency targets. Maximum energy savings at low/medium frequency targets.

vations are in order.

- a) TFET implementations on an average consume  $0.44 \times$  the energy of MOSFET implementations. Result is intuitive given the energy proportional to  $C_{\rm dyn}$  and  $V_{\rm DD}^2$ , a TFET design's energy benefit over an MOSFET design is expected to be about  $0.5 \times (0.32^2/0.45^2)$  due to TFET's lower  $V_{\rm DD}$  of operation compared to MOSFET.
- b) Interestingly, TFET designs optimized for lower performance points show higher energy savings (60%)over MOSFET implementations compared to higher performance points (48%). This is apparent given TFET at 0.32 V is slightly slower than MOSFET at 0.45 V (Fig. 3). However, what is nonobvious is that synthesis allows a TFET design at 0.32 V to achieve about the same frequency as MOSFET at 0.45 V, but at the cost of area, dynamic power, and hence increased energy. As shown in Figs. 11 and 12, TFETs' lower intrinsic capacitance allows synthesis to choose large drive-strength cells to achieve frequency targets at the cost of area and dynamic power. Such tradeoffs between performance and area for emerging lowvoltage devices can only be observed by co-optimizing logic design and devices together-enabled by the proposed technology-driven design approach.
- c) Area consumed by TFET and MOSFET are similar except at high-performance points, where TFET uses large drive-strength cells and repeaters to meet high-performance targets.



FIGURE 11. Post-synthesis TFET/MOSFET designs' cell area histogram shows differences in cells chosen by logic synthesis (e.g., aoi112x1 and aoai13x1 not chosen for MOSFET designs).



FIGURE 12. Histogram of gates in critical path for post-synthesis TFET/MOSFET designs. High drive-strength gates (e.g., nor2x12 and invx12) are preferred in TFET design to meet delay targets due to their lower self-loading capacitance.

3) Interaction Between Logic Synthesis And Device I-V: Synthesis' logic optimization methods interact with inherent device characteristics. As a result TFET and MOS-FET designs are optimized differently by logic synthesis. For instance, cells such as aoi112x1 and aoai13x1 are only used in a TFET-based design (Fig. 11). Inspecting the cells that make up the critical paths, TFET design critical paths use large drive strengths cells more liberally than MOSFET design critical paths (Fig. 12). This trend is due to TFETs having lower device capacitance compared to MOSFETs [2]. Lower intrinsic device capacitance reduces the self-loading of large drive-strength cells resulting in lower delay by up to 15% for reasonable load conditions. This makes large drive-strength cells more attractive for synthesis optimizations for TFETs, but not for MOSFETs.

4) Impact Of Embedded Memory Leakage and Design Size On Power/Performance Tradeoffs: Given the large size and physical footprint of *CASE-C-1.5M*, we use physically aware logic synthesis (Synopsys dc-Topographical [17]). Even at the logic synthesis stage, we provide the tool high level floorplan and customized block-specific wire-load models to improve accuracy and quality of results. Results are shown in Fig. 13.

a) While TFET at 0.32 V can operate at frequencies similar to MOSFET at 0.45 V, it consumes  $0.6 \times$  the energy-per-op. While the energy savings of TFETs over MOSFETs is still substantial it is not as much



FIGURE 13. (a) Energy-per-op benefits of TFETs for CASE-C-1.5M is lowered due to embedded memory leakage. (b) Switching energy-per-op benefits for TFET at 0.32 V over MOSFET at 0.45 V is in-line with expectations. Using TFET-optimized cell library (TFET\_OPT\_v032) enables 10% high frequency and 4% lower energy compared to baseline TFET library (TFET\_v032).



FIGURE 14. Synthesis trades off area for performance by offsetting slightly lower performance of TFET at 0.32 V to MOSFET at 0.45 V for *CASE-C-1.5M*.

- as *CASE-B-110K* (Fig. 10). The benefits of steep-SS may be seen in either leakage power (TFET with lower  $I_{off}$ ) or in dynamic power (with lower  $V_{DD}$ ). For *CASE-C-1.5M* and *CASE-B-110K*, transistor  $I_{off}$ was targeted to be equivalent for MOSFET and TFET. *CASE-B-110K* has a dominant active power component and sees full benefit of the transistor  $I_{off}$  targets. *CASE-C-1.5M* has larger leakage components (with an additional 20% from embedded memories), as such, energy reduction is less significant for this specific  $I_{off}$  targeting (Fig. 13(b)). Evidence for this reasoning is shown in Fig. 13(b) where switching energy benefits of TFET over MOSFET are 0.49×, in-line with expectations.
- b) TFET at 0.32 V has marginally lower performance compared to MOSFET at 0.45 V for CASE-C-1.5M. Nontrivial wire RC (due to large block size) and tight power-performance constraints require TFET designs to consume 14% more area than MOSFET designs (Fig. 14). This trend was not seen in CASE-B-110K because the block was optimized for low power and not high performance.

5) **Impact Of Library/Synthesis on Emerging Device Benchmarking:** Results presented so far using the proposed framework underscore the need for industrial design blocks, flows and large libraries for accurate design projections with emerging devices. But is such a framework necessary to

make relative comparisons between emerging devices? We synthesized CASE-B-110K across several frequency targets using small and large TFET and MOSFET cell libraries. We observed that using small libraries was more detrimental to TFET at 0.32 V than MOSFET at 0.45 V to synthesize high frequency designs. Specifically, using small TFET libraries to synthesize designs under-projects a TFET design's maximum attainable frequency by 40% (Fig. 15). This result is in-line with expectation that cell library with reasonable number and variety of cells (functions, drive strengths) is key to the efficiency of design generated by logic synthesis, which in turn impacts absolute and relative power/performance projections for emerging logic devices. Technology-driven design approach co-optimizes logic design and device I-V characteristics resulting in technology-optimal designs and accurate power-performance projections for emerging device benchmarking.



FIGURE 15. Synthesis/technology-specific comprehensive cell libraries are essential to make accurate absolute and relative projections for emerging devices. Technology-driven design approach improves accuracy of relative comparisons of delay and energy/op by 40% and 71%.

6) Sensitivity of Performance and Energy Efficiency to Supply Voltage: Prior works have used "swap-andsimulate" to understand the sensitivity of a block's performance and energy efficiency to operating voltage for different emerging low-voltage logic devices. While being simple, "swap-and-simulate" fails to comprehend the interaction between an emerging device's I-V characteristics, operating voltage, and design-optimization methods such a logic synthesis. Our proposed technology-driven design approach addresses this pitfall of "swap-and-simulate." Fig. 16 shows performance/watt improvements of TFET over MOSFET at different operating frequencies as estimated by "swap-and-simulate" and technology-driven design. Technology-driven design, by optimizing logic design with device I-V, can improve performance/watt by 22% at low performance, and frequency by 10% at higher performance points-illustrating the efficacy of the proposed approach.

# D. PHYSICAL SYNTHESIS WITH TFETS AND MOSFETS

1) Why Physical Synthesis? Interconnect scaling—with tight pitches, acceptable resistance, and capacitance per  $\mu$ m of wire—is becoming increasingly challenging [11], [18]. As interconnect and devices together determine the efficiency of ICs, any holistic evaluation of emerging devices has to be



FIGURE 16. Technology-driven design approach enables 22% better performance/watt and 10% higher frequency than swap-and-simulate. Improvements are achieved by co-optimizing logic design with *I–V* characteristics for emerging low-voltage devices.



FIGURE 17. Physical synthesis is necessary to capture interconnect *RC* impact on a design's frequency and area (e.g., physical synthesis frequency estimate is 9% lower than logic synthesis).

fully cognizant of interconnect RC scaling. While prior works have taken a step in the right direction, their interconnect considerations are rudimentary. For instance, a recent prior work has tried to account for interconnect RC with wire-load models in logic synthesis [5]. Although logic synthesis with wire-load models provides a notion of interconnect loading on gates (based on gate count and fan-out), lack of placement information makes them highly inaccurate compared to physical synthesis (Fig. 17). To work-around this problem, a naïve approach is to adapt physical synthesis flows used in products for emerging low-voltage devices. However, that will be impractical, as it requires development of complete physical technology collaterals. We use the methodology described in supplementation section to assess the effect of future generation interconnect (node N + 1) using existing physical synthesis flows (in node N).

2) Interaction of Low-Voltage Logic and Scaled Interconnect: Next, we analyzed the impact of scaled interconnect RC (as applicable to ITRS 2018 node) in *CASE-C-1.5M*. Given the large physical size of this block it is meaningful to analyze the interaction of low-

| Attributes | TFET @ 0.32V normalized to<br>MOSFET @ 0.45V |                    |  |  |
|------------|----------------------------------------------|--------------------|--|--|
|            | Logic Synthesis                              | Physical Synthesis |  |  |
| Frequency  | 1.0                                          | 0.98               |  |  |
| Energy     | 0.6                                          | 0.6                |  |  |
| Area       | <u>1.14</u>                                  | <u>1.07</u>        |  |  |

# FIGURE 18. Post-layout timing and power estimations show different relative effects between TFET/MOSFET that are not identified with prelayout analysis (synthesis with [17]).

voltage logic devices and interconnects. We observe that lower supply voltages of TFETs lends it a marginal advantage over MOSFETs in tackling future interconnect *RC* effects (Fig. 18).

# **IV. CONCLUSION**

Owing to their superior performance and leakage at low supply voltage steep-SS devices, such as TFETs, are being actively considered to augment MOSFETs in energyconstrained SoCs. To fully understand and leverage the unique opportunities enabled by low-voltage logic devices, it is necessary to consider the full range of logic and circuit design optimizations. To the best of our knowledge, this is the first work that co-optimizes logic design with unique characteristics of emerging devices—resulting in superior design comparisons and accurate design projections. Technology-driven design framework enables device engineers and designers to develop a heterogeneous compute substrate for future applications. Key insights from technology-driven design case studies are as follows.

- "Swap-and-simulate" is effective only for first-order relative comparisons between emerging devices as it does not consider technology-specific design optimizations.
- Technology-optimized designs with emerging devices are as different as the underlying device *I-V* characteristics.
- Power-performance tradeoffs associated with different emerging logic devices are also dependent on design attributes and design-optimization methods.
- Comparing emerging logic devices with consideration of interconnect *RC* characteristics is critical. Interaction between logic device and interconnect *RC* is best studied in blocks with large physical footprint.
- Low-voltage logic devices, such as TFETs, improve energy efficiency. Heterogeneous integration of such devices with MOSFET is necessary to achieve high performance [13].

#### REFERENCES

- [1] ITRS. *ITRS Roadmap.* 2012. [Online]. Available: http://www.itrs.net/reports.html
- [2] U. E. Avci *et al.*, "Energy efficiency comparison of nanowire heterojunction TFET and Si MOSFET at L<sub>g</sub>=13 nm, including P-TFET and variation considerations," in *IEDM Tech. Dig.*, Dec. 2013, pp. 33.4.1–33.4.4.
- [3] D. H. Morris, U. E. Avci, R. Rios, and I. A. Young, "Design of low voltage tunneling-FET logic circuits considering asymmetric conduction characteristics," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 4, no. 4, pp. 380–388, Dec. 2014.

- [4] K. Swaminathan *et al.*, "Modeling steep slope devices: From circuits to architectures," in *Proc. Conf. Design, Autom. Test Europe (DATE)*, 2014, pp. 1–6.
- [5] A. Sharma, A. Arun Goud, and K. Roy "Sub-10 nm FinFETs and tunnel-FETs: From devices to systems," in *Proc. Design, Autom. Test Europe Conf. (DATE)*, 2005, pp. 1443–1448.
- [6] D. H. Morris, U. E. Avci, and I. A. Young, "Variation-tolerant dense TFET memory with low V<sub>MIN</sub> matching low-voltage TFET logic," in *Proc. Symp. VLSI Technol.*, 2015, pp. T24–T25.
- [7] K. Swaminathan, E. Kultursay, V. Saripalli, V. Narayanan, M. T. Kandemir, and S. Datta, "Steep-slope devices: From dark to dim silicon," *IEEE Micro*, vol. 33, no. 5, pp. 50–59, Sep. 2013.
- [8] R. Aitken et al., "Physical design and FinFETs," in Proc. Int. Symp. Phys. Design, 2012, pp. 65–68.
- [9] U. E. Avci, D. H. Morris, and I. A. Young, "Tunnel field-effect transistors: Prospects and challenges," *IEEE J. Electron Devices Soc.*, vol. 3, no. 3, pp. 88–95, May 2015.
- [10] ARM Standard Cell Libraries. Accessed: 2016. [Online]. Available: https://www.arm.com/products/physical-ip/logic-ip/standard-celllibraries.php
- [11] J. S. Clarke, C. George, C. Jezewski, A. M. Caro, D. Michalak, and J. Torres, "Process technology scaling in an increasingly interconnect dominated world," in *Proc. Symp. VLSI Technol.*, 2014, pp. 1–2.
- [12] Synopsys Liberty NCX. Accessed: 2015. [Online]. Available: https://www. synopsys.com/Tools/Implementation/SignOff/Documents/liberty\_ncx\_ ds.pdf
- [13] D. H. Morris, K. Vaidyanathan, U. E. Avci, H. Liu, T. Karnik, and I. A. Young, "Enabling high-performance heterogeneous TFET/CMOS logic with novel circuits using TFET unidirectionality and low-V<sub>DD</sub> operation," in *Proc. Symp. VLSI Technol.*, 2016, pp. 1–2.
- [14] T. Thiel, "Have I really met timing?—Validating primetime timing reports with SPICE," in *Proc. Design, Autom. Test Europe*, 2004, pp. 114–119.
- [15] Cadence Spectre Simulator. Accessed: 2015. [Online]. Available: http:// www.cadence.com/products/cic/spectre\_circuit/pages/default.aspx
- [16] Synopsys Primetime. Accessed: 2015. [Online]. Available: http://www. synopsys.com/Tools/Implementation/SignOff/Pages/PrimeTime.aspx
- [17] Synopsys DC Topographical. Accessed: 2015. [Online]. Available: https:// www.synopsys.com/apps/support/training/designcompilertop\_fcd.html
- [18] C. Hou, "A smart design paradigm for smart chips," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2017, pp. 8–13.
- [19] D. E. Nikonov and I. A. Young, "Benchmarking of beyond-CMOS exploratory devices for logic integrated circuits," *IEEE J. Explor. Solid-State Computat. Devices Circuits*, vol. 1, pp. 3–11, 2015.
- [20] B. Sedighi, X. S. Hu, H. Liu, J. J. Nahas, and M. Niemier, "Analog circuit design using tunnel-FETs," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 1, pp. 39–48, Jan. 2015.
- [21] J. Min and P. M. Asbeck, "Compact modeling of distributed effects in 2-D vertical tunnel FETs and their impact on DC and RF performances," *IEEE J. Explor. Solid-State Computat. Devices Circuits*, vol. 3, pp. 18–26, 2017.



**KAUSHIK VAIDYANATHAN** (M'14) received the B.E. degree in electronics and communication from the Madras Institute of Technology, Chennai, India, athe M.S. and Ph.D. degrees in electrical and computer engineering from Carnegie Mellon University (CMU), Pittsburgh, PA, USA, in 2012 and 2014, respectively.

At CMU, in collaboration with IBM, he researched and developed design technology cooptimization (DTCO) methods for cost-effective

scaling beyond 20-nm node. His research on DTCO was one of the most downloaded SPIE paper in 2015. He has co-authored a book for SPIE Press and co-taught a tutorial on DTCO in SPIE Advanced Lithography in 2016. Since 2015, he has been a Research Scientist with the Microarchitecture Research Lab, Intel Labs, Santa Clara, CA, USA. He is part of the Process-Optimized Microarchitecture Research Group, where his role is to understand and quantify the implications of emerging devices, interconnect technologies and integration schemes at different design abstractions (circuit, block, and microarchitecture).

Dr. Vaidyanathan received the University Gold Medal for his B.E. degree.



**DANIEL H. MORRIS** (M'12) received the B.S. degree from Northwestern University, Chicago, IL, USA, and the M.S. and Ph.D. degrees from Carnegie Mellon University (CMU), Pittsburgh, PA, USA.

At CMU, he researched novel all-magnetic logic circuits which were designed to compute without tightly integrated CMOS for low cost and ultra-low voltage operation. He was involved in design enablement and design-technology co-

optimization of 14-nm logic and memory circuits in a joint IBM/CMU Project on cost-effective scaling. He is with the Exploratory Integrated Circuit Group, Components Research Department, Intel Corporation, Hillsboro, OR, USA. With a focus on circuit and architectural aspects, he is responsible for design and benchmarking of new devices in both conventional and emerging application areas. He researches the tunneling FET and other chargebased or spin-based beyond-CMOS technologies.



**UYGAR E. AVCI** (M'05) received the double major B.S. degrees in physics and electrical engineering from Bogazici University, Istanbul, Turkey, and the M.S. and Ph.D. degrees in applied physics from Cornell University, Ithaca, NY, USA, in 2003 and 2005, respectively. During his Ph.D. degree, he demonstrated the first experimental realization of backside flash memory.

He joined Intel's Components Research in 2005, leading floating body cell (FBC) memory experi-

mental device design and scaling that demonstrated industry-leading FBC memory cells. Since 2010, he has been involved in the opportunities that beyond-CMOS devices offer to either replace or augment CMOS. He is currently a Principal Engineer, leading the research for charge-based beyond-CMOS devices and circuits.

Dr. Avci was the recipient of the President's Award. He served as the Fundamentals Class Chair and the Short Course Chair for the International SOI Conference in 2012 and 2013, respectively. He is an Associate Editor for the IEEE TRANSACTIONS ON ELECTRON DEVICES.



**HUICHU LIU** (S'11–M'15) received the B.S. degree in microelectronics from Peking University, Beijing, China, in 2009, and the Ph.D. degree in electrical engineering from Pennsylvania State University, University Park, PA, USA, in 2014.

She interned at IBM T. J. Watson Research Center, Yorktown Heights, NY, USA, from 2011 to 2011, and Globalfoundries, Santa Clara, CA, USA, from 2014 to 2014. She joined as a Research Scientist with Intel Labs Santa Clara, CA, USA, in

2015. Her current research interests include device-circuit co-design for low power applications.

Dr. Liu was one of the winners of the IBM Ph.D. Fellowship Award from 2011 to 2012.



**TANAY KARNIK** (F'14) received the Ph.D. degree in computer engineering from the University of Illinois at Urbana-Champaign, Champaign, IL, USA.

He was the Director of Intel's University Research Office. He joined Intel Corporation, Hillsboro, Oregon, USA, in 1995. He is a Principal Engineer with the Microarchitecture Research Lab, Intel Labs. His current research interests include the areas of 3-D architectures, variation

tolerance, power delivery, and architectures for novel devices. He has published over 80 technical papers, has 74 issued and 40 pending patents in these areas. Dr. Karnik is an ISQED Fellow. He was a member of the ISSCC, DAC, ICCAD, ICICDT, ISVLSI, ISCAS, 3DIC, and ISQED program committees, and JSSC, TCAD, TVLSI, and TCAS review committees. He received the Intel Achievement Award for the pioneering work on integrated power delivery. He has presented several keynotes, invited talks, and tutorials, and has served on seven Ph.D. students' committees. He was the General Chair of ISLPED'14, ASQED'10, ISQED'09, ISQED'08, and ICICDT'08. He was a Guest Editor for the *Journal of Solid-State Circuits*. He is an Associate Editor for TRANSACTIONS ON VERY LARGE SCALE INTEGRATION and a Senior Advisory Board Member of *Journal on Emerging and Selected Topics in Circuits and Systems*.



**HONG WANG** received the a bachelor's degree in computer engineering from the Harbin Institute of Technology, Harbin, China, and the Ph.D. in electrical engineering from The University of Rhode Island, Kingston, RI, USA.

He is an Intel Fellow and the Director of the Microarchitecture Research Lab, Intel Labs Organization, Intel Corporation, Santa Clara, CA, USA. He manages microarchitecture research for processors and other key intellectual property core

designs. He has published over 50 technical papers and has been granted 145 patents, with another 80 patents pending, in areas including processor architecture and microarchitecture. His current research interests include developing synthesizable, configurable designs that support low-power, and energy-efficient system-on-chip (SoC) integration on multiple process technologies. This line of research has led to transformative technologies used in the Intel Quark family of embedded Internet of Things products and Intel's SoFIA family of SoC designs for mobile computing based on Intel Atom processors.

Dr. Wang has honored by Intel with three Intel Achievement Awards—for his work on platform-level software simulation in 1999, for making Atom processors ready for the FPGA emulation in 2008, and for developing the sub-Atom core inside the Quark products, thereby creating a foundation for Intel's low-end processor roadmap in 2011. He was also a recipient of the 2011 Mahboob Khan Outstanding Industry Liaison Award from the Semiconductor Research Corporation.



**IAN A. YOUNG** (F'99) received the B.E.E. and M.Eng.Sc. degrees from the University of Melbourne, Melbourne, VIC, Australia, and the Ph.D. degree in electrical engineering from the University of California at Berkeley, Berkeley, CA, USA.

He is a Senior Fellow and the Director of Exploratory Integrated Circuits in the Technology and Manufacturing Group, Intel Corporation, Hillsboro, Oregon, USA. He joined Intel in 1983 and his technical contributions have been in the

design of DRAMs, SRAMs, microprocessor circuit design, phase locked loops and microprocessor clocking, mixed-signal circuits for microprocessor high speed I/O links, RF CMOS circuits for wireless transceivers, and research for chip to chip optical I/O. He has also contributed to the definition and development of Intel's process technologies. He is currently leading a research group exploring the future options for the integrated circuit in the beyond CMOS era.

Dr. Young was a recipient of the 2009 International Solid-State Circuits Conference's Jack Raper Award for Outstanding Technology Directions Paper and the 2018 IEEE Frederik Philips Award for leadership in research and development on circuits and processes for the evolution of microprocessors. He served on the Technical Program Committee of the International Solid-State Circuits Conference from 1992 to 2005, where he was the Technical Program Committee from 1991 to 1998, where he was the Technical Program Committee from 1991 to 1998, where he was the Technical Program Committee and Symposium On VLSI Circuits Technical Program Committee from 1991 to 1998, where he was the Technical Program Committee and Symposium Chairman. He is a three time Guest Editor for the IEEE Journal Of Solid-State Circuits and the Guest Editor of the IEEE Journal of Selected Topics in Quantum Electronics. He is currently the Editor-in-Chief of the IEEE Journal in Exploratory Solid-State Computation Devices and Circuits.