

Received 10 June 2024, accepted 3 July 2024, date of publication 12 July 2024, date of current version 23 July 2024. *Digital Object Identifier* 10.1109/ACCESS.2024.3427332

# **RESEARCH ARTICLE**

# **Design Technology Co-Optimization and Time-Efficient Verification for Enhanced Pin Accessibility in the Post-3-nm Node**

JAEHOON JEONG<sup>10</sup>, (Graduate Student Member, IEEE), YUNJEONG SHIN<sup>10</sup><sup>2</sup>, (Graduate Student Member, IEEE), HYUNDONG LEE<sup>10</sup><sup>2</sup>, (Graduate Student Member, IEEE), JONGHYUN KO<sup>10</sup><sup>2</sup>, (Graduate Student Member, IEEE), JONGBEOM KIM<sup>10</sup><sup>2</sup>, (Graduate Student Member, IEEE), AND TAIGON SONG<sup>10</sup><sup>2</sup>, (Member, IEEE)

<sup>1</sup>Foundry Design Service Team, Samsung Electronics, Gyeonggi-do 18448, South Korea
<sup>2</sup>School of Electronic and Electrical Engineering, Kyungpook National University (KNU), Daegu 41566, South Korea

Corresponding author: Taigon Song (tsong@knu.ac.kr)

This work was supported in part by the National Research and Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and Information and Communications Technology (ICT) under Grant 2020M3H2A107804514; in part by the BK21 Four Project funded by the Ministry of Education, South Korea, under Grant 4199990113966; in part by the Basic Science Research Program through NRF funded by the Ministry of Education under Grant 2022R1I1A3073214; in part by Samsung Electronics Company Ltd.; in part by the Technology Innovation Program (Customizable Low-Power Artificial Intelligence (AI) Accelerator and Software Based on Resistive Analog Computing in-Memory (ACiM) for Edge Devices) funded by the Ministry of Trade, Industry and Energy (MOTIE), South Korea, under Grant 00236772; in part by the Technology Innovation Program (Public-Private Joint Investment Semiconductor Research and Development Program (K-CHIPS) to Foster High-Quality Human Resources) funded by MOTIE under Grant RS-2023-00236772 and Grant 1415187507.

**ABSTRACT** As the technology nodes approach 3 nm and beyond, nanosheet FETs (NSFETs) are replacing FinFETs. However, despite the migration of devices from FinFETs to NSFETs, few studies report the impact of NSFETs in the digital VLSI's perspective. In this paper, we present a study of how the latest device technology, back end of line (BEOL), and the designs of NSFETs aid each other for enhanced pin accessibility in layout and standard cell library design for less routing congestion and low power consumption. For this objective, 1) we discuss five layout design methodologies that are co-optimized with device technology to tackle the pin accessibility issues that arise in standard cell designs in extremely-low routing resource environments (e.g., 4 Signal Tracks), 2) we introduce pin accessibility analysis procedures before chip P&R, and 3) we report how local trench contact (LTC) helps in reducing cell tracks for 5 track cells and less. Using our methodology, we improve design metrics such as power consumption, total area, and wirelength by 11.0%, 13.2%, and 16.0%, respectively in full-chip scale designs. By our study, we expect the routing congestion issues that additionally occur in advanced technology nodes to be handled and better full-chip designs to be done in 3 nm and beyond.

**INDEX TERMS** NSFET, nanosheet, pin optimization, standard cell layout, library.

#### **I. INTRODUCTION**

Semiconductor chips have made tremendous development over the past half-century. This development is not only the result of remarkable growth in a specified research domain but also the result of continuous efforts in various fields.

The associate editor coordinating the review of this manuscript and approving it for publication was Jagadheswaran Rajendran<sup>10</sup>.

New types of transistors (FinFET, Nanosheet FET) have solved the planar-MOSFET technology scaling limitations such as short-channel effect and leakage power [1], [2], [3], [4], and new process technologies such as double-patterning and EUV technology have overcome the challenges of interconnect scaling which, however, is not proportional as devices [5], [6].



FIGURE 1. Front end and middle end of line structure of 3 nm NanoSheet FET (NS3K, [12]). Metals and devices are connected with local trench contact (LTC). Figure modified from [13].

From the circuit design perspective, the direction of development over the past 50 years is also impressive. From an architectural standpoint, designers use modeling tools to analyze design factors such as total power, area, and timing [7]. In circuit design, numerous designers have made great developments such as automated layout design via machine learning and standard cell synthesis framework for automated placement and routing at sub 7 nm technology [8], [9]. Also, from a digital design perspective, we cannot but discuss the development of standard cells, which is the core of all digital design, and the importance of electrical design automation (EDA). Standard cell (SDC) is the smallest unit of a digital design that implements all logic, and their optimization (= the number of these SDCs, the length of the wires on the chip connecting them all) directly affects the chip's power, performance, and area (PPA). The design and optimization of these SDCs cannot be done without EDA tools, and these EDA tools have evolved well with advances in various fields [10], [11].

Pin optimization is a matter of considering where every input/output (I/O) pin is optimally placed on an SDC. In digital design, place and route is no longer done manually, so routing issues of connecting all pins in an SDC are now done with EDA tools. Also, several studies have confirmed that the PPA of a chip design depends on the optimal pin placement [14], [15]. In other words, connecting SDCs is done only if the router has access to SDC I/O pins, and better access to the pins improves the quality of the chip design. Conventionally, pin optimization focuses on checking the routability (= cell density, congestion) of the total chip by evaluating whether the router has easy or difficult access to I/O pins (= pin accessibility) [16]. However, as technology advances, access to an SDC input/output (I/O) is becoming more challenging.

Improving pin accessibility in SDC layout requires extensive studies from various fields such as device development, manufacturing process, cell library design, and computeraided-design (CAD) tool algorithms [16], [17], [18]. For example, designing a compact transistor model through design technology co-optimization (DTCO) should be done by a close collaboration of device and process engineers [19] or by improving pin accessibility with detailed placement optimizations in CAD tools [16]. However, NSFET and FinFET, which are emerging device candidates for post-planar-MOSFET processes, differ in many ways from manufacturing to library design [20], [21]. For this reason, it is hard to directly apply conventional planar-MOSFET-based pin optimization methods.

Many of the recent studies published in academia mainly focused on FinFET-based pin optimization in the CAD perspective. For example, Chung et al. [22] proposed an SDC pin placement algorithm by extracting all net segments for in-cell routing to enhance pin accessibility. Seo et al. [23] represented SDC layout re-design method by choosing the candidate cells for re-designing. Tai et al. [24] proposed designing I/O pins in long or short of achieving the pin optimization, and Clark et al. [25] modified the shape of an SDC layout in 7nm FinFET (ASAP7, [21]) based on a local interconnect gate. In [11], Xu et al. improved pin accessibility by reducing the total number of SDC metal tracks. Although the semiconductor industry has recently begun mass production with nanosheet FETs (NSFETs), the latest studies in pin accessibility enhancement do not include NSFET-based designs.

Therefore, in this paper, we study pin accessibility of NSFET in a holistic procedure considering devices, BEOL, and CAD. The pin access of NSFETs differs in many ways from the conventional FinFETs: 1) NSFET SDCs should be designed with very few routing tracks (less than 5), 2) pin optimization is limited due to the extra front end or middle end of the line layer (FEOL or MEOL), 3) layout design rules are more complex due to the different device structures. Therefore, designers are required to clearly understand these differences to design a chip with the best PPA in the latest process. We highlight our finding that NSFET enables efficient pin placement by modifying the FEOL layer (= local trench contact in Fig. 1) and reduces the number of SDC in-route tracks compared to FinFET. From the perspective of optimizing the NSFET SDC I/O pins by taking advantage of these points, we present the following contributions:

- To the best of the author's knowledge, we propose the first study on I/O pin optimization of SDCs in 3 nm NSFET technology node that visualizes the potential of LTCs in SDC design.
- We discuss five layout design methodologies incorporating devices and interconnects that increase pin access points (pin accessibility) and show significant design enhancement in 3 nm NSFET.
- 3) We present new pin accessibility analysis methods based on how routers access pins. This method allows a fair comparison by quantifying SDC layout pin accessibility of the proposed scheme to the original scheme.
- 4) We present a methodology to analyze pin accessibility in cell-to-cell level before the chip-level P&R analysis.

|                                                                         | 3 nm NSFET design rules | Dimension |
|-------------------------------------------------------------------------|-------------------------|-----------|
| →(e) <del>+</del> \ DD                                                  | (a) M0 width            | 12 nm     |
| #1 (a)                                                                  | (b) M0 spacing          | 12 nm     |
| +(g)+<br>#2 →(h)+ ⊥                                                     | (c) M1 width            | 16 nm     |
| (b)                                                                     | (d) M1 spacing          | 16 nm     |
| $\frac{1}{3} \qquad \frac{1}{2} \rightarrow (c) \leftarrow \frac{1}{2}$ | (e) LTC width           | 12 nm     |
| #4                                                                      | (f) LTC2M0 width        | 10 nm     |
| (f)+<br>VCC                                                             | (g) PTC width           | 10 nm     |
|                                                                         | (h) VIA0 width          | 10 nm     |

FIGURE 2. Key design rules of 3 nm NSFET and INV layout [12]. Each design rule is indicated as (a), (b), etc., and the corresponding values are noted in the table on the right-hand side. The total number of tracks is five, which consists of four tracks for signal routing tracks and 1 track for power rail. All metals are allowed to be designed for uni-directional routing (Metal 0 is horizontal, Metal 1 is vertical).

TABLE 1. Details of our 5nm/3nm back end of line structure [12].

|        | 5     | nm               | 3nm   |                  |  |  |
|--------|-------|------------------|-------|------------------|--|--|
| Layer  | Pitch | $R_{BEOL}$       | Pitch | $R_{BEOL}$       |  |  |
|        | (nm)  | $(\Omega/\mu m)$ | (nm)  | $(\Omega/\mu m)$ |  |  |
| M0     | 28    | 147.8            | 24    | 210.8            |  |  |
| M1-3   | 40    | 50.50            | 32    | 88.13            |  |  |
| M4-6   | 76    | 10.16            | 64    | 16.25            |  |  |
| M7-8   | 96    | 6.350            | 80    | 9.650            |  |  |
| M9-10  | 160   | 2.040            | 120   | 3.830            |  |  |
| M11-12 | 720   | 0.090            | 600   | 0.120            |  |  |

Cell-to-cell level analysis shortens the turn-aroundtime for SDC's pin accessibility analysis with better accuracy.

- 5) We introduce an algorithm for checking track reduction from 5-track to less for 3 nm NSFET layouts.
- 6) Compared to a previous study of 3 nm layouts [12] and traditional layout design methods [24], [25], our I/O pin-optimized 3 nm NSFET SDC library reports 11.0%, 13.2% and 16.0% improvements in power, area, and total wire length, respectively.

#### **II. BACKGROUND**

In this section, we review the design rules of the 3 nm NSFET SDC library (NS3K, [12]) used in this study. Also, we provide definitions of 1) the pin access point and 2) situations where SDCs have low pin accessibility. Then, we explain why pin accessibility is important in the process of designing an SDC layout.

#### A. KEY DESIGN RULES OF NS3K LAYOUT

In our study, we use 3 nm NSFETs as the baseline technology node [12]. Fig. 1 illustrates the front and middle end of line structure of this technology node.<sup>1</sup> The number of metal tracks per standard cell (SDC) depends on the specific node (five tracks in [12]), and this leads to challenges in designing SDC layout due to the limited cell height (i.e., horizontal



**FIGURE 3.** 3 nm NSFET standard cell layout and connections with surrounding cells. (a) shows a situation where the lower metal (Metal 2) is used and (b) shows the situation where the upper metal (Metal 4) is used to connect the I/O pins between the cells. We should use upper metal (Metal 4) to connect C<sub>out</sub> (cell C output) with D<sub>in</sub> (cell D input), since Metal 1 and Metal 2 are already placed.

track count). Signal and power pins are typically routed using the closest metal to the devices in an SDC layout. Typically, power pins are assigned to metal tracks at the top and bottom of an SDC (e.g., VDD/VSS). The remaining metal tracks are used for the signal pins. Unlike the conventional Metal 0 (M0) or Metal 1 (M1) for connecting devices, 3 nm NSFET technology provides Local Trench Contact (LTC) and LTCto-M0 (LTC2M0) vias for additional routing resources.

Fig. 2 shows some key design rules and parameters of the 3 nm NSFET layout. As shown in the right-hand side of Fig. 2, note that the metal spacing between tracks is very small compared to FinFETs [12], [21] (e.g., M0 spacing is 12 nm in NSFET and 14 nm in FinFET. See Table 1 for details). Regarding the detailed design rules, more details are discussed in [12]. In addition, this technology node assumes a uni-directional interconnect (i.e., M0 is horizontal, M1 is vertical, M2 is horizontal, etc.). The total number of tracks is five (5T). Four tracks are for signal routing, and one track is used for VDD/VSS. Note that signal routing of SDCs is done in an extremely-less routing track environment. In this study, we focus on optimizing SDC layout pins to meet all design rules of Fig. 2.

#### B. PIN ACCESS POINTS AND PIN ACCESSIBILITY

As mentioned in Sec. II-A, III nm NSFET SDCs encounter extremely-low track (5)-track) when routing signal tracks. Due to the reduced number of tracks, Metal 2 (M2) will be used more frequently if the pins of SDCs in M1 are not designed properly, which inevitably leads to using upper metal layers such as Metal 3 (M3) or Metal 4 (M4). For example, Fig. 3 illustrates how the I/O pins of each cell are connected when placing 3 nm NSFET SDCs. Fig. 3 (a) shows the situation where the output pin of cell A (Aout) and the input pin of cell B (Bin) are connected. We confirm that Bout is pre-placed between the two pins, but both pins (red font) can be connected to M2 in track #1 (using track #2,#3, or #4 is also possible). However, in the case of Fig. 3 (b), the other I/O pins of cell C and cell D are pre-placed with M2. In particular, when connecting the output pin of cell C ( $C_{out}$ ) and the input pin of cell D  $(D_{in})$ , it is inevitable to use M4 to connect these two pins (red font) due to the pre-placed M2 wires existing in #1 to #4. Also, note that Bin in cell B is placed from track #1 to #4, whereas  $D_{in}$  in cell D is placed only at #3 and #4. Note that there are four and two points that the router (=

<sup>&</sup>lt;sup>1</sup>In [12] and [26], buried-power-rail (BPR) related research is intensively studied, so we skip the BPR-related contents in this study.

commercial tool) can access to  $B_{in}$  and  $D_{in}$ . We define these points as 'pin access points'. As illustrated in Fig. 3, we define these situations as 'lower pin accessibility' in which upper metals (M3 or M4) should be used for routing due to lower metals being occupied by other nets. Thus, designers should understand that commercial routers use higher metal layers (e.g., M3 and M4) in situations when it is difficult to route pins using lower metal layers (e.g., M1 and M2). In order to reduce the routing congestion and save metal resources, SDC designs with more pin access points are crucial.

#### III. FIVE DESIGN METHODOLOGIES CONSIDERING DTCO (DESIGN-TECHNOLOGY CO-OPTIMIZATION)

Pin accessibility of a 3 nm NSFET layout is a challenging issue, and conventional methods reported its limitations on the latest technology nodes [24], [25]. This is due to the following two reasons: 1) Fewer signal routing tracks (< five) and 2) different layer structures of the front end and middle end of line.

Longer input/output (I/O) pins of standard cells are one of the most common ways to increase pin accessibility. For example, when we make the M1 pin longer, the upper routing resource, M2, becomes available from all points of access to I/O. Although this traditional method is effective, it does not provide greater accessibility to the pins on the NSFET layout.

For example, if a six-I/O cell is designed in a layout where four signal routing tracks (M1 layer) is available for design (# of I/Os  $\geq$  # of M1 available in layout, details in Sec. III-C), it may not be possible to design long pins due to layout constraints. In addition, LTC that connects the device to the interconnect is closely associated with pin design. The length of the LTC leads to the number of metal tracks M1 should be used to access the sheet. For example, if the designer uses 30nm LTC for connection between M1 and the devices, M1 requires 4 tracks for device connection. However, if 36nm LTC is used, M1 can be designed using 2 tracks only (detailed in Sec. III-A).

Therefore, enhancing the pin accessibility of SDCs in NSFET requires a holistic approach that considers both devices and interconnects for design. Understanding this, we discuss five layout design methodologies that enhance pin accessibility of 3nm NSFETs.

An important layer to this task is the LTC (lower trench contact). As a part of the 3 nm NSFET standard cell library (NS3K), the LTC provides an additional layer of connectivity between the active region and M0 [12]. We discuss how the LTC layer is effectively used.

#### A. DESIGN #1 (D1)-OUTPUT PIN FLEXIBILITY

The LTC modification allows for more flexibility in the design of the output pins. In Sub-FinFET technology, the initial metal (= M0 in the NS3K library) is not fabricated directly over the active (= diffusion) region [25]. This requires the initial metal to connect through the poly gate using an additional layer (LTC in NS3K [12] and Local-Interconnect Source-Drain in ASAP7 [21]). In [12], LTCs



**FIGURE 4.** (a) Original 3 nm NSFET NOR2  $\times$  1 layout (b) Modified 3 nm NSFET NOR2  $\times$  1 layout. Due to the LTC modification, the output pins are not necessarily designed to be long. Figure modified from [13].



**FIGURE 5.** (a) Original 3 nm NSFET OAI21 × 1 layout, (b) Re-designed 3 nm NSFET OAI21 × 1 layout. By modifying LTC, transistors  $\alpha$  and  $\beta$  have better Source/Drain connections, which allows more freedom for pin designs (B1 and B2 are now able to use all #1-#4 tracks). Figure modified from [13].

are designed for the purpose of providing a strict connection between nanosheets and M0. However, LTC can be extended for better output pin design [21].

Extending the LTC gives the opportunity to freely design the output pins. Fig. 4 (a) shows the original 3 nm NOR2  $\times$ 1 layout, and the NOR2  $\times$  1 layout in Fig. 4 (b) illustrates the modification of the LTC to shorten the output pin length. In the original 3 nm layout of the NSFET, LTC exists only in the active region. This makes LTC a full overlap with #1 and #4 only. Thus, we must design a long output pin to connect the PFET and the NFET (ZN pin from #1 to #4). Note that it may be inefficient when output pins occupy all possible routing tracks. To resolve this issue, we expand the LTC above the active region, as shown in Fig. 4 (b). As a result of this LTC extension, the output pin ZN can now be freely designed to be placed at #2 and #3, as desired by the designer. Therefore, this method gives more flexibility in designing the length of the output pins.

#### B. DESIGN #2 (D2)-INCREASING PIN ACCESS POINTS

The method of modifying LTC affects both the output pin length as well as the Source/Drain connection between transistors when redesigning the layout. **D1** (Design #1) refers to a reduction in the pin size, whereas **D2** refers to an increase in the pin size in order to ensure optimal performance. In the original 3 nm NSFET layout, as shown in

## **IEEE**Access

Fig. 5 (a), we use two tracks (#3 and #4) to connect the source of transistor  $\alpha$  and the drain of transistor  $\beta$ . Due to the number of M1s that overlap with active (mentioned in Sec. III-A), connections between transistors frequently require detours. To connect two transistors ( $\alpha$  and  $\beta$ ), two M1s and one M0 must be used, as indicated by the arrows in red font in Fig. 5 (a). This connection affects the pin layout. Considering the connection between transistors, input pins B1 and B2 are limited to two pin access points (= #1, #2) in Fig. 5 (a). As a result of the internal transistor connections, B1 and B2 have a limited pin length of two tracks.

In this situation, LTC can provide a breakthrough. The Source/Drain connections of transistors  $\alpha/\beta$  can be designed using only one track (= #4 track) by using a longer LTC, as illustrated in Fig. 5 (b). Furthermore, B1 and B2 now have more pin access points that cover all tracks (#1 to #4). Therefore, proper LTC design can significantly improve pin accessibility.

#### C. DESIGN #3 (D3)-USING MORE WIRE SEGMENTS FOR A PIN

Compared to FinFETs, NSFETs show better performance in the same footprint [27]. Based on this fact, NSFETs have a very limited metal resource for a smaller cell size. Considering that NSFETs have extremely low metal resources (4)-track for signal routing), long pins have a maximum of 4 pin access points, and long pin designs are no longer conducive to improving pin accessibility. Knowing that there is a trade-off between pin access points and limited metal resources in NSFETs, we apply an appropriate method through case classification.

Because of its original schematics, complex cells (cells that consist of more than two native cells, such as BUF, AND2, and OR2) generate empty spaces where diffusion is not placed. For example, Fig. 6 (a) shows the 3 nm NSFET OR3  $\times$  1 layout applying **D2** methodology. All I/O pins have a maximum pin access point within a limited cell height. However, notice that there is no M1 in the empty area (see the red box), and this may or may not be favorable from the I/O accessibility perspective. We separate the pin-accessibility issue into two cases:

- Case 1: I/Os fewer than tracks Let us assume a BUFX1 design where the layout can use three M1, but it needs two M1 for pins A and Z. In this case, since Tracks(4) > I/Os(2), BUFX1 can be routed using M1 and M2 assuming that the surrounding of BUFX1 is not congested.
- **Case 2:** I/Os more than tracks In a case where AOI221 consists of 6 I/Os (5 inputs and 1 output), note that Tracks(4) < I/Os(6). The number of pins in AOI221 is greater than the number of tracks in M0, which means the router may use the M3 and M4 for routing.

For **Case 1**, it is typically preferable to leave the empty area since an empty area may allow the router to perform a better M0-M1 routing, whereas for **Case 2**, the opposite is true. Since the router is facing many I/Os for connection



**FIGURE 6.** (a) OR3  $\times$  1 layout having natural 'empty area'. (b) AOI221  $\times$  1 layout with input pin B2 added in the 'empty area.' A significant improvement in pin accessibility is achieved by increasing the number of I/O access points (D2), as well as by arranging additional pins in empty areas of (b). Figure modified from [13].



**FIGURE 7.** Re-designed standard cell layout of  $AO122 \times 1$ . The gray box represents the original  $AO122 \times 1$  layout. Since this cell has more than five I/O pins (A1, A2, B1, B2, and ZN), we increase the cell area solely for the purposes of adding additional A1 and A2 pins. Figure modified from [13].

in this Case 2, using more wire segments for a pin is better in terms of routing quality. Therefore, we figure that **D3** is useful only in **Case 2**. Regarding this, Fig. 6 (b) shows the AOI221  $\times$  1 layout with one additional segment placed in the empty area. By placing one more segment for input pin C1 on top of input pin B1, **Case 2** can retain better pin access points. Note that placing one more segment for a pin does not increase the transistor count. As illustrated in Fig. 6 (b), add metal to the poly used by input C1.

Note that, when using Design #3, the designer should understand adding which pin is providing the best results. As shown in Fig. 6 (b), the access point of the input pin B1 becomes smaller than other pins because of the additional segment of input C1. In this case, we need to decide whether we want to add a segment for input pin C2 or another input pin (such as A, B, or C1). More details are in Sec. IV to solve this issue.

#### D. DESIGN #4 (D4)-WIDER CELL AREA FOR BETTER PIN PLACEMENT

Increasing an SDC layout area is another method of improving the **Case 2** pins accessibility. When designing the traditional SDCs, it is essential to minimize SDC area according to the Pin-Area Cost (PAC) and Pin-Resolution Cost (PRC) values [14]. However, studies reported that



FIGURE 8. A layout connecting two BUFX4 cells. (a) requires M2 and M3 for I/O pin connection or detours for its connection, and (b) doesn't. Figure modified from [13].

SDC area does not have to be the minimum to achieve the smallest overall chip area [18]. Also, as mentioned in Sec. III-C, NSFETs have a smaller cell height while improving performance compared to FinFETs. This means that, compared to the larger cells of the previous node, the trade off of area (width) increase is relatively minimal compared to other larger-sized cells for additional pins. Understanding this, we generate SDCs having pin-only areas and increase the number of pin-access points. Fig. 7 represents the original 3 nm NSFET AOI22  $\times$  1 layout (grey box) and a re-designed AOI22  $\times$  1 layout with a wider area. Due to the difficulty of designing a long A1 input pin in the conventional 3 nm layout, we secure an additional area for the A1 pin. Increasing the pin access points of a specific I/O pin by extending an SDC area is also a way to increase pin accessibility.

#### E. DESIGN #5 (D5)-SWITCHING I/O PIN LOCATIONS

Optimization of pin location is also an important strategy for resolving the problem of hard-to-access I/O pin cells. In the 7 nm FinFET PDK [21], the pins are designed as bidirectional. The use of bi-directional pins enables the router to have more than one pin access point, which is extremely advantageous during the detailed routing process. However, as process technology advances, pin direction should be unidirectional [28]. For a uni-directional pin, the location of the I/O pins is an essential factor in the actual layout. Fig. 8 shows how pin optimization can be done with the uni-directional layout we applied. In Fig. 8 (a), where cells are placed on the top and bottom of each other, the router should use upper metal (= M2, M3) or detour to connect the ZN (output pin of the cell B), not M1 (= the signal routing metal for NS3K) for routing. The use of upper metal increases the number of vias, and the minimum area of M1 metal also increases, thereby raising the process difficulty [28]. In addition, considering that SDC uses smaller number of routing tracks (e.g., < 4) the impact of a detouring pin is more critical in the latest technology nodes. Due to these issues, as shown in Fig. 8 (b),

| TABLE 2. Not | ations for | calculating | the pin | access | probability | y. |
|--------------|------------|-------------|---------|--------|-------------|----|
|--------------|------------|-------------|---------|--------|-------------|----|

| Term                                       | Description                                         |
|--------------------------------------------|-----------------------------------------------------|
| Т                                          | Total set of SDC signal tracks                      |
| P                                          | Total set of SDC I/O pins                           |
| t                                          | Subset of the set <i>T</i>                          |
| $p_{\mathrm{i}}, p_{\mathrm{o}}$           | Elements of the set P                               |
| $p_{\mathrm{i,n}}, p_{\mathrm{o,n}}$       | $n^{\text{th}}$ input and output pins of an SDC 'X' |
| $M_1$                                      | The weight of the M1 pin access probability         |
| $t_{\rm p}^{\rm i,n}, t_{\rm p}^{\rm o,n}$ | Pin access points of the cross point                |
| $S(p_{i/o,n})$                             | Total sum of the input pin access probability       |
| C(P(X))                                    | Total pin access probability of an SDC 'X'          |

a re-designed layout can allow routing to be done using M1 only by properly allocating I/O pin locations.

#### F. COMBINING DESIGN METHODOLOGIES

The design methodologies (from D1 to D5) we discussed do not always help each other from the pin accessibility perspective. For example, it is hard to add one more pin (D3) while increasing the access point of all I/O pins (D2). Also, when increasing the area of the pins (D4), it is not available to increase the access points of all I/O pins (D2). Thus, we group the five design methodologies by three cases to fit NSFET.

- Scheme A: All\_acccess Using D1-D2-D5 methods
- Scheme B: More\_pins Using D1-D3-D5 methods
- Scheme C: Area\_Increase (Area\_Incr) Using D1-D4 methods

We combine the design methodologies in each scheme. Scheme A (All\_Access) means to increase the access point of all I/O pins, Scheme B (More\_pins) uses one more pin, and Scheme C (Area\_Incr) means to increase the pin-only area. Note that, in **Scheme C**, we assume that D5 is not used. Since we use D4, only the cell corresponding to Case 2 (I/Os more than the track), it is challenging to consider the connection between other cells (D5). Therefore, we use the above three schemes in all sections from now on.

#### **IV. ANALYTIC MODELING OF PIN ACCESSIBILITY**

This section presents a thorough analysis of the pin accessibility on our five layout design methods. There are two main approaches to analyzing the pin accessibility: algorithmic and data-driven [29]. The algorithmic approach analyzes the pin accessibility of an SDC without tampering design rule violations (DRVs) [30]. In the data-driven method, it selects a feature tile and then performs analysis on the DRVs reported on the tile [31]. We propose both approaches, and we propose a novel pin accessibility analysis method that can be used in SDCs with extremely-low track counts. In this section, we present all our analysis examples based on 3 nm AOI221 $\times$  1 layout for a clear understanding and comparison.

#### A. BASIC TERMS AND GOALS OF THE PROPOSED ANALYTIC MODELING

Fig. 9 (a) illustrates the AOI221  $\times$  1 layout using basic terms in Table 2. Our technology node uses 4 signal tracks. Thus, |T|



FIGURE 9. (a) AOI221 × 1 pin access probability notation using Scheme A. (b-c) The step of the Sec. IV-B2 using original 3 nm NSFET AOI221 x 1 layout. (d) Calculation step of Sec. IV-B3, IV-B4. The proposed method, pin access probability calculation (PAPC), should follow four processes.

is 4. t means a subset of tracks, and tracks #1 to #4 are notated as a,b,c,d ( $t = \{a\},\{b\},\{c\},\{d\}$ ). Since AOI221 × 1 has five input pins and one output pin as  $P = \{p_{i,1}, p_{i,2}, p_{i,3}, p_{i,4}, p_{i,5}, \dots, p_{i,4}\}$  $p_{o,1}$ , respectively. We define  $|S(p_{i/o,n})|$  and |C(P(X))| using the following Equations:

$$\mathbf{S}(p_{\mathbf{i}/\mathbf{o},n})| = \sum_{t \in T} (t_p^{\mathbf{i}/\mathbf{o},n}) \quad (1)$$

$$|\mathbf{C}(P(X))| = |\mathbf{T}| \times \sum_{k=1}^{|\mathbf{T}|} \sum_{n=1}^{P-1} (|\mathbf{S}(p_{n,k})|) + \mathbf{M}_1 \quad (2)$$

$$Target\_goal = max(|C(P(X))|) \quad (3)$$

 $|S(p_{i/0,n})|$  is the sum of all pin access probabilities for each I/O pin. |C(P(X))| refers to the sum of the pin access probability values of all I/O pins in an SDC and the pin access probability weight of M1. Therefore, as in Equation 3, |C(P(X))| indicates that better pin accessibility is achieved in higher values.

#### **B. MATHEMATICAL FORMULA OF PIN ACCESS** PROBABILITY

The following two are the representative methods for algorithm-based analysis: First is the method of calculating the remaining access points of pin p when the router approaches a specific pin p [23]. Second is the method of calculating the pin access points by giving a penalty when obstacles exist in the pattern of pin p [31]. However, in the case of [23], vertical access to the pin was not taken into account, and in the case of [31], only one net connected to the pin p was fixed, so it did not consider the other directions that the router can access (e.g., up, down, etc.).

Our proposed placement method of I/O pins in layout, pin access probability calculation (PAPC), considers all directions and probabilities when the router accesses a pin.

The main goal is to find a high value of |C(P(X))| as in Equation 3. We illustrate the four steps when performing PAPC:

#### 1) EXTRACTION

We extract all pin cross points on the track to be counted. Fig. 9 (b) shows the pin cross points of track c (grey box). Input pins B1, B2, A, C1, C2 of AOI221 × 1 are  $c_p^{i,1}$ ,  $c_p^{i,2}$ ,  $c_p^{i,3}$ ,  $c_p^{i,4}$ ,  $c_p^{o,1}$ ,  $c_p^{o,1}$ , and output pin ZN is extracted as  $c_p^{o,1}$ , respectively.

#### 2) FIND THE CASES

This probability is calculated by the directions left/right (L/R) or top/bottom (T/B) (the direction in which the router enters the I/O pin, and the direction in which it is blocked with metals). First, we determine the priority pin to calculate. For example, as shown in Fig. 9 (c), the baseline is the  $c_p^{0,1}$  (grey box). Second, we divided this into two cases: one direction is L/R (= Case A), and the other direction is T/B (= Case B). Case A is a combination of L/R in the router direction and L/R in the metal blocking direction, so there are four cases as follows. Note that, to represent an integer, we multiply all by  $2^{P-1}$ .

- Case A-1:  $t_p^{i,n}$ ,  $t_p^{o,n}$  is placed from 'L', router direction
- Case A-2: t<sup>i,n</sup><sub>p</sub>, t<sup>o,n</sup><sub>p</sub> is placed from 'L', router direction 'L'
  Case A-3: t<sup>i,n</sup><sub>p</sub>, t<sup>o,n</sup><sub>p</sub> is placed from 'R', router direction
- Case A-4:  $t_p^{i,n}$ ,  $t_p^{o,n}$  is placed from 'R', router direction 'R'

In Case B, there is the same number of cases as in Case A, but the direction should be considered as 'T' and 'B'. Note that, Case B uses the Equation 4. As illustrated in Fig. 9 (c), if there is a pre-placed M1 in the path where the router is entering, we assign  $M_1 = -1$ . Otherwise, we assign  $M_1 = 0$ .

$$M_1 \in \mathbb{Z}, M_1 \in [-1, 0]$$
 (4)

#### 3) SUMMATION

This step sums all the values calculated from 2) Find the cases  $(= |S(p_{i/o,n})|)$ . In Case B, we compute the  $|S(p_{i/o,n})|$  using Equation 5.

$$|S(p_{i/o,n})| = \min(2 + M_1, 2)$$
(5)

#### 4) ITERATION AND CALCULATING THE PAPC

We calculate the total PAPC (= |C(P(X))|) by repeating steps 1) to 3) for all I/O pins of SDC (See the Fig. 9 (d)). We use the Equation 2 to compute the |C(P(X))|.

#### C. PIN ACCESS PROBABILITY CALCULATION (PAPC) RESULTS

Table 3 shows the PAPC results of AOI221  $\times$  1. Note that, we will define the calculated value of the each I/O pins as 'Routability Value'. We confirm that  $|C(P(AOI221 \times 1))|$ of Scheme A and B increases about 1.7 times, which is showing better pin accessibility than the original NS3K. Also,

| $p_{\rm i}, p_{\rm o}$ | $\left S(p_{\rm i/o,n})\right $ | Original<br>NS3K | Scheme A<br>(All_Access) | Scheme B<br>(More_Pins) |
|------------------------|---------------------------------|------------------|--------------------------|-------------------------|
| $p_{i,1} = B1$         | S(B1)                           | 39 + 1           | 62 + 2                   | 31 + 1                  |
| $p_{i,2} = B2$         | S(B2)                           | 20 + 1           | 32 + 2                   | 32 + 2                  |
| $p_{i,3} = A$          | S(A)                            | 10 + 1           | 20 + 2                   | 20 + 2                  |
| $p_{i,4} = C1$         | S(C1)                           | 10 + 1           | 20 + 2                   | 51 + 3                  |
| $p_{i,5} = C2$         | S(C2)                           | 39 + 1           | 62 + 2                   | 62 + 2                  |
| $p_{o,1} = ZN$         | S(ZN)                           | 16 + 2           | 32 + 2                   | 32 + 2                  |
| C(P(AO))               | I221X1))                        | 141 (134+7)      | 240 (228+12)             | 240 (228+12)            |

 TABLE 3. Pin access probability calculation (PAPC) results <sup>a</sup>:

 Bigger number is better.

 $^{\rm a}$  This table does not describe Scheme C because the pin access point is the same as Scheme A.

the routability value of the internal pins are lower in the original NS3K and Scheme A (A = 11, 22 and C1 = 11, 22). This indicates that it is difficult for the router to access the internal I/O pins. Since there is one more internal pin C1 on the outside in Scheme B, the value of pin C1 is larger. However, we confirm that pin B1 is relatively small compared to the other methods. In other words, there is no decrease in pin accessibility due to the increased routability value of pin C1 (Scheme A and B have the same |C(P(X))| value). Therefore, when adding a pin as shown in Scheme B, it is more reasonable to add an internal pin (A or C1) as mentioned in Sec. III-C.

#### V. CELL BASED PIN ACCESSIBILITY CHECK

This section verifies the proposed schemes through individual pin accessibility analysis based on a single SDC and SDC-to-SDC unit. When analyzing the pin accessibility of an SDC unit in a full-chip layout, a typical method is to extract and analyze a particular region of the GDSII file. Based on the extracted region, the designer analyzes the SDC or SDC-to-SDC pin accessibility by checking the location of SDC pins and connections between the pins. However, this method has two restraints: First, extracted GDSII regions are all random in nature. Thus, P&R results (and accessibility analysis) are all different depending on the extracted region. Second, significant time is required before performing the place and route (P&R) stage due to the numerous required techfiles (such as, parasitic information file (.tluplus), standard delay constraint file (.sdc), SDC information file (.lef), etc.). Therefore, this section proposes a fast verification method for cell-based pin accessibility.

#### A. COMPARISON OF THE CELL TO ALL-LIBRARY-CELLS

We propose a method of pin accessibility analysis by placing surrounding SDCs to a victim SDC. This method surrounds a target SDC in all eight directions (top, bottom, left, right, and all diagonal directions) using the SDC cells (= all cells) in the library and performs P&R to another victim surrounded by neighbor cells (see Fig. 10). This method provides two advantages: First, this method requires minimum technology files (.lef of the interconnect and SDCs). Thus, fast analysis is possible. Second, this method checks all possibilities for routing blockages and pin accessibility. Our method provides

TABLE 4. Total wirelength of cell-based pin accessibility check. Conv means conventional.  $\triangle 1$  and  $\triangle 2$  represents the difference of NS3K designs to the conventional scheme and Scheme A (All\_Access), respectively.

| Five<br>Basic<br>SDCs <sup>a</sup> | <b>Original</b><br>NS3K<br>(µm) | $\begin{array}{c} \textbf{Conv} \\ \textbf{Method}^{\rm b} \\ (\mu m) \end{array}$ | Δ1<br>(%) | Scheme A<br>(All_Access)<br>(µm) | Δ2<br>(%) |
|------------------------------------|---------------------------------|------------------------------------------------------------------------------------|-----------|----------------------------------|-----------|
| INV                                | 253.44                          | 239.16                                                                             | -5.6      | 235.02                           | -7.3      |
| BUF                                | 257.6                           | 248.78                                                                             | -3.4      | 242.4                            | -6        |
| NAND2                              | 325.9                           | 310.77                                                                             | -4.6      | 304.47                           | -6.6      |
| NOR2                               | 324.9                           | 311.42                                                                             | -4.1      | 305.28                           | -6        |
| DFF                                | 604.92                          | 592.47                                                                             | -2.1      | 589.22                           | -2.6      |
| Averag                             | ge Value                        | -4.0                                                                               |           | -5.7                             |           |

<sup>a</sup> The drive strength of all five cells is uniformly X1.

<sup>b</sup> We choose the worst case of short or long method [24], [25].

a fast and accurate pin accessibility analysis because it does not require a separate extraction of the layout region nor a variety of techfiles.

Fig. 10 (a) illustrates the details of our method. (a) shows a situation where BUFX1 in Set 1 is surrounded by OR2  $\times$  1 and another BUFX1 in Set 2 is surrounded by OR2  $\times$  2. BUFX1 in each set connects to each other. Note that in addition to Fig. 10(a), we create all situations for the victim (BUFX1) and the neighbors (OR2  $\times$  1 or OR2  $\times$  2) using all SDCs in the library (e.g., set 3: BUFX1-DFFX1, set 4: BUFX1-NAND3  $\times$  1). In addition to BUFX1 in Fig. 10 (a) (= baseline SDC), any SDC in the library, such as INVX1 and NAND2  $\times$  1, can be placed as a baseline cell. For a fair comparison, we set all schemes (Scheme A, NS3K, and conventional method) to have the same distance for each set.

In the case of a well-pin-optimized SDC layout, the upper metal (M3) is used relatively less because the router can access the cell's I/O pins well. Fig. 10 (b) shows the GDSII result of the actual P&R (a). Due to the blockage of neighboring cells, it is inevitable to use a higher metal (M3). However, we confirm that the proposed scheme A uses about 320 nm less than NS3K (= 416 nm) by using 96 nm of M3 (= green box). Furthermore, we verify that the router accesses the I/O pins of the proposed scheme A layout well and is able to route without using much upper metal.

Fig. 11 (a), (b), and (c) show the metal usages when performing the P&R phase like Fig. 10 (b).<sup>2</sup> As for the usage rate of M0, the proposed Scheme A has a higher usage rate than other methods illustrated in Fig. 11 (a). This indicates that the router uses lower metal (= M0) due to the enhanced pin accessibility. However, in the case of M1 and M4, we confirm that the proposed Scheme A (= All\_Access) is the lowest, as shown in Fig. 11 (b-c). Note that, since the router used M0 more, M1 and M4 were used relatively less in (b) and (c). Therefore, in the proposed Scheme A, the router uses more lower metals due to the improved pin accessibility compared to other methods.

<sup>&</sup>lt;sup>2</sup>In addition to BUFX1 as the baseline cell, we simulated all SDC cells in the library. However, we illustrate the results of only five basic SDCs (= INVX1, BUFX1, NAND2  $\times$  1, NOR2  $\times$  1, DFFX1) required for P&R.

Table 4 shows the total wire length used in Fig. 10 (a). We analyze the five basic SDCs (= INVX1, BUFX1, NAND2 × 1, NOR2 × 1, DFFX1) in the middle. As shown in Table 4, DFFX1 shows a relatively small wire length reduction rate compared to other cells. In the case of DFFX1, the number of transistors is about 20 more than other basic cells. Thus, it is difficult to design optimized pins. In addition, access to I/O pins is more challenging due to blockages (i.e., internal routing) inside the layout. Through the results, in our proposed Scheme A, we highlight the total wire length of each cell reduces by 3% to 7% compared to the original NS3K. We confirm that our proposed method (Scheme A) performs well on pin optimization compared to the conventional method (-4%) with an average of about -6% (=  $\Delta$ 2).

#### **VI. CHIP LEVEL PLACEMENT AND ROUTING**

This section compares the full-chip results of conventional short/long pin methods with our five novel methodologies (Scheme A, B, and C) in 3 nm NSFET technology (in Sec. III).<sup>3</sup> Before analyzing the full-chip results, we verify the full-chip routing congestion through congestion map of GDSII. In addition, we analyze the wire length of the smallest (AES) and largest (FFT) benchmarks, as well as the overall full-chip results. We maintain the same standard cell count of 49 for a fair comparison by using the 49 standard cells from [12]. All benchmarks have a core utilization of 80% and a clock period of 0.25ns. Our benchmarks are from [32], and we perform placement & routing (P&R) and power measurement to the benchmarks using Synopsys IC Compiler II, StarRC, and PrimeTime. We also checked all the routing DRC violations of benchmarks with Synopsys IC Validator and they have DRC clean layout.

### A. COMPARING SMALL AND LARGE BENCHMARKS

We compare the wire length of the original NS3K, conventional method, and proposed Scheme A using the smallest (AES, cell count  $\approx$  12K) and the largest benchmark (FFT, cell count  $\approx$  720K). Fig. 12 shows the wire length of benchmark AES and FFT. Above all, we report that our Scheme A has the shortest total wirelength compared to NS3K and the conventional method (= choose the worst case of short or long method). As shown in Fig. 12 (a), we report that the wirelength in the proposed Scheme A is the shortest in all metal layers. In benchmark FFT, as in Fig. 12 (b), the wirelength of the metal from M0 to M4 is longer than other methods, but the wirelength of M5 to M9 is shorter. The increased length of the lower metal set means the router has better access to the pins and performs less detour. Thus, it uses fewer upper metal sets. Therefore, as shown in Fig. 12 (b), we demonstrate that the pin accessibility of the proposed Scheme A is improved because less wires are used in the

upper metal set even though the length of the lower metal set is slightly increased.

#### B. IMPROVEMENTS IN FULL-CHIP PPA (POWER, PERFORMANCE, AND AREA)

Table 5 shows our results for the full-chip benchmarks. Our baseline is the full-chip results from the original 3 nm NSFET layout (NS3K) [12]. Note that, in this study, the proposed methodologies are not used alone. We provide the following results for comparison: Short [24], [25] (Previous study based on short pins), Long [24], [25] (Previous study based on long pins), Area\_Incr (**Scheme C**, area increased cells based on D1 and D4), More\_Pins (**Scheme B**, increased pin count based on D1, D3, and D5), and All\_Access (**Scheme A**, applying D1, D2, and D5). Based on our results, layouts applying All\_Access improve the power and the area by 11.0% and 13.2%, respectively. Additionally, the total number of cells and wirelength also decreased on average by 12.0% and 16.0%. This indicates that the All\_Access approach is the most optimal solution in the 3 nm NSFET layout.

Benchmark M256 indicates a significant reduction of area (-11%) and power (-14%) compared with a reduction of total cells (-1%) and wires (-8%) in our All\_Access results. Regarding these results, we note that 1) the significant area and power reduction in the M256 benchmark is from reduced parasitic components by fewer detoured routes, and 2) better pin accessibility leads to usage of buffer or inverter cells with a weaker strength (=less power consumption). This is why significant area and power reduction (-1%).

Although the results of **Scheme A** (= All\_Access) method is the best, the results of the proposed **Scheme B** (= More\_Pins) are also remarkable (benchmark M256 power, area, etc). Note that, in the I/Os more than tracks (= Case 2 in Sec. III-C), we confirm that applying **Scheme B** is a suitable method on the actual layout for better pin accessibility. Therefore, we highlight that the layout methodology of placing one more pin has a very high potential for use in future technology nodes.

#### VII. 4-TRACK CELL DESIGN IN 3 NM AND BEYOND

As the devices shrink as the technology node scales, standard cells should also scale. In 3 nm NSFET, SDCs are expected to be designed in 5 tracks [12], [33], [34]. For further track reduction, studies are focusing on new types of transistors such as forksheet FET (FSFET) and complementary FET (CFET) [35], [36], [37]. When designing a 4-track (or less) SDC layout, routing resources are extremely limited [12]. In particular, in the case of complex cells such as MUX and D-Flip Flop, there is a limit to designing a 4-track SDC layout in 1 height because multiple connections should be made in the horizontal direction. Therefore, designing SDCs in multiple-height (= multi-height) [38], [39], [40] or utilizing the frontend metal resources to solve the limited routing resource [12] could be a good solution.

<sup>&</sup>lt;sup>3</sup>Note that, since Scheme A shows the best index compared to other schemes in Table 5, we compare and analyze focusing on Scheme A.



FIGURE 10. (a) Experiment setting of the cell to all-library-cells for pin accessibility analysis. (b) Actual P&R simulation result of (a). In the case of NS3K, more metal 3 (= red box) was used because of the blockage by the surrounding cells.



FIGURE 11. Result of the five basic SDC (= INVX1, BUFX1, NAND2 × 1, NOR2 × 1, DFFX1) cell to all-library-cells analysis.

Our proposed LTC modification provides a breakthrough in reducing the number of SDC tracks. Using the method proposed in this study, we can implement 1-height 4-track SDCs without using new types of devices such as FSFETs and CFETs. Fig. 13 shows the NOR2  $\times$  1 by reducing one track through the LTC modification method. If we move the input pin A1 located on track #1 to #2 and use the Design #1 mentioned in Sec. III, track #1 in the layout Fig. 13 (a) becomes unnecessary. Therefore, a 1-height 4-track SDC layout design is possible by fully utilizing the proposed method without using FSFET or CFET. However, as illustrated in Fig. 13 (b), the possibility of 4-track SDC design with 1 height should depend only on the intuitive judgment of the designer. Therefore, we propose an algorithm that can determine whether a 4-track SDC in 1 height is possible using LTC modification.

#### A. ALGORITHM FOR 1-HEIGHT 4-TRACK LAYOUT

Algorithm 1 checks whether an SDC can be designed as 1-height 4-tracks. Before discussing the details of our algorithm, we emphasize that LTC modification is a critical process in designing a 4T cell. Based on the 3 nm technology that [12] presented, the LTC layer covers the active region only. Therefore, additional track reduction is not possible even in the INVX1 gate without LTC modification, and our algorithm discusses the process based on the assumption that LTC modification is possible. Regarding Algorithm 1, we first separate the pull-up and pull-down networks in an SDC netlist (= input data), and calculate the total number of transistors. Then, we separate the clusters using the netlist of an SDC. Cluster is a concept introduced in [41], where a set of transistors starting from the power rails become an input to another set of transistors. For example, in the case of NAND2  $\times$  1, the cluster is one because all transistors contribute to one output. However, in the case of AND3  $\times$  1, the output net of NAND3  $\times$  1 goes into the input net of INVX1, so AND3  $\times$  1 has two clusters (= NAND3  $\times$  1 and INVX1). In this concept, we define an INTERNAL (INT) net that is the output net of a cluster, which connects between clusters in a multi-cluster SDC. So, in NAND2  $\times$  1, ZN is the INT net (see Fig.14 (a)). As in AND3  $\times$  1 of Fig.14 (c), the cluster net ZN (INT1: output of NAND3 inside AND3  $\times$  1) and ZN (output of the INV inside AND3  $\times$  1) are the INT nets. If an SDC has the count of INT as 1, it must satisfy the *ORfunction* or *Deviation* to be designed as a 1-height 4-track cell. Note that a 4T cell consists of only 3 signal routing tracks. In addition, two tracks are above the active region of PFET and NFET among these 3 signal tracks. This means that signal routing that does not consist of any connection to the device (e.g., INT net) has only one routing track to use.

#### B. LTC\_MODIFICATION FOR TRACK REDUCTION

The basis of our algorithm is to check how many paths exist from VDD/VSS to the cluster output (INT) for LTC TABLE 5. Full-chip results of the proposed methodologies and original 3 nm node (NS3K) at the same clock frequency. WL - wire length, # wires - number of total wire count, # cells - number of cell count, # Pins - number of pin count, # Input/Output - number of I/O pin count, A - chip area, P - power. Decreased Ratio 1,2,3 (%) represents the difference of original NS3K designs to the short [24], [25] scheme, the long [24], [25] scheme, and our best results (= All\_Access), respectively. The total benchmark average reduction ratio represents the average improvement we gain through full-chip design in benchmarks based on our best results (All\_Access, Scheme A). Scheme A method combining D1-D2-D5 shows the best design improvement compared to the NS3K.

| Bench |               | Original     | Short        | Decreased     | Long         | Decreased | Area_Incr   | More_Pins | All_Access | Decreased |
|-------|---------------|--------------|--------------|---------------|--------------|-----------|-------------|-----------|------------|-----------|
| mark  |               | NS3K         | [24], [25]   | Ratio 1       | [24], [25]   | Ratio 2   | Scheme C    | Scheme B  | Scheme A   | Ratio 3   |
|       | WL (mm)       | 30.06        | 28.87        | -4%           | 26.45        | -12%      | 26.35       | 27.72     | 23.81      | -21%      |
|       | # wires       | 127K         | 122K         | -4%           | 108K         | -15%      | 105K        | 108K      | 97K        | -23%      |
|       | # Cells       | 14368        | 13758        | -4%           | 12687        | -12%      | 12011       | 12847     | 11670      | -19%      |
| AFS   | # Pins        | 73895        | 71513        | -3%           | 67168        | -10%      | 70528       | 67850     | 63097      | -15%      |
| ALO   | # Input       | 58997        | 57225        | -3%           | 53951        | -9%       | 55987       | 54473     | 50897      | -14%      |
|       | # Output      | 14898        | 14288        | -4%           | 13217        | -11%      | 14541       | 13377     | 12200      | -18%      |
|       | A $(\mu m^2)$ | 328.437      | 312.286      | -5%           | 296.546      | -10%      | 336.869     | 314.413   | 274.064    | -17%      |
|       | P (mW)        | 14.1         | 13.4         | -5%           | 12.2         | -13%      | 12.4        | 12.3      | 11.9       | -16%      |
|       | WL (mm)       | 86.14        | 79.99        | -7%           | 79.15        | -8%       | 79.82       | 78.7      | 75.31      | -13%      |
|       | # wires       | 377K         | 385K         | +2%           | 366K         | -3%       | 339K        | 332K      | 330K       | -12%      |
|       | # Cells       | 53066        | 51183        | -4%           | 51267        | -3%       | 52544       | 51627     | 49976      | -6%       |
| DES   | # Pins        | 282099       | 274597       | -3%           | 274903       | -2%       | 280011      | 274343    | 250739     | -11%      |
| DES   | # Input       | 220225       | 214576       | -3%           | 214828       | -2%       | 218659      | 221908    | 196955     | -11%      |
|       | # Output      | 61874        | 59991        | -3%           | 60075        | -3%       | 61352       | 52435     | 56784      | -8%       |
|       | A $(\mu m^2)$ | 2169.51      | 2212.573     | +2%           | 2103.689     | -3%       | 2124.234    | 2209.199  | 1870.889   | -14%      |
|       | P (mW)        | 125.8        | 123.8        | -2%           | 124.1        | -1%       | 126.9       | 126.3     | 122.3      | -3%       |
|       | WL (mm)       | 499.44       | 480.33       | -4%           | 479.59       | -4%       | 446.79      | 467.61    | 440.2      | -12%      |
|       | # wires       | 1378K        | 1418K        | +3%           | 1341K        | -3%       | 1348K       | 1235K     | 1273K      | -8%       |
|       | # Cells       | 136814       | 136913       | +1%           | 136913       | +1%       | 136528      | 136299    | 136047     | -1%       |
| M256  | # Pins        | 830846       | 837774       | +1%           | 831242       | +1%       | 829702      | 828786    | 829618     | -1%       |
| W1250 | # Input       | 670025       | 667721       | +1%           | 670322       | +1%       | 669167      | 668480    | 669105     | -1%       |
|       | # Output      | 160821       | 170053       | -1%           | 160920       | +1%       | 160535      | 160306    | 160513     | -1%       |
|       | A $(\mu m^2)$ | 8328.665     | 8479.213     | +1%           | 8476.148     | +1%       | 7408.732    | 8993.107  | 7408.732   | -11%      |
|       | P (mW)        | 27.4         | 24.6         | -10%          | 24.2         | 12%       | 23.6        | 23.6      | 23.6       | -14%      |
|       | WL (mm)       | 662.8        | 602          | -10%          | 563.03       | -15%      | 531.96      | 579.3     | 520.67     | -21%      |
|       | # wires       | 2262K        | 1771K        | -22%          | 1672K        | -26%      | 1642K       | 1571K     | 1567K      | -31%      |
|       | # Cells       | 288442       | 212092       | -27%          | 208792       | -28%      | 212057      | 212088    | 201947     | -30%      |
| IPEC  | # Pins        | 1550708      | 1245308      | -20%          | 1248908      | -19%      | 1245168     | 1245292   | 1244728    | -20%      |
| JIEG  | # Input       | 1201044      | 971994       | -20%          | 958894       | -20%      | 971889      | 971982    | 971559     | -19%      |
|       | # Output      | 349664       | 273314       | -22%          | 270014       | -23%      | 273279      | 273310    | 273169     | -22%      |
|       | A $(\mu m^2)$ | 13974.55     | 13041.687    | -7%           | 12545.6      | -10%      | 12327.22    | 13526.65  | 12091.2    | -13%      |
|       | P(mW)         | 195          | 207          | +6%           | 187          | -4%       | 173         | 192       | 172        | -12%      |
|       | WL (mm)       | 1772.46      | 2139.5       | +20%          | 2070.8       | +17%      | 1566.26     | 1612.11   | 1537.72    | -13%      |
|       | # wires       | 6696K        | 5947K        | -11%          | 5689K        | -15%      | 4906K       | 4882K     | 4737K      | -29%      |
|       | # Cells       | 712751       | 686018       | -4%           | 677600       | -5%       | 678396      | 678508    | 678473     | -5%       |
| FFT   | # Pins        | 4033131      | 3925851      | -3%           | 3892527      | -3%       | 3895711     | 3896159   | 3896019    | -3%       |
| TT I  | # Input       | 3153786      | 3073239      | -3%           | 3048333      | -3%       | 3050721     | 3051057   | 3050952    | -3%       |
|       | # Output      | 879345       | 852612       | -3%           | 844194       | -4%       | 844990      | 845102    | 845067     | -4%       |
|       | A $(\mu m^2)$ | 31346.7      | 39586.673    | +26%          | 29499.26     | -6%       | 27749.13    | 32179.37  | 27751.6    | -11%      |
|       | P (mW)        | 606          | 628          | +3%           | 610          | -9%       | 551         | 603       | 540        | -11%      |
|       |               |              |              |               |              |           |             |           |            |           |
|       |               |              |              |               |              |           | Wire length | # Cells   | Area       | Power     |
|       |               | Total benchi | nark average | reduction rat | io (D1-D2-D5 | 5)        | -16.0%      | -12.0%    | -13.2%     | -11.0%    |

adjustment. We use Euler Path to calculate the path between the vertices. Euler Path provides the most optimal transistor placement for Source/Drain sharing [42]. In addition, this function reports the possible edges for track reduction using LTC modification. Fig. 14 shows an example of how *LTC\_Modification* works in Algorithm 1. Vertices such as VDD in the pull-up network (PFET.VDD), VSS in the pulldown network (NFET.VSS), and INTERNAL (INT) are input variables. As in NAND2 × 1 of Fig. 14 (a), NAND2 × 1 consists of one cluster, INT is ZN, and there is one path from INT to VSS, which matches *ORfunction* (line 21 of Algorithm 1). In INVX2 of Fig. 14 (b), there are two paths from VDD to INT and two paths from VSS to INT. That is, the number of paths in each pull-up and down network is identical, and this case matches function *Deviation* line 26 of Algorithm 1.

As shown in the green box (for edges) of Fig. 14 (a) and (b), the edges inside the green box can be used for 4-track design using LTC modification. As mentioned in Fig. 4 (b) (= D1 Output pin flexibility in Sec. III), our LTC modification extends the LTC that connects the output net. Therefore, SDCs in (a) NAND2  $\times$  1 and (b) INVX2 can use the LTC modification method through INT net and we can design those SDCs in 1-height 4-track.<sup>4</sup>

<sup>4</sup>SDC track reduction using LTC modification must be supported by the latest process and strict design rules [37]





**FIGURE 12.** Total wirelength (illustrated next to the library type) and wirelength in each metal: (a) Benchmark AES (cell count  $\approx$  12K) and (b) Benchmark FFT (cell count  $\approx$  720K).



**FIGURE 13.** (a) Modified 3 nm NSFET NOR2  $\times$  1 layout using Design #1. (b) Reduced track version of NOR2  $\times$  1 layout. If we use LTC modification and move the pin placement, we can reduce one SDC track. Note that, if we want to design like (b), we should establish accurate design rules first [37].

However, if there are 2 clusters, both the output net (= INT1) of cluster 1 and the output net (INT2) of cluster 2 are INT as shown in Fig. 14 (c) AND3  $\times$  1. If we compute the path based on each INT1 and INT2 added by a function 'Summation', the pull-up network has 4 paths, and the pull-down network has 2 paths. As shown in line 25, since a difference in the number of paths in each pull-up and pull-down network is two or more, (c) AND3  $\times$  1 cannot be designed as a 1-height 4 track. Also, in the case of complex cells (DFFX1, MUX2  $\times$  1) with a large number of transistors, a multi-height cell design is required since the number of

# Algorithm 1 Checking the 1-Height 4-Track SDC Layout Design

**Data:** Standard cell netlist **Result:** 1-height 4-track SDCs

- 1 PFET = pull\_up\_network(SDC netlist);
- 2 NFET = pull\_down\_network(SDC netlist);
- 3 Separate standard cell clusters;
- 4 i = 0;
- 5 INTERNAL[i] = Cluster-to-Cluster net;
- 6 if HowMany(Cluster-to-Cluster Path) < 2 then
  - **if** (the number of the cluster != 1) **then**
- 8 INTERNAL[i+1] = internal output net;

```
9 end
```

7

- if ((ORfunction(U/D network, INTERNAL) || Deviation(U/D network, INTERNAL)) == 1) then
- 11 Output = **True**;

12 end

- 13 else
- 14 Output = False;

15 end

- 16 Function Summation (U/D network, INTERNAL):
- 18 **return** Summation
- 19 Function ORfunction (U/D network, INTERNAL):
- **if** ((Summation(PFET.VDD, INTERNAL) || Summation(NFET.VSS, INTERNAL)) == 1) **then**
- 21 | ORfunction = **True**;
- 22 e
- 23 return ORfunction
- 24 Function Deviation (U/D network, INTERNAL):
- 25 **if** (|Summation(PFET.VDD, INTERNAL) -Summation(NFET.VSS, INTERNAL)|)  $\leq 1$  then
- 26 Deviation = **True**;
- 27 end
- 28 return Deviation
- 29 \*TR: transistor, U/D: Pull-up/down

cluster-to-cluster paths is much more than two (DFFX1 path count: 5, MUX2  $\times$  1: 3).

Tab. 6 shows the result of applying Algorithm 1 to 49 SDCs [12]. 17 SDCs are available for 1-height 4-track design, and 32 SDCs must use 2-height to enable 4-track design. For example, AND3  $\times$  1 and MUX2  $\times$  1 should be designed by 2-height as shown in Fig. 15 (a) and (c). In these cases, the 2-height design increased area by 2% and 1%, respectively, compared to the previous 5-track SDC layout in Fig. 15 (b) and (d). However, since 1-height SDCs have very few tracks for I/O pin placement and internal routing, they must be designed with 2-height.



FIGURE 14. Details of Algorithm 1. (a) NAND2 × 1, (b) INVX2,

(c) AND3  $\times$  1. When there is only one VDD/VSS-INT path as in (a) or the count of VDD/VSS-INT path is the same as in (b), the total number of SDC tracks can be reduced by one. As in (c), if the difference in the number of pull-up/down paths is two or more (pull-up network path: 3, pull-down network path: 1), the corresponding SDC should be designed as multi-height (2)-height).



FIGURE 15. 5-track standard cell and 2-height 4-track standard cell designs. (a) 2-height 4-track MUX2  $\times$  1. (b) NS3K MUX2  $\times$  1. (c) AND3  $\times$  1 layout of Fig. 14 (c) using 2-height 4-track. (d) NS3K AND3  $\times$  1.

TABLE 6. 1-height and 2-height 4-track design of SDCs.

| 1-height Standard Cells                                                                                                                                                                                                                                                          |               |  |  |  |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|--|--|--|
| AND2X1, AND2X2, AOI21X1, AOI22X1, AOI211X1<br>BUFX1, BUFX2, INVX1, INVX2, INVX4<br>NAND2X1, NAND3X1, NOR2X1, NOR3X1<br>OAI21X1, OR2X1, OR2X2                                                                                                                                     | 17            |  |  |  |
| 2-height Standard Cells                                                                                                                                                                                                                                                          | # of<br>Cells |  |  |  |
| AND2X4, AND3X1, AOI221X1, AOI222X1, BUFX4<br>BUFX8, BUFX16, BUFX32, DFFX1, DFFX2<br>FAX1, HAX1, INVX8, INVX16, INVX32, INVX32_2<br>INVX32_3, MUX2X1, MUX2X2, NAND2X2<br>NAND2X4, NOR2X2, NOR2X4, OAI22X1<br>OAI221X1, OAI222X1, OR2X4, OR3X1<br>XNOR2X1, XNOR2X2, XOR2X1, XOR2X2 | 32            |  |  |  |
| Total Standard Cells                                                                                                                                                                                                                                                             | 49            |  |  |  |

Therefore, designs with 2-height (= 2 or more heights) cause a little area increase, but it is the only possible design style in 4-track and beyond SDCs. Therefore, to design a proper 4-track NSFET standard cell library, we should design a mix of 1-height and multi-height cells based on the horizontal track requirement [43].

VIII. CONCLUSION

In this study, we proposed 5 novel layout design methodologies that can be applied to the 3 nm Nanosheet FET (NSFET) library. Our methodologies are possible via the Local Trench Contact (LTC) modification method. Our fullchip experiments by combining each methodology showed that the proposed methodologies significantly enhance the chip PPA (power, performance, area). Our improved 3 nm library reduces the number of total wires by 20.7%, total wirelength by 16.0%, number of cells by 11.3%, area by 15.9%, and power by 11.0% thanks to better pin accessibility. Also, we proposed an SDC pin accessibility verification method before the chip-level design. Our fast analysis method is possible by using minimum technology files and performs analysis to all possible cases in actual layout. We highlight that this is the first study reporting the methodology to optimize pin accessibility and shrink routing tracks in layout for 3 nm technology node and beyond. Our future work includes: 1) studying a general rule for post-3 nm SDC design for optimal pin accessibility, 2) analyzing the 3 nm SDCs pin optimization for CPU, GPU, memory, and 3) finding new SDC design methodologies for future transistors (e.g., Forksheet FET and CFET).

#### ACKNOWLEDGMENT

The EDA tool was supported by the IC Design Education Center (IDEC), South Korea.

(Jaehoon Jeong and Yunjeong Shin contributed equally to this work.)

#### REFERENCES

- S.-Y. Wu et al., "A 16nm FinFET CMOS technology for mobile SoC and computing applications," in *IEDM Tech. Dig.*, Dec. 2013, pp. 911–914.
- [2] C. C. Wu et al., "High performance 22/20nm FinFET CMOS devices with advanced high-K/metal gate scheme," in *IEDM Tech. Dig.*, Dec. 2010, pp. 2711–2714.
- [3] G. Bae et al., "3nm GAA technology featuring multi-bridge-channel FET for low power and high performance applications," in *IEDM Tech. Dig.*, Dec. 2018, pp. 2871–2874.
- [4] A. Mocuta, P. Weckx, S. Demuynck, D. Radisic, Y. Oniki, and J. Ryckaert, "Enabling CMOS scaling towards 3nm and beyond," in *Proc. IEEE Symp. VLSI Technol.*, Jun. 2018, pp. 147–148.
- [5] M. Drapeau, V. Wiaux, E. Hendrickx, S. Verhaegen, and T. Machida, "Double patterning design split implementation and validation for the 32nm node," in *Design for Manufacturability Through Design-Process Integration*, vol. 6521. Bellingham, WA, USA: SPIE, 2007, p. 652109.
- [6] V. M. B. Carballo et al., "Single exposure EUV patterning of BEOL metal layers on the IMEC iN7 platform," in *Extreme Ultraviolet (EUV) Lithography*, vol. 10143. Bellingham, WA, USA: SPIE, 2017, p. 1014318.
- [7] Y. N. Wu, J. S. Emer, and V. Sze, "Accelergy: An architecture-level energy estimation methodology for accelerator designs," in *Proc. IEEE/ACM Int. Conf. Computer-Aided Design (ICCAD)*, Nov. 2019, pp. 1–8.
- [8] T.-C. Yu, S.-Y. Fang, H.-S. Chiu, K.-S. Hu, P. H. Tai, C. C. Shen, and H. Sheng, "Pin accessibility prediction and optimization with deeplearning-based pin pattern recognition," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 40, no. 11, pp. 2345–2356, Nov. 2021.
- [9] D. Park, D. Lee, I. Kang, S. Gao, B. Lin, and C.-K. Cheng, "SP&R: Simultaneous placement and routing framework for standard cell synthesis in sub-7nm," in *Proc. 25th Asia South Pacific Design Autom. Conf. (ASP-DAC)*, Jan. 2020, pp. 345–350.
- [10] X. Xu, B. Cline, G. Yeric, B. Yu, and D. Z. Pan, "Self-aligned double patterning aware pin access and standard cell layout co-optimization," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 34, no. 5, pp. 699–712, May 2015.

- [11] X. Xu, N. Shah, A. Evans, S. Sinha, B. Cline, and G. Yeric, "Standard cell library design and optimization methodology for ASAP7 PDK: (Invited paper)," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD)*, Nov. 2017, pp. 999–1004.
- [12] T. Kim, J. Jeong, S. Woo, J. Yang, H. Kim, A. Nam, C. Lee, J. Seo, M. Kim, S. Ryu, Y. Oh, and T. Song, "NS3K: A 3-nm nanosheet FET standard cell library development and its impact," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 31, no. 2, pp. 163–176, Feb. 2023.
- [13] J. Jeong, J. Ko, and T. Song, "A study on optimizing pin accessibility of standard cells in the post-3 nm node," in *Proc. ACM/IEEE Int. Symp. Low Power Electron. Design*, Aug. 2022, pp. 1–20.
- [14] T. Taghavi, Z. Li, C. Alpert, G.-J. Nam, A. Huber, and S. Ramji, "New placement prediction and mitigation techniques for local routing congestion," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design* (*ICCAD*), Nov. 2010, pp. 621–624.
- [15] C. Han, A. B. Kahng, L. Wang, and B. Xu, "Enhanced optimal multi-row detailed placement for neighbor diffusion effect mitigation in sub-10 nm VLSI," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 38, no. 9, pp. 1703–1716, Sep. 2019.
- [16] A. B. Kahng, J. Kuang, W.-H. Liu, and B. Xu, "In-route pin accessdriven placement refinement for improved detailed routing convergence," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 41, no. 3, pp. 784–788, Mar. 2022.
- [17] S. I. Heo, A. B. Kahng, M. Kim, L. Wang, and C. Yang, "Detailed placement for IR drop mitigation by power staple insertion in sub-10nm VLSI," in *Proc. Design, Autom. Test Eur. Conf. Exhibition*, Mar. 2019, pp. 830–835.
- [18] M.-K. Hsu, N. Katta, H. Y. Lin, K. T. Lin, K. H. Tam, and K. C. Wang, "Design and manufacturing process co-optimization in nano-technology (designer track paper)," in *Proc. IEEE/ACM Int. Conf. Computer-Aided Design (ICCAD)*, Nov. 2014, pp. 574–581.
- [19] L. Liebmann, J. Zeng, X. Zhu, L. Yuan, G. Bouche, and J. Kye, "Overcoming scaling barriers through design technology CoOptimization," in *Proc. IEEE Symp. VLSI Technol.*, Jun. 2016, pp. 1–2.
- [20] Y. M. Lee, M. H. Na, A. Chu, A. Young, T. Hook, L. Liebmann, E. J. Nowak, S. H. Baek, R. Sengupta, H. Trombley, and X. Miao, "Accurate performance evaluation for the horizontal nanosheet standardcell design space beyond 7nm technology," in *IEDM Tech. Dig.*, Dec. 2017, p. 29.
- [21] L. T. Clark, V. Vashishtha, L. Shifren, A. Gujja, S. Sinha, B. Cline, C. Ramamurthy, and G. Yeric, "ASAP7: A 7-nm finFET predictive process design kit," *Microelectron. J.*, vol. 53, pp. 105–115, Jul. 2016.
- [22] S. Chung, J. Jeong, and T. Kim, "Improving performance and power by co-optimizing middle-of-line routing, pin pattern generation, and contact over active gates in standard cell layout synthesis," in *Proc. ACM/IEEE Int. Symp. Low Power Electron. Design*, Aug. 2022, pp. 1–12.
- [23] J. Seo, J. Jung, S. Kim, and Y. Shin, "Pin accessibility-driven cell layout redesign and placement optimization," in *Proc. 54th ACM/EDAC/IEEE Design Autom. Conf. (DAC)*, Jun. 2017, pp. 1–6.
- [24] C.-W. Tai and R.-B. Lin, "Morphed standard cell layouts for pin length reduction," in *Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI)*, Jul. 2019, pp. 94–99.
- [25] S.-R. Fang, C.-W. Tai, and R.-B. Lin, "On benchmarking pin access for nanotechnology standard cells," in *Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI)*, Jul. 2017, pp. 237–242.
- [26] D. Prasad, S. S. Teja Nibhanupudi, S. Das, O. Zografos, B. Chehab, S. Sarkar, R. Baert, A. Robinson, A. Gupta, A. Spessot, P. Debacker, D. Verkest, J. Kulkarni, B. Cline, and S. Sinha, "Buried power rails and back-side power grids: Arm CPU power delivery network design beyond 5nm," in *IEDM Tech. Dig.*, Dec. 2019, pp. 1911–1914.
- [27] N. Loubet et al., "Stacked nanosheet gate-all-around transistor to enable scaling beyond FinFET," in *Proc. Symp. VLSI Technol.*, Jun. 2017, pp. T230–T231.
- [28] K. Vaidyanathan, L. Liebmann, A. Strojwas, and L. Pileggi, "Sub-20 nm design technology co-optimization for standard cell logic," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD)*, Nov. 2014, pp. 124–131.
- [29] S. Kim and T. Kim, "Pin accessibility-driven placement optimization with accurate and comprehensive prediction model," in *Proc. Design, Autom. Test Eur. Conf. Exhibition*, Mar. 2022, pp. 778–783.
- [30] Y. Ding, C. Chu, and W.-K. Mak, "Pin accessibility-driven detailed placement refinement," in *Proc. ACM Int. Symp. Phys. Design*, Mar. 2017, pp. 1–26.

- [31] W.-T.-J. Chan, P.-H. Ho, A. B. Kahng, and P. Saxena, "Routability optimization for industrial designs at sub-14nm process nodes using machine learning," in *Proc. ACM Int. Symp. Phys. Design*, Mar. 2017, pp. 1–17.
- [32] OpenCores. Accessed: Nov. 9, 2002. [Online]. Available: https:// openCores.org/
- [33] J.-S. Yoon, J. Jeong, S. Lee, J. Lee, S. Lee, R.-H. Baek, and S. K. Lim, "Performance, power, and area of standard cells in sub 3 nm node using buried power rail," *IEEE Trans. Electron Devices*, vol. 69, no. 3, pp. 894–899, Mar. 2022.
- [34] A. Veloso, T. Huynh-Bao, P. Matagne, D. Jang, G. Eneman, N. Horiguchi, and J. Ryckaert, "Nanowire & nanosheet FETs for ultra-scaled, highdensity logic and memory applications," *Solid-State Electron.*, vol. 168, Jun. 2020, Art. no. 107736.
- [35] B. Chehab, P. Weckx, J. Ryckaert, D. Jang, D. Verkest, and A. Spessot, "Standard cell architectures for N2 node: Transition from FinFET to nanosheet and to forksheet device," in *International Society for Optics and Photonics*, vol. 11328. Bellingham, WA, USA: SPIE, 2020, p. 1132807.
- [36] E. Park and T. Song, "An optimized standard cell design methodology targeting low parasitics and small area for complementary FETs (CFETs)," in *Proc. 18th Int. Soc Design Conf. (ISOCC)*, Oct. 2021, pp. 395–396.
- [37] M. K. Gupta, P. Weckx, P. Schuddinck, D. Jang, B. Chehab, S. Cosemans, J. Ryckaert, and W. Dehaene, "A comprehensive study of nanosheet and forksheet SRAM for beyond N5 node," *IEEE Trans. Electron Devices*, vol. 68, no. 8, pp. 3819–3825, Aug. 2021.
- [38] S. A. Dobre, A. B. Kahng, and J. Li, "Design implementation with noninteger multiple-height cells for improved design quality in advanced nodes," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 37, no. 4, pp. 855–868, Apr. 2018.
- [39] U. Brenner, "BonnPlace legalization: Minimizing movement by iterative augmentation," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 32, no. 8, pp. 1215–1227, Aug. 2013.
- [40] N. K. Darav, A. Kennings, D. Westwick, and L. Behjat, "High performance global placement and legalization accounting for fence regions," in *Proc. IEEE/ACM Int. Conf. Computer-Aided Design (ICCAD)*, Nov. 2015, pp. 514–519.
- [41] T. Song, "Opportunities and challenges in designing and utilizing vertical nanowire FET (V-NWFET) standard cells for beyond 5 nm," *IEEE Trans. Nanotechnol.*, vol. 18, pp. 240–251, 2019.
- [42] Uehara and Vancleemput, "Optimal layout of CMOS functional arrays," *IEEE Trans. Comput.*, vols. C-30, no. 5, pp. 305–312, May 1981.
- [43] S. Abazyan, "Standard cell library enhancement for mixed multi-height cell design implementation," in *Proc. IEEE East-West Design Test Symp.* (*EWDTS*), Sep. 2021, pp. 1–4.



**JAEHOON JEONG** (Graduate Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from the School of Electronics Engineering, Kyungpook National University (KNU), South Korea, in 2021 and 2023, respectively. He is currently a Circuit Design Engineer with the Foundry Design Service Team, Samsung Electronics. His main research interests include the physical/logical design beyond 7 nm, VLSI/SOC physical implementation, DTCO, and commercial CAD tools.

### **IEEE**Access



**YUNJEONG SHIN** (Graduate Student Member, IEEE) received the B.S. degree in electrical engineering from the School of Electronics Engineering, Kyungpook National University (KNU), Daegu, South Korea, in 2023. She is currently pursuing the M.S. degree with the Intelligent Three-Dimensional Very Large Scale Integrated Circuits (I3D VLSI) Laboratory, KNU.

Her research interest includes advanced PDK development in 3/2 nm.



**JONGBEOM KIM** (Graduate Student Member, IEEE) received the B.S. degree in electrical engineering from the School of Electronics Engineering, Kyungpook National University (KNU), Daegu, South Korea, in 2022, where he is currently pursuing the M.S. degree with the Intelligent Three-Dimensional Very Large Scale Integrated Circuits (I3D VLSI) Laboratory.

His research interest includes the multi-valued logic design and highly easy-to-implement design of ternary circuits.



**HYUNDONG LEE** (Graduate Student Member, IEEE) received the B.S. degree in electrical engineering from the School of Electronics Engineering, Kyungpook National University (KNU), Daegu, South Korea, in 2022. He is currently pursuing the M.S. degree with the Intelligent Three-Dimensional Very Large Scale Integrated Circuits (I3D VLSI) Laboratory, KNU. His current research interests include the circuit-level design of the ternary processor and PDK development in 5/3 nm.



**JONGHYUN KO** (Graduate Student Member, IEEE) received the B.S. degree in electrical engineering from the School of Electronics Engineering, Kyungpook National University (KNU), Daegu, South Korea, in 2021, where he is currently pursuing the M.S. degree with the Intelligent Three-Dimensional Very Large Scale Integrated Circuits (I3D VLSI) Laboratory. His research interest includes the multi-valued logic design and fabrication-friendly design of ternary circuits.



**TAIGON SONG** (Member, IEEE) received the B.S. degree in electrical engineering from Yonsei University, Seoul, South Korea, in 2007, the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2009, and the Ph.D. degree from the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA, in 2015.

He joined the School of Electronics Engineering, Kyungpook National University (KNU), Daegu, South Korea, as an Assistant Professor, in 2019. Prior to joining KNU, he was a Senior Research and Development Engineer with Synopsys Inc. His research interests include modeling, design, and analysis in advanced VLSI technologies, including 3D integrated circuits (3D ICs) and standard cells of advanced transistor technology.