

Received April 25, 2021, accepted May 12, 2021, date of publication May 17, 2021, date of current version May 26, 2021. *Digital Object Identifier* 10.1109/ACCESS.2021.3080710

# A 120-150 GHz Power Amplifier in 28-nm CMOS Achieving 21.9-dB Gain and 11.8-dBm P<sub>sat</sub> for Sub-THz Imaging System

# JINCHENG ZHANG<sup>1</sup>, (Student Member, IEEE), TIANXIANG WU<sup>1</sup>, (Student Member, IEEE), LIHE NIE<sup>1</sup>, SHUNLI MA<sup>®1</sup>, (Member, IEEE), YONG CHEN<sup>®2,3</sup>, (Senior Member, IEEE), AND JUNYAN REN<sup>®1</sup>, (Member, IEEE)

<sup>1</sup>State Key Laboratory of ASIC and System, Fudan University, Shanghai 201203, China
<sup>2</sup>State-Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau, Macau 999078, China
<sup>3</sup>Department of Electrical and Computer Engineering, Faculty of Science and Technology, University of Macau, Macau 999078, China

Corresponding authors: Shunli Ma (shunlima@fudan.edu.cn) and Junyan Ren (junyanren@fudan.edu.cn)

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 61934008, and in part by the National Key Research and Development Program of China under Grant 2018YFB2202500.

**ABSTRACT** This paper presents a high-gain D-band power amplifier (PA) fabricated with 28-nm CMOS technology for a sub-terahertz frequency modulated continuous wave imaging system. It adopts two-channel power combining using artificial transmission lines to absorb the parasitic capacitance of the ground-signal-ground pad. The layout of the transistors and neutralization capacitors are optimized to improve the maximum stable gain, stability, and robustness. Asymmetrically magnetically coupled resonators are used in inter-stage and input matching networks to extend the operating bandwidth. The PA achieves a peak power gain of 21.9 dB and maximum output power of 11.8 dBm with 10.7% of power-added efficiency. Also, this PA can achieve higher than 10 dBm output power over the frequency range of 120-150 GHz.

**INDEX TERMS** D-band, power amplifier (PA), sub-terahertz (sub-THz), CMOS, power combining, imaging system, frequency modulated continuous wave (FMCW).

#### I. INTRODUCTION

Millimeter-wave technology has been widely exploited for medical imaging and security scanning applications thanks to its significant advantages in security over the previous techniques. To implement better image resolution of the observed object, a higher operating frequency is expectedly preferred [1]. D-band (110-170 GHz) is very suitable for sub-terahertz (sub-THz) imaging because it is an atmospheric window with a path attenuation < 1-dB/km [2]. Therefore, D-band imaging systems around 140 GHz have become a hot research interest in these years.

With the rapid development of solid-state circuits over the past decade, prior works concerning the sub-THz integrated circuits have been reported [3]–[7]. However, most of these works are based on III-V technology because of the poor available gain and efficiency of the silicon-based devices in the sub-THz band, which results in higher fabrication cost and

The associate editor coordinating the review of this manuscript and approving it for publication was Giambattista Gruosso<sup>10</sup>.

lower integration. This dilemma is changing in recent years due to the scaling of the silicon process. With the maximum oscillation frequency  $(f_{\text{max}})$  and cut-off frequency  $(f_{\text{T}})$  of the advanced CMOS process up to THz band, CMOS is gradually emerging as a considerable technology to design sub-THz band imaging systems. The power amplifier (PA) is one of the most critical sub-blocks for a D-band imaging system. Yet, it is challenging to achieve enough output power and power gain for a CMOS PA due to the low breakdown voltage and the limited  $f_{\text{max}}$ . Meanwhile, compact chip size and broad operating bandwidth are necessary for PAs in imaging systems to achieve sufficient range resolution and system integration. It raises a comprehensive requirement for the design of the PA in the sub-THz imaging system. Previous works of the D-band CMOS PAs which have been reported [8]-[10] suffer from the poor available gain of the silicon device. Thus, they have to cascade more stages, reducing the efficiency significantly.

This paper reports a three-stage D-band PA targeting the sub-THz frequency modulated continuous wave (FMCW)



FIGURE 1. The top architecture of the proposed D-band PA.

imaging system. Our PA is based on a two-channel powercombing scheme. Optimizing the size and layout topology of the differential pairs including the neutralization capacitors raises the maximum stable gain (MSG) and stability of the device. Matching networks are all designed using the custom low-coupling transformers to minimize the insertion loss and chip area. Our PA achieves a saturated output power ( $P_{sat}$ ) of 11.8 dBm at 135 GHz, and >10-dBm from 120 to 150 GHz. The maximum power-added efficiency (PAE) is 10.7% with 140 mW power consumption.

This paper is organized as follows. Section II shows the top topology of the PA. Section III introduces the considerations in active device optimization. Section IV details the design procedure of the passive device. Section V presents the measurement setup and results. Finally, conclusions are given in Section VI.

# **II. TOP ARCHITECTURE OF THE PROPOSED PA**

Fig. 1 shows the overall topology of the proposed D-band CMOS PA adopting two-channel power combining. Each branch of the PA employs a three-stage common-source pseudo-differential scheme with neutralization capacitance to improve the MSG and stability. The input and output impedances of the single PA are matched to 100  $\Omega$ . Artificial transmission lines with 100  $\Omega$  characteristic impedance are developed to perform the power splitting and power

combining to absorb the 18 fF parasitic capacitance of the ground-signal-ground (GSG) pad.

The transistor size of three stages is  $18-\mu$ m/ 30-nm  $(M_{1a,1b})$ , 18- $\mu$ m/30-nm  $(M_{2a,2b})$ , and 45- $\mu$ m/30-nm  $(M_{3a,3b})$ . The size of the output-stage transistor is determined by the desired saturation output power. To enhance the saturation output power of the PA, a larger transistor size is commonly preferred for the output stage. Yet, to resonate out the capacitance of the transistors, the inductance of transformers in output matching networks will decrease correspondingly as the transistor size increasing. Fig. 2 shows the inductance and insertion loss of the stack transformers versus the radius of the coils with a metal width of 5  $\mu$ m at 140 GHz. The insertion loss becomes significant while the radius of the stack transformer is <15- $\mu$ m, which will seriously degrade the output power and efficiency of the PA. Meanwhile, the influence of phase imbalance and parasitic effects between different fingers are also obvious for large transistors, which will degrade the performance of the active device [12]. In our design, a pair of  $45-\mu$ m/30-nm transistors is adopted in the output stage. The equivalent output capacitance of the differential pair is 24.5 fF and the radius of the corresponding transformers for output matching is about 17  $\mu$ m.

The size of the driver stages is considered to maximize the PAE without limiting the linearity of the PA. This indicates



FIGURE 2. The simulated insertion loss and self-inductance of the stack transformers versus radius of the coils.

that the output 1-dB compression point (OP<sub>1dB</sub>) of the previous stage must be much larger than the input 1-dB compression point (IP<sub>1dB</sub>) of the following stage. A pair of  $18-\mu m/$ 30-nm transistors are selected for the second stage in this design. The total gate width of the first-stage transistor was also set as 18  $\mu$ m for easily optimizing the input matching network. Despite a smaller transistor size that can improve the overall efficiency of the whole PA, the equivalent resistance of the input impedance will be much larger than 100  $\Omega$ . It means a very high impedance transform ratio in the input matching, which will significantly increase the insertion loss and design complexity of the input matching network. The neutralization capacitance uses the NMOS transistors instead of the MOM capacitance to achieve a more robust neutralization across the process, voltage, and temperature (PVT) variations for better matching with the NMOS differential pairs [13]. The size of the neutralization capacitors are 9.5- $\mu$ m/30-nm ( $C_{n1a,n1b}$ ), 9.5- $\mu$ m/30-nm ( $C_{n2a,n2b}$ ) and 25- $\mu$ m/30- nm ( $C_{n3a,n3b}$ ) for  $M_{1a, 1b}$ ,  $M_{2a,2b}$  and  $M_{3a,3b}$ .

# **III. OPTIMIZATION OF THE ACTIVE DEVICE**

# A. TRANSISTOR LAYOUT OPTIMIZATION

As the operating frequency increasing, the parasitic effects on the silicon process caused by the peripheral interconnect show an increasingly significant impact on the performance of the active device. At the sub-THz band, the layout footprint must be well-designed to preserve the intrinsic performance of the transistor. The  $f_{max}$  and  $f_t$  are commonly used to evaluate the high-frequency performance of the active device. They can be calculated as follows [14]:

$$f_{max} = \frac{f_t}{2\sqrt{g_{ds} \cdot (R_g + R_s + r_{ch}) + g_m \cdot R_g C_{gd} / C_{gg}}}$$
(1)  
$$f_t = g_m / (2\pi \cdot C_{gg})$$
(2)

where  $R_g$  is the gate resistance,  $R_s$  is the source resistance,  $r_{ch}$  is the channel resistance,  $g_m$  is the trans-conductance,  $C_{gd}$  is the gate-drain capacitance, and  $C_{gg}$  is the total gate capacitance. As the CMOS technology evolving continuously,  $C_{gg}$  will decrease with the poly gate shrinking. Therefore,  $f_t$  can be continuously advanced as the CMOS process

evolves. At the same time,  $f_{max}$  will be degraded by the resistance of the interconnecting wires due to the influence of  $R_g$  and  $R_s$ . For an ultra-scaled CMOS technology, the narrow gate length of the transistor usually makes the resistance of the peripheral metal wire increase accordingly [15]. Thus, there is a marginal diminishing effect on the evolution of  $f_{max}$  as the CMOS process scaling. The layout topology plays an important role in the  $f_{max}$  of the active device [16]. Therefore, we can use  $f_{max}$  to estimate the merits of transistor layout on high-frequency performance. As can be seen from (1),  $R_g$  represents the most significant influence on  $f_{max}$ . It is crucial to optimize the gate contact resistance while optimizing the transistor layout.

Fig. 3 shows four representative styles of transistor layout, including single-end gate contact and series feed (SS) [17], single-end gate contact and parallel feed (SP) [18], doubleend gate contact, and series feed (DS) [19], and doubleend gate contact and parallel feed (DP). In the case of the single-end gate contact, the gate of transistors are connected from a single end of the poly to the bottom metal through the contact via, then connected to the top metal with metal via. Meanwhile, the double-end gate contact utilizes bottom metal to fan-out both ends of the poly simultaneously. In the series feed topology, the overall signal direction is perpendicular to the direction of the gate finger while they are parallel with each other in the parallel feed topology. The MSG of the four representative configurations is compared based on a transistor with gate length of 30 nm and gate width of  $20 \times 0.5 \ \mu m$ in 28-nm CMOS technology. This process provides eight metal layers, including two layers of thick metal (3.5  $\mu$ m for Metal 8 and 0.85  $\mu$ m for Metal 7) and six layers of thin metal  $(0.09 \ \mu m \text{ for Metal 1 to Metal 6}).$ 

Fig. 4 shows the MSG of the four different layout topologies extracted after post-layout simulation. At the frequency of 140 GHz, the MSG of the SS, SP, DS, and DP style are 5.27, 5.51, 6.52, and 6.66 dB, respectively. We can see that the MSG of double-end gate contact style is much larger than that of the single-end gate contact, while the influence of feed mode is relatively small for the given  $10-\mu$ m/30-nm transistor. This is because the resistance of gate poly and poly contact via (50-  $\Omega$ /square and  $100-\Omega$ /via in this process) is much greater than that of the metal wire and the metal via (0.45- $\Omega$ /square and 4.5- $\Omega$ /via in this process). Although the layout of the single-end gate contact style is more compact and simple, double-end gate contact is necessary for sub-THz circuit design due to its significant advantage over the MSG.

Fig. 4 also shows that the MSG of the DP configuration is about 0.1-0.15 dB larger than that of the DS configuration over the full band for the given transistor size. It is caused by the average gate-connected resistance in parallel feed which is less than that of the series feed. For a MOSFET with a gate finger of N, the average external gate resistance for a single finger will remain constant in parallel feed type while that of series feed type is proportional to N. Fig. 5 shows the MSG versus various total gate width for DP and DS topology with a fixed single gate-width of 0.5  $\mu$ m. For small



FIGURE 3. The evolution of four representative layout styles: (a) single-end gate contact and series feed (SS), (b) single-end gate contact and parallel feed (SP), (c) double-end gate contact and series feed (DS), and (d) double-end gate contact and parallel feed (DP).



FIGURE 4. The MSG comparison of the four layout styles in FIGURE. 3.

transistors, the total gate resistance of DP and DS configuration is roughly equal, and therefore the difference between their MSG is not obvious. However, the influence of external gate resistance will become significant as the size of the transistors increasing. The MSG of DP style is 0.5 dB larger than that of DS style for a 20  $\mu$ m transistor. Although the DP configuration shows a larger MSG, the width of the transistor perpendicular to the direction of the gate will increase accordingly with the finger number N. For the 45  $\mu$ m transistor which is used in the output stage of this work, the finger number is up to 90 with a single finger width of 0.5  $\mu$ m. The width of the transistors will be 20  $\mu$ m in DP style, which is difficult for inter-connection in layout. The DS transistor is convenient to be combined in parallel and the total width will not be too large because the single DS transistor is narrow. Therefore, the DP style is recommended in the driver stage for the highest gain, and the DS style is recommended in the output stage to simplify the interconnection and reduce the phase imbalance between the gate fingers of the transistors.

#### **B. NEUTRALIZATION CAPACITANCE DESIGN**

Equation (1) shows that the presence of gate-drain parasitic capacitor  $C_{gd}$  will significantly reduce the  $f_{max}$  of the



FIGURE 5. The MSG versus the transistor size for DP and DS topologies.

transistor and degrade the high-frequency gain [20]. At the same time, this capacitor also leads to poor reverse isolation and causes instability. To eliminate these effects, the neutralization technique is widely used in the design of mm-wave PA. Theoretically, if the value of the neutralization capacitance ( $C_n$ ) is equal to the value of  $C_{gd}$ , the feedback path of the transistors will be eliminated and the MSG of the differential pair can reach the maximum. However, the intrinsic model of the transistor in practice is shown in Fig. 6 [21].  $C_{gd}$  and  $R_{gd}$  jointly create the feedback path between the gate and drain, and they can be equivalent to a capacitor with a certain quality factor, which can be calculated as:

$$Q = \frac{1}{\omega C_{gd} * R_{gd}} \tag{3}$$

where  $C_{gd}$  and  $R_{gd}$  are intrinsic parameters of the transistor so they are independent of the operating frequency [21]. Therefore, the Q of this equivalent capacitance will decrease as the operating frequency increases. Meanwhile, the quality factor of the neutralization capacitor is also poor at the sub-THz band. Therefore, the neutralization capacitance is not able to eliminate the feedback path between the gate and drain, especially in the sub-THz band. Fig. 7 shows the stability and



FIGURE 6. Small-signal model of the transistors.



FIGURE 7. MSG and Kf with respect to the neutralization capacitance.

MSG of the 45  $\mu$ m transistor used in the output stage versus the values of neutralization capacitance at the frequency of 120 GHz, 140 GHz, and 160 GHz, respectively. It can be seen that as the frequency increases, the poles of stability (Kf) and MSG will keep moving away from each other [22]. At lower frequencies, the range of the neutralization capacitance which can make the transistor stable is smaller because the Q of the equivalent capacitance is higher. The range of capacitance which lets the stability greater than unit over the frequency of 120-160 GHz is 10-17 fF. A maximum MSG can be obtained while the  $C_n$  is selected to 17 fF. However, the Kf is a bit less than unity in this condition. Considering the PVT variation of the process, the  $C_N$  of the output stage is selected as 15 fF in this work, which gives a certain margin to keep the differential pair stable within the operating frequency range. Fig. 8 shows the layout of the differential pair together with neutralization capacitors. The layout of the transistor based neutralization capacitors also adopts DS style to reduce the gate contact resistance and obtain a better consistency with the common-source stage.

# C. OPTIMUM LARGE-SIGNAL IMPEDANCE

To find the optimum source and load impedance of the differential pairs, we performed load-pull simulations on differential pairs together with corresponding neutralization capacitance in differential mode. The center frequency of the simulations is chosen as 145 GHz to cope with the gain and



FIGURE 8. Layout of the differential pair at the output stage.

TABLE 1. Load-pull parameters at 145 GHz.

| Stage           | P <sub>in</sub> | Pout | $Z_{in}$             | Z <sub>out</sub> |  |
|-----------------|-----------------|------|----------------------|------------------|--|
| $1_{st}$        | -20             | -7.8 | $29.5 + j \cdot 120$ | 45 + j·111       |  |
| $2_{nd}$        | -10.8           | 1.1  | $31.2 + j \cdot 126$ | 44.7 + j·109     |  |
| 3 <sub>rd</sub> | -1.9            | 8.5  | 9.4 + j·42.5         | 18.4 + j.38      |  |

output power degradation as operating frequency increases. Table 1 shows the optimum source/load impedance and output power of the three stages with the given input power. The input power of the output stage is just equal to  $IP_{1dB}$ . The insertion loss of inter-stage matching networks is estimated as 3 dB, so the output power of the driver stage must be 3 dB larger than the input power of the next stage in load-pull simulations.

#### **IV. DESIGN CONSIDERATIONS OF PASSIVE DEVICE**

# A. ASYMMETRICAL MCR MATCHING NETWORKS

Transformers are widely used in mm-wave circuits because they can realize impedance matching, DC feed, and single-ended-to-differential conversion with an ultrasmall chip area [9]. Meanwhile, high-order magnetically coupled-resonator (MCR) matching networks can be realized using a single transformer to implement broadband impedance matching. Compared with other wideband techniques, the transformer-based MCR matching network is more compact and simple in the layout, implying less insertion loss within mm-wave band circuits design. The symmetrical MCR provides a maximum trans-impedance and the design considerations have been analyzed in [23]. However, the quality factors of the two terminal-impedance are commonly different in the inter-stage matching. To build a symmetrical MCR, a resistor or capacitor is supposed to be added to the inter-stage matching network. However, it will introduce extra insertion loss and the value of the on-chip capacitor and resistor are imprecise. In this design, the asymmetrical MCR-based matching networks are adopted and analyzed in the inter-stage and input matching networks



FIGURE 9. The lumped model of the MCR.

design, and it shows a more flat in-band frequency response compared with the symmetrical MCR.

Fig. 9 shows the ideal model of an MCR [23], consisting of two *RLC* resonators coupled by mutual inductance. In the case of inter-stage matching, the output impedance of the previous stage and the input impedance of the following stage can be equivalent to networks of a resistor in parallel with a capacitor, which can be absorbed into the resistance ( $R_1$  and  $R_2$ ) and capacitance ( $C_1$  and  $C_2$ ) of the MCR. A transformer with primary self-inductance  $L_1$ , secondary self-inductance  $L_2$ , and coupling coefficient k is inserted to realize the MCR. Under the condition of  $2Q^2 \gg 1 - k^2$ , the two-pole frequencies of MCR can be derived by:

$$\omega_{H}^{2} = \frac{\omega_{1}^{2} + \omega_{2}^{2} + \sqrt{(\omega_{1}^{2} + \omega_{2}^{2})^{2} - 4(1 - k^{2})\omega_{1}^{2}\omega_{2}^{2}}}{2(1 - k^{2})} \quad (4)$$

$$\omega_L^2 = \frac{\omega_1^2 + \omega_2^2 - \sqrt{(\omega_1^2 + \omega_2^2)^2 - 4(1 - k^2)\omega_1^2\omega_2^2}}{2(1 - k^2)} \quad (5)$$

where  $\omega_1$  and  $\omega_2$  are the resonant frequency of the two *RLC* resonators, which can be calculated as (6),

$$\begin{cases} \omega_1 = \frac{1}{\sqrt{L_1 C_1}} \\ \omega_2 = \frac{1}{\sqrt{L_2 C_2}} \end{cases}$$
(6)

The trans-impedance  $Z_{21}$  at the two-pole frequencies can be derived as (7), as shown at the bottom of the page.  $Q_1$  and  $Q_2$  are the quality factor of the two resonators. To achieve a flat frequency response, the peak magnitude of  $Z_{21}$  at the twopole frequencies are supposed to be equal, and two solutions can be calculated as:

$$\omega_1 = \frac{1}{\sqrt{L_1 C_1}} = \frac{1}{\sqrt{L_2 C_2}} = \omega_2 \tag{8}$$

$$\frac{\omega_1}{Q_1} = R_1 C_1 = R_2 C_2 = \frac{\omega_2}{Q_2} \tag{9}$$

Fig. 10(a) shows the magnitude of  $Z_{21}$  versus frequencies with different  $Q_2$  and equal  $\omega/Q$  ratio while the parameters



**FIGURE 10.** The simulated Z<sub>21</sub> with different Q<sub>2</sub> (constant Q<sub>1</sub> of 3.6 and  $f_1$  of 140 GHz). (a) keeping  $\omega_2/\omega_1 = Q_2/Q_1$ . (b) keeping  $\omega_2 = \omega_1$ .

of the primary resonator are constant. The peak magnitude of  $Z_{21}$  degrades significantly while the mismatch between the  $Q_1$  and  $Q_2$  increasing. The calculated  $|Z_{21}|$  with unequal Q while  $\omega_1$  and  $\omega_2$  remaining equal are shown in Fig. 10(b). It can be seen that the peak magnitude of  $|Z_{21}|$  is almost equal to that in symmetrical MCR only if the (8) is satisfied. Therefore,  $\omega_1$  and  $\omega_2$  must keep equal in MCR design. The mismatch of  $\omega_1$  and  $\omega_2$  will lead to significant insertion loss and it is unacceptable in sub-THz circuits design. Fig. 10(b) also shows that a lower  $Q_2$  always exhibits a smaller in-band ripple. Therefore, it is not necessary to insert a capacitor to build asymmetric MCR in inter-stage matching. The mismatch of  $Q_1$  and  $Q_2$  is acceptable only if  $\omega_1 = \omega_2$ .

The pole frequencies and the corresponding peak amplitude in the condition of  $\omega_1 = \omega_2$  can be calculated as:

$$\begin{cases} \omega_H = \frac{\omega_0}{\sqrt{1-k}} \\ \omega_L = \frac{\omega_0}{\sqrt{1+k}} \end{cases}$$
(10)

$$Z_{21}(j\omega_H) = Z_{21}(j\omega_L) = \frac{\sqrt{R_1R_2}}{\sqrt{Q_1/Q_2} + \sqrt{Q_2/Q_1}}$$
(11)

$$\begin{cases} Z_{21} (j\omega_H) = \frac{k\omega_1\omega_2\sqrt{R_pR_sQ_1Q_2\omega_1\omega_2}}{-(\omega_2Q_1 + \omega_1Q_2)\left(1 - k^2\right)\omega_H^2 + \omega_1\omega_2}(\omega_1Q_1 + \omega_2Q_2)} \\ Z_{21} (j\omega_L) = \frac{k\omega_1\omega_2\sqrt{R_pR_sQ_1Q_2\omega_1\omega_2}}{-(\omega_2Q_1 + \omega_1Q_2)\left(1 - k^2\right)\omega_L^2 + \omega_1\omega_2\left(\omega_1Q_1 + \omega_2Q_2\right)} \end{cases}$$
(7)

From (11), we can see that the maximum peak amplitude is obtained only while  $Q_1$  is equal to  $Q_2$ , which is consistent with the conclusion in Fig. 10(b). It means that we can obtain the maximum peak amplitude only if the MCR is symmetrical. The asymmetrical MCR with lower  $Q_2$  can obtain a flatter frequency response as slightly sacrificing the peak amplitude at pole frequencies as shown in Fig. 10(b). Considering the quality factor of output matching networks are commonly smaller than that of inter-stage and input matching networks, the output matching network adopted the symmetrical MCR is adopted in inter-stage and input matching networks to obtain larger operating bandwidth. The in-band valley amplitude and its corresponding frequency  $\omega_c$  are given as follows,

$$Z_{21}(\omega_c) = \frac{k\sqrt{(1-k^2)Q_1Q_2R_1R_2}}{1-k^2+k^2Q_1Q_2}$$
(12)

$$\omega_c = \frac{\omega_0}{\sqrt{1-k^2}} \tag{13}$$

where  $\omega_c$  is the geometrical mean of the two-pole frequencies, so it can be estimated as the center frequency of the MCR. If we make the  $|Z_{21}|$  at pole frequencies and center frequency equal, we can achieve the no ripple condition as (14),

$$k^{2} \cdot \left(Q_{\rm m}^{2} + 1\right) = 1 \tag{14}$$

where  $Q_{\rm m}$  is the smaller value of  $Q_1$  and  $Q_2$ .

Taking the second inter-stage matching as an example. The output impedance of the driving stage and the input impedance of the last stage are 44.7 + j\*109 and 9.4 + j\*42.5, respectively. Therefore,  $R_1$ ,  $C_1$ ,  $R_2$ , and  $C_2$  can be calculated as 313  $\Omega$ , 8.9 fF, 201  $\Omega$ , and 25.5 fF. To achieve the maximum bandwidth, no extra capacitance is inserted into the matching network. The center frequency  $f_c$  is 140 GHz and the quality factor  $Q_1$  and  $Q_2$  can be deduced as,

$$\begin{cases} Q_1 = \sqrt{\frac{-1 + \sqrt{1 + 4 (\omega_c R_1 C_1)^4}}{2}} \\ Q_2 = \sqrt{\frac{-1 + \sqrt{1 + 4 (\omega_c R_2 C_2)^4}}{2}} \end{cases}$$
(15)

In this design,  $Q_1$  and  $Q_2$  can be calculated as 2.35 and 4.45. Then the coupling coefficient is calculated as 0.39 by  $Q_1$ . The resonant frequency of the two resonators are,

$$f_1 = f_2 = f_c \sqrt[4]{1 - k^2} \tag{16}$$

which is 134 GHz in this case, and the inductance  $L_1$  and  $L_2$  can be calculated as 157.8 pH and 55.1 pH.

# **B. LUMPED MODEL OF THE TRANSFORMER**

Regarding the above analysis, the parasitic parameters of the transformers have not been considered. Yet, effects of the parasitic effects are particularly significant within the sub-THz band. Therefore, it is very necessary to consider these parasitic effects to design transformers more precisely.





FIGURE 11. The lumped-element model of the (a) transformer and (b) the inter-stage matching network with simplified transformer model.

Fig. 11(a) shows a lumped model of the on-chip transformer which is applicable up to the sub-THz band [24].  $C_{pox,n}$  and  $C_{sox,n}$  (n = 1 or 2) represent the oxide layer capacitance of the primary/secondary coil of the transformer.  $C_{psi,n}$ ,  $C_{ssi,n}$ ,  $R_{psi,n}$  and  $R_{ssi,n}$  represent the parasitic capacitance and resistance to the silicon substrate, respectively.  $L_P$  and  $L_S$  are the inductors of primary/secondary coils, and their resistances are  $R_P$  and  $R_S$ , respectively. The coupling capacitance between the two coils  $(C_c)$  and the differential input/output transmission line  $(C_t)$  is also shown in Fig. 11(a). The value of these parasitic parameters can be extracted using the S-parameters of the transformer [25]. However, this model is complex and incompatible with the model of the MCR. It is necessary to simplify these parasitic parameters. In this work, low-k transformers are used instead of stack transformers. Therefore,  $C_C$  is negligible in this case. The other parasitic capacitance can be combined into a single parallel capacitor in differential mode. The effect of the parasitic resistance is also equivalent to a parallel resistance. Fig. 11(b) shows the simplified lumped model of the interstage matching network. The parameters of the MCR consist of the parasitic parameters of the transformers and the equivalent parallel networks of the terminal impedance.

Fig. 12 shows the comparison of insertion loss between the two lumped parameter models and the EM model while inserted into the inter-stage matching network discussed above. The insertion loss of the simplified model is about



**FIGURE 12.** The comparison of insertion loss between the lumped-element model and the EM model.

0.7-dB less than that of the EM model while the frequency range is consistent. The simulated insertion loss of this inter-stage matching network is <3 dB from 120 to 150 GHz. All of the transformers which are used in this work have been shown in Fig. 1 and their corresponding parasitic parameters are also summarized.

### C. POWER SPLITTING AND COMBINING

Fig. 13(a) shows the overall three-dimensional model of the input power splitting network and output power combining network. The input and output impedance of the unit PA are matched to 100  $\Omega$  using transformer-based matching networks, then each branch is respectively connected to the GSG pad by using a transmission line with the characteristic impedance of 100  $\Omega$  to realize power splitting/combining, which shows an impedance of 50  $\Omega$  at the junction node of the power splitter /combiner. However, the parasitic capacitance of the GSG pad will introduce impedance mismatch, which is especially critical at the sub-THz band. In this work, an artificial transmission line consisting of a shorter line and two shunt capacitances is used to absorb the parasitic capacitance



**FIGURE 13.** (a) The layout of the power splitting and power combining network and (b) transmission line and its equivalent artificial transmission line.

of the GSG pad, as shown in Fig. 13(b). The admittance matrices of the circuits [Fig. 13(b)] are:

$$Y_{0} = \begin{bmatrix} \frac{\cos(\theta_{0})}{jZ_{0} \cdot \sin(\theta_{0})} & \frac{j}{Z_{0} \cdot \sin(\theta_{0})} \\ \frac{j}{Z_{0} \cdot \sin(\theta_{0})} & \frac{\cos(\theta)}{iZ_{0} \cdot \sin(\theta_{0})} \end{bmatrix}$$
(17)

$$Y_e = \begin{bmatrix} \frac{\cos(\theta)}{jZ \cdot \sin(\theta)} + j\omega C & \frac{j}{Z \cdot \sin(\theta)} \\ \frac{j}{Z \cdot \sin(\theta)} & \frac{\cos(\theta)}{jZ \cdot \sin(\theta)} + j\omega C \end{bmatrix}$$
(18)

where  $Z_0$ ,  $\theta_0$ , Z,  $\theta$  are the characteristic impedance and electrical length of the equivalent line and shortened line. Comparing (15), we can obtain the relation as:

$$\begin{cases} Z_0 \cdot \sin(\theta_0) = Z \cdot \sin(\theta) \\ \cos(\theta_0) = \cos(\theta) - Z \cdot \sin(\theta) \cdot \omega C \end{cases}$$
(19)

Given  $Z_0 = 100 \ \Omega$  and C = 9 fF for the single branch in this work, the Z and  $\theta$  of the shortened line can be calculated as 111  $\Omega$  and 60°. Therefore, a transmission line with 111  $\Omega$ characteristic impedance and 60° electrical length shunted by two 9 fF capacitance will show the same admittance matrices as a transmission line with 100  $\Omega$  characteristic impedance and 105° electrical length at 140 GHz. The two 9-fF capacitances at the end of the GSG pad can be absorbed into its parasitic capacitance. The metal-line width for M7 and M8 to obtain a characteristic impedance of 111  $\Omega$  is 12  $\mu$ m and 10.5  $\mu$ m for the process used in this work. Fig. 14 shows the simulated S-parameters of the power combiner. It shows a good matching with reflection coefficients at the end of GSG pad better than -10 dB and insertion loss less than 0.5 dB from 110 GHz to 160 GHz.



FIGURE 14. The simulated S-parameters of the power combiner.

# **V. MEASUREMENT RESULTS**

The proposed D-band PA is fabricated in a 28-nm bulk CMOS technology. The chip photo is shown in Fig. 15(a). The core area of the PA including the GSG pad is 600  $\mu$ m × 400  $\mu$ m, while the total area is 600  $\mu$ m × 590  $\mu$ m. The chip is mounted on a PCB and the DC pads are bonded to the PCB to provide 0.7-V bias voltage and 1-V supply voltage. The total power consumption is 140 mW.

Fig. 15(b) shows the block diagram of the measurement setup of the PA. The D-band input signal was generated using

#### TABLE 2. Performance summary and comparison of broadband D-Band PAs.

| Ref.         | Technology      | Freq.<br>(GHz) | Gain<br>(dB) | Stages | Gain BW <sup>**</sup><br>(GHz) | P <sub>sat</sub> BW <sup>**</sup><br>(GHz) | Core Area<br>(mm <sup>2</sup> ) | P <sub>sat</sub><br>(dBm) | PAE<br>(%) | FoM*** |
|--------------|-----------------|----------------|--------------|--------|--------------------------------|--------------------------------------------|---------------------------------|---------------------------|------------|--------|
| [8]          | 40-nm<br>CMOS   | 133            | 16.0         | 6      | 13                             | /                                          | 0.13*                           | 8.6                       | 7.4        | 75.8   |
| [9]          | 28-nm<br>CMOS   | 132            | 22.5         | 4      | 22                             | 45*                                        | 0.0265+                         | 8.0                       | 6.6        | 81.1   |
| [10]         | 65-nm<br>CMOS   | 150            | 8.2          | 3      | 27                             | /                                          | 0.16                            | 6.3                       | 10         | 68     |
| [11]         | 90-nm SiGe      | 116            | 15           | 3      | 15                             | 24*                                        | 4.95                            | 20.8                      | 7.6        | 85.9   |
| [26]         | 16-nm<br>FinFET | 135            | 19.0         | 3      | 16                             | 35*                                        | 0.062                           | 13.1                      | 11.0       | 85.1   |
| This<br>work | 28-nm<br>CMOS   | 135            | 21.9         | 3      | 20                             | 41                                         | 0.24                            | 11.8                      | 10.7       | 86.6   |

<sup>#</sup> Output power where PAE is at maximum value

<sup>+</sup> GSG pad are excluded

\* Graphically estimated

\*\* Gain BW is 3-dB bandwidth of power gain, Psat BW is 3-dB bandwidth of saturation output power.

\*\*\* FoM =  $P_{sat}$  [dBm] + Gain [dB] + 20log(Frequency [GHz]) + 10log(PAE<sub>max</sub> [%]).



600 µm



**FIGURE 15.** (a) Chip photo and (b) block diagram of the measurement setup of the PA.

the vector signal source (Agilent E8267D) followed by the VDI frequency extension module WR-6.5 SG. We use the signal analyzer (Keysight N9040B) along with VDI frequency extension module WR-6.5 SA to characterize the output power level of the PA. The loss of the probes and interconnect is measured and calibrated under the same setup with the assistance of the connection on a calibration substrate.

The simulated peak gain of the PA is 26.9 dB, with a 3-dB bandwidth of 20 GHz from 125 to 145 GHz, as shown



FIGURE 16. Simulated S-parameters.



FIGURE 17. Simulated and measured  $\mathrm{P}_{out},$  PAE and power gain versus  $\mathrm{P}_{in}$  at 135 GHz.

in Fig. 16. Fig. 17 and Fig. 18 show the simulated and measured power performance of the PA. It shows that our PA achieves a >10-dBm saturation output power and >8% PAE covering 120 GHz to 150 GHz. Meanwhile, a power gain of >19 dB is obtained over the frequency from 128 to 147 GHz with a peak gain of 21.9 dB at 135 GHz. The measured power performance at center frequency 135 GHz versus different



FIGURE 18. Simulated and measured large-signal performance versus frequency of the proposed PA.

input power is shown in Fig. 17. The PA achieves a maximum output power of 11.8 dBm and an output 1-dB compression point of 7.5 dBm with a peak PAE of 10.7%.

Table 2 summarizes the overall performance of our PA and compares it to the state-of-the-arts. The PA shows the highest per-stage gain and competitive FoM compared to previously published works thanks to the well-designed differential pairs, the low-k transformer-based broadband matching networks, and the artificial line-based power splitter/combiner.

#### **VI. CONCLUSION**

This article presented the optimization procedures and design considerations of a D-band PA in detail. The transistor size of each stage is selected considering the performance of the active device and the insertion loss of the passive device. The layout of the differential pairs involving the transistors and neutralization capacitors are carefully optimized to maximize the MSG, stability, and robustness. The matching networks are all designed based on the MCR realized by lowk transformers to improve the operating bandwidth. A 3-stage D-band PA was fabricated on a 28-nm bulk CMOS technology to verify our design methodology. The PA achieves a maximum peak gain of 21.9 dB, the saturated output power of 11.8 dBm, peak PAE of 10.7%, and 86.6 FoM. This PA proto-type shows the highest per-stage gain thanks to the well-designed differential pairs and MCR-based matching networks.

#### REFERENCES

- D. G. Long and D. L. Daum, "Spatial resolution enhancement of SSM/I data," *IEEE Trans. Geosci. Remote Sens.*, vol. 36, no. 2, pp. 407–417, Mar. 1998.
- [2] T. Schneider, A. Wiatrek, S. Preussler, M. Grigat, and R.-P. Braun, "Link budget analysis for terahertz fixed wireless links," *IEEE Trans. Terahertz Sci. Technol.*, vol. 2, no. 2, pp. 250–256, Mar. 2012.
- [3] V. Radisic, D. W. Scott, A. Cavus, and C. Monier, "220-GHz highefficiency InP HBT power amplifiers," *IEEE Trans. Microw. Theory Techn.*, vol. 62, no. 12, pp. 2801–2805, Dec. 2014.
- [4] S. Koch, M. Guthoerl, I. Kallfass, A. Leuther, and S. Saito, "A 120– 145 GHz heterodyne receiver chipset utilizing the 140 GHz atmospheric window for passive millimeter-wave imaging applications," *IEEE J. Solid-State Circuits*, vol. 45, no. 10, pp. 1961–1967, Oct. 2010.
- [5] M. Cwiklinski, P. Bruckner, S. Leone, C. Friesicke, H. Mabler, R. Lozar, S. Wagner, R. Quay, and O. Ambacher, "D-band and G-band highperformance GaN power amplifier MMICs," *IEEE Trans. Microw. Theory Techn.*, vol. 67, no. 12, pp. 5080–5089, Dec. 2019.

- [6] V. Radisic, K. M. K. H. Leong, S. Sarkozy, X. Mei, W. Yoshida, P.-H. Liu, W. R. Deal, and R. Lai, "220-GHz solid-state power amplifier modules," *IEEE J. Solid-State Circuits*, vol. 47, no. 10, pp. 2291–2297, Oct. 2012.
- [7] M. Ćwikliński, "190-GHz G-band GaN amplifier MMICs with 40GHz of bandwidth," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Jun. 2019, pp. 1257–1260.
- [8] K. Katayama, M. Motoyoshi, K. Takano, L. C. Yang, and M. Fujishima, "133GHz CMOS power amplifier with 16dB gain and +8dBm saturated output power for multi-gigabit communication," in *Proc. Eur. Microw. Integr. Circuit Conf. (EuMIC)*, Oct. 2013, pp. 69–72.
- [9] X. Tang, J. Nguyen, A. Medra, K. Khalaf, A. Visweswaran, B. Debaillie, and P. Wambacq, "Design of D-Band transformer-based gain-boosting class-AB power amplifiers in silicon technologies," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 67, no. 5, pp. 1447–1458, May 2020.
- [10] M. Seo, "A 1.1 V 150 GHz amplifier with 8dB gain and +6dBm saturated output power in standard digital 65nm CMOS using dummy-prefilled microstrip lines," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2009, pp. 484–485.
- [11] H.-C. Lin and G. M. Rebeiz, "A 112–134 GHz SiGe amplifier with peak output power of 120 mW," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2014, pp. 163–166.
- [12] D. Zhao and P. Reynaert, "An E-Band power amplifier with broadband parallel-series power combiner in 40-nm CMOS," *IEEE Trans. Microw. Theory Techn.*, vol. 63, no. 2, pp. 683–690, Feb. 2015.
- [13] M. Abdulaziz, H. V. Hünerli, K. Buisman, and C. Fager, "Improvement of AM–PM in a 33-GHz CMOS SOI power amplifier using pMOS neutralization," *IEEE Microw. Wireless Compon. Lett.*, vol. 29, no. 12, pp. 798–801, Dec. 2019.
- [14] A. F. Tong, W. M. Lim, C. B. Sia, K. S. Yeo, Z. L. Teng, and P. F. Ng, "RFCMOS unit width optimization technique," *IEEE Trans. Microw. Theory Techn.*, vol. 55, no. 9, pp. 1844–1853, Sep. 2007.
- [15] C.-Y. Chan, S.-C. Chen, M.-H. Tsai, and S. Hsu, "Wiring effect optimization in 65-nm low-power NMOS," *IEEE Electron Device Lett.*, vol. 29, no. 11, pp. 1245–1248, Nov. 2008.
- [16] H.-S. Kim, K. Park, H. Oh, and E. Seung Jung, "Effective gate layout methods for RF performance enhancement in MOSFETs," *IEEE Electron Device Lett.*, vol. 30, no. 10, pp. 1105–1107, Oct. 2009.
- [17] C. Liang and B. Razavi, "A layout technique for millimeter-wave PA transistors," in *Proc. IEEE Radio Freq. Integr. Circuits Symp.*, Jun. 2011, pp. 1–4.
- [18] A. Ali, J. Yun, F. Giannini, H. J. Ng, D. Kissinger, and P. Colantonio, "168-195 GHz power amplifier with output power larger than 18 dBm in BiCMOS technology," *IEEE Access*, vol. 8, pp. 79289–79299, May 2020.
- [19] D. Parveg, M. Varonen, D. Karaca, A. Vahdati, M. Kantanen, and K. A. I. Halonen, "Design of a D-band CMOS amplifier utilizing coupled slow-wave coplanar waveguides," *IEEE Trans. Microw. Theory Techn.*, vol. 66, no. 3, pp. 1359–1373, Mar. 2018.
- [20] Z. Wang, P.-Y. Chiang, P. Nazari, C.-C. Wang, Z. Chen, and P. Heydari, "A CMOS 210-GHz fundamental transceiver with OOK modulation," *IEEE J. Solid-State Circuits*, vol. 49, no. 3, pp. 564–580, Mar. 2014.
- [21] S. Kawai, S. Sato, S. Maki, K. K. Tokgoz, K. Okada, and A. Matsuzawa, "Accurate transistor modeling by three-parameter pad model for millimeter-wave CMOS circuit design," *IEEE Trans. Microw. Theory Techn.*, vol. 64, no. 6, pp. 1736–1744, Jun. 2016.

- [22] Z. Wang and P. Heydari, "A study of operating condition and design methods to achieve the upper limit of power gain in amplifiers at near*f<sub>max</sub>* frequencies," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 2, pp. 261–271, Feb. 2017.
- [23] H. Jia, C. C. Prawoto, B. Chi, Z. Wang, and C. P. Yue, "A full Ka-band power amplifier with 32.9% PAE and 15.3-dBm power in 65-nm CMOS," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 9, pp. 2657–2668, Sep. 2018.
- [24] B. Leite, E. Kerherve, J.-B. Begueret, and D. Belot, "An analytical broadband model for millimeter-wave transformers in silicon technologies," *IEEE Trans. Electron Devices*, vol. 59, no. 3, pp. 582–589, Mar. 2012.
- [25] Z. Gao, K. Kang, C. Zhao, Y. Wu, Y. Ban, L. Sun, W. Hong, and Q. Xue, "A broadband and equivalent-circuit model for millimeter-wave on-chip M:N six-port transformers and baluns," *IEEE Trans. Microw. Theory Techn.*, vol. 63, no. 10, pp. 3109–3121, Oct. 2015.
- [26] B. Philippe and P. Reynaert, "A 15 dBm 12.8%-PAE compact D-band power amplifier with two-way power combining in 16 nm FinFET CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 374–376.



**JINCHENG ZHANG** (Student Member, IEEE) received the B.S. degree in electronic science and engineering from the University of Electronic Science and Technology of China (UESTC), Chengdu, China, in 2016. He is currently pursuing the Ph.D. degree in micro-electronics with Fudan University, Shanghai, China.

His research interest includes millimeter-wave circuit and systems, including power amplifiers, distributed amplifiers, and high-speed switch.



**TIANXIANG WU** (Student Member, IEEE) received the B.S. degree from Anhui University, Anhui, China, in 2014, and the M.S. degree from Southeast University, Jiangsu, China, in 2017. He is currently pursuing the Ph.D. degree in micro-electronics engineering with Fudan University, China.

His current research interests include the design of PLL and MoS<sub>2</sub> circuit design.



**LIHE NIE** received the B.S. degree in electronic engineering from the Dalian University of Technology, Dalian, China, in 2018, and the M.S. degree in microelectronics from Fudan University, Shanghai, China, in 2020.

Her research interest includes millimeter-wave circuits, particularly on phase shifters, oscillators, and super regenerative receivers.



**SHUNLI MA** (Member, IEEE) received the B.S. degree in micro-electronics engineering from Shanghai Jiao Tong University, Shanghai, China, in 2011, and the Ph.D. degree in micro-electronics engineering from Fudan University, Shanghai, in 2016.

From 2012 to 2014, he was a Project Officer with Nanyang Technological University, Singapore. From 2016 to 2017, he worked in industry and designed FMCW PLL for

automotive radar sensor. He is currently an Assistant Professor with the State

74762

Key Laboratory of ASIC and Systems, Fudan University. He has published many high-performance mm-wave circuit papers on top conferences, including ESSCIRC, CICC, RFIC, ASSCC, and IMS. His research interests include 2D MoS<sub>2</sub> chip design and mm-wave integrated circuits design, including mm-wave imaging sensing, mm-wave PLL, high-speed sampler in ADC, and biomedical RF circuits for cancer detection. His article has received the Finalist at IMS 2015. He received the 2015 ISSCC Student Research Preview, the ISSCC STGA Award, and the Distinguished Designer for mm-wave PLL design for automotive radar.



**YONG CHEN** (Senior Member, IEEE) received the B.Eng. degree in electronic and information engineering from the Communication University of China (CUC), Beijing, China, in 2005, and the Ph.D. in Engineering degree in microelectronics and solid-state electronics from the Institute of Microelectronics, Chinese Academy of Sciences (IMECAS), Beijing, in 2010.

From 2010 to 2013, he worked as Postdoctoral Researcher with the Institute of Microelectronics,

Tsinghua University, Beijing. From 2013 to 2016, he was a Research Fellow with VIRTUS/EEE, Nanyang Technological University, Singapore. Since March 2016, he has been an Assistant Professor with the State Key Laboratory of Analog and Mixed-Signal VLSI (AMSV), University of Macau, Macau, China. His research interest includes integrated circuit designs involving analog/mixed-signal/RF/mm-wave/sub-THz/wireline.

Dr. Chen serves as a member for the IEEE Circuits and Systems Society; the Circuits and Systems for Communications (CASCOM) Technical Committee (2020-2021); and the Technical Program Committee (TPC) of A-SSCC (2021), APCCAS (2019-2020), ICTA (2020-2021), NorCAS (2020-2021), and ICSICT ('20); and a Review Committee Member for ISCAS (2021). He was a recipient of the Haixi (three places across the Straits) Postgraduate Integrated Circuit Design Competition (Second Prize), in 2009; a co-recipient of the Best Paper Award at the IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), in 2019; and a co-recipient of the Macao Science and Technology Invention Award (First Prize), in 2020. His team reported three chip inventions at the IEEE International Solid-State Circuits Conference-ISSCC (Chip Olympics): mm-wave PLL (2019) and VCO (2019), and radio-frequency VCO (2021). He has been serving as an Associate Editor for IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, since 2019; IEEE Access, since 2019; and IET Electronics Letters (EL), since 2020; an Editor for the International Journal of Circuit Theory and Applications (IJCTA), since 2020; and a Guest Editor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIEFS, in 2021. He also serves as the Vice-Chair for the IEEE Macau CAS Chapter (2019-2021), the Tutorial Chair for ICCS (2020), a conference local organization committee of A-SSCC (2019), and the TPC Co-Chair for ICCS (2021).



**JUNYAN REN** (Member, IEEE) received the B.S. degree in physics and the M.S. degree in electronic engineering from Fudan University, Shanghai, China, in 1983 and 1986, respectively.

From 1986 to 2000, he was with the Department of Electronic Engineering, Fudan University, where he has been a Full Professor with the School of Microelectronics, since 2000. Since 1992, he has also been with the State Key Laboratory of ASIC and System. His research interests include

data converters, analog/RF/mixed-signal circuits, and micromachined ultrasound transducers.

Prof. Ren is a Senior Member of the China Institute of Communications. He received the Excellent Subject Chief Scientists Award, the Education Excellence Awards, the Distinguished Young Faculty from Shanghai Municipal Government, and the Excellent Graduate Advisor from Fudan University.