# An Active-Under-Coil RFDAC With Analog Linear Interpolation in 28-nm CMOS

Feifei Zhang, Student Member, IEEE, Peng Chen<sup>®</sup>, Member, IEEE, Jeffrey S. Walling<sup>®</sup>, Senior Member, IEEE, Anding Zhu<sup>®</sup>, Senior Member, IEEE, and Robert Bogdan Staszewski<sup>®</sup>, Fellow, IEEE

Abstract—This paper demonstrates a wideband 2.4 GHz 2 x 9-bit Cartesian radio-frequency digital-to-analog converter (RFDAC). Active-under-coil integration is introduced in the physical implementation, where all key active circuitry is located underneath the matching-network transformer, achieving a core area of merely 0.35 mm<sup>2</sup>. An 8× analog linear interpolation at the RF rate is proposed to suppress replicas close to the carrier while avoiding any high-order and high-speed digital filters in digital processing back-end. The multi-port transformer is adopted in the matching network to improve the back-off efficiency. The measured peak output power and drain efficiency at the center frequency of 2.4 GHz are 17.47 dBm and 17.6% respectively, while the peak efficiency is 19.03%. Moreover, the 6-dB back-off efficiency is at 66% of that at the peak output power. The activeunder-coil integration helps this RFDAC to achieve the smallest area among comparable prior arts.

*Index Terms*—RFDAC, analog linear interpolation (ALI), class-E power amplifier, multi-port transformer, active-under-coil integration.

### I. INTRODUCTION

W IRELESS systems in high data-rate applications are increasingly required to support a wide signal bandwidth and complex modulation schemes, which usually feature high peak-to-average power ratio (PAPR). At the same time, they are pushed towards higher levels of system integration and smaller silicon die area for the sake of cost. As the RF transmitter is considered the most power-hungry block of the entire RF system, intensive research has been focusing on realizing fully integrated all-digital transmitters or RFDACs with high output power and high efficiency, while occupying a low silicon area. Three popular RFDAC architectures have been intensively studied: polar [1]–[3], Cartesian (a.k.a.

Manuscript received June 4, 2020; revised November 3, 2020 and January 18, 2021; accepted February 7, 2021. Date of publication February 23, 2021; date of current version April 27, 2021. This work was supported by the Microelectronic Circuits Centre Ireland (MCCI) through Enterprise Ireland under Grant TC-2015-0019. This article was recommended by Associate Editor H. Stratigopoulos. (*Corresponding author: Peng Chen.*)

Feifei Zhang was with MCCI, Dublin D04, Ireland, and also with the School of Electrical and Electronic Engineering, University College Dublin, Dublin D04, Ireland. She is now with Silicon Austria Labs, 4040 Linz, Austria (e-mail: feifei.zhang@silicon-austria.com).

Peng Chen is with LTH, Lund University, 221 00 Lund, Sweden (e-mail: dhcchp@gmail.com).

Jeffrey S. Walling was with MCCI, Cork 021, P51 R206 Ireland. He is now with Skyworks Solutions, Inc., Woburn, MA 01801 USA.

Anding Zhu and Robert Bogdan Staszewski are with the School of Electrical and Electronic Engineering, University College Dublin, Dublin D04, Ireland (e-mail: robert.staszewski@ucd.ie).

Color versions of one or more figures in this article are available at https://doi.org/10.1109/TCSI.2021.3059368.

Digital Object Identifier 10.1109/TCSI.2021.3059368

I/Q) [4], [5] and outphasing [6]. Cartesian RFDACs feature identical I and Q paths, avoiding bandwidth expansion and various mismatch impairments and timing misalignments that appear in the polar architecture. Various types of digitally controlled power amplifiers (DPA) or RF power generation cells are widely used in the RFDAC. Inverse Class-D [7], class-E [4], and switched-capacitor (SC) [8] typologies are implemented according to different specifications. Among the available switch-mode typologies that have the possibility of achieving high efficiency in theory, the class-E DPA has an advantage of high tolerance to parasitics [9] and contains only one type of transistor (i.e., NMOS). Thus, further analysis on class-E mode DPAs is warranted. The mechanism for amplitude control there is through a resistance modulation [4].

There are various methods to suppress the noise floor, mainly by increasing the resolution or adopting digital filters with high sample rate in baseband. However, the resolution is limited by the intrinsic matching accuracy of semiconductor devices [10], while the high-speed digital interface is challenging and bulky [11]. In this situation, exploiting an on-chip interpolation for the baseband signals at the RFDAC side can relax the speed and complexity of the digital interface.

A power efficiency improvement of PAs in back-off has gained a lot of attention. As perhaps the best embodiment of these efforts, Doherty PA has been well developed and analyzed [12]. However, passive components to realize the impedance transformation there tend to be very bulky. A hint on how to reduce the silicon area in a CMOS implementation was suggested in 1997 in [13]: "The large silicon area under the inductor can potentially be used for device fabrication" has indicated a possibility of a active-under-coil integration of passive and active components to improve the silicon utilization in RFDAC designs. It was not until 2015 that [14] published an LC-tank PLL which demonstrated a vertical layout integration of some digital logic and DC bias circuitry underneath the tank's inductor. Reference [15] has exploited this methodology further in a Bluetooth low energy transceiver where active circuits are located underneath the transformers in both the DCO and DPA. However, both of these designs deliver very limited RF power. Among state-of-the-art RFDAC designs, there are two categories distinguished by whether the matching network is off-chip or on-chip. References [16], [17] have their matching network off-chip. They both report 8 dBm peak output power while their core active circuits occupy 0.72 mm<sup>2</sup> and 0.25 mm<sup>2</sup>, respectively. References [2]-[4], [8], [18] are state-of-the-art RFDAC designs with the

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/



Fig. 1. Functional diagram of the Cartesian RFDAC.

matching network on-chip and they deliver maximum output power as high as 14.6 dBm, 16.8 dBm, 22.8 dBm, 20.5 dBm and 13.5 dBm, respectively. In those designs, the active circuitry and the passive matching network components occupy separate locations on the floor plan, with the matching network occupying almost the same area as the DPA array. An activeunder-coil integration of the matching network and transistors in systems delivering high RF output power would offer significant cost savings but certainly not without numerous challenges.

Towards that goal, a fully integrated Cartesian RFDAC is proposed in this paper. The design of this RFDAC demonstrates three innovations: 1) Active devices are placed *underneath* the impedance-transformation matching network, while delivering >17 dBm of RF output power. 2) In the matching network, a *multi-port* transformer is designed such that the transformer's turns-ratio can be changed according to the input signal level for the purpose of improving the back-off efficiency. 3) Furthermore, a linear interpolation is incorporated *inside* the RFDAC, in order to avoid the conventional highorder digital filters of typically high complexity.

This paper is organized as follows. Section II introduces the replica suppression from the RFDAC system point-of-view, as well as a class-E mode power combiner. Section III presents circuit details, followed by the active-under-coil integration described in Section IV. The measurement results are shown in Section V.

# **II. SYSTEM ANALYSIS**

## A. Function of All-Digital Cartesian RFDAC

According to the digital communication theory, to achieve high bandwidth efficiency, the baseband information can be represented by two orthogonal signals: in-phase (I) and quadrature (Q). In a differential transmitter, the I and Q baseband signals modulate the four-phase carrier signal and then are summed. Based on this, the concept of a Cartesian RFDAC working with the four-phase carrier (i.e., differential quadrature) was described in [4] as:

$$IQ = filter\{(I_{BB-up} \times LO_{IP} + Q_{BB-up} \times LO_{QP}) - (I_{BB-up} \times LO_{IN} + Q_{BB-up} \times LO_{QN})\}$$
(1)

which is the linear model of the RFDAC shown in Fig. 1. In the digital signal processor (DSP), the in-phase and quadrature baseband data samples  $I_{BB}$  and  $Q_{BB}$ , respectively, are



Fig. 2. Utilization of AND-AND-OR logic in RFDAC.



Fig. 3. Output spectrum emissions of the RFDAC.

generated in the signal generator at the rate of  $f_{BB}$ , then upsampled to  $f_s$  rate by the FIR filters, producing the baseband up-sampled data  $I_{BB_up}$  and  $Q_{BB_up}$ . LO<sub>IP</sub>, LO<sub>IN</sub>, LO<sub>QP</sub> and LO<sub>QN</sub> are a square-shaped four-phase carrier from an LO generator. Operators "×" and "+" in (1) reveal orthogonal modulation and component summation. To maintain the orthogonality, the aforementioned four-phase carrier clocks use 25% duty-cycle waveforms to avoid overlap. The *AND* logic gates in the blue box represent the unit modulators. To achieve efficient summation, the class-E mode power combining is implemented in this design.

Since a quad-phase carrier with 25% duty-cycle is adopted in the differential quadrature operation, in any quarter cycle of the carrier period, effectively only one PA unit operates while the other three are idling. Therefore, the number of PA units can be reduced by the *AND-AND-OR* logic gates [19], as shown in Fig. 2, while maintaining the same function as in (1).

#### B. Replica Suppression

Let us assume the carrier frequency of LO in Fig. 1 is  $f_c = 2.4$  GHz. To simplify the description below, let us further assume that  $I_{\rm BB}$  and  $Q_{\rm BB}$  are sinusoidal signals with the same amplitude at  $f_{\rm BB}$  and the baseband upsampling rate is  $f_s$ . Hence, both  $I_{\rm BB-up}$  and  $Q_{\rm BB-up}$  have the main tone at  $f_{\rm BB}$  and replicas at  $nf_s \pm f_{\rm BB}$ ,  $n = 1, 2, 3, \ldots$ , so the output spectrum of the RFDAC can be worked out



Fig. 4. Comparison between increasing the sampling rate  $f_s$  and engaging ALI: (a) increasing  $f_s$  from  $f_{s,B}$  to  $4f_{s,B}$ ; (b) engaging  $4 \times$  ALI after upsampling at  $f_s = f_{s,B}$ .

as in Fig. 3, where  $f_s$  and  $f_{BB}$  are set to 150 MHz and 1.1719 MHz, respectively. Additionally, the spectra at around  $f_c$  and  $f_c + f_s$  are zoomed in and shown within the blue and green boxes in Fig. 3. The carrier harmonics and sampling replicas are undesired emissions. A number of approaches are available to suppress the harmonics. For example, a differential structure reduces the amplitude of even-order harmonics (ideally eliminating them) and a bandpass output matching network suppresses the amplitudes of both even- and oddorder harmonics. However, for sampling replicas centered around  $f_c \pm n \cdot f_s$ ,  $n = 1, 2, 3, \ldots$ , especially when n is small, it is difficult to filter them since filtering with sharp enough stopbands are impractical to achieve by only resorting to passive filtering.

The sampling replicas are inherently produced during the sample generation process. Fig. 4(a) illustrates the baseband upsampling operation in the DSP of Fig. 1. The baseband symbols  $A_1$ ,  $A_2$  and  $A_3$  are produced by the signal generator and a (e.g. raised-cosine) FIR pulse-shaping filter up-samples them as  $B_x = \{B_1, B_2, \ldots, B_5\}$  with a sampling rate of  $f_s = f_{s,B}$ , while  $C_x = \{C_1, C_2, \ldots, C_{17}\}$  can be obtained if the sampling rate is increased to  $f_s = 4f_{s,B}$ .

Considering the effect of ZOH, the up-sampled signals  $B_x$  or  $C_x$  which are to modulate the carriers can be expressed as the solid gray/blue continuous waveform in Fig. 4(a) instead of the discrete data.

A mathematical explanation of the baseband up-sampling operation is given in [20]. Increasing  $f_s$  can push the closest replica further away from the main signal tone.

Aiming to avoid the need for high-speed digital operation, Fig. 4(b) [21] presents a method that can suppress the replicas but still secure the sampling rate of  $f_s$ , thus avoiding higher digital rate frequencies. In this proposed method,  $A_1$ ,  $A_2$  and  $A_3$  as well as  $B_x$  play the same roles as in Fig. 4(a). Rather than modulating the carrier directly, the linear interpolation is implemented based on  $B_x$ . Taking the first two points,  $B_1$ and  $B_2$ , as an example, by dividing the segment  $B_1B_2$  into four smaller segments, samples  $D_1$ ,  $D_2$ ,  $D_3$ ,  $D_4$  and  $D_5$  can be obtained and used to replace  $B_1$  and  $B_2$  for modulating the carrier. By applying the described operation to all the points in  $B_x$ , a new waveform (marked in red) containing  $D_x = \{D_1, D_2, \dots, D_{17}\}$  is obtained with  $D_1, D_5, D_9, D_{13}$ and  $D_{17}$  coinciding with the previous  $B_1, B_2, B_3, B_4$  and  $B_5$ points. The translation from  $A_1, A_2$  and  $A_3$  to  $D_x$  can be summarized as up-sampling the symbols to  $B_x$  in DSP and applying  $4 \times$  linear interpolation to get  $D_x$ . It is interesting to note that there is no need to realize this linear interpolation in DSP. Since the proposed interpolation is henceforth implemented by analog circuitry in this work, it is named as analog linear interpolation (ALI).

Generalizing it for a  $2^m \times$  ALI, the frequency response is

$$H_{2^{m} \times \text{ALI}}(f) = \operatorname{sinc}\left(\frac{\pi f}{f_{s}}\right) \prod_{k=1}^{m} \cos\left(\frac{\pi f}{2^{k} f_{s}}\right)$$
(2)

As expected, the  $2^m \times$  linear interpolation can provide suppression to the signals located at  $nf_s \pm f_{BB}$ ,  $n = 1, 2, ..., 2^m - 1$  [20], [22]. The signal experiencing  $2^m \times$  ALI can be expressed as

$$S_{2^{m} \times \text{ALI}}(f) = D_{s}(f) \cdot H_{2^{m} \times \text{ALI}}(f)$$
(3)

The suppression of  $2^m \times$  ALI can be calculated using the following equation:

$$\operatorname{Sup}_{2^{m} \times \operatorname{ALI}}(f) = 20 \log \left| \operatorname{sinc} \left( \frac{\pi f}{f_{s}} \right) \prod_{k=1}^{m} \cos \left( \frac{\pi f}{2^{k} f_{s}} \right) \right| \quad (4)$$

Substituting  $f = nf_s - f_{BB}$  into (4) for m = 0 (i.e., ZOH) and m = 1, 2, 3, the corresponding amplitude of the upsampled baseband signals after ZOH,  $2\times$ ,  $4\times$  and  $8\times$  ALI can be obtained and are now plotted as gray circles in Fig. 5 for  $f_{BB} = 1.1719$  MHz and  $f_s = 150$  MHz. These circles are further connected by gray dotted lines to show the trend. Meanwhile, to verify these theoretical results in a more practical system, a Verilog-A model of RFDAC with ZOH,  $2\times$ ,  $4\times$ and  $8 \times$  ALI is built and simulated. It should be noted that this RFDAC model contains the traditional class-E mode matching network with bandpass filtering (BPF), which is structured as later shown in Fig. 7(a). The simulation conditions for the input are set as in the theoretical calculations. The output spectra with  $f_c = 2.4$  GHz are plotted in Fig. 5, while the spectra around  $f_c$  and  $f_c + f_s$  are zoomed in with more details. Both the theory and simulations verify the suppression of the first to  $(2^m - 1)$ -th replicas. Even though the class-E matching network exhibits some BPF effects in suppressing the replicas located beyond 3 GHz, it can hardly suppress the replicas closer to the main tone at  $f_c$ . Contrarily, the proposed  $2^m \times$ ALI has a potential of also suppressing these close-in replicas. Superimposed in Fig. 5(d) are *measured* amplitudes of the first to eighth replicas of the  $8 \times$  ALI, which are obtained from Fig. 20 in Section V.

Even though the ALI is demonstrated with a single modulating tone, the conclusions are also valid when the baseband signal would be spread over a frequency band. To note the two phenomena related to Fig. 5 as well as Eq. (2), let us define  $r = f_s/f_{BB}$ , which is a measure of how wideband the signal is versus the sampling frequency. First, for the same r, the first replica, which is located at  $f_c + f_s - f_{BB}$  and receives the least suppression from the ZOH, is not further significantly suppressed beyond 2× ALI. This can be observed from the



Fig. 5. Theoretically calculated baseband amplitude and RFDAC output spectra from Verilog-A model simulations with different interpolation orders: (a) without ALI; (b)  $2 \times$  ALI; (c)  $4 \times$  ALI; (d)  $8 \times$  ALI. The measured results of  $8 \times$  ALI are superimposed in (d).



Fig. 6. Amplitude suppression of the signal components located at  $nf_s - f_{BB}$ , n = 1, 2, 3, ..., under  $8 \times ALI$  with different *r*.

zoomed-in spectra around  $f_c + f_s \pm f_{BB}$  in Fig. 5. The reason is the sinc[ $(\pi f)/f_s$ ] function in (2) being common for all k = $1, \dots, 2^m$  linear interpolation orders and  $\cos[(\pi f)/(2^k f_s)]$ factors not effective in filtering at lower values of f. Second, for a  $2^m$ -order ALI, r must be higher than a certain value to make sure the first replica has lower amplitude than that of the  $2^m$ -th replica after the suppression. One example is shown in Fig. 6, where a modulating signal with frequency  $f_{BB}$  is up-sampled with  $f_s = 150$  MHz and interpolated by the 8× ALI. In this example,  $r \ge 11$  would be necessary for  $f_s =$ 150 MHz in order to achieve a lower amplitude for the first replica than the eighth replica.

Compared to the higher sampling-rate operation, such as  $4f_{s,B}$  in Fig. 4(a), which calls out for a high-speed high-order complex digital filter, the proposed ALI can utilize a lower sampling-rate filter by moving the higher frequency operation to the analog domain where the high frequency carrier is available (e.g, via an edge division of the LO clock). Thus, to realize the benefits of ALI, the 8× interpolation is chosen for the silicon implementation.

# C. Multi-Port Transformer-Based Class-E Mode Power Combiner

The schematic of a conventional transformer-based class-E power combiner in [23] can be simplified as in Fig. 7(a). In this architecture, the RFDAC is terminated by  $R_{\text{load}} = 50 \ \Omega$  and the switches in each of the four branches operate in the 25% duty-cycle non-overlapping mode. The transformer's turns-ratio can be expressed as

$$n_{\rm tr} = \sqrt{\frac{L_{\rm tn}}{L_s}} = \frac{N_2 \times k_m}{N_1} \tag{5}$$

where  $N_1$  and  $N_2$  denote the number of turns in the primary and secondary stage, respectively, while  $k_m$  is the coupling factor.

The values of  $C_s$ ,  $L_s$ ,  $L_{tn}$  and  $C_{tn}$  need to be determined before physically implementing the power combiner. The above values are easy to obtain in the single-ended class-E mode power combiners shown in Fig. 7(e) since they have been well researched starting with [9], [25]. To extend the conclusions for the circuits shown in Fig. 7(a), it would be beneficial to analyze the conversion from the quad differentially ended branches to one single-ended branch.

As an equivalent circuit of Fig. 7(a), Fig. 7(b) is constructed by replacing the real transformer with an ideal one in which the turns-ratio is set to be one. Consequently, the values of inductance, capacitance and load resistance are modified to the depicted values. By applying the half-circuit concept as well as omitting the ideal transformer, Fig. 7(c) is achieved. Fig. 7(d) differs from Fig. 7(c) by including the separate I and Q paths which are not merged. Fig. 7(e) is obtained by taking the I-path only. One point worth mentioning is that the load stays at  $R_{\text{load}}/2n_{\text{tr}}^2$  in Figs. 7(c)–(e). With the above conversion and based on the analysis in [9] and [25], the component values can be calculated by the following formulas, which are specifically derived for the 25% duty-cycle class-E mode network:

$$L_s = \frac{3.56}{2\pi f_c} \cdot \frac{R_{\text{load}}}{2n_{\text{tr}}^2} \tag{6}$$

$$C_s = \frac{0.21}{2\pi f_c} \cdot \frac{2n_{\rm tr}^2}{R_{\rm load}} \tag{7}$$

where  $R_{\text{load}} = 50\Omega$ . Combining (5) and (6),

$$L_{\rm tn} = \frac{3.56}{2\pi f_c} \cdot \frac{R_{\rm load}}{2} \tag{8}$$

can be obtained. In Fig. 7(e), the inductance valued  $L_{\rm tn}/n_{\rm tr}^2$  and the capacitance valued  $n_{\rm tr}^2 C_{\rm tn}$  compose the LC tank resonating at  $f_c$ , creating an infinite impedance for the fundamental signal. Thus,

$$C_{\rm tn} = \frac{1}{3.56\pi f_c R_{\rm load}} \tag{9}$$

One conclusion that can be drawn from (8), (9) and Fig. 7(a) is that the inductance of the secondary stage  $L_{\text{tn}}$  as well as capacitance  $C_{\text{tn}}$  are firmly established by the fixed system



Fig. 7. Equivalent transformation between differential-ended class-E mode network and single-ended class-E mode network: (a) differential-ended class-E mode network with I- and Q-paths; (b) equivalent model of (a); (c) and (d) single-ended class-E mode network with I- and Q-path; (e) differential-ended class-E mode network with I-path.



Fig. 8. Model simulation results of the PA efficiency changing with the number of turned on switches at different turn-ratios.

specifications, such as duty-cycle, operating frequency and the load value, which is different from  $L_s$  and  $C_s$  that have one additional constraint from the transformer's turns-ratio.

Based on the above results and Fig. 7(a), the relationship of the drain efficiency versus turns-ratio is explored. The cascode-structured NMOS switches (shown later in Fig.13) are utilized to replace the ideal switches in Fig. 7(a), which will be further analyzed in Section III. The component values are calculated using the above equations, with  $1/n_{\rm tr}$  being set to 1:3, 1.2:3, 1.5:3, 1.7:3 and 2:3 in each simulation round. The trends of the drain efficiency versus the number of turnedon switches at different  $1/n_{\rm tr}$  are plotted in Fig. 8 as curves of different colors. At this point, it is necessary to mention that in these simulations the transformer coils are of zeroresistance and do not cause any power loss, while the switches have the same structure as that shown in Fig. 13 and are constructed by the transistors without considering the layout routing parastics. Even though (8) and (9) are based on the zero-voltage-switching (ZVS) operation of class-E, they are still practical in this analysis.

Taking the green curve with  $1/n_{tr} = 1.5:3$  as an example, the PA drain efficiency at first increases as the number of onswitches rises. However, when the number goes beyond 38, the efficiency starts to drop. Similar trend can be observed for the other curves, yet with different peak points. To analyze these results, the following steps can be taken. Firstly, mark out the intersections of the adjacent curves as *A*, *B* and *C* in Fig. 8. Secondly, create three vertical thin black lines across A, B and C, separately. Thus, the whole operational space is divided into four zones. Thirdly, pick the highest curve segments in each zone and combine them into a new composed curve denoted by the dashed black curve. This dashed black contour indicates a possibility of achieving a high back-off drain efficiency by adjusting the transformer's turns-ratio appropriately according to the input data during the operation.

The number of turns in the primary/secondary stage and the coupling factor are two aspects that can influence the turnsratio as illustrated by (5). Thus, to design a class-E RFDAC with an improved drain efficiency, a transformer-based matching network can be considered with the following features: 1) the matching-network transformer has multiple primary coils and one secondary coil, which creates the possibility of varying the transformer's turns-ratio by adding/reducing the number of primary coils; 2) different groups of switches connect with different primary coils, which provides the chance that the turns-ratio can be controlled by the input data; 3) the value of individual inductance and capacitance, depending on their roles in the matching network, still meets the requirement of class-E operational principle.

## **III. CIRCUIT DETAILS**

# A. Topology and Functions of Top Circuits

The overall architecture of the Cartesian RFDAC is shown in Fig. 9. The carrier of four 25% duty-cycle clock phases,  $LO_{\rm IP}$ ,  $LO_{\rm IN}$ ,  $LO_{\rm QP}$  and  $LO_{\rm QN}$ , is generated by the LO generator from a single-ended external input  $LO_{\rm ref}$ . The I/Q baseband data streams are sent into the chip through the low-voltage differential-signaling (LVDS) interface. The data streams are parallelized into  $2 \times 9$ -bit binary code in the serialto-parallel-converter (S2P). The external signal *sync\_s* and the internal signal *sync\_f* that is from the LO generator are used to parallelize and synchronize the binary data. To control the 6-bit/2-bit segmented digital-to-RF-amplitude converter (DRAC), the 8-bit amplitude binary code is split into 6 MSB bits and 2 LSB bits, which are then separately converted into the MSB and LSB thermometer codes in the encoder (ENC).



Fig. 9. Top architecture of the designed Cartesian RFDAC.

As the key building block in the proposed RFDAC, the DRAC comprises a P-array and an N-array. The P- and N-arrays have a fully symmetric typology and connect to the positive and negative terminals of the matching network, correspondingly. According to (1), the P-array operates with  $LO_{\rm IP}$  and  $LO_{\rm OP}$ , while the N-array works with  $LO_{\rm IN}$  and LO<sub>ON</sub>. There are 63 MSB and 9 LSB identical unit cells in each P- and N-array. The unit cells are categorized into three sub-arrays, indicated by green, blue and red colors in Fig. 9, of which the 1st sub-array contains 11 MSB unit cells plus 3 LSB unit cells; the 2nd one contains 17 MSB unit cells plus 3 LSB unit cells; and the 3rd one contains 35 MSB unit cells and 3 LSB unit cells. Note that the 3 LSB unit cells in each sub-arrays are identical. The three sub-arrays correspondingly connect to P1 + / P1 -, P2 + / P2 - and P3 + / P3 - of the matching network. Furthermore, the way of grouping the MSB unit cells is intimately effected by the physical transformer design, which is to be explained in the later section. The MSB unit cells are controlled by signal  $M_{I/Q}$  with a weight of four, making the PA devices in MSB unit cells four times the width of those in the LSB unit cell.

As another key block, the matching network comprises a multi-port transformer and several capacitors. The multi-port transformer has three coils at the primary stage and one coil at the secondary stage. Displayed in Fig. 9, the three primary coils are marked in green, blue and red, in order to illustrate the way how the DRAC connects with the matching network. Similar to the schematic in Fig. 7(a),  $C_{s1}$ ,  $C_{s2}$  and  $C_{s3}$  are the shunt capacitors, while  $C_{tn}$  and the secondary coil comprise the LC tank resonating at the fundamental frequency.

The operation of the DRAC and matching network can be described as follows. As the amplitude control word (ACW) gradually increases, the MSB thermometer-coded  $M_{I/Q}(62:0)$  increments its weight, enabling their related MSB unit cells. At first, only the first sub-array (green) operates and the current flows in/out of P1+/P1-. The RF power is coupled to the secondary coil and transferred to the external load. When  $M_{I/Q}(12)$  becomes '1', the second sub-array (blue) starts to operate together with the filled-up first sub-array, causing the current to also flow in/out of P2 + /P2-. The superimposed RF power is coupled and then transferred to the load. Similarly, there will be current flowing in/out of



Fig. 10. Control logic circuits for LSB unit cells.

P3+/P3- when  $M_{I/Q}(29)$  becomes '1' and enables the third sub-array (red). In this situation, all of the three primary coils can couple their RF power to the secondary coil, resulting in more power delivered to the load.

According to the operation described, the three groups of LSB unit cells should be controlled not only by  $L_{I/Q}\langle 2:0\rangle$ , but also by  $M_{I/Q}\langle 12\rangle$  and  $M_{I/Q}\langle 29\rangle$ , which are the control signals of the the 12-th and 29-th MSB unit cells. Fig.10 discloses the control logic. We define the *effective* sub-array as one having the enabled and disabled MSB unit cells at the same time. Based on the thermometer code control, when  $M_{I/Q}\langle 12\rangle$  is '0', the 1st MSB sub-array is the effective one and only the LSBs in red in Fig. 9 are enabled. When  $M_{I/Q}\langle 12\rangle$  is '1' and  $M_{I/Q}\langle 29\rangle$  is '0', the 2nd MSB sub-array is the effective one and only the LSBs in blue should be enabled. Similarly, when  $M_{I/Q}\langle 29\rangle$  is '1', the LSBs in green should be enabled.

#### B. Implementation of Analog Linear Interpolation

Introduced in Section II-B, the  $8 \times ALI$  functionality is incorporated in all MSB/LSB unit cells of DRAC.<sup>1</sup> Let us start with the conventional method shown in Fig. 11(a) and assume that the input data samples *D* are at 150 MHz and the carrier clock *CK* is at 2.4 GHz. The DFF synchronizes *D* with *CK*, producing data  $D_0$ . Then,  $D_0$  modulates *CK* to generate an amplitude modulated carrier signal by means of turning on/off the switches. When the switches are turned on with the on-resistance *R*, the current  $I_{no-inp}$  is generated. It is worthy noticing that the DFF which produces  $D_0$  is triggered by the next *CK* rising edge after the data transition. As a result,  $D_0$ is a delayed version of *D*. Now, if a second DFF is added after

<sup>1</sup>In this implementation, the carrier frequency  $f_c$  should be  $8 \times$  of  $f_s$ , but it could be fairly easily redesigned for other integer or even fractional numbers.



Fig. 11. Circuits of (a) the conventional way without ALI; (b) the proposed way with  $8 \times$  ALI.



Fig. 12. Time-domain waveform of the group of delayed signal and the produced current signal w/ and w/o 8× ALI.

the main one and triggered by the same CK, its output will be further delayed by one CK period. Similarly, if a number of DFFs are added and triggered by CK, a group of signals delayed by the staggered CK periods can be obtained.

Based on the above, a simplified schematic of the proposed ALI is shown in Fig. 11(b), where the DFF chain consisting of 15 DFFs is cascaded with the main DFF of Fig. 11(a). The eight CK-synchronized outputs of the chain,  $D_0$ ,  $D_1$ , ...,  $D_7$ , are tapped off from the odd-numbered DFFs. Their waveforms are shown in Fig. 12. The current signal  $I_{inp}$  up/down steps occur at the 1.2 GHz rate.

As can be observed at the right side of Fig. 11(b),  $D_0$ ,  $D_1, \ldots, D_7$  modulate CK in parallel but with staggered delays. Correspondingly, there are eight identical switches to be controlled by  $D_0$ ,  $D_1, \ldots, D_7$ . To maintain the same total current, the on-resistance of these switches is 8R. When D changes from '0' to '1',  $D_0$ ,  $D_1, \ldots, D_7$  also change from '0' to '1' in eight steps as indicated by the waveform in Fig. 12. The eight switches are turned on one by one, resulting in the current  $I_{inp}$  experiencing a ramp envelope of eight units, each delayed by two RF clock cycles. Similarly, the current *envelope* decreases over the 8 unit steps when the D bit transitions from '1' to '0'. Comparing  $I_{inp}$  with the dashed line  $I_{no-inp}$ , which represents the current *envelope* of Fig. 11(a), it can be concluded that the  $8 \times ALI$  is realized by the proposed



Fig. 13. Circuits of the modulator and DPA.

method in Fig. 11(b). Additionally, since

$$\int_{t1}^{t2} I_{\text{no-inp}}(t) dt = \int_{t1}^{t2} I_{\text{inp}}(t) dt,$$

the ALI circuit still transfers the same quantity of current to the matching network as the conventional one.

#### C. Circuits for Modulator and DPA

Taking one MSB unit cell of DRAC's P-array as an example, Fig. 13 illustrates the key circuits in the unit cells in more detail. In the dashed box of Fig. 13 there are the two blocks that pre-process the signals as well as the carrier.  $M_{I/Q}$ , being the bus of control codes for this unit cell, becomes  $D_{I/Q}$ at the output of the input buffers.  $D_{I/Q}$  experiences the 8× linear interpolation in the ALIs and produces  $D_{Ix}$  and  $D_{Qx}$ , x = 0, 1, ..., 7, as explained in Section II-B.  $CK_{IP}/CK_{QP}$ undergo the sign-bit selection before triggering the DFFs in the ALIs and getting modulated by  $D_{I/Q}$ . The selection in P- and N-array can be described as

$$CK_{\rm IP} = LO_{\rm IP} \cdot I_{\rm sign} + LO_{\rm IN} \cdot \overline{I_{\rm sign}}$$
$$CK_{\rm OP} = LO_{\rm OP} \cdot Q_{\rm sign} + LO_{\rm ON} \cdot \overline{Q_{\rm sign}}$$

and

$$CK_{\rm IN} = LO_{\rm IP} \cdot \overline{I_{\rm sign}} + LO_{\rm IN} \cdot I_{\rm sign}$$
$$CK_{\rm ON} = LO_{\rm OP} \cdot \overline{Q_{\rm sign}} + LO_{\rm ON} \cdot Q_{\rm sign}$$

respectively, in order to cover all four quadrants of the I/Q constellation map [23].

The RFDAC concept in Fig. 2 is implemented to realize the modulator in order to reduce the idle time for PA units. However, rather than the *AND-AND-OR* logic, the *NAND-NAND* logic is adopted, because the latter can maintain the same functionality, yet containing fewer transistors.

The cascode structure is chosen to construct the DPA, as shown in Fig. 13. The DPA is composed of a 1.0-V core NMOS transistor and 1.8-V thick-oxide cascode NMOS transistor. This DPA is different from the one utilized in [23], where a single device is utilized due to its better noise performance and smaller on-resistance. However, its reliability is challenged when the gate voltage reaches 0 while the drain voltage swings above  $V_{DD}$ . An excessive voltage drop between the drain and gate terminals cannot be avoided in the



Fig. 14. 2.5-D view of the entire active-under-coil integration: (a) overall view, (b) layered view.

class-E mode, especially in high output-power applications. Consequently, we adopt the cascode structure to ease the reliability issues [24]. Meanwhile, the minimum allowable length is selected for both types of transistors for the purpose of handling the 2.4 GHz operational frequency. The width of the transistors is determined by balancing the on-resistance and the targeted output power.

## IV. ACTIVE-UNDER-COIL INTEGRATION

To be able to save the manufacturing cost by significantly reducing the occupied silicon area, the presented RFDAC has been engineered such that the entire active circuitry is located underneath the matching transformer. The RF switching arrays and all other supporting circuits use active CMOS layers and lower metal layers, while the passive components (the matching-network transformer) use the thicker metal layers directly above. This is the first such demonstration of activeunder-coil integration in moderate-to-high output power RF transmitter systems. The pseudo 2.5-dimensional (D) view of the implemented active-under-coil integration is shown in Fig. 14, of which subfigure (a) shows the comprehensive rendering and (b) shows the layered perspective.



Fig. 15. Imitation 2-D view of the transformer layout.

Active-under-coil integration takes full advantage of all of the available layers in a CMOS process technology, especially in advanced (fine-line) nodes. To understand this, it deems helpful to examine the 28-nm CMOS node utilized in this design. The technology contains one aluminum re-distribution layer (AP) and nine copper routing layers (M1~M9), of which M8, M9 and AP are thick and hence, have relatively low sheet resistance, while M1~M7 have the same but much higher sheet resistance. The sheet resistance of the thin lower metal layers is around  $41\times$ ,  $88\times$  and  $20\times$  the value of the sheet resistance of M8, M9 and AP, respectively. Generally, as indicated in Fig. 14, the transformer, as well as its shielding, are mainly constructed by the upper metal layers, while the DRAC is limited to the bottom metal layers. The layout details follow in the subsections.

Before discussing the details, mainly in Sec. IV-B, it is important to point out that extensive (i.e., taking days or even weeks on a cluster of computing servers) EM model extraction and simulation in Cadence's EMX are needed at each of many iterations between the layout and simulations for verifying, for example, the inductors' value and the self-resonant frequency. The final transformer contains 150 ports to connect with DRAC, RF power supply and RF ground. To obtain a useful model, it is necessary to include as many metal layers as possible, yet impractical in reality. Thus, the most complex transformer model used in the simulations considers the layers (M8, M9, AP) constructing the coils, the shielding (M6, M7) and other layers occupying a significantly larger area (M1, M2).

## A. Transformer Layout

The imitation 2-D view of the multi-port transformer is rendered in Fig. 15, with the terminal names and the colors matching those marked for the transformer symbol in Fig. 9.

In the presented transformer, the primary winding structures are placed on M9, while the crossover routing is placed on M8 or AP. This is done to reduce the power losses caused by the metal resistance. As mentioned earlier, there are three coils in the primary stage and one coil in the secondary stage. According to (8) and Fig. 7, the theoretical value of the secondary inductance should be 5.91 nH for the class-E mode matching-network operating with 25% duty-cycle at 2.4 GHz. Thus, this value is a hard design constraint while searching for the optimal geometry of the secondary coil. As (5) and (6) are under-constrained for establishing the primary inductor value and turns-ratio  $n_{\rm tr}$ , finding the proper turns-ratio requires a number of iterations between the layout and circuit simulations. In other words, instead of being merely a constraint for the layout, the inductor values of the primary coils rely on the layout. Eventually, the multi-port transformer is designed with the inner diameter of 275  $\mu$ m, the width of the coils of 9  $\mu$ m, and the space between the two turns of the secondary coil (gray) is 48  $\mu$ m. These design parameters and topology are determined by considering the trade-off among the self-resonant frequency, the occupied area as well as the floor plan of the DRAC underneath.

The yellow octagonal plate in Figs. 14 and 15 represents the patterned ground shield (PGS) and floating shield adopted to reduce the substrate losses [13] [26], and also effectively shielding underlying circuits from the electromagnetic fields to make use of the area below for a compact design.

As shown in Fig. 14(b), M1~M5 are taken up by building the DRAC arrays, thus M6 and M7 are the layers left for shielding. The PGS, which has a better shielding efficacy, is built up by grounding stripped lines that are positioned in the direction perpendicular to the transformer wires. Connecting the ground to the shield would introduce quite a bit of capacitance and result in the Q-factor degradation due to the significant coupling from M9-M8 to M7 or M6. Thus, M6 is selected for PGS due to the farther distance from M9-M8. The floating shield is implemented by putting floating M7 metal pieces extending the whole area of the transformer to decrease the capacitive load, even though the floating shielding has discounted efficacy due to the introduced charge on it.

Considering that M6 and M7 are relatively close to M9, in order to further minimize parasitics, both the stripped lines and the floating pieces are of the minimum width that still satisfies the design rule checker (DRC).

Below we state the principle for grouping the MSB unit cells mentioned in Section III-A. Having established the transformer structure, the multi-port model can then be extracted by an EM simulator to instantiate the curves illustrated in Fig. 8. The steps are: 1) attach all the MSB unit cells with terminals to the first primary coil in the EM model and sweep ACW ( $1 \sim 64$ ) to get the drain efficiency versus the ACW curve; 2) repeat this to the second and third primary coils; 3) find the two intersections of the three curves. The abscissa of the two intersections indicate the way of grouping the MSB unit cells. Note that it might require a few iterations between the transformer layout and simulation phases until the intersections are located not far away from the peak points of the drain efficiency curves.

#### B. DRAC Layout

The 2-D view of the DRAC is presented in Fig. 16. The symmetric P- and N-arrays of DRAC are located respectively in the top and bottom halves of the layout. Also shown are the square and rectangular boxes, of which the squares contain four unit cells and the rectangles, two. The boxes are



Fig. 16. Floor plan and key details of DRAC layout design.

further marked with green, blue or red stripes for the purpose of matching them with the three sub-arrays in Fig. 9. The reason for physically locating the sub-arrays this way is for the convenience of short interconnects with the transformer, which will be elaborated upon later. The unit cells are designed with layers M1–M4. The motivation of assigning four or two cells per unit is to share one local clock booster, which is responsible for recovering the carrier clocks' transition edges after a fairly long journey through an H-type clock tree from the LO generator. The H-type clock tree that mainly utilizes M5 with some small amount of M4, indicated as the orange route in Fig. 16, is one of the measures to retain precision of the duty cycle. Another measure to maintain the accuracy of carrier clocks is to employ the dummy cells shown as the golden dashed squares and rectangles in Fig. 16. They are due to the fact that not all the tails of the H-type clock tree are loaded with a clock booster. There are three types of dummy cells marked with A, B and C. All types of the dummy cells have a loading buffer that is the same as in the clock booster. Especially, type A dummy cells contain additional de-coupling MOS capacitors, in order to occupy the same area as the squares composed by four unit cells. The strategy for type Bdummy cells is analogous. The above methods have helped to maintain a fully symmetric layout in DRAC, as shown in Fig. 16.

Special care has been exerted to properly arrange the DRAC unit and dummy cells in order to ease wiring complexity when connecting the drain nodes with the primary coils. To detail the physical arrangement, the connection between the the first sub-array and the first primary coil is taken as an example. Denote the connecting points in the coils as *terminal points*. T1, T2, ..., T5 in Figs. 14(b) and 16 represent the terminal points in half-side of the first primary coil. Since the transformer is laid out on the top, the unit cells of the first sub-array are arranged either right below or below near the terminal points. For the unit cells right below the terminal points, such as T1, T2, T3 and T5, only vias (green dashed lines in Fig. 14(b)) are needed for the connection. As for the unit cells *n*ear the terminal points, such as T4, an additional



Fig. 17. Turns-ratio change trend versus the number of active unit cells.

short wire (green solid line) is needed to bridge the gap. The short wire is perpendicular to the target side of the coil, hence the minimum length guarantees minimum parasitics. The same approach is applied for the connections between the second primary coil and the second sub-array. Note that M8 is used for these short wires.

The above method is adjusted for the connection between the third primary coil and the third sub-array. This is because most of these unit cells are located in the center area without the coils immediately above, as illustrated in Fig. 14. Therefore, for these cells as well as those below the crossing areas of the coils, rather long wires must be used to attach the drain nodes to the vias underneath the terminal points of the third primary coil. The combinations of short wires and vias are used for the rest of unit cells in the third sub-array. The red lines in Fig. 16 describe the way of routing these long and short wires. Even though not shown, it is necessary to mention that M9 constructs the routing in the center area while the rest are still in M8.

Deterioration of the self-resonant frequency caused by the drain connections is mitigated by the connection lines being perpendicularly routed and as short as possible. Additionally, the utilization of M8 is carefully planned, making it possible to be re-used for ground in this active-under-coil integration which will be explained shortly.

Besides changing the number of effective primary coils in response to the ACW data, the coil's inductance changes slightly within each winding selection zone when different terminal points are selected. Calculated from the S-parameter obtained by the EM simulations, the maximum inductance values of the first/second/third primary coils are 4.59/2.31/1.47 nH, while the minima are 4.07/1.89/1.14 nH. The simulations in Fig. 17 show that the turns-ratio changes depending on the selected number of active unit cells. The three zones can be clearly identified. The zoomed-in area shows the slight turns-ratio variation for the first selected coil. It can be discerned that the two large jumps of the curve occur when the second and third primary coils start to join the network, while the more gradual slopes, shown within the dashed green/blue/red ovals, are introduced by the terminal points. Irrespective of whether examined globally or locally, the curve rises monotonically, which means the connection method for DRAC and matching network appears reasonable. However, the two large jumps cannot be avoided if the minimum inductance value of the first/second coils is not equal to the maximum inductance value of the second/third coils.

There are three groups of  $C_s$  connecting with the beginning and ending terminals of the three primary coils. The values of these three  $C_s$  are calculated by (7) and adjusted in the simulation for higher output power and drain efficiency. The final values of these  $C_s$  are 80%~90% of their original values. All these  $C_s$  are implemented with M1~M5 and located in the gaps between the unit cells in DRAC.

The RF ground, i.e., for the RF switch devices, is separated from the ground for digital circuitry, resulting in two ground domains in this DRAC. Low ground impedance is necessary for the performance and efficiency and so thick metals would naturally be preferred. In this active-under-coil integration, the thickest copper layer, such as M9, is largely used in the transformer while the AP is above M9, thus needing additional vias to connect. M8 is the only suitable layer with a low sheet resistance that can be selected for the PA ground. As such, M8 is only sparingly used for other purposes. On the other hand, the usage of large pieces of M8 for minimizing the ground impedance can decrease the self-resonant frequency of the transformer. Therefore, the utilization of M8 must be carefully planned. The white lines in Fig. 16 reveal the solution. The digital ground for the logic circuits in DRAC is built with M1, while the power supply is with M2, both of which are shown as the purple plate in Fig. 16. By adopting these approaches, the resulting DRAC layout features the plates of digital ground/power supply that lie in the unit and dummy cells.

As described above, the proposed integration method has changed the way of connecting the drain terminals to the transformer, making it different from the conventional 'snake' arrangement [4] designed to avoid glitches. The ground floor plan is also different due to the inability to use the top metal layer freely. Besides, the digital ground and power supply are limited to use only the lower metal which should be used as much as possible for a lower IR drop. All these result in sacrificing the performance for a smaller area, which will be shown in Section V-C.

## V. MEASUREMENT RESULTS

The proposed RFDAC ideas have been verified in TSMC 28-nm LP flavor of CMOS. Shown on the right-hand side of Fig. 18 is the microphotopraph of this chip, which occupies an area of 1.31 mm×1.22 mm in total and 0.59 mm×0.59 mm for the core octagon containing the *entire* DRAC. Only the auxiliary blocks, which are used for setup, test and characterization, i.e., the LO generator, B2T ENC and S2P, are outside of the core octagon and are marked as white rectangles. The lefthand side of Fig. 18 shows the layout of the DRAC underneath, where the blue boxes are the unit cells, while the purple ones are the dummy cells.

The measurement setup is depicted in Fig. 19. The 2 × 9-bit input data which is up-sampled to  $f_s$  is generated in MATLAB, then written into block memories of a Field Programmable Gate Array (FPGA) board through Universal Asynchronous Receiver-Transmitter (UART). The chip can read the data from the memories through FPGA Mezzanine Card (FMC) connectors. The writing and reading process can be controlled manually by pressing buttons on the FPGA evaluation board, since the buttons can be programmed as desired. Two signal



Fig. 18. Microphotograph of the designed RFDAC (right), and the layout of the DRAC underneath (left).



Fig. 19. Measurement setup used to characterize performance of the designed RFDAC.

generators are used during the measurements, of which one provides the single-ended LO to the on-chip LO generator, and the other supplies the user clock for the FPGA evaluation board. The chip is directly powered by the DC supplies. There are two DC voltage domains: one is 1.8 V used in LVDS and the switches in unit cells, the other is 1 V used for the rest of circuits. The output of the RFDAC is fed to a spectrum analyzer as well as a vector signal analyzer (VSA), depending on the characteristics to be measured.

The measurement results are provided *without* any digital pre-distortion (DPD). Employing a DPD could substantially improve the linearity. For example, in [4], the I/Q RFDAC's image suppression was boosted >25 dB by applying a fourth-order memoryless polynomial approximation, and was improved 19 dB by applying a look-up table. Additionally, in [2] the RFDAC linearity was increased by finely tuning the size of the switch devices, achieving EVM = -36 dB for 20 MHz 64-QAM, which was better (EVM = -28 dB also for 20 MHz 64-QAM) than by adopting the DPD in [4].

#### A. Replica Suppression

In these measurements, the baseband I- and Q-path signals are sinusoidal signals of which the frequency is 1.1719 MHz. The baseband I- and Q-signal are further up-sampled to  $f_s = 150$  MHz. The produced 150 MHz I/Q signal is sent to the chip via the FPGA evaluation board and experiences the on-chip 8× ALI before modulating the  $f_c = 2.4$  GHz carrier.

To verify the efficacy of the suppression of replicas, the measured wideband spectrum in the range between 2.2 GHz and 5.0 GHz is shown in Fig. 20. The measured replica amplitudes have been annotated in Fig. 5(d) in Section II-B to compare with the theoretical and simulated



Fig. 20. Measured output spectrum with  $8 \times ALI$  from 2.4 GHz to 4.8 GHz ( $f_c = 2.4$  GHz,  $f_s = 150$  MHz).

#### TABLE I

| Replicas [dBc] | $1^{st}$ | 2 <sup>nd</sup> | $3^{\rm rd}$ | $4^{\mathrm{th}}$ | $5^{\mathrm{th}}$ | $6^{\mathrm{th}}$ | $7^{\mathrm{th}}$ | $8^{\mathrm{th}}$ |
|----------------|----------|-----------------|--------------|-------------------|-------------------|-------------------|-------------------|-------------------|
| Measured       | -73      | -84             | -83          | -77               | -82               | -84               | -89               | -64               |
| Simulated      | -82      | -89             | -93          | -95               | -97               | -97               | -98               | -73               |

values using Verilog-A system behavioral modeling. Table I further lists the measured and simulated numbers.

It can be verified that the implemented  $8 \times$  ALI has suppressed the first seven replicas, resulting in their amplitudes well below that of the eighth replica.

It should be noted that the Verilog-A model in Section II-B is a behavioral model constructed by the ideal class-E matching network. It does not consider, for example, the imperfect synchronization among the  $2 \times 9$ -bit data, or the nonlinearity introduced by the multi-port transformer based matching network. These can account for the gap between the simulation and measurement. The measured suppression for the fourth replica is much lower than simulated. The key reason is a parasitic coupling from the input data being sent to the chip in the serial data streams at the frequency of 4  $f_s = 600$  MHz as described in Section III-A.

#### **B.** Static Measurements

Two kinds of sweeps were done for the static measurements. The first one is a frequency sweep in which the maximum output power and the peak efficiency are recorded across the carrier frequency in order to examine the operating range. The second one is the ACW sweep in which the chip works across all the ACW levels at the best operating frequency (i.e.,  $\sim f_{center}$ ) evaluated from the frequency sweep measurements.

In the frequency sweep, the input data is at the maximum ACW and the carrier frequency is swept from low to high. During the measurement, the output power,  $P_{\text{RF}}$ , the power consumption of the RF,  $P_{\text{dc,RF}}$  and digital circuitry,  $P_{\text{dc,dig}}$ , are recorded to calculate the drain efficiency as well as the system efficiency. Curves in Fig. 21(a) and (b) display the results.

The 3-dB bandwidth is from 1.8 GHz to 2.8 GHz, so the center lies at  $f_{center} \approx 2.3$  GHz. The maximum output power and peak drain efficiency happen at 2.4 GHz, which aligns



Fig. 21. Measured (a) maximum output power versus frequency, and (b) drain and system efficiency versus frequency.



Fig. 22. Measured (a) output power and drain efficiency versus ACW, and (b) drain efficiency versus output power. Carrier = 2.4 GHz.

with the design target. Even though the peak system efficiency happens at 2.3 GHz, it drops only slightly at 2.4 GHz.

The ACW sweep is carried out at 2.4 GHz. The RFDAC is programmed to sequentially go through all 256 ACW levels, while  $P_{\rm RF}$  and  $P_{\rm dc,RF}$ ,  $P_{\rm dc,dig}$  are recorded. The curves describing the output power and drain efficiency versus ACW are plotted in Fig. 22(a). The measured maximum output power and peak drain efficiency are 17.47 dBm and 19.03%, respectively. Fig. 22(b) plots the drain efficiency versus output power, from which the 3-dB back-off efficiency is 15.90%, which is 84% of the peak efficiency. The 6-dB back-off efficiency is 12.48% which is 66% of the peak efficiency.

Three distinct zones can be discerned in both the output power and the drain efficiency curves. This phenomenon can be traced back to Fig. 17, where the curve of turns-ratio also contains three segments.

To evaluate the proposed method, there are two important aspects needed to be considered. First, the model resulting in Fig. 8 has omitted the loss caused by the transformer. Second, it is impossible in practice to get the full EM model of the transformer, including all the active circuits underneath.



Fig. 23. Output peak-to-peak voltage comparison between the simulation and measurement.

Even though these two factors lead to the measured data deviating from the theory, the drain efficiency curve in Fig. 22(a) expresses the similar trend shown in Fig. 8, which suggests the improvement of back-off efficiency.

#### C. Comparison Between Simulations and Measurements

Although it is nearly impossible to obtain the full transformer EM model containing all metal layers (M1~M7) due to the limited computational resources, it is enlightening to perform the circuit simulation with an EM model that contains several key metal layers, (e.g., M1, M2, M6 and M7) which occupy large areas as in Fig. 14(b). Additionally, as the RF ground floor-plan has also been adjusted due to the integration method, the ground resistance is larger and has a severe influence on the performance. Fig. 23 compares the output peak-to-peak voltages between different model simulations and the measurement. Among these models, Model-A is the one with only the transformer coils; Model-B is the one with coils and shielding (M6, M7); Model-C is the one with coils and shielding as well as M1 and M2; Model-D is the one that contains all the items in Model-C plus the RF ground resistance (2  $\Omega$ ) and power supply resistance  $(0.03 \ \Omega)$ . Comparing with Fig. 8, the output peak-to-peak voltage decreases with the proposed integration method mainly due to the transformer loss, the floating shielding that has the discounted shielding efficacy, as well as the RF ground impedance. Correspondingly, the efficiency has also decreased, and the worst efficiency observed in the simulation is 28%.

The *raw* linearity characteristic of this RFDAC is severely affected by the discontinuities due to the switching of the transformer coils, as observed both in simulations and measurements. The discontinuities could be mitigated digitally using a mismatch DPD [23], or by exploiting the method proposed in [2] which achieves EVM = -36 dB for a 64-QAM BW = 20 MHz signal. As a comparison, in [27], for the same signal with a similar design but without any calibration, the measured EVM was -28 dB. A similar method could be incorporated here by adjusting the size of the switching transistors at the design stage and adding extra unit cells that are dynamically controlled to smooth out the code-to-AM and code-to-PM transfer characteristics.

The key performance summary and comparison are provided in Table II. As the next logical step in the evolution of migrating the matching network from off-chip to onchip, the active-under-coil integration of the matching-network passives on top of the RFDAC switching devices has shown

|                                    | TCAS18   | JSSC15     | JSSC17      | JSSC13  | TMTT14                        | JSSC16   | This Work |  |
|------------------------------------|----------|------------|-------------|---------|-------------------------------|----------|-----------|--|
|                                    | [16] Ba  | [17] Filho | [2] Hashemi | [3] Ye  | [4] Alavi                     | [8] Yuan |           |  |
| Matching network                   | off-chip |            |             | on-cl   | active-under-coil integration |          |           |  |
| CMOS Process (nm)                  | 40       | 28         | 40          | 65      | 65                            | 65       | 28        |  |
| Supply (V)                         | 1.0      | 0.9/1.8    | 0.5         | 1.2     | 2.4                           | 1.2/2.4  | 1.0/1.8   |  |
| Resolution (bits)                  | 10-Polar | 10-IQ      | 9-Polar     | 9-Polar | 13-IQ                         | 7-IQ     | 9-IQ      |  |
| Center Frequency (GHz)             | 0.93     | 1.0        | 2.2         | 2.2     | 1.8                           | 2.0      | 2.4       |  |
| Peak Pout (dBm)                    | 8.0      | 8.0        | 14.6        | 16.8    | 22.8                          | 20.5     | 17.5      |  |
| Peak Drain Efficiency (%)          | 45(PAE)  | -          | 43.8        | 24.5    | 42                            | 20       | 19        |  |
| 6-dB back-off Efficiency (%)       | 20*      | -          | 10*         | 15*     | 19                            | 9*       | 12.5      |  |
| Core Area (mm <sup>2</sup> )       | 0.72     | 0.25       | 0.45        | 1**     | 0.45                          | 1.1**    | 0.35      |  |
| Power Density (W/mm <sup>2</sup> ) | 0.009    | 0.025      | 0.064       | 0.048   | 0.423                         | 0.102    | 0.161     |  |
| Replica Suppression                |          |            | Yes         |         |                               |          |           |  |

TABLE II Performance Summary and Comparison

<sup>1</sup> \*Estimated from the measurement curves.

 $^2$  \*\*Estimated from the chip photo.

the advantage of achieving the smallest core area and very high power density. The proposed multi-port transformer maintains the 6-dB back-off at two-thirds of the peak efficiency. Moreover, this RFDAC has demonstrated for the first time the significant replica suppression in analog domain.

#### VI. CONCLUSION

This paper has demonstrated a 2.4 GHz  $2 \times 9$ -bit Cartesian RFDAC with  $8 \times$  analog linear interpolation. The analog linear interpolation, which is analytically described and verified with model simulation, can significantly suppress the replicas close to the main signal tone while avoiding the need for high-order and high-speed digital filters in DSP. The multi-port transformer is adopted in the matching network to keep the 6-dB back-off efficiency to be approximately two-thirds of that of the peak power. Active-under-coil integration is explored in the physical implementation, achieving a high RF power density among state-of-the-art RFDACs.

## ACKNOWLEDGMENT

The authors would like to thank Paul Hyland for administrative support, Cagri Cetintepe for technical discussions, Yiyu Shen for discussions on measurements, Erik Staszewski and Hao Zheng for help with lab measurements, TSMC University Shuttle for chip fabrication, Xilinx for donation of the Virtex FPGA evaluation board, and Cadence for EMX license.

#### REFERENCES

- R. B. Staszewski *et al.*, "All-digital PLL and transmitter for mobile phones," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2469–2482, Dec. 2005.
- [2] M. Hashemi, Y. Shen, M. Mehrpoo, M. S. Alavi, and L. C. N. de Vreede, "An intrinsically linear wideband polar digital power amplifier," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3312–3328, Dec. 2017.
- [3] L. Ye, J. Chen, L. Kong, E. Alon, and A. M. Niknejad, "Design considerations for a direct digitally modulated WLAN transmitter with integrated phase path and dynamic impedance modulation," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3160–3177, Dec. 2013.
- [4] M. S. Alavi, R. B. Staszewski, L. C. N. de Vreede, and J. R. Long, "A wideband 2 × 13-bit all-digital I/Q RF-DAC," *IEEE Trans. Microw. Theory Techn.*, vol. 62, no. 4, pp. 732–752, Apr. 2014.
- [5] H. Wang et al., "A highly-efficient multi-band multi-mode all-digital quadrature transmitter," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 5, pp. 1321–1330, May 2014.

- [6] Z. Hu, L. C. N. de Vreede, M. S. Alavi, D. A. Calvillo-Cortes, R. B. Staszewski, and S. He, "A 5.9 GHz RFDAC-based outphasing power amplifier in 40-nm CMOS with 49.2% efficiency and 22.2 dBm power," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, May 2016, pp. 206–209.
- [7] C. E. Lokin, R. A. R. van der Zee, D. Schinkel, and B. Nauta, "EMI reduction in class-D amplifiers by actively reducing PWM ripple," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 67, no. 3, pp. 765–773, Mar. 2020.
- [8] W. Yuan, V. Aparin, J. Dunworth, L. Seward, and J. S. Walling, "A quadrature switched capacitor power amplifier," *IEEE J. Solid-State Circuits*, vol. 51, no. 5, pp. 1200–1209, May 2016.
- [9] F. Raab, "Idealized operation of the class E tuned power amplifier," *IEEE Trans. Circuits Syst.*, vol. CAS-24, no. 12, pp. 725–735, Dec. 1977.
- [10] E. Roverato, M. Kosunen, J. Lemberg, K. Stadius, and J. Ryynanen, "RX-band noise reduction in all-digital transmitters with configurable spectral shaping of quantization and mismatch errors," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 11, pp. 3256–3265, Nov. 2014.
- [11] S. Spiridon *et al.*, "A 375 mW multimode DAC-based transmitter with 2.2 GHz signal bandwidth and in-band IM3 <-58 dBc in 40 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 7, pp. 1595–1604, Jul. 2013.
- [12] D. Jung, H. Zhao, and H. Wang, "A CMOS highly linear Doherty power amplifier with multigated transistors," *IEEE Trans. Microw. Theory Techn.*, vol. 67, no. 5, pp. 1883–1891, May 2019.
- [13] C. P. Yue and S. S. Wong, "On-chip spiral inductors with patterned ground shields for Si-based RF IC's," in *Proc. Symp. VLSI Circuits*, Kyoto, Japan, 1997, pp. 85–86.
- [14] C. H. Lee *et al.*, "A 2.7 GHz to 7 GHz fractional-N LC-PLL utilizing multi-metal layer SoC technology in 28 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 50, no. 4, pp. 856–866, Apr. 2015.
- [15] F. Kuo *et al.*, "A Bluetooth low-energy transceiver with 3.7-mW alldigital transmitter, 2.75-mW high-IF discrete-time receiver, and TX/RX switchable on-chip matching network," *IEEE J. Solid-State Circuits*, vol. 52, no. 4, pp. 1144–1162, Apr. 2017.
- [16] A. Ba et al., "A 1.3 nJ/b IEEE 802.11ah fully-digital polar transmitter for IoT applications," *IEEE J. Solid-State Circuits*, vol. 51, no. 12, pp. 3103–3113, Dec. 2016.
- [17] P. E. P. Filho, M. Ingels, P. Wambacq, and J. Craninckx, "An incremental-charge-based digital transmitter with built-in filtering," *IEEE J. Solid-State Circuits*, vol. 50, no. 12, pp. 3065–3076, Dec. 2015.
- [18] J.-W. Lai et al., "A 0.27 mm<sup>2</sup> 13.5 dBm 2.4 GHz all-digital polar transmitter using 34%-efficiency class-D DPA in 40 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, sec. 19.8, pp. 342–343.
- [19] M. Mehrpoo, M. Hashemi, Y. Shen, L. C. N. de Vreede, and M. S. Alavi, "A wideband linear *I/Q*-interleaving DDRM," *IEEE J. Solid-State Circuits*, vol. 53, no. 5, pp. 1361–1373, May 2018.
- [20] K.-F. Un, F. Zhang, P.-I. Mak, R. P. Martins, A. Zhu, and R. B. Staszewski, "Design considerations of the interpolative digital transmitter for quantization noise and replicas rejection," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 67, no. 1, pp. 37–41, Jan. 2020.
- [21] Y. Zhou and J. Yuan, "A 10-bit wide-band CMOS direct digital RF amplitude modulator," *IEEE J. Solid-State Circuits*, vol. 38, no. 7, pp. 1182–1188, Jul. 2003.

- [22] P. T. M. van Zeijl and M. Collados, "On the attenuation of DAC aliases through multiphase clocking," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 56, no. 3, pp. 190–194, Mar. 2009.
- [23] M. S. Alavi, J. Mehta, and R. B. Staszewski, *Radio-Frequency Digital-to-Analog Converters: Implementation in Nanoscale CMOS*, 1st ed. Amsterdam, The Netherlands: Elsevier, 2016.
- [24] Q. Zhu et al., "A digital polar transmitter with DC–DC converter supporting 256-QAM WLAN and 40-MHz LTE—A carrier aggregation," *IEEE J. Solid-State Circuits*, vol. 52, no. 5, pp. 1196–1209, May 2017.
- [25] S. D. Kee, I. Aoki, A. Hajimiri, and D. Rutledge, "The class-E/F family of ZVS switching amplifiers," *IEEE Trans. Microw. Theory Techn.*, vol. 51, no. 6, pp. 1677–1690, Jun. 2003.
- [26] T. S. D. Cheung and J. R. Long, "Shielded passive devices for siliconbased monolithic microwave and millimeter-wave integrated circuits," *IEEE J. Solid-State Circuits*, vol. 41, no. 5, pp. 1183–1200, May 2006.
- [27] S. Zheng and H. C. Luong, "A WCDMA/WLAN digital polar transmitter with low-noise ADPLL, wideband PM/AM modulator, and linearized PA," *IEEE J. Solid-State Circuits*, vol. 50, no. 7, pp. 1645–1656, Jul. 2015.



Feifei Zhang (Student Member, IEEE) received the B.Sc. degree in navigation guidance and control from the Beijing University of Aeronautics and Astronautics and the M.Sc. degree in microelectronics from the Beijing Embedded System Key Laboratory, Beijing University of Technology, in 2014. She is currently pursuing the Ph.D. degree with University College Dublin, Dublin, Ireland.

Her current research interest includes radio frequency digital-to-analog converters. He is currently with Skyworks. His research interests include high-efficiency digital transmitter architectures and power amplifier design. He has authored more than 70 articles in peer-reviewed journals and refereed conferences. He received the award for Outstanding Teaching at The University of Utah in 2015, the Excellence in Teaching award from HKN at Rutgers University in 2012, the Best Paper Award from Mobicom in 2012, the Yang Award for outstanding graduate research from the EE Department, University of Washington, in 2008, an Intel Preddoctoral Fellowship from 2007 to 2008, and the Analog Devices Outstanding Student Designer Award in 2006.



Anding Zhu (Senior Member, IEEE) received the Ph.D. degree in electronic engineering from University College Dublin (UCD), Dublin, Ireland, in 2004.

He is currently a Professor with the School of Electrical and Electronic Engineering, UCD. He has published more than 130 peer-reviewed journal and conference articles. His research interests include high-frequency nonlinear system modeling and device characterization techniques, high-efficiency power amplifier design, wireless transmitter archi-

tectures, digital signal processing, and nonlinear system identification algorithms.

Prof. Zhu is also an Elected Member of Microwave Theory and Techniques Society (MTT-S) Administrative Committee (AdCom), the Chair of the Electronic Information Committee, and the Vice Chair of the Publications Committee. He is also the Chair of the MTT-S Microwave High-Power Techniques Committee. He has served as the Secretary for MTT-S AdCom in 2018. He was the General Chair of the 2018 IEEE MTT-S International Microwave Workshop Series on 5G Hardware and System Technologies (IMWS-5G) and a Guest Editor of the IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES on 5G Hardware and System Technologies. He is also a Track Editor of the IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES and an Associate Editor of the *IEEE Microwave Magazine*.



**Peng Chen** (Member, IEEE) received the B.Sc. degree in electronics from the Huazhong University of Science and Technology, Wuhan, China, in 2012, the M.Sc. degree in microelectronic from TU Delft, Delft, The Netherlands, in 2014 (M.Sc. thesis done at the IMEC Holst Center in Eindhoven, The Netherlands), and the Ph.D. degree from University College Dublin, Ireland, in 2019. From 2017 to 2018, he was a visiting Research Assistant with the University of Macau. Since 2019, he has been a Post-Doctoral Researcher with Lund University. His

research interests include time-domain data converters, frequency synthesizers, and wideband receivers.



**Robert Bogdan Staszewski** (Fellow, IEEE) was born in Bialystok, Poland. He received the B.Sc. (*summa cum laude*), M.Sc., and Ph.D. degrees in electrical engineering from The University of Texas at Dallas, Richardson, TX, USA, in 1991, 1992, and 2002, respectively.

From 1991 to 1995, he was with Alcatel Network Systems, Richardson, involved in SONET crossconnect systems for fiber optics communications. He joined Texas Instruments Incorporated, Dallas, TX, USA, in 1995, where he was an elected Distin-

guished Member of Technical Staff (limited to 2% of technical staff). From 1995 to 1999, he was involved in advanced CMOS read channel development for hard disk drives. In 1999, he co-started the Digital RF Processor (DRP) group within Texas Instruments with a mission to invent new digitally intensive approaches to traditional RF functions for integrated radios in deeply scaled CMOS technology. He was appointed as a CTO of the DRP group from 2007 to 2009. In 2009, he joined the Delft University of Technology, Delft, The Netherlands, where he currently holds a guest appointment of Full Professor (Antoni van Leeuwenhoek Hoogleraar). Since 2014, he has been a Full Professor with the University College Dublin (UCD), Dublin, Ireland. He is also a Co-Founder of a startup company, Equal1 Labs, with design centers located in Silicon Valley and Dublin, Ireland, aiming to produce single-chip CMOS quantum computers. He has authored or coauthored five books, seven book chapters, 130 journal and 200 conference publications, and holds 200 issued U.S. patents. His research interests include nanoscale CMOS architectures and circuits for frequency synthesizers, transmitters and receivers, and quantum computers.

Prof. Staszewski was a recipient of the 2012 IEEE Circuits and Systems Industrial Pioneer Award. In May 2019, he received the title of Professor from the President of the Republic of Poland. He was also the TPC Chair of the 2019 European Solid-State Circuits Conference (ESSCIRC), Krakow, Poland.



Jeffrey S. Walling (Senior Member, IEEE) received the B.S. degree from the University of South Florida, Tampa, in 2000, and the M.S. and Ph.D. degrees from the University of Washington, Seattle, in 2005 and 2008, respectively.

Prior to starting his graduate education, he was employed with Motorola Solutions, Plantation, FL, USA, working in cellular handset development. He interned for Intel, Hillsboro, from 2006 to 2007, and was a Post-Doctoral Research Associate with the University of Washington from 2008 to 2010.

He was an Assistant Professor with the ECE Department, Rutgers University, New Brunswick, NJ, USA, from 2011 to 2012, and then an Assistant Professor and later an Associate Professor with the ECE Department, The University of Utah, from 2012 to 2018. He was the Head of Group (RF Transceivers) in the Microelectronic Circuits Centre Ireland, Tyndall National Institute, in 2019.