

Received 17 March 2023, accepted 5 April 2023, date of publication 10 April 2023, date of current version 13 April 2023. Digital Object Identifier 10.1109/ACCESS.2023.3265814

## **RESEARCH ARTICLE**

## An Integrated Real-Time FMCW Radar Baseband Processor in 40-nm CMOS

# MOHAN GUO<sup>®1</sup>, DIXIAN ZHAO<sup>®1</sup>, (Senior Member, IEEE), QISONG WU<sup>®1</sup>, (Member, IEEE), JIARUI WU<sup>2</sup>, DIWEI LI<sup>®1</sup>, AND PENG ZHANG<sup>1</sup>

<sup>1</sup>School of Information Science and Engineering, Southeast University, Nanjing 211189, China
<sup>2</sup>Purple Mountain Laboratories, Nanjing 211111, China

Corresponding author: Dixian Zhao (dixian.zhao@seu.edu.cn)

This work was supported in part by the National Key Research and Development Program of China under Grant 2019YFB1803000, and in part by the Major Key Project of Peng Cheng Laboratory (PCL) under Grant PCL2021A01-2.

**ABSTRACT** In this paper, a pipelined frequency-modulated continuous-wave (FMCW) radar baseband processor applied to real-time applications is proposed and implemented in 40-nm CMOS technology. The FMCW radar signal processing time is analyzed according to the system specifications. On the basis of the theoretical analysis and systematic considerations, a pipelined baseband architecture with internal single-port static random access memory (SRAM) is employed. The baseband processor is mainly composed of two-dimensional fast Fourier transform (2D-FFT), two-dimensional constant false alarm rate (2D-CFAR), digital beam-forming (DBF), and memory control modules. The 2D-FFT module is structured with a pipelined scheme and avoids the waste of data transferring time between modules. The 2D-CFAR module is programmable for different applications. The designed address control is proposed to depose the edge cells. The processor occupies a core chip area of  $3.353 \text{ mm} \times 3.353 \text{ mm}$  and has been tested on the personal computer (PC) and field programmable gate array (FPGA) platform. The power consumption and processing time are also analyzed and compared with other works. The processor consumes 55.65 mW, including SRAMs. The processing time is 12.67 ms with the maximum window size and 256 targets when operating at 125 MHz. This time is estimated based on the assumption that each chirp lasts for 0.04096 ms, and data input takes 10.48 ms. Within this period, the range FFT is completed. The Doppler FFT, 2D-CFAR with the maximum window size, and DBF with 256 targets require 0.80 ms, 1.16 ms, and 0.23 ms respectively.

**INDEX TERMS** Baseband processor, CFAR, DBF, FFT, FMCW radar.

## I. INTRODUCTION

Currently, autonomous driving technology has been paid sufficient attention by both academic and industrial research communities. It is critical for autonomous vehicles to perceive the surrounding environment and other moving entities to protect vulnerable road users. Autonomous vehicles acquire a semantic understanding of the scenes through various sensors to avoid unnecessary evasive/emergency brake maneuvers for harmless objects. The Radar sensor is regarded to be one of the key components in autonomous driving, owing to its robustness to low-light conditions and severe

The associate editor coordinating the review of this manuscript and approving it for publication was Harikrishnan Ramiah<sup>(D)</sup>.

weather. The millimeter-wave (MMW) radar with linear frequency modulated continuous wave (LFMCW) has become a prevailing trend in autonomous driving applications since it has the capability of distinguishing multiple targets due to high range resolution and extracting the object's motion characteristics for classification.

Most of the radar algorithms are often implemented on software or FPGA platforms, whose processing time is constrained by the universal processor resources and has high power consumption [1], [2], [3]. In the FMCW radar system, an FMCW waveform is transmitted from the transceiver, and the echo waveform, which is reflected from the illuminated targets, is converted into an intermediate frequency (IF) signal by utilizing a mixer with the transmitted template waveform [4], [5]. By implementing digital signal processing on the IF signal, the location and velocity of the object can be detected [6]. The proper architecture and data flow need to be explored for real-time application. The FFT module, which is a widely-used signal processing tool, is used to perform the signal analysis [7]. At the same time, maintaining a low false alarm probability is an essential goal in the FMCW radar system. CFAR processor utilizes an adaptive threshold to detect the target within the clutter noise. Cell-averaging CFAR (CA-CFAR) is the most commonly used algorithm in a homogeneous environment, and many modified CFAR algorithms are proposed to adapt to different environments. However, most of them increase the hardware complexity and processing time [8].

The radar signal processor implemented on FPGA brings forward a prototype that focuses on real-time processing [9]. A miniaturized FMCW radar system is developed to implement short-range detection based on a high-performance DSP chip [10]. A miniature FMCW SAR system is implemented, which can realize real-time processing in 2.18 s [11]. The work realizes a processing architecture that can utilize the hardware resources fully and decreases the processing time for over 60 ms [12]. However, FGPA and DSP based processing time still limit its applications. Thus, it is quite necessary to develop a novel high-speed real-time processor.

The 2D target detection scheme in [13] utilizes two single-channel FMCW radars to calculate the 2D target profile. This solution can realize multi-track detection, but has high requirements for device and space adjustments. To realize multi-target 3D detection, the FMCW digital signal processor contains range-Doppler FFT, a peak detection module, and a direction detection module [14]. However, the partition of different functions and the connection with the required specifications have not been clearly stated.

FFT for multi-channel FMCW radar systems must be efficient and have high throughput [15]. A flexible computing unit optimized for FFT has been proposed in [16]. Conventional FFT processors mostly focus on the bit reversal circuit and the algorithm [17], [18], [19]. In the FMCW radar processor, the whole processing time should be less than the chirp interval in the front end, which is largely dependent on the speed of FFT data transmission and processing. FFT processor occupies the biggest area in the whole system as well, which poses extra demand for area-saving for this module.

A design of one-dimensional CFAR circuit is introduced in [20]. However, it is not suitable for two-dimensional matrix detection. 2D-CFAR has been proposed in [21] and it has a fixed window to realize fast processing. In order to adapt to different detection environments, flexible CFAR detection is needed.

ASIC implementation of the range, speed, and direction detection in one processor can consume less power and achieve higher speed, which is a potential trend for future radar systems. Taking the whole system's parameters and



FIGURE 1. IF signal after front-end processing.

conventional computation capability into consideration, the entire FMCW baseband processor composed of 2D-FFT, 2D-CFAR, DBF, and internal memory is proposed and implemented on an integrated chip in this article. FFT data flow is designed according to the proposed architecture. In order to fit different situations, the CFAR detection unit in this paper has a programmable window size and can realize pipelined processing. The detection of direction of arrival (DOA) by processing the FFT phase is limited by the number of antennas [22]. It is assumed that one transmitting antenna and four receiving antennas are utilized in this paper. In order to avoid multiplier units, digital beam forming based on the Cordic algorithm is adopted.

This paper describes the design and implementation of a CMOS-based FMCW radar processor, with a focus on system considerations and chip development. Section II discusses the algorithms and the system considerations, and proposes the system architecture. Section III describes the detailed circuit implementation on the chip. Section IV presents the chip measurement results.

## II. BASEBAND PROCESSOR CONSIDERATIONS AND ARCHITECTURE

Fig. 1 shows the FMCW front-end processing scheme. In the FMCW radar system, the transceiver generates a chirp waveform with a wide bandwidth to acquire high-range resolution. The echoed frequency modulated wave is transformed to IF signal by FMCW front-end [23]. The IF signals contain the range, velocity, and angle information and are processed by the baseband processor after the ADC sampling. After passing through the mixer and filter operators, the IF signal can be represented as:

$$S_{IF}(t) = \frac{A_{RF} \times A_{LO}}{2} \times \{ \cos \left[ 2\pi \left( f_{RF} - f_{LO} \right) t + \left( \varphi_{RF} - \varphi_{LO} \right) \right] \} \\ = \frac{A_{RF} \times A_{LO}}{2} \left[ \cos \left( 2\pi f_{IF} t + \Delta \varphi \right) \right]$$
(1)

where  $A_{RF}$ ,  $f_{RF}$  and  $\varphi_{RF}$  denote the amplitude, frequency, and phase of the received signal respectively.  $A_{LO}$ ,  $f_{LO}$  and  $\varphi_{LO}$ 

are the amplitude, frequency and phase of the transmitted waveform,  $f_{IF}$  is the beat frequency related with the range of the observed targets and  $\Delta \varphi$  is the phase difference. The range of the target can be calculated according to the IF signal frequency as:

$$R = \frac{f_{IF} \cdot c}{2S} \tag{2}$$

where *R* denotes the range of the object, *S* the slope of the chirp, and *c* the speed of the electromagnetic wave. Thus, FFT can be utilized to extract the range from the beat frequency.

The movement of the object will cause a different time difference in two consecutive received chirp signals, and thus can be represented both by the frequency and phase difference of IF signals. Although the frequency of the IF signal includes the velocity information, velocity estimation based on the IF spectrum is not feasible because of its low precision. The frequency difference is  $\Delta f = 2S \Delta d/c$  and the phase difference is  $\Delta \Phi = 4\pi \Delta d/\lambda$ , where  $\Delta d$  is the object displacement between two chirps and  $\lambda$  is the wave length. Taking 76-77 GHz FMCW radar for example, if the object moves 1mm in a second and a chirp lasts for 50  $\mu s$ , there would be a 267 Hz frequency difference, which is difficult to recognize due to 0.013 cycle in a 50  $\mu s$  sweep time. However, the phase difference would be  $\pi$ . Thus, the velocity can be calculated as:

$$V = \frac{\lambda \Delta \Phi}{4\pi T_c} \tag{3}$$

where  $T_c$  denotes the time between two chirps. The phase difference can be obtained by distinguishing the peak of Doppler FFT.

There are some important specifications for FMCW radar, which are decided by the front end and should be taken into consideration when designing the processor. For the sake of description, 76-77 GHz FMCW radar with  $T_c = 50 \ \mu s$  is specified. The maximum detectable range is limited by the ADC and the chirp. According to [24], the maximum range decided by waveforms can be represented as  $c \cdot T_c/2 = 7.5$  km However, the ADC sampling rate also limits the detectable range as  $F_s c/2S = F_s cT_c/2B = 18.75 \ m$ , taking  $F_s = 5$  MHz for example. Thus, the ADC sampling rate and pulse width should be specified according to the maximum detectable range.

The maximum detectable velocity is determined by the interval between the chirps  $(T_c)$  and the minimum discernible distance is determined by the frame length  $(T_f)$  as

$$v_{\max} = \frac{\lambda}{4T_c} \tag{4}$$

$$v_{\rm res} = \frac{\lambda}{2T_f} \tag{5}$$

In the FMCW system, one key challenge to realizing real-time processing is arranging the soft-core and hard-core processing with the limitation in the front-end transceiver. The object range information can be extracted in one chirp. However, the Doppler FFT must be implemented on multiple



FIGURE 2. CFAR detection window.



FIGURE 3. Receiving signals from an angle.

chirps to realize velocity estimation. Thus, the dataset with multiple chirps is required to be stored before the 2D FFT operation. Six single-port static SRAMs are used in this design. The whole transmission time is  $T_f$ , which means the signal processing time must be less than  $T_f$  to refresh the data.  $T_f$  is decided by the velocity resolution and bandwidth as mentioned above. Taking the 76-77 GHz FMCW radar system for example, the transmission time is 50 ms with 0.04 m/s velocity resolution. Thus, the whole processing must be implemented in 50 ms.

To detect the power peak after FFT, the CFAR algorithm is used, as shown in Fig. 2. In this work, 2D-CFAR is utilized to process the 2D FFT matrices. 2D-CFAR aims to maintain the alarming rate by recognizing the peak in the 2D-FFT matrices. The cell under test (CUT) is compared with the window sum to evaluate the amplitude. Guard units are designed to avoid the influence of the target shape. CA-CFAR is applied in this proposed system to adapt to general applications, It sums the window units as a comparison with CUT and is suitable for a homogeneous environment.

In this article, one antenna is used for transmitting, while four antennas are used for receiving, as shown in Fig. 3. The interspace of the receiving antennas d is 0.002 m in the figure (half of the wavelength in 76-77 GHz radar). In this work, DBF is utilized to estimate the angle by extracting the corresponding phases of the detected target across four receivers.

The data processing flow is shown in Fig. 4. To perform high-speed processing and satisfy the refresh time, a specific pipelined FMCW radar baseband processor architecture is



FIGURE 4. Baseband processor data flow.



FIGURE 5. Baseband processor architecture.

proposed. The baseband processor performs FFT on IF signals to obtain the range-chirp spectrum. The spectrum is then Fourier transformed again to obtain the Doppler spectrum. The 2D-FFT matrices are stored in the SRAMs. 2D-CFAR is implemented to acquire the distance and velocity of the target in the spectrum matrix. The data of four channels streaming into the baseband are sampled and need to be processed and stored separately in four single-port SRAMs. The targets are recognized in the four SRAMs and the addresses are read into the DBF module to measure the angles.

The baseband architecture is shown in Fig. 5. Since the use of single-port SRAMs, the data stored can not be accessed simultaneously. The data processing speed is limited by the four SRAMs. To realize pipelined data flow, the proposed architecture is memory-based. The data of four channels after range FFT is stored in four SRAMs and the Doppler FFT data replacing them after the 2D-FFT is completed to save half of the storage. A valid signal is sent to the 2D-CFAR module to start the processing. The 2D-CFAR module is composed of seven 1D-CFAR modules. It reads data from the four FFT SRAMs and stores the addresses of objects in a separate CFAR SRAM. Then DBF can access the CFAR SRAM to get the location of the targets. The angle can be

36044

calculated by processing the data from the FFT SRAMs of the targeted location. This architecture optimizes memory usage and allows for efficient real-time processing of the data stream.

With the internal SRAMs, the baseband processor integrates the entire data processing part and can accomplish the process within the RF refreshing time of 50 ms to realize realtime processing.

## **III. CHIP IMPLEMENTATION**

### A. TWO DIMENSIONAL FAST FOURIER TRANSFORM

The concise overview of the 2D-FFT circuit is shown in Fig. 6. The 2D-FFT module can be divided into two parts: the FFT controller and the FFT calculator. The FFT controller communicates with other modules and implements the address mapping to realize radix-2 FFT. The 10 bits of data from ADC are windowed by Hamming window (ham0 to ham255) and multiplied according to the pipelined data flow. The Hamming window function coefficients are stored in the registers for quick and frequent access. The ADDR\_TRA in the figure represents the address transform part and realizes the FFT address mapping only by connecting wires, which consumes a smaller area and has less latency. The avg\_ram0 to avg ram3 of 32 bits are the average values of the four 2D-FFT matrices by summing up all the data and truncating. They are transmitted to the 2D-CFAR module to compensate for the edge cell. Meanwhile, the clock-gating technique is utilized in this design to reduce power consumption. For instance, the operation registers in the multipliers for windowing are clock-gated by input enabling signal from ADC in the range FFT control module.

Fig. 7 shows the memory access schedule. As there are four SRAMs utilized for storing the FFT data, the FFT operation needs to be carried out on four matrices. The four SRAMs can only be read or written by one module at a time, thus the read-and-write operation can be a parallel process. The pipelined schedule in the 2D-FFT controller saves a lot of time in memory accessing operations.

The FFT calculator is shown in Fig. 8. The twiddle factors are stored in the registers. The FFT calculator is composed of 128 butterfly units in the proposed system. In this article, there are 256 points sampled for the range FFT and 128 chirps for Doppler FFT. To calculate 256-point FFT, 8 stages with 128 butterfly calculator units are needed. The proposed butterfly unit is implemented with 4 real multipliers utilizing the Booth algorithm. Since the FFT block operates on a fixed-point number with a word length of 48 bits (24 bits real part and 24 bits imaginary part), the multiplier input widths are 24 bits and 12 bits, and the output widths are 36 bits. The output of every butterfly unit has a length of 37 bits and is truncated to 24 bits using the method of rounding toward zero. At the last stage, the output data of 24 bits are truncated to 16 bits for the real part and imaginary part respectively. The address map module is reused and fits different stages. The register file stores the original data from ADC and is refreshed



FIGURE 6. FFT Implementation circuit.

|               |          |          |         |        | 1       |        |          |
|---------------|----------|----------|---------|--------|---------|--------|----------|
| сік ПЛЛ/Л     |          |          |         |        |         |        |          |
| RAM0 - (READ) | (WRITE)  | <u> </u> |         |        | I       |        | i        |
| RAM1          | ( READ ) |          | (WRITE) |        | <br>    |        |          |
| RAM2 —        |          | I        | READ    |        | (WRITE) |        |          |
| RAM3          |          |          |         |        | READ    |        | (WRITE)- |
| PROCESS       |          | (RAM1)   | <br>    | (RAM2) | <br>    | (RAM3) |          |

FIGURE 7. FFT timing schedule.



FIGURE 8. FFT calculator and register file.

TABLE 1. CFAR window calculating.

| Back Window                                                                         | Detection Unit | Front Window                                               |
|-------------------------------------------------------------------------------------|----------------|------------------------------------------------------------|
| $\begin{array}{c} a_0 = x_0 + x_1 \\ + \dots + x_{l-1} \end{array}$                 | $x_{l+g}$      | $a_{l+2g+1} = x_{l+2g+1} + x_{l+2g+2} + \dots + x_{2l+2g}$ |
| $a_1 = a_0 - x_0 + x_l$                                                             | $x_{l+g+1}$    | $a_{l+2g+2} = a_{l+2g+1} - x_{l+2g+1} + x_{2l+2g+2}$       |
|                                                                                     |                |                                                            |
| $\begin{bmatrix} a_{i-l-g} = a_{i-l-g-1} \\ -x_{i-l-g-1} + x_{i-g-1} \end{bmatrix}$ | $x_i$          | $a_{i+g+1} = a_{i+g}$ $-x_{i+g} + x_{i+l+g}$               |
|                                                                                     |                |                                                            |





after every stage. Thus the registers used are 7/8 fewer than traditional module reuse. squared magnitude f is input into the resp from four channels.

## B. TWO DIMENSIONAL CONSTANT FALSE ALARM RATE

2D-CFAR module realizes peak recognition by summing up the magnitude of each cell and making comparisons. It reads 32 bits of data from the four SRAMs respectively, which

VOLUME 11, 2023

include 16 bits of the imaginary part and 16 bits of the real part. Two  $16 \times 16$  multipliers are utilized to calculate the squared magnitude for every channel. Then the summed data is input into the responding processing flow. There are data from four channels. In the 2D-CFAR module, the squared magnitudes of four channels are summed up. Thus the cell under test is also squared and scaled according to the window size before comparison. To be suitable for different shapes of objects and to have high reliability, the 2D-CFAR module

#### TABLE 2. Programmable 2D-CFAR setting.

| CFAR Mode | Window | Guard Units |
|-----------|--------|-------------|
| 1         | 3×3    | 0×0         |
| 2         | 5×5    | 0×0         |
| 3         | 5×5    | 1×1         |
| 4         | 7×7    | 0×0         |
| 5         | 7×7    | 1×1         |
| 6         | 7×7    | 2×2         |

in the proposed processor is composed of seven 1D-CFAR calculators."

The 1D-CFAR processor is shown in Fig. 9. The 1D-CFAR architecture is arranged on the basis of memory accessing and realizes the pipelined processing. As shown in Table 1, the sum of the back window and front window in a row can be expressed by the preceding results and the specific units. l is the length of a row in the window and g is the guard length. Thus the pipelined data flow can be realized by utilizing the delayed output.

There are countless application scenarios for FMCW radar. To be applied in different environments, the 2D-CFAR processing must take into account the size of the object and the noise level. The 2D-CFAR processor proposed in this system has several settings applying to different situations, which are shown in Table 2. The threshold coefficient k can be adjusted from 2 to 256, with 16 being the default value for normal situations.

The programmable CFAR circuit is shown in Fig. 10. To realize the flexible 2D-CFAR, seven 1D-CFAR modules are designed for the maximum window size. Only half of the control signals are represented in the figure due to the symmetry of the window. Among them,  $sel_0$  to  $sel_3$  are the enable signals for different 1D-CFAR units,  $l_0$  to  $l_3$  the length of the row,  $g_0$  to  $g_3$  the number the guard units and  $en_0$  to  $en_3$  control the calculation of the center unit. Since the background length l and guard length g can be selected, seven 1D-CFAR modules can form different detection settings as represented above. The 2D-CFAR window size can be set by enabling different 1D-CFAR modules.

The reading control circuit is shown in Fig. 11. In the figure, *cnt\_read\_0* to *cnt\_read\_2* are the 1D-CFAR reading control signals, and *cnt\_read* the reading enable signals for the 1D-CFAR units. The different bits in the *cnt\_read* enable the address counter and reading schedule in different 1D-CFAR units. Thus, the data can be accessed in sequence in different modes.

When it comes to the reading address regulation, the start addresses for the starting rows in the window and the void addresses are managed, as shown in Fig. 12. Among the signals, *cnt\_begin\_0* to *cnt\_begin\_3* are the starting addresses



FIGURE 10. The circuit implements different modes.

for the first row to the fourth row of the window, and *zero\_num\_4* to *zero\_num\_6* the disabled counting numbers for the fifth row to the seventh row of the window. The first few rows of the window are empty when the process starts. The reading address is enabled when the counting ends and starts from the appointed number for different rows. Thus the ram access schedule is managed. The data is transferred to the specific 1D-CFAR module by sequence according to the designed counter to fit the window size and thus the pipelined processing is realized with the window moving. To improve accuracy, the empty unit is filled up with the average value in the SRAM as the average noise level.

The result of the 2D-CFAR module is the location of targets in the 2D-FFT matrices, which has a word length of 16 bits. The target's location is written into the  $1024 \times 16$  SRAM, where 1024 targets can be stored for most.

#### C. DIGITAL BEAM FORMING

ω

DBF module is designed to obtain the angle of the object. It reads the target address of 16 bits from the CFAR SRAM and finds the original data from FFT SRAMs. With the four receiving antennas in the system, the angle can be calculated by processing the phase difference.

$$\rho = \frac{2\pi d \sin \theta}{\lambda} \tag{6}$$

$$\theta \epsilon \left[ -\frac{\pi}{2}, \frac{\pi}{2} \right] \Rightarrow \omega \epsilon [-\pi, \pi]$$
 (7)

With compensations in phase corresponding to the angle, the sum of the four signals S reaches its maximum.

$$\omega_{0} = \{\omega \mid \max\{|S|\}\} = \left\{\omega \mid \max\{|S_{1}+S_{2} \cdot e^{j\omega}+S_{3} \cdot e^{j2\omega}+S_{4} \cdot e^{j3\omega}|\}\right\}$$
(8)



FIGURE 11. The memory access circuit.

The multiplier is a basic unit with a large area and power consumption in a digital chip and can be avoided by adopting the Cordic algorithm in DBF. As shown in Fig. 13, the phase compensation from  $P_1$  to  $P_n$  is realized by iterations. The angle moved in each iteration corresponds to binary displacement as illustrated in the equation. Thus it is easier to be implemented in circuits and avoid multipliers.

$$\begin{cases} I_2 = I_1 \cos \theta_1 - Q_1 \sin \theta_1 = \cos \theta_1 (I_1 - Q_1 \tan \theta_1) \\ Q_2 = Q_1 \cos \theta_1 + I_1 \sin \theta_1 = \cos \theta_1 (Q_1 + I_1 \tan \theta_1) \end{cases}$$
(9)

$$\tan \theta_i = 2^{-i} \tag{10}$$

As shown in the equations. the multiplication in DBF can be implemented by shifting and adding.

$$\begin{cases} I_n = I_1 \cos \theta - Q_1 \sin \theta \\ = \prod_{\theta_i} \cos \theta_i \left( I_{n-1} - Q_{n-1} \times 2^{-i} \right) \\ Q_n = Q_1 \cos \theta + I_1 \sin \theta \\ = \prod_{\theta_i} \cos \theta_i \left( Q_{n-1} + I_{n-1} \times 2^{-i} \right) \end{cases}$$
(11)

After the valid signal from the 2D-CFAR module, the DBF module reads the 16 bits location data in the CFAR SRAM.



FIGURE 12. The reading address controller.



FIGURE 13. Cordic algorithm.

By reading the 32 bits of data (16 bits real part and 16 bits imaginary part) in these addresses from the four FFT SRAMs, the angle can be calculated. The DBF module is composed of a DBF controller and DBF calculators. The DBF controller manages the data stream and directs it to different calculators to realize pipeline processing. The calculator calculates the equation and obtains the amplitude both by CORDIC rotation. And the angle of 8 bits corresponding to the maximum amplitude is selected and stored in the DBF SRAM with a size of  $1024 \times 8$ .

The circuit avoids the usage of multiplier units and thus needs a smaller area and less power, as shown in Fig. 14. For



FIGURE 14. DBF implementation circuit.





FIGURE 16. Testing platform.

FIGURE 15. Area of different parts.

traditional DBF design with the same resolution, there are  $540(180 \times 3)$  multiplier units to meet the same processing speed, which is much bigger.

### **IV. MEASUREMENT RESULTS**

The proposed FMCW radar processor is implemented on 40-nm CMOS with a 1.1 V and a 3.3 V voltage supply. The die size is  $3.353 \text{ mm} \times 3.353 \text{ mm}$ . The area distribution of this processor is shown in Fig. 15. 2D-FFT block occupies 65% of the area, which is the biggest module in this design. The six SRAMs account for 24% of the processor's area, followed by 2D-CFAR occupying 7% and DBF occupying 4%. The top-level and connections are excluded from the analysis due to their relatively small area.

There are 208 IO pads in total, including 151 signal pads, 16 1.1 V power ground pairs, 12 3.3 V power ground pairs, and a power on control (POC) pad. There are more signal pads than necessary due to testing purposes. The power consumption operating on 125 MHz is 55.65 mW, including four  $16384 \times 32$  SRAMs for 2D-FFT, a  $1024 \times 16$  SRAM for 2D-CFAR, and a  $1024 \times 8$  SRAM for DBF.

The testing platform is shown in Fig. 16, with the layout of the chip. While we were not able to perform online testing with a real front-end, we tested the processor with extensive

simulations. A 76-77 GHz FMCW radar front-end is simulated in MATLAB to mimic the real world. By utilizing the tools in MATLAB, the data is validated at critical points according to the theory. Meanwhile, the chirp duration is also taken into consideration as we discussed in Section II. It should be noted that the results obtained from these simulations may have some limitations compared to testing with a real front-end. However, these simulations provide valuable insights into the processor's performance and serve as a solid foundation for future research and development. To verify the design and test the chip, the Xilinx VCU118 Board is utilized to manage the data transmission. Several targets with specified ranges and velocities are simulated in a homogeneous environment with Gauss noise. After the front-end simulation in MATLAB, the processed data is serially transmitted to the FPGA by the UART interface at 115200 bps and stored in the FPGA. By clicking the button on the FPGA, the data stored is transmitted to the chip through an FMC connector. FPGA also provides the 125 MHz system clock through the FMC connector. We reserved several pads for notifying and controlling purposes in the chip. Hence, there will be a light on after each processing and before clicking the buttons on the PCB. By clicking the sending button on the PCB after processing, the chip transfers the output data to the FIFO in FPGA. FPGA sends the data to the PC through UART. Also, the input pads of modes and CFAR threshold coefficients are reserved in this chip. The switches in the PCB are connected to these pads to control the modes and coefficients.

The data are tested with different modes on the platform. A set of results are shown in Fig. 17. and the chip outputs



**FIGURE 17.** (a)13 targets were detected with mode 3, CFAR coefficient of 16. (b)13 targets were detected with mode 5, CFAR coefficient of 16. (c)13 targets were detected with mode 6, CFAR coefficient of 16.

simulated in the homogeneous background are plotted. It can be seen that all targets can be detected with these modes. There are more detection results for one target for mode 5 and mode 6 due to the larger window sizes. With larger window



FIGURE 18. Detection performance in different backgrounds.

| TABLE 3. Performance compariso |
|--------------------------------|
|--------------------------------|

| Reference            | This work | [25]  | [2]                  | [26]    |
|----------------------|-----------|-------|----------------------|---------|
| Technology           | ASIC      | MCU   | FPGA                 | GPU     |
| Processing Time (ms) | 12.67     | ≥11.3 | 1180 every 64 images | 6133.49 |
| Power (mW)           | 55.65     | /     | 3000                 | /       |
| Frequency (MHz)      | 125       | <120  | 120                  | 1020    |
| Range&Velocity       | yes       | yes   | yes                  | yes     |
| DOA                  | yes       | no    | no                   | no      |

size and more guard units, the outer shape of the target (which occupies more than one cell) is also detected as a target. This requires future processing as clustering.

The detection performances are compared in Fig. 18. The signal noise ratio (SNR) is the ratio of target signal power to noise power. The detection performance is defined as the ratio of detected targets to real targets with a false alarm rate under 0.001. Mode 3, mode 5, and mode 6 have the best detection performance for their guard units, especially when detecting large-scale targets. Mode 1 has the lowest detection rate for missing large-scale targets. However, it still has the advantage of distinguishing adjacent objects. Meanwhile, smaller window sizes decrease power consumption by enabling fewer 1D-CFAR units. Thus, with appropriate settings, the processor can be applied to different circumstances.

The presented processor in this article realizes the lowpower real-time design compared with FPGA or software implementation, as shown in Table 3. The processing time of this work is 12.67 ms from the first data input to the last DBF output. This time is calculated based on the assumption that every chirp lasts for 0.04096 ms and the time for inputting data is 10.48 ms. The range FFT can be completed in the period. The Doppler FFT, 2D-CFAR with maximum window size, and DBF with 256 targets consume 0.80 ms, 1.16 ms, and 0.23 ms respectively. While the input streaming time is constrained by the front end and the range FFT processing can be fully covered by the sampling processing time, the time required for pure digital processing is 2.19 ms. With this faster digital processor, the whole FMCW system can realize real-time processing and refresh in high demand. Meanwhile, the ASIC design in this work has less power consumption and a smaller size compared with other works. It could be a superiority in portable and lightweight radar systems.

## **V. CONCLUSION**

The pipelined FMCW baseband processor architecture is proposed and implemented in 40-nm CMOS for real-time low-power applications. The novel processor architecture with internal SRAM can realize the pipelined data flow. The 2D-FFT implementation realizes pipeline processing by designed scheduling with memory on-chip and consumes less area by recycling the register files. Pipelined 2D-CFAR is designed to fit different situations. With the designed controlling circuit and memory accessing schedule, the window size can be adjusted to different circumstances. DBF can calculate the angle with the data transmitted from CFAR SRAM and FFT SRAMs. The power and processing time are drastically decreased compared with former processors which are mostly implemented on FPGA. This is a new commitment to radar ASIC to meet the increasingly high-speed low-power requirements.

### REFERENCES

- Y.-T. Im, J.-H. Lee, and S.-O. Park, "A pulse-Doppler and FMCW radar signal processor for surveillance," in *Proc. 3rd Int. Asia–Pacific Conf. Synth. Aperture Radar (APSAR)*, Sep. 2011, pp. 1–4.
- [2] C. J. Cochrane, K. B. Cooper, S. L. Durden, R. Rodriguez Monje, and R. J. Dengler, "An FPGA-based signal processor for FMCW Doppler radar and spectroscopy," *IEEE Trans. Geosci. Remote Sens.*, vol. 58, no. 8, pp. 5552–5563, Aug. 2020.
- [3] M. Ciesielski, K. Stasiak, M. Khyzhniak, K. Jedrzejewski, M. Zywek, and S. Brawata, "Simultaneous signal processing with multiple coherent processing intervals in FMCW radar for drone detection," in *Proc. 21st Int. Radar Symp. (IRS)*, Jun. 2021, pp. 1–8.
- [4] T. Ma, W. Deng, Z. Chen, J. Wu, W. Zheng, S. Wang, N. Qi, Y. Liu, and B. Chi, "A CMOS 76–81-GHz 2-TX 3-RX FMCW radar transceiver based on mixed-mode PLL chirp generator," *IEEE J. Solid-State Circuits*, vol. 55, no. 2, pp. 233–248, Feb. 2020.
- [5] W. Deng, R. Wu, Z. Chen, M. Ding, H. Jia, and B. Chi, "A 35-GHz TX and RX front end with high TX output power for Ka-band FMCW phasedarray radar transceivers in CMOS technology," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 28, no. 10, pp. 2089–2098, Oct. 2020.
- [6] R. R. Monje, K. B. Cooper, R. J. Dengler, C. J. Cochrane, S. L. Durden, A. Tang, and M. Choukroun, "Long range-Doppler demonstration of a 95 GHz FMCW radar," in *Proc. 15th Eur. Radar Conf. (EuRAD)*, Sep. 2018, pp. 1429–1432.
- [7] M. Lehne and S. Raman, "A discrete-time FFT processor for ultrawideband OFDM wireless transceivers: Architecture and behavioral modeling," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 57, no. 11, pp. 3011–3022, Nov. 2010.
- [8] J. Abdullah and M. S. Kamal, "Multi-targets detection in a nonhomogeneous radar environment using modified CA-CFAR," in *Proc. IEEE Asia–Pacific Conf. Appl. Electromagn. (APACE)*, Nov. 2019, pp. 1–5.
- [9] J. Saad, A. Baghdadi, and F. Bodereau, "FPGA-based radar signal processing for automotive driver assistance system," in *Proc. IEEE/IFIP Int. Symp. Rapid Syst. Prototyping*, Jun. 2009, pp. 196–199.
- [10] Y.-B. Li, F. Su, H.-L. Peng, M.-M. Li, and W.-H. Li, "A miniaturized 60 GHz FMCW radar for short range and high precision detection," in *Proc. Int. Conf. Microw. Millim. Wave Technol. (ICMMT)*, Sep. 2020, pp. 1–3.

- [11] W. Chang and X. Li, "Miniature high resolution FMCW SAR system," in Proc. CIE Int. Conf. Radar (RADAR), Oct. 2016, pp. 1–4.
- [12] H. P. Z. Tan, W. Tian, and C. Hu, "Rapid preprocessing of FMCW GB-SAR echo based on FPGA," in *Proc. IEEE Int. Conf. Imag. Syst. Techn. (IST)*, Oct. 2017, pp. 1–5.
- [13] S. Hamidi and S. Safavi-Naeini, "Single channel mmWave FMCW radar for 2D target localization," in *Proc. IEEE 19th Int. Symp. Antenna Technol. Appl. Electromagn. (ANTEM)*, Aug. 2021, pp. 1–2.
- [14] S. Kumar and A. A. B. Raj, "Design of X-band FMCW radar using digital Doppler processor," in *Proc. Int. Conf. Syst., Comput., Autom. Netw.* (ICSCAN), Jul. 2021, pp. 1–5.
- [15] S.-J. Huang and S.-G. Chen, "A high-parallelism memory-based FFT processor with high SQNR and novel addressing scheme," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2016, pp. 2671–2674.
- [16] T. Styles and L. Wildman, "An optimised processor for FMCW radar," in Proc. 11th Eur. Radar Conf., Oct. 2014, pp. 497–500.
- [17] S.-G. Chen, S.-J. Huang, M. Garrido, and S.-J. Jou, "Continuous-flow parallel bit-reversal circuit for MDF and MDC FFT architectures," *IEEE Trans. Circuits Syst. 1, Reg. Papers*, vol. 61, no. 10, pp. 2869–2877, Oct. 2014.
- [18] N. Le Ba and T. T.-H. Kim, "An area efficient 1024-point low power radix-2<sup>2</sup> FFT processor with feed-forward multiple delay commutators," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 10, pp. 3291–3299, Oct. 2018.
- [19] B. G. Jo and M. H. Sunwoo, "New continuous-flow mixed-radix (CFMR) FFT processor using novel in-place strategy," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 52, no. 5, pp. 911–919, May 2005.
- [20] H. E. Benseddik, B. Cherki, M. Hamadouche, and A. Khouas, "FPGAbased real-time implementation of distributed system CA-CFAR and clutter MAP-CFAR with noncoherent integration for radar detection," in *Proc. Int. Conf. Multimedia Comput. Syst.*, May 2012, pp. 1093–1098.
- [21] J. Yan, X. Li, and Z. Shao, "Intelligent and fast two-dimensional CFAR procedure," in *Proc. IEEE Int. Conf. Commun. Problem-Solving (ICCP)*, Oct. 2015, pp. 461–463.
- [22] C.-W. Liu, J.-Y. Wu, and K.-C. Huang, "A low latency NN-based cyclic Jacobi EVD processor for DOA estimation in radar system," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, Oct. 2020, pp. 1–5.
- [23] T. Mitomo, N. Ono, H. Hoshino, Y. Yoshihara, O. Watanabe, and I. Seto, "A 77 GHz 90 nm CMOS transceiver for FMCW radar applications," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 928–937, Apr. 2010.
- [24] J. Lee, Y.-A. Li, M.-H. Hung, and S.-J. Huang, "A fully-integrated 77-GHz FMCW radar transceiver in 65-nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2746–2756, Dec. 2010.
- [25] D. Deiana, E. M. Suijker, R. J. Bolt, A. P. M. Maas, W. J. Vlothuizen, and A. S. Kossen, "Real time indoor presence detection with a novel radar on a chip," in *Proc. Int. Radar Conf.*, Oct. 2014, pp. 1–4.
- [26] R. S. Perdana, B. Sitohang, and A. B. Suksmono, "Radar signal processing in parallel on GPU: Case study dual polarization FMCW weather radar," in *Proc. Int. Conf. Electr. Eng. Informat. (ICEEI)*, Jul. 2019, pp. 657–661.



**MOHAN GUO** received the B.S. degree in communication engineering from the Nanjing University of Science and Technology, Nanjing, China, in 2020. She is currently pursuing the master's degree in electronic engineering with Southeast University, Nanjing. Her current research interests include ASIC designs and signal processing.

## **IEEE**Access



**DIXIAN ZHAO** (Senior Member, IEEE) received the B.Sc. degree in microelectronics from Fudan University, Shanghai, China, in 2006, the M.Sc. degree in microelectronics from the Delft University of Technology (TU Delft), Delft, The Netherlands, in 2009, and the Ph.D. degree in electrical engineering from the University of Leuven (KU Leuven), Leuven, Belgium, in 2015.

From 2005 to 2007, he was with the Auto-ID Laboratory, Shanghai, where he developed the nonvolatile memory for passive radio frequency identification (RFID) tags. From 2008 to 2009, he was with Philips Research, Eindhoven, The Netherlands, where he designed the 60-GHz beamforming transmitter for presence detection radar. From 2009 to 2010, he was a Research Assistant with TU Delft, working on the 94-GHz wideband receiver for imaging radar. From 2010 to 2015, he was a Research Associate with KU Leuven, where he developed several world-class 60-GHz and E-band transmitters and power amplifiers. In April 2015, he joined Southeast University, Nanjing, where he is currently a Full Professor. He has authored or coauthored more than 50 peer-reviewed journal articles and conference papers, one book, and two book chapters. His current research interests include millimeter-wave integrated circuits, transceivers and phased-array systems for 5G, satellite, radar, and wireless power transfer applications. He serves as a Technical Program Committee (TPC) Member or the Sub-Committee Chair for several conferences, including the IEEE European Solid-State Circuits Conference (ESSCIRC), IEEE Asian Solid-State Circuits Conference (A-SSCC), and IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA). He was a recipient of the Innovative and Entrepreneurial Talent of Jiangsu Province, in 2016, the IEEE Solid-State Circuits Society Predoctoral Achievement Award, in 2014, the Chinese Government Award for Outstanding Students Abroad, in 2013, and the Top-Talent Scholarships from TU Delft, in 2007 and 2008. He also serves as an Associate Editor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I: REGULAR PAPERS.



**JIARUI WU** received the B.S. and M.S. degrees in electronic science and technology from the Dalian University of Technology, Dalian, Liaoning, China. Her research interests include IC designs on communication protocol and signal processing.



**DIWEI LI** received the B.E. degree in information engineering and the M.E. degree in communication and information systems from Southeast University, Nanjing, China, in 2018 and 2021, respectively. His research interests include FPGA algorithm optimization for 5G communication systems and high-speed energy-efficient baseband ASIC implementation for millimeter-wave integrated transceivers.



**QISONG WU** (Member, IEEE) received the Ph.D. degree from Xidian University, Xi'an, China, in 2010. He was a Postdoctoral Associate with Duke University, Durham, NC, USA, from 2010 to 2013, and with Villanova University, Villanova, PA, USA, from 2013 to 2015. He is currently an Associate Professor with the Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing, China. His research interests

include radar signal processing, sonar signal processing, array signal processing, sparse Bayesian learning, through-the-wall radar imaging, and synthetic aperture radar imaging.



**PENG ZHANG** received the B.S. degree in automation from the Nanjing Institute of Technology, Nanjing, China, in 2016, and the M.S. degree in instrument engineering from Southeast University, Nanjing, in 2020, where he is currently pursuing the Ph.D. degree. His current research interests include millimeter-wave radar signal processing, digital circuit designs, verifications, and embedded development.

...