

Received March 26, 2018, accepted April 30, 2018, date of publication May 7, 2018, date of current version June 26, 2018. Digital Object Identifier 10.1109/ACCESS.2018.2833623

# **VLSI Architecture for Novel Hopping Discrete Fourier Transform Computation**

WEN-HO JUANG<sup>[D]</sup>, (Student Member, IEEE), SHIN-CHI LAI<sup>[D]</sup>, (Member, IEEE), CHING-HSING LUO<sup>1</sup>, (Member, IEEE), AND SHUENN-YUH LEE<sup>[D]</sup>, (Senior Member, IEEE) <sup>1</sup>Department of Electrical Engineering, National Cheng Kung University, Tainan City 70101, Taiwan

<sup>2</sup>Department of Computer Science and Information Engineering, Nanhua University, Chiayi County 62249, Taiwan

Corresponding author: Shin-Chi Lai (chingivan2008@gmail.com)

This work was supported in part by the Ministry of Science and Technology of the Republic of China under Grant 106-2221-E-343-002, Grant 106-2511-S-262-001, and Grant 104-2221-E-343-003.

**ABSTRACT** The hopping discrete Fourier transform (HDFT) is a new method applied for time-frequency spectral analysis of time-varying signals. In the implementation of HDFT algorithms, the updating vector transform (UVT) plays a key role, and therefore a novel recursive DFT-based UVT formula is introduced in the proposed design for a HDFT algorithm and its architecture. The perceived advantages can be summarized as: 1) the proposed algorithm reduces the number of multiplications, additions, and coefficients by 42.3%, 33.3%, and 50%, respectively, compared with Park's method under the settings of an M-sample complex input sequence (M = 256), and an N-point recursive DFT computation scheme (N = 64) for time hop L (L = 4); 2) by adopting the hardware-sharing scheme and the register-shifting concept, the proposed design only takes nine multipliers and 12 adders for realization. The proposed hardware accelerator can be implemented using a field-programmable gate array, which can operate at 48.33 MHz clock rate. The resource utilization of combinational logic lookup tables (LUTs) and digital signal processing (DSP) blocks reduced by 11.7% and 42.5% compared with Juang *et al.*'s work. For very-large-scale integration realizations, the proposed design would be more powerful than other existing algorithms in future applications focusing on DSP, filtering, and communications.

**INDEX TERMS** Hopping discrete Fourier transform (HDFT), sliding discrete Fourier transform (SDFT), recursive discrete Fourier transform (RDFT), recursive DFT-based UVT.

## I. INTRODUCTION

In the field of digital signal processing (DSP), the discrete Fourier transform (DFT) is one of the most important discrete orthogonal transformations. Fourier analysis entered many applications, e.g., wireless communication and spectral analysis [1]. In recent years, several low-complexity mathematical algorithms have been developed for discrete-time signal processing [2]. The so called sliding transforms, like the sliding DFT (SDFT) [3]–[5], are applied on a sample-by-sample basis to analyze the time-frequency spectrum of the desired signals. SDFT has been used in various applications, e.g., the adaptive power-line interference canceler [6], the phaselocked loop (PLL) [7], the variable sampling frequency-based adaptive PLL [8], and the three-phase harmonic component detector [9]. The SDFT algorithm can directly use the previous DFT results to iteratively compute each new output bin. Hence, the SDFT's computational complexity is lower than that for traditional DFT. In [3] and [4], several SDFT algorithms described that the standard SDFT only takes four real additions and four real multiplications for computing each spectral bin in next window. In addition, considering the potential problems about the instability of standard SDFT filter and its heavy computational complexity, the guaranteedstable SDFT and the sliding Goertzel DFT were proposed to overcome these critical issues. In [5], Duda presented the so called modulated SDFT (mSDFT), which resembles to the basic DFT. This relies on circular frequency shifting to maintain the precision and the stability of the Fourier sliding transform. However, mSDFT imposes heavy computational cost, due to its hopping style. These algorithms have to perform the DFT for all of the time indices, even though the DFT output needs to be calculated only every L (L > 0)samples. That is, the overlap of time-window (N) is decreased from (N-1) to (N-L) between two subsequent local windows. Compared with SDFT, the successive outputs of DFT have L interval-samples as shown in Fig. 1. In other words, the resolution of the time-frequency representation will be how affected with different samples of L. In practical application,



**FIGURE 1.** The relationship between SDFT and the successive DFT outputs with *L* interval samples, where *N* is 8 and *L* is 4. (a) Previous window. (b) Current window.

Park proposed the idea of time hopping to achieve less computational requirement SDFT algorithms, i.e., the generalized SDFT (gSDFT) [10] and the optimal SDFT (oSDFT) [11]. Based on the same concept, it can be combined with the adaptive DFT for performing synchrophasor measurements [12], [13]. On the other hand, the hopping DFT (HDFT) [14], [15] with the time hop is introduced to reduce the computational complexity. For the fast computation of the HDFT, Park proposed a radix-2 decimationin-time (DIT) updating vector transform (UVT) and a UVT-based HDFT algorithm in [14]. In [15], the DFT modulation property according to mSDFT [5] was adapted to HDFT. This helped reduce the cumulative round-off errors and to ensure the pole placed on the unit circle in the complex plane. According to [14] and [15], the HDFT computation can be combined with the sliding unit, the UVT unit, and the resonators unit. Among various kinds of units, the UVT has most complexity than other units. Additionally, the increase of time hops implies that the computational complexity and hardware cost are greatly increases. This is the major reason that a low-complexity and low-cost novel hopping DFT algorithm is needed to develop.

In this paper, we derive a novel recursive architecture for HDFT integrated with a recursive DFT-based (RDFTbased) UVT algorithm. The proposed UVT algorithm overcomes the shortcomings of Park's UVT filter: (i) Park's design has non-uniform architecture with arbitrary-time hopping computation; (ii) the UVT kernel of the proposed design guarantees a low-complexity and low-cost accelerator after employing a comb and a resonator cascade filter. Note that the resonator filter is implemented according to Lai's method [16]–[18].

The rest of the paper is organized as follows. The proposed algorithm is derived in detail in Section II. Hardware accelerator mapping is displayed in Section III. Section IV compares the performance of the proposed design with that of other existing approaches. Section V presents hardware implementations for the proposed methods. Finally, conclusions and remarks are provided.

## **II. DERIVATION OF THE PROPOSED ALGORITHM**

In this section, a HDFT is derived to be an alternative of the SDFT for the time-frequency analysis. The mathematical introduction and the properties of SDFT are given first, then an efficient and compact RDFT-based UVT design are proposed for low-complexity HDFT computation.

# A. HDFT ALGORITHM

The k-th spectral bin of an N-point DFT formula at the time index m is defined as

$$X_m[k] = \sum_{n=0}^{N-1} x \left[ \hat{m} + n \right] W_N^{-nk},$$
 (1)

where  $x[\hat{m} + n]$  and  $X_m[k]$  are, respectively, the time-domain and frequency-domain signals,  $\hat{m} = m - N + 1, k = 0, 1, \dots, N - 1$ , and the twiddle factor,  $W_N$ , is denoted by

$$W_N = e^{j2\pi/N}.$$
 (2)

As presented shown in [3] and [4], the relationship between  $X_m$  [k] and  $X_{m-1}$  [k] can be derived from

$$X_{m}[k] = \sum_{n=0}^{N-1} x \left[ \widehat{m} + n \right] W_{N}^{-nk}$$
  
=  $W_{N}^{k} \sum_{n=0}^{N-1} x \left[ \widehat{m} - 1 + n \right] W_{N}^{-nk}$   
+  $x \left[ \widehat{m} + N - 1 \right] W_{N}^{k} - x \left[ \widehat{m} - 1 \right] W_{N}^{k}$   
=  $(X_{m-1}[k] + x [m] - x [m - N]) W_{N}^{k}.$  (3)

Furthermore, the solution of this difference equation [see (3)] can be expressed by z-transform. The z-domain transfer function,  $H_{SDFT}(z)$ , for the SDFT filter is

$$H_{SDFT}(z) = \frac{(1 - z^{-N}) W_N^k}{1 - W_N^k z^{-1}}.$$
 (4)

This means that the SDFT algorithm can be implemented by a comb filter  $(1 - z^{-N})$  cascading with a resonator. When the output data rate of the spectral bin is the same as the input data rate, the first and the last samples between the current window and the pre-window in (3) can comply with the SDFT property. This means that SDFT could be treated as DFT with time hopping (L = 1). Similarly, for arbitrary-time hops, the HDFT relationship between  $X_m$  [k] and  $X_{m-L}$  [k] – as shown in [14] and [15] – is derived by recursively substituting  $X_m$  [k] into  $X_{m-L}$  [k] L times in (3). The resultant formula is given by

$$X_{m}[k] = \sum_{n=0}^{N-1} x \left[ \widehat{m} + n \right] W_{N}^{-nk}$$
$$= W_{N}^{Lk} \sum_{n=0}^{N-1} x \left[ \widehat{m} - L + n \right] W_{N}^{-nk}$$

$$+\sum_{n=N}^{N-1+L} x \left[ \widehat{m} + n - L \right] W_N^{-(n-L)k} -\sum_{n=0}^{L-1} x \left[ \widehat{m} + n - L \right] W_N^{-(n-L)k} = W_N^{Lk} \sum_{n=0}^{N-1} x \left[ \widehat{m} - L + n \right] W_N^{-nk} + \sum_{n=0}^{L-1} \left( x \left[ \widehat{m} + n + N - L \right] - x \left[ \widehat{m} + n - L \right] \right) W_N^{-(n-L)k} = \left( X_{m-L} \left[ k \right] + D_m^L \left[ k \right] \right) W_N^{Lk},$$
(5)

where  $D_m^L[k]$  is defined as the *k*-th bin of the *L*-point UVT substitute.

$$D_m^L[k] = \sum_{n=0}^{L-1} d[m-n] W_N^{(n-L+1)k},$$
(6)

where d[m] = x[m] - x[m - N] and k = 0, 1, ..., N - 1. Equation (6) can be therefore rewritten as

$$D_{m}^{L}[k] = W_{N}^{k} \left( \sum_{n=0}^{L} d [m-n] W_{N}^{-Lk} W_{N}^{nk} - d [m-L] \right)$$
$$= W_{N}^{k} \left( \sum_{n=0}^{L} \left( d [m-n] W_{N}^{-Lk} - d [m-n-L] \right) W_{N}^{nk} \right)$$
$$= W_{N}^{k} \left( \sum_{n=0}^{L} \widehat{d} [m-n] W_{N}^{nk} \right),$$
(7)

where

$$\widehat{d}[m] = d[m] W_N^{-Lk} - d[m-L],$$

and

$$d\left[m-n-L\right] = 0|_{n\neq 0}.$$

After expanding the sigma function of the UVT filter in (7), we obtain a recursive formula as

$$D_m^L[k] = W_N^k \left( \sum_{n=0}^L \widehat{d} [m-n] W_N^{nk} \right)$$
$$= \left( \left( \widehat{d} [m-L] W_N^k + \widehat{d} [m-L+1] \right) W_N^k + \dots + \widehat{d} [m] \right) W_N^k.$$
(8)

Hence, the kernel function of the difference equation is obtained as (9).

$$D[m,k] = \left(D[m-1,k] + \widehat{d}[m]\right) W_N^k, \tag{9}$$

where D[-1, k] = 0. Here, the z-domain transfer function,  $H_D(z)$ , is derived from the recursive difference equation (9).

$$H_D(z) = \frac{W_N^k}{1 - W_N^k z^{-1}}.$$
 (10)

VOLUME 6, 2018

30493



FIGURE 2. The novel hopping DFT structure with recursive computation.

According to (5) and (10), the proposed novel HDFT structure with a recursive kernel is depicted in Fig. 2.

#### B. NOVEL RDFT-BASED UVT FORMULA

Since equation (8) is similar to the derivation of RDFT in [16]–[18], the function, D[L, k], can be derived as

$$D_m^L[k] = D[L-1,k].$$
(11)

Via (8) and (9), and taking the result of  $D_m^L[k]$  at time n = L - 1, the coefficient of UVT can be obtained by *L* iterations. From [16]–[18], the transfer function  $H_D(z)$  can be derived as

$$H_D(z) = \frac{W_N^k}{1 - W_N^k z^{-1}}$$
  
=  $\frac{W_N^k \left(1 - W_N^{-k} z^{-1}\right)}{\left(1 - W_N^k z^{-1}\right) \left(1 - W_N^{-k} z^{-1}\right)}$   
=  $\frac{W_N^k - z^{-1}}{1 - 2\cos\theta_k z^{-1} + z^{-2}}$   
=  $\frac{\left(\cos\theta_k - z^{-1}\right) + j\sin\theta_k}{1 - \left(2\cos\theta_k - z^{-1}\right)z^{-1}},$  (12)

where  $\theta_k = 2\pi k/N$ . Now, the proposed architecture design for the RDFT-based UVT algorithm is shown in Fig. 3.

## **III. VLSI ARCHITECTURE DESIGN**

In the proposed design, the key indices in terms of speed, area, and power are all considered before starting VLSI implementation. In this section, we discuss how to use well-known design techniques to improve chip speed, to reduce chip area, and to achieve area efficiency for the proposed architecture.

For the purpose of explaining, the nodes of graph are named, e.g.,  $A_m$ ,  $B_m$ , and  $C_m$ , and are all tagged in Fig. 3. Besides, to obtain the real and the imaginary parts of the UVT filter or for a complex-number output, the functions are also built, as (13) and (14).

$$\mathfrak{R}_{m} = \operatorname{Re}\left\{D_{m}^{L}[k]\right\}$$
$$= \operatorname{Re}\left\{B_{m}\right\} - \operatorname{Re}\left\{C_{m}\right\} - \operatorname{Im}\left\{A_{m}\right\} \times \sin\theta_{k}, \quad (13)$$

Function 1 Compute the Real Part of a Complex NumberInput: The results of the UVT filter,  $D_m^L[k]$ Output: The real part of the UVT filter,  $\mathfrak{R}_m$ 

1.  $\dot{D}_m^L[k] = \text{conjugate} \left( D_m^L[k] \right)$ 

2.  $\Re_m = \left( D_m^L[k] + \dot{D}_m^L[k] \right) / 2$ 

3. return  $\Re_m$ 

Function 2 Compute the Imaginary Part of a Complex Number

**Input:** The results of the UVT filter,  $D_m^L[k]$ **Output:** The imaginary part of the UVT filter,  $\Im_m$ 

- 1.  $\dot{D}_m^L[k] = \text{conjugate}\left(D_m^L[k]\right)$
- 2.  $\Im_m = \left(D_m^L[k] \dot{D}_m^L[k]\right)/2j$
- 3. return  $\mathfrak{I}_m$

TABLE 1. Latency analysis in various units for the proposed architecture.

| Unit           | Path Latency                |                           | Worst                     |  |
|----------------|-----------------------------|---------------------------|---------------------------|--|
| Sliding        | In. to Out. 1T <sub>A</sub> |                           | $1T_A$                    |  |
| RDFT-based UVT | In. to Out. through $A_m$   | $2T_{M} + 4T_{A}$         | $2T_{M} + 5T_{A}$         |  |
|                | In. to Out. through $B_m$   | $2T_{\rm M} + 5T_{\rm A}$ |                           |  |
| Resonator      | In. to Out.                 | $1T_{\rm M} + 2T_{\rm A}$ | $1T_{\rm M} + 2T_{\rm A}$ |  |
| Total          |                             |                           | $3T_{\rm M} + 8T_{\rm A}$ |  |

$$\Im_m = \operatorname{Im} \left\{ D_m^L[k] \right\}$$
  
= Im {B<sub>m</sub>} - Im {C<sub>m</sub>} + Re {A<sub>m</sub>} × sin \(\theta\_k, \)(14)

where the subscript *m* is a time index, and the pseudo-code for  $Re\{\bullet\}$  and  $Im\{\bullet\}$  are shown in Function 1 and 2, respectively.

## A. LATENCY ANALYSIS AND IMPROVEMENT

The latency between the input and the output of a path can be calculated in various units [see Fig. 2 and Fig. 3]. The results are listed in Table 1. The critical path of the proposed



FIGURE 3. Proposed novel RDFT-based UVT design.



FIGURE 4. Proposed low-critical-path HDFT design.

algorithm consists of  $3T_M + 8T_A$ , where  $T_M$  is the time period of the real multiplier and  $T_A$  is the time period of the real adder.

Note that the time period of one complex multiplier is evaluated as  $1T_M + 1T_A$  under x[m] is a complex sequence. After applying basic VLSI techniques and a re-timing scheme [19] to shorten the critical path on Fig. 3, the proposed lowcritical-path HDFT design is displayed in Fig. 4. There, the critical path is greatly shortened from  $8T_A + 3T_M$  into  $3T_A + 1T_M$ .

## **B. HARDWARE RESOURCE SHARING SCHEME**

For the property of HDFT [see (5) and (11), or Fig. 1], the output data rate of spectral bin decreased by a factor of L compared with SDFT. The hardware resource of the proposed architecture can be shared, namely, the real and the imaginary part calculation circuits can be interlaced as shown in the right side of Fig. 4. The control flow and the plan of calculations are described on a flow diagram (FD) in Fig. 5. The FD is divided into five procedures:

• *Proc. 1*: Calculate the real part of the UVT filter, and then accumulate the results from the previous DFT. The values of  $Re\{A_m\}$ ,  $Im\{B_m\}$ , and  $Im\{C_m\}$  need to be stored for use in *Proc. 2*.

- *Proc.* 2: Similar to *Proc.* 1, but calculate the imaginary part of the UVT filter and accumulate the results from the previous DFT.
- *Proc. 3*: Calculate the real part of the new DFT with *L*-samples.
- *Proc.* 4: Calculate the imaginary part of the new DFT with *L*-samples.
- *Proc.* 5: Update temporary variables Var\_III, Var\_VI, and Var\_VII. The variables are realized using a D-type flip-flop for hardware implementation.

Fig. 6 (a) displays an example of the proposed hardware sharing scheme based on the above FD. The number of time hops (L) is set to two. The timing diagram of the overall architecture is displayed in Fig. 6 (b).

As mentioned above, the results show that the proposed method not only shortens the latency, but also uses the hardware sharing scheme to reduce the cost of the area. Moreover, the power consumption can be saved, simultaneously.

#### **IV. DISCUSSION AND COMPARISON**

For the hopping issue of various HDFT algorithms, the performance metrics in terms of numbers of multiplication, addition, and coefficient are evaluated in Table 2. For performance metrics, we set the transformation length (N) of



FIGURE 5. Flow diagram of the control and the plan of calculations.

DFT to 64, the time-domain frame size (M) to 256, the number of time hops (L) to 4, and the input signal to complex numbers for *k*-th spectral bin of the single-bin. Note that one complex multiplication involves four real multipli-

cations and two real additions. The evaluation results show that although Wang's [15] method is more accurate than Park's [14], it has greater computational complexity, i.e., 3,840 multiplications and 4,096 additions. In this paper, the



FIGURE 6. Hardware design for proposed HDFT structure. (a) Hardware architecture design with sharing scheme. (b) Timing diagram of the hardware sharing scheme.

TABLE 2. Computational complexity comparison of N-point HDFT algorithms for hopping applications.

| Metrics                                             | 2014<br>[14]             | 2015<br>[15]                              | Previous work<br>[20] | Proposed      |
|-----------------------------------------------------|--------------------------|-------------------------------------------|-----------------------|---------------|
| Multiplication                                      | 4M * (L - 1) + (4M / L)  | 4M * (L - 1) + (12M / L)                  | 8M + (4M / L)         | 6M + (6M / L) |
| Addition                                            | 4M * (2L - 1) + (4M / L) | 2M * (2L - 1) + (8M / L) $10M + (4M / L)$ |                       | 8M + (8M / L) |
| Coefficient                                         | 2L                       | 4 <i>N</i> +2 <i>L</i> -2                 | 4                     | 4             |
| Cri. Path (a)<br>(T <sub>A</sub> , T <sub>M</sub> ) | (L + 2, 2)               | (L + 3, 3)                                | (3, 1)                | (3, 1)        |
| Cpt. Cycle (b)                                      | М                        | М                                         | M+2                   | <i>M</i> +2   |
| Cpt. Time                                           | (a) * (b)                |                                           |                       |               |

RDFT-based computation with a recursive kernel for the UVT filter, and it not only takes fewer computational complexity than previous works [20] but also reduces the number of multiplications, additions, and coefficients by 42.3%,

33.3%, and 50%, respectively, compared with Park's [14] method.

Table 3 shows the comparison of the hardware resources for HDFT algorithms. Hardware resources include

| Metrics     | 2014<br>[14]   | 2015<br>[15]             | Pre-work [20] | Proposed   |
|-------------|----------------|--------------------------|---------------|------------|
| Multiplier  | 4L             | 4L + 8                   | 12            | 9          |
| Adder       | 4 <i>L</i> + 2 | 4L + 6                   | 14            | 12         |
| Register    | N+2L - 1       | <i>N</i> +3 <i>L</i> - 1 | N + 2L + 1    | N + 2L + 2 |
| Coefficient | 2L             | 2L                       | 4             | 4          |

#### TABLE 3. Hardware cost comparison of N-point HDFT architecture designs for hopping applications.

multipliers, adders, registers, and coefficient-ROM. Here, it should be noticed that one complex multiplier requires four real multipliers and two real adders for the hardware evaluation. The hardware need of the proposed design is lower than that of other designs [14], [15]. There are only nine real multipliers, 12 real adders, 74 registers, and four coefficients in our implementation. Compared with previous works [20], we saved 25% on multipliers and 14.3% on adders.

Overall, the advantages of the proposed design are low complexity and area efficiency, and it would be more suitable for real-time time-frequency spectral applications.

## **V. IMPLEMENTATION AND RESULTS**

#### A. FPGA IMPLEMENTATION

Before mapping HDFT algorithm into FPGA, a detailed and completed analysis is made to the word-length of each node for the proposed architecture through mathematical software, and then the available precision selected according to the simulation reports. We assume the test signal is influenced by complex Gaussian noise. It has zero-mean, and its variance equals one. First we feed the test signal in 16-format, until each node [see Fig. 4] is recorded at each time slot using fixed-point arithmetic in Q-format, as listed in Table 4. The node quality of recursive DFT is evaluated through the signalto-quantization-noise ratio (SQNR) as

$$\varepsilon_m = X_{mDFT}^k - X_{mHDFT}^k \tag{15}$$

$$SQNR = 10 \log \left( X_{mDFT}^{k2} / \varepsilon_m^2 \right), \tag{16}$$

where  $X_{mDFT}^k$  denotes the standard DFT bin output from the batch data of length *N*, and  $X_{mHDFT}^k$  stands for the recursive computations of the proposed algorithm. The simulation results are shown in Fig. 7, and the selected optimization set reports that the proposed hardware has approximate 100 dB SQNR as well as previous work [20] does, since we select the finite-word representation of fractional numbers are 24 bits and 27 bits on the coefficients of twiddle factors and internal each node, respectively.

For hardware implementation, the proposed architecture was coded by the Verilog-2001 hardware description language (HDL), and converted from HDL into an SRAM Object File (SOF) or Programmer Object File (POF) using Quartus II software. Table 5 displays the comparison between the proposed design and previous works [20] after FPGA implementations. We used Altera Cyclone IV, an FPGA device,

#### TABLE 4. Fixed point word-length analysis at each node of the proposed (low-critical-path) HDFT design.

| Node  | Word-length      |                  |  |
|-------|------------------|------------------|--|
|       | Format           | Total bits       |  |
| Ι     | Q1.15            | 16               |  |
| Π     | Q2.15            | 17               |  |
| III   | $Q2.\Delta L_1$  | $2 + \Delta L_1$ |  |
| IV    | Q7. $\Delta L_1$ | $7+\Delta L_1$   |  |
| С     | Q1. $\Delta L_2$ | $1 + \Delta L_2$ |  |
| Other | Q4. $\Delta L_1$ | $4+\Delta L_1$   |  |

#### TABLE 5. FPGA implementation and resource utilization.

| Hardware Cost           |     | Hardware Block<br>(Cyclone IV E EP4CE115F29C7) |          |  |
|-------------------------|-----|------------------------------------------------|----------|--|
|                         |     | Previous work<br>[20]                          | Proposed |  |
| LC Combinations         |     | 1634                                           | 1443     |  |
| LC Registers            |     | 465                                            | 560      |  |
| DSP Elements            |     | 80                                             | 46       |  |
| DSP 18x18               |     | 40                                             | 23       |  |
| RAM                     |     | 64 * 32 bits                                   |          |  |
| Mem.                    | ROM | 4 * 25 bits                                    |          |  |
| Dyn. Power (mW)         |     | Unlist                                         | 23.33    |  |
| Max Speed (MHz)         |     | 47.26                                          | 48.33    |  |
| Norm. Power<br>(mW/MHz) |     | Unlist                                         | 0.48     |  |

#### TABLE 6. Physical chip results.

| Technology                            | TSMC 0.18µm 1P6M CMOS                       |
|---------------------------------------|---------------------------------------------|
| Word Lengths<br>(Max Inter. / Coeff.) | 34 bits / 25 bits                           |
| Supplied Voltage / Clock Rate         | 1.8 V / 25 MHz                              |
| Data Memory (RAM)                     | 64 * 32 bits                                |
| DFT-Length (N) / Time Hop (L)         | 64 / 4                                      |
| Power Consumption<br>(Chip / Core)    | 49.8 mW / 23.2 mW                           |
| Chip Size / Core Size                 | $2.0737 \text{ mm}^2 / 0.7937 \text{ mm}^2$ |
| Gate Count                            | 133,161                                     |

which can be operated at 48.33 MHz. The realization results show that the resource utilization of combinational logic lookup tables (LUTs) and DSP blocks reduced by 11.7% and 42.5%, respectively, compared with previous works [20]. The dynamic power consumption based on the test signal is shown in Fig. 8. The normalized power metrics are given in Table 5. Overall, the advantages of the proposed design has low complexity and area efficiency, although the utilization

| Unit                | Power<br>(mW) | Percentage<br>(%) | Area<br>(um <sup>2</sup> ) | Percentage<br>(%) | Power norm. to area<br>( mW / mm <sup>2</sup> ) |
|---------------------|---------------|-------------------|----------------------------|-------------------|-------------------------------------------------|
| Sliding             | 9.92          | 42.76             | 359,600                    | 45.3              | 27.59                                           |
| UVT<br>(RDFT-based) | 11.2          | 48.28             | 313,225                    | 39.46             | 35.76                                           |
| Resonator           | 1.89          | 8.15              | 110,880                    | 13.97             | 17.05                                           |
| Other               | 0.19          | 0.81              | 10,045                     | 1.27              | 18.91                                           |

TABLE 7. Implementation of cell-based design for the proposed HDFT accelerator.



FIGURE 7. Bit-level SQNR simulation for the proposed (low-critical-path) HDFT design.



**FIGURE 8.** A test signal with complex Gaussian distribution for power analysis.

of registers are increased for saving the derived values that are generated in the calculation.

#### **B. ASIC PHYSICAL RESULTS**

The proposed architecture was implemented using an application specific integrated circuits (ASIC) chip with 0.18-um technology. The Taiwan Semiconductor Manufacturing Company Limited (TSMC) 0.18-um 1P6M CMOS

standard-cell library was used, and then the verilog code synthesized using the Synopsys Design Compiler and placeand-route for layout using Cadence SOC Encounter. The specifications and the chip results of the implementation are listed in Table 6. The total pad count of the design is 130. It is packaged in a ceramic quad flat pack (CQFP) package. The chip size is 1780 um  $\times$  1165 um and the core size is 1250 um  $\times$  635 um. The power consumption is 49.8 mW at 25 MHz as simulated by the Synopsys PrimePower tool. For analysis of the core size and the power consumption, the proposed design can be divided into four sub-blocks, as listed in Table 7. In the sliding unit, the function  $z^{-N}$  is realized by a two-port memory, which dominates the metrics. In the UVT unit, the percentages of the power and the area use are 48.28% and 39.26%, respectively. This confirms that the proposed RDFT-based UVT filter is greatly involved in the reduction of computational and hardware cost. The resonator unit has small portion of the power consumption and the area compared with other units, because it only takes a light workload and employs resource sharing. In Table 7, the power consumption is normalized to the corresponding area. According to the result, the UVT filter plays a crucial role that decides the complexity and the performance of chip.

## **VI. CONCLUSION**

This paper presented a low-complexity, low-cost, RDFTbased architecture design for HDFT computation. The proposed HDFT algorithm employs a unified recursive kernel for arbitrary length time hops. The results show that the proposed method has lower hardware requirements compared with previous works. It would be a more powerful element for a VLSI realization in future applications focusing on digital signal processing, filtering, and communications.

## ACKNOWLEDGMENT

The authors would like to acknowledge chip fabrication support provided by National Chip Implementation Center (CIC), Taiwan, R. O. C.

#### REFERENCES

- D. E. Newland, An Introduction to Random Vibrations, Spectral & Wavelet Analysis, 3rd ed. New York, NY, USA: Dover, 2012.
- [2] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, *Discrete-Time Signal Processing*, 3rd ed. Upper Saddle River, NJ, USA: Prentice-Hall, 2010.
- [3] E. Jacobsen and R. Lyons, "The sliding DFT," *IEEE Signal Process. Mag.*, vol. 20, no. 2, pp. 74–80, Mar. 2003.

- [4] E. Jacobsen and R. Lyons, "An update to the sliding DFT," *IEEE Signal Process. Mag.*, vol. 21, no. 1, pp. 110–111, Jan. 2004.
- [5] K. Duda, "Accurate, guaranteed stable, sliding discrete Fourier transform [DSP tips & tricks]," *IEEE Signal Process. Mag.*, vol. 27, no. 6, pp. 124–127, Nov. 2010.
- [6] S. Mishra, D. Das, R. Kumar, and P. Sumathi, "A power-line interference canceler based on sliding DFT phase locking scheme for ECG signals," *IEEE Trans. Instrum. Meas.*, vol. 64, no. 1, pp. 132–142, Jan. 2015.
- [7] P. Sumathi and P. A. Janakiraman, "Integrated phase-locking scheme for SDFT-based harmonic analysis of periodic signals," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 55, no. 1, pp. 51–55, Jan. 2008.
- [8] C. M. Orallo, I. Carugati, S. Maestri, P. G. Donato, D. Carrica, and M. Benedetti, "Harmonics measurement with a modulated sliding discrete Fourier transform algorithm," *IEEE Trans. Instrum. Meas.*, vol. 63, no. 4, pp. 781–793, Apr. 2014.
- [9] I. Carugati, C. M. Orallo, P. G. Donato, S. Maestri, J. L. Strack, and D. Carrica, "Three-phase harmonic and sequence components measurement method based on mSDFT and variable sampling period technique," *IEEE Trans. Instrum. Meas.*, vol. 65, no. 8, pp. 1761–1772, Aug. 2016.
- [10] C.-S. Park, "Fast, accurate, and guaranteed stable sliding discrete Fourier transform [sp tips&tricks]," *IEEE Signal Process. Mag.*, vol. 32, no. 4, pp. 145–156, Jul. 2015.
- [11] C.-S. Park, "Guaranteed-stable sliding DFT algorithm with minimal computational requirements," *IEEE Trans. Signal Process.*, vol. 65, no. 20, pp. 5281–5288, Oct. 2017.
- [12] S. Liu and T. Guo, "An adaptive DFT algorithm for measuring power system synchrophasors based on rectangular coordinate," in *Proc. IEEE PES Asia–Pacific Power Energy Eng. Conf. (APPEEC)*, Nov. 2015, pp. 1–5, doi: 10.1109/APPEEC.2015.7380900.
- [13] S. Liu, T. Guo, J. Li, and J. Wu, "Adaptive DFT algorithm for measuring synchrophasors based on shifting window symmetrical," in *Proc. Int. Conf. Electr. Utility Deregulation Restruct. Power Technol. (DRPT)*, Nov. 2015, pp. 722–726, doi: 10.1109/DRPT.2015.7432339.
- [14] C.-S. Park and S.-J. Ko, "The hopping discrete Fourier transform [sp tips&tricks]," *IEEE Signal Process. Mag.*, vol. 31, no. 2, pp. 135–139, Mar. 2014.
- [15] Q. Wang, X. Yan, and K. Qin, "High-precision, permanently stable, modulated hopping discrete Fourier transform," *IEEE Signal Process. Lett.*, vol. 22, no. 6, pp. 748–751, Jun. 2015.
- [16] S. C. Lai, S. F. Lei, C. L. Chang, C. C. Lin, and C. H. Luo, "Low computational complexity, low power, and low area design for the implementation of recursive DFT and IDFT algorithms," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 56, no. 12, pp. 921–925, Dec. 2009.
- [17] S.-C. Lai, W.-H. Juang, C.-L. Chang, C.-C. Lin, C.-H. Luo, and S.-F. Lei, "Low-computation-cycle, power-efficient, and reconfigurable design of recursive DFT for portable digital radio mondiale receiver," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 57, no. 8, pp. 647–651, Aug. 2010.
- [18] S.-C. Lai, S.-F. Lei, W.-H. Juang, and C.-H. Luo, "A low-cost, low-complexity, and memory-free architecture of novel recursive DFT and IDFT algorithms for DTMF application," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 57, no. 9, pp. 711–715, Sep. 2010.
- [19] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York, NY, USA: Wiley, 1999, pp. 91–112.
- [20] W.-H. Juang, S.-C. Lai, K.-H. Chen, W.-K. Tsai, and C.-H. Luo, "Lowcomplexity hopping DFT design based on a compact recursive structure," *Electron. Lett.*, vol. 53, no. 1, pp. 25–27, Jan. 2017.



**WEN-HO JUANG** (S'15) was born in Chiayi, Taiwan. He received the M.S. degree from the Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, in 2010. From 2011 to 2014, he was an Engineer with the Data Center Networking BU, MediaTek Inc., Hsinchu, Taiwan. He is currently pursuing the Ph.D. degree with the Department of Electrical Engineering, National Cheng Kung University. His research interests include digital signal processing and VLSI system design.



**SHIN-CHI LAI** (M'15) was born in Taichung, Taiwan, in 1980. He received the B.S. degree in electronic engineering from ChienKuo Technology University, Changhua, Taiwan, in 2002, the M.S. degree in electronic engineering from the National Yunlin University of Science and Technology, Yunlin, Taiwan, in 2005, and the Ph.D. degree from National Cheng Kung University, Tainan, Taiwan, in 2011. From 2011 to 2013, he was an Assistant Research Fellow with the Depart-

ment of Electrical Engineering, National Cheng Kung University. He is currently an Associate Professor with the Department of Computer Science and Information Engineering, Nanhua University, Chiayi, Taiwan. His main research interests include signal processing and its circuit design, especially for speech, audio, biomedical, and multimedia applications.



**CHING-HSING LUO** (M'05) received the B.S. degree in electrophysics from National Chiao-Tung University, Hsinchu, Taiwan, the M.S. degree in electrical engineering from National Taiwan University, Taipei, in 1982, the M.S. degree in bio-medical engineering from Johns Hopkins University, Baltimore, MD, USA, in 1987, and the Ph.D. degree in biomedical engineering from Case Western Reserve University, Cleveland, OH, USA, in 1991. Since 1995, he has

been a Distinguished Professor with the Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan. His research interests include biomedical instrumentation systems-on-chip, assistive device and agent system, cell modeling, signal processing, RFIC, implantable wireless bio-sensing integrated circuits, genome simulation, and pulse diagnosis platform in traditional Chinese medicine.



**SHUENN-YUH LEE** (M'98–SM'15) was born in Taichung, Taiwan, in 1966. He received the B.S. degree from the National Taiwan Ocean University, Keelung, Taiwan, in 1988, and the M.S. and Ph.D. degrees from National Cheng Kung University, Tainan, Taiwan, in 1994 and 1999, respectively.

He was with the Department of Electrical Engineering, National Chung Cheng University, Chiayi, Taiwan, as an Associate Professor in 2006 and

a Professor in 2011, respectively. He is currently a Professor with the Department of Electrical Engineering, National Cheng Kung University. His current research interest include the design of analog and mixed-signal integrated circuits including filter, high-speed ADC/DAC, and sigma-delta ADC/DAC, biomedical circuits and systems, low-power and low-voltage analog circuits, and RF front-end integrated circuits for wireless communications.

He is currently a member of the Circuits and Systems Society, the Solid-State Circuits Society, and the Medicine and Biology Society of the IEEE. He is also a member of the IEICE. He served as the Chairman of Heterogeneous Integration Consortium under the VLSI Educational Program sponsored by the Ministry of Education, Taiwan, from 2009 to 2011. He served as the Technical Program Chair of the 2014/2015 International Symposium on Bioelectronics and Bioinformatics, the 2015 Taiwan and Japan Conference on Orange Technologies, and the 2013 IEEE International Conference on Orange Technologies, and the Publication Chair for the 2012 IEEE Asia Pacific Conference on Circuits and Systems. From 2013 to 2016, he serves as the Chairman of the IEEE Solid-State Circuits Society Tainan Chapter. From 2016 to 2017, he serves as the Vice Chairman of the IEEE Tianan Section. Since 2016, he has been serving as the Associate Editor for the IEEE TRANSACTION ON BIOMEDICAL CIRCUITS AND SYSTEMS.