

Received January 5, 2018, accepted February 9, 2018, date of publication February 21, 2018, date of current version March 13, 2018. *Digital Object Identifier 10.1109/ACCESS.2018.2808371*

# FPGA-Based Architecture for Medium Access Techniques in Broadband PLC

## PABLO POUDEREUX<sup>1</sup>, ÁLV[ARO](https://orcid.org/0000-0001-6843-5199) HERNÁNDE[Z](https://orcid.org/0000-0001-9308-8133)®1, (Senior Member, IEEE), FERNANDO CRUZ-ROLDÁN®2, (Senior Member, IEEE), AND RAÚL MATEOS<sup>1</sup>

<sup>1</sup> Electronics Department, Universidad de Alcalá, E-28805 Alcalá de Henares, Spain

<sup>2</sup>Signal Theory and Communications Department, Universidad de Alcalá, E-28805 Alcalá de Henares, Spain

Corresponding authors: Pablo Poudereux (pablo.poudereux@depeca.uah.es) and Álvaro Hernández (alvaro.hernandez@uah.es)

This work was supported by the Spanish Ministry of Economy and Competitiveness, SOC-PLC Project, under Project TEC2015-64835-C3-2-R and Project TEC2015-64835-C3-1-R.

**ABSTRACT** In this paper, two real-time architectures of medium access techniques useful for future generation of wireline and wireless communication systems are presented. One architecture is based on discrete cosine transform (DCT), while the second approach implements a filter-bank multi-carrier (FBMC) system. A comparative analysis, in terms of resource consumption, performance, and precision, is shown. The comparison considers a floating-point model, a fixed-point model, and experimental tests. These models make it possible to evaluate the effect of the fixed-point precision in the implementation and, in turn, to verify the correctness of the developed architecture. The simulation models and the experimental tests have been carried out in different practical environments in order to achieve a further analysis. The two proposed architectures have been implemented on a field-programmable gate array (FPGA) device. Furthermore, the architectures have been included as advanced peripherals in a system-on-chip, which also integrates a soft microprocessor to monitor the whole system and manage the data transfers. As a communication scenario, the proposed architectures have been particularized to operate in real time while meeting all timing requirements defined by a broadband power line communications standard. For that case, the system has achieved a desired transmission rate of 62.5 Ms/s at the converters, providing mean squared errors, at the output for an ideal channel, below  $3.10^{-5}$  for both the DCT and FBMC approaches, whereas each transmitter/receiver requires around 50% of the DSP cells available in the Xilinx XC6VLX240T FPGA, the most demanded resource in the device.

**INDEX TERMS** Field-programmable gate arrays, multi-carrier communication (MCM), filter-bank multicarrier (FBMC) systems, broadband power-line communications, discrete cosine transform (DCT).

## **I. INTRODUCTION**

Multicarrier Modulation (MCM), such as discrete multitone (DMT) and Orthogonal Frequency Division Multiplexing (OFDM), is currently the most adopted transmission scheme in both wired and wireless communication standards (e.g., [1]–[6]). When MCM is implemented by means of fast algorithms of the Discrete Fourier Transform (DFT) [7], it has several advantages such as low complexity, effectiveness against multipath, or frequency selective channels [8]. Nevertheless, DFT-based MCM is also questioned because of some weaknesses. One important drawback is the high sensitivity to synchronization errors [9], and particularly to Carrier Frequency Offset (CFO) coming either from differences between the oscillators at the transmitter and receiver or from the Doppler effect. The CFO involves two aspects: the reduction of the signal magnitude, since signals are not sampled at the expected time stamp; and the appearance of inter-carrier interference in the system, since the subcarriers are no longer orthogonal. Other weaknesses of DFT-based systems are the spectral/power inefficiency due to the guard intervals (the Cyclic Prefix CP that is often inserted), and the poor spectral containment of DFT-based subcarriers.

For the above reasons, new alternatives to OFDM have appeared during the last years, with the aim of improving the above OFDM drawbacks. In order to deal with the CFO problem, several authors have proposed the use of Discrete Trigonometric Transforms (DTT) for multicarrier modulation, mainly Type-II even and Type-IV even Discrete Cosine Transforms (DCT2e and DCT4e) [8], [10]–[12]. On the other hand, Filter-Bank Multicarrier (FBMC) modulation is a

promising technique for future wireline and wireless communication standards [13]–[15]. It avoids the use of redundant samples as the CP, and with FBMC systems, an increase in spectral efficiency and robustness to synchronization requirements can be achieved. In addition, FBMC allows more robustness in noisy environments, such as the power line channels [16]. For this reason, a type of FBMC is recommended in a physical layer of the IEEE 1901 Standard for broadband Power Line Communications (PLC) [4].

All these approaches imply a significant and complex computational load, which makes any feasible real-time architecture difficult. They often require high data rates and transfers, with high sampling frequencies and different parallel datapaths to be implemented. All these features imply that the design of an efficient architecture suitable for the implementation of novel medium-access multi-carrier techniques is a relevant topic nowadays. For that purpose, Field-Programmable Gate Array (FPGA) devices have become a feasible and suitable alternative for the implementation and first prototyping of algorithms in communication applications [17]–[19] requiring high clock frequencies, adaptability or flexibility, a certain degree of parallelism, and need for intermediate data storage. As a result, it is possible to find previous works where FPGAs are the key element in the proposal of architectures for the implementation of different multicarrier modulations [6], [20]–[23]. In [24] a hardware architecture is proposed for 5G standard, dealing with an FBMC modulation. This work is extended in [25] with a detailed description of the pipelined architecture for the transmitter. A further step is proposed in [26], where FPGA coprocessor is merged together with GPPs (General-Purpose Processors), also for 5G. Furthermore, in this trend, it is also interesting to consider FPGA-based System-on-Chip (SoC) [27], where the die is often shared by a general-purpose processor and the configurable logic resources, merging the advantages of both sides.

With regard to the PLC standard, a transceiver based on OFDM and implemented on a Xilinx Spartan6 FPGA is described in [28], with data rates up to 107Mbps. In [29], a synchronization algorithm for OFDM is implemented on a FPGA for the same standard. A final example can be found in [30], where an OFDM physical layer for 3G-PLC is designed and implemented on a FPGA, together with an ARM processor in charge of higher-level algorithms.

The main contribution of this work is the proposal of an FPGA-based SoC architecture to deal with the real-time implementation of different MCM techniques, using broadband PLC as case study. The transmultiplexers for the different modulations considered can be inserted as advanced peripherals in the SoC architecture, achieving the required data rates and bandwidths. Two multi-carrier modulation schemes have been taken into account: one based on DCT4e and the other on FBMC. The SoC architecture has been evaluated, and, furthermore, a comparison between the two multi-carrier modulations has been carried out, in terms of resource consumption and fixed-point representation errors.

Compared to [23], which was focused on the single implementation of an FBMC approach, this work proposes and describes this novel SoC architecture, where peripherals for different modulation schemes can be integrated, which is a key aspect to achieve a flexible real-time system. Furthermore, the feasibility of the architecture has been verified by integrating, not only the FBMC approach in [23], but also a particular implementation for an approach based on DCT4e presented here. Finally, a final comparison between both, the FBMC and the DCT4e approaches, based on some preliminary experimental tests is shown. The rest of the manuscript is organized as follows: Section II describes the global SoC structure; Section III is dedicated to the architectures proposed for the implementation of the different medium-access techniques considered; Section IV shows some experimental results; and finally, conclusions are discussed in Section V.

## **II. GENERAL SYSTEM DESCRIPTION**

The global SoC architecture is based on an FPGA device [31], where it is possible to integrate different multi-carrier transmitters and receivers, as can be observed in Fig. 1. A Microblaze soft microprocessor [32] is in charge of the management and control of the global system. It is worth noting that other processors with higher performance, such as ARM in Zynq architecture by Xilinx Inc., can easily replace Microblaze in the proposal, although the latter can cope with the current computational load.



**FIGURE 1.** General block diagram of the proposed SoC architecture.

The architecture allows the flexible integration of any transmitter and/or receiver, corresponding to any mediumaccess technique (not only those described later), as an advanced peripheral (hardware accelerator). It is important to remark that these transmitters and/or receivers (peripherals) can be easily packaged and distributed as IP cores for its integration in any system based on AXI bus.

The data rate demanded by the communication standard is provided by a Direct Memory Access (DMA) controller, thus making it possible to transfer data from an external memory to the transmitter and from the receiver to an external memory [33]. The implemented MCM techniques provide a

solution with multiple input/output datapaths. For that reason, the DMA controller is used to generate suitable data rates for the transmitter and the receiver. The DMA is connected to an external DDR3 memory bank through a memory controller [34]. This releases the Microblaze processor from data-moving operations and allows it to perform only system management tasks. The DMA presents independent buses to communicate with the transmitter and the receiver.

With regard to the advanced peripherals, they should involve the communication interfaces with the corresponding DMA bus at the demanded rates. Furthermore, they include some control and status registers to be accessed and configured by Microblaze. All the buses involved in the architecture are based on several specifications of the AXI bus.

For clarity's sake, two medium-access techniques have been considered hereinafter as an example to show the feasibility and performance of the proposed SoC architecture. These techniques are based on the DCT4e and on an FBMC system. Nevertheless, it is worth noting that any other approach could be taken into account and integrated in the SoC architecture.

## **III. TYPE-IV EVEN DISCRETE CONSINE TRANSFORM MULTICARRIER MODULATION (DCT4e-MCM)**

Several authors have proposed the use of DCT instead of DFT because the former offers benefits such as (1) excellent spectral compaction and energy concentration, which leads to less inter-carrier interference leakage to adjacent subcarriers; (2) more robustness against CFT; and (3) the use of only real arithmetic [8], [10]–[12].



**FIGURE 2.** Block diagram of the proposed DCT4e-MCM. Transmitter (left) and receiver (right).

Fig. 2 shows the block diagram to implement multicarrier modulation with DCT4e. At the transmitter (left side), data

are processed by an *M*-point inverse DCT4e, with *M* being the number of subchannels or subcarriers. At the receiver (right side), a DCT4e of the same size is performed. Note that the orthogonal definition of DCT4e is the same as that of its inverse counterpart [35]. Furthermore, when the DCT4e is used for multicarrier data transmission, the inter-block interference can be eliminated by adding a Symmetric Extension (SE) to each symbol, including a left prefix and a right suffix. We refer the reader to [10] for more information about the system configuration of DCT4e-MCM.

Focusing on Fig. 2, the input data  $X_m[k]$  are modulated in a multicarrier scheme by means of the DCT4e [10]. Afterwards, an SE is added to the signal  $p_m[n]$ , thus obtaining *qs*[*n*]. This signal is serialized to be transmitted through the channel. At the receiver, the signal  $r[n]$  is deserialized to obtain  $q'_{s}[n]$ . The corresponding SE is removed and the resulting signal  $p'_m[n]$  is inserted in the DCT4e module, obtaining the output signal  $X'_m[k]$ .

On both sides, transmitter and receiver, the DCT4e is defined by

$$
p_m[n] = \sqrt{\frac{2}{M}} \cdot X_m[k] \cdot \cos\left[\frac{\pi}{M}\left(m + \frac{1}{2}\right)\left(n + \frac{M+1}{2}\right)\right],\tag{1}
$$

where  $p_m[n]$  is the output from the DCT4e, M is the number of points in the transform, and  $X_m[k]$  is the corresponding input. The block can be divided into four stages:

• s1: Generate the sequence  $Y_m[k]$  by multiplying the input signal  $X_m[k]$  by a constant  $e^{-j}m\pi/2^M$ :

$$
Y_m[k] = X_m[k] \cdot e^{-\frac{jmx}{2M}}
$$
 (2)

• s2: Obtain the signal *ym*[*n*] by applying a Fast Fourier Transform (FFT) on  $Y_m[k]$  according to

$$
y_m[n] = F\{Y_m[k]\}\tag{3}
$$

where the operator  $F\{\}$  performs an *M*-points FFT.

• s3: Rearrange the signal  $y_m[n]$ , obtaining the signal  $z_m[n]$ :

$$
z_{2i+1}[n] = y_i[n] z_{2i}[n] = conj \{y_{m-i}[n] \}
$$
 (4)

where  $i = 0, \ldots, M/2 - 1$ ;  $y_i[n]$  is the output from the FFT; and the operator *conj*{} is the conjugate of the sample.

s4: Generate the sequence  $p_m[n]$  from the signal  $z_m[n]$  by multiplying it by the constant  $e^{-j\pi(2m+1)/4M}$  according to

$$
p_m[n] = 2 \cdot \frac{1}{\sqrt{2M}} \cdot Re \left\{ z_m[n] \cdot e^{-\frac{j\pi (2m+1)}{4M}} \right\} \quad (5)
$$

where  $z_m[n]$  is the output from the rearrangement, *pm*[*n*] is the output from the DCT, and *Re*{} is the real part.



**FIGURE 3.** Block diagram of the architecture proposed for the DCT4e-MCM transmitter.

## A. DCT4e-MCM ARCHITECTURE

The DCT4e-MCM architecture is divided into two parts: the SE and the DCT4e. First, the SE replicates the beginning and end of the output array  $p_m[n]$  to improve the transmission. This extension is carried out according to

$$
q_s[n] = p_{\alpha-1-s}[n] \quad \text{if } 0 < s < \alpha
$$
\n
$$
q_s[n] = p_{s-\alpha}[n] \quad \text{if } \alpha \le s < M + \alpha
$$
\n
$$
q_s[n] = -p_{M-1-s+M+\alpha}[n] \quad \text{if } M + \alpha \le s < M + 2 \cdot \alpha \quad (6)
$$

where  $q_s[n]$  is the final signal to be serialized and transmitted,  $\alpha$  is the length of the extension, and  $s = 0, 1, \ldots, M + \alpha - 1$ and  $m = 0, 1, \ldots, M - 1$ .

As has already been mentioned before, the DCT4e has been divided into four stages (from s1 to s4), each one consuming their corresponding resources in function of the parallelism ratio. The parallelism ratio  $R_p$  is defined as the number of samples simultaneously processed by the architecture and is related to the number of parallel datapaths (lanes) existing in the proposal. In this way, a high  $R_p$  implies more resources but better throughput in the architecture, whereas a low *R<sup>p</sup>* means less resources, reutilized over time, with lower throughput. These four stages have been assembled together with the SE module in order to obtain the architecture for the DCT4e-MCM transmitter, which can be observed in Fig. 3. Note that the stage s4 has been placed before s3, so this last one can use the same memory block as the SE. This transmitter architecture requires 49 multipliers for its implementation. Furthermore, since all the architecture has been pipelined, it presents a global latency of 1258 clock cycles.

On the other hand, it is possible to propose a similar architecture for the DCT4e-MCM receiver, considering the previous blocks in a reversed order, as is shown in Fig. 4. Note that in this case the stage s3 and the SE cannot share the same memory block. Here the architecture requires 48 multipliers and, due to the pipelined structure, presents a 1768-clock cycles latency. It is worth noting that the receiver requires one less multiplier, since the multiplier involved in the SE is merged together with the ones implemented in stage s1.



**FIGURE 4.** Block diagram of the architecture proposed for the DCT4e-MCM receiver.

#### **IV. FILTER-BANK MULTI-CARRIER (FBMC) APPROACH**

The second MCM technique considered here, the FBMC approach, is based on a type of cosine-modulated filter bank. The proposed system and its implementation have been thoroughly studied in [16] and [23]. This technique also involves the same DCT4e module as in the DCT4e-MCM approach, together with a polyphase filtering bank that allows a better spectral separation between subcarriers than the OFDM solution. Another additional advantage of FBMC in general, and of this approach in particular, is that it does not require the inclusion of redundant samples, such as the SE.

The proposed FBMC transmitter can be observed in Fig. 5 (up). Following the scheme, each input signal *Vm*[*k*] is multiplied by a constant  $\theta_m$ . This constant ranges from −1 to +1, depending on the considered subcarrier *m*. This multiplication generates the signal *Xm*[*k*], which is sent to the DCT4e module to obtain  $p_m[n]$ . The signals  $p_m[n]$  go through a module that performs arithmetic computations, represented by matrices **I** and **J**. These matrices provide the signal  $q_s[n]$ to be inserted in the polyphase filter bank. The resulting signals *ts*[*n*] are added in pairs to obtain the corresponding *M* subcarriers. Finally, the signal is serialized and sent through the communication channel.

On the other hand, at the receiver, the involved processing is described in Fig. 5 (down). The received signal  $r[n]$  is deserialized and inserted in the filter bank. This bank is the same as the one in the FBMC transmitter. The obtained signals  $q'_{s}[n]$  are processed by the matrices **I** and **J** to obtain  $p'_m[n]$  and then inserted in the DCT4e module. Finally, this signal is multiplied by the constant  $\theta_m$  to obtain the output signal  $V'_m[k]$ . We refer the reader to [16] and [23] for more information about the system model and a detailed architecture for the approach.

#### **V. EXPERIMENTAL RESULTS**

The proposed SoC architecture has been implemented in a Xilinx Virtex6 xc6vlx240t FPGA [36]. The resource consumption of the global system is presented in Table 1. Note that it consists of the Microblaze microprocessor, the external memory control, the DMA module, the synchronization



**FIGURE 5.** Block diagram of the proposed FBMC transmitter (up) and receiver (down).

module, and the converter controllers. Table 1 shows the global resource consumption as well as the resource consumption required by each module. In addition to the resource

|                    | <b>Flip Flops</b> | LUTs        | <b>BRAMs</b>   | <i>DSP48E1</i> |
|--------------------|-------------------|-------------|----------------|----------------|
| Global system      | 12993             | 31659       | 107            | 93             |
| Microblaze         | 1445              | 3746        | 19             | 4              |
|                    | $(11.12\%)$       | $(11.83\%)$ | $(17.76\%)$    | $(4.30\%)$     |
| AXI4               | 3658              | 9230        | 9              |                |
|                    | (28.15%)          | (29.15%)    | $(8.41\%)$     | $(0.00\%)$     |
| DDR3 Control       | 3123              | 6506        |                |                |
|                    | $(24.04\%)$       | (20.55%)    | $(0.00\%)$     | $(0.00\%)$     |
| DMA module         | 1170              | 2608        |                |                |
|                    | $(9.00\%)$        | $(8.24\%)$  | (1.87%)        | $(0.00\%)$     |
| Synchronization    | 57                | 131         | 3              |                |
| module             | $(0.44\%)$        | $(0.41\%)$  | $(2.80\%)$     | $(0.00\%)$     |
| <b>DAC</b> Control | 594               | 1178        |                |                |
|                    | $(4.57\%)$        | $(3.72\%)$  | $(2.80\%)$     | $(0.00\%)$     |
| <b>ADC</b> Control | 144               | 358         | 0              | 0              |
|                    | $(1.11\%)$        | (1.13%)     | $(0.00\%)$     | $(0.00\%)$     |
| Interface Block Tx | 90                | 181         | $\mathfrak{D}$ |                |
|                    | $(0.69\%)$        | $(0.57\%)$  | (1.87%)        | $(0.00\%)$     |
| Transmitter DCT4e- | 1245              | 3567        | 33             | 45             |
| <b>MCM</b>         | $(9.58\%)$        | (11.27%)    | $(30.84\%)$    | $(48.39\%)$    |
| Transmitter FBMC   | 1397              | 3724        | 45             | 49             |
|                    | $(9.58\%)$        | (11.27%)    | $(31.91\%)$    | $(47.11\%)$    |
| Interface Block Rx | 172               | 393         | $\Omega$       | $\Omega$       |
|                    | $(1.32\%)$        | $(1.24\%)$  | $(0.00\%)$     | $(0.00\%)$     |
| Receiver           | 1060              | 3197        | 36             | 44             |
| DCT4e-MCM          | $(8.16\%)$        | $(10.10\%)$ | $(33.64\%)$    | $(47.31\%)$    |
| Receiver FBMC      | 1096              | 3232        | 58             | 51             |
|                    | $(8.16\%)$        | $(10.10\%)$ | (41.13%)       | $(49.03\%)$    |

**TABLE 1.** Resource consumption for DCT4e-MCM and FBMC in the global SoC architecture in a Xilinx xc6vlx240t FPGA.

consumption, a percentage with respect to the total architecture consumption is presented. Note that DCT4e-MCM and FBMC transmitters/receivers can be easily packaged and distributed as IP cores for their integration into any system based on AXI bus.

In order to obtain an experimental setup suitable for the validation of the proposed SoC architecture, the final design requires a synchronization module, as well as the corresponding analog–digital converters. Fig. 6 shows the final assembly, where a DAC (digital–analog converter) controller and an ADC (analog–digital converter) controller have been added in the transmitter and receiver datapaths, respectively, compared to that shown in Fig. 1. Furthermore, since a synchronization solution is out of the scope of this work, an ideal synchronization module has also been considered. Note that a real synchronization solution will affect the final performance in a certain degree, depending on the considered algorithm.

Fig. 7 shows the global experimental architecture for tests. Both converters, the ADC AD9467 [37] and the DAC FMC204 [38], have a 16-bit data width. This is a limitation that influences the fixed-point representation used for the implementation of the system. The converters have maximum sampling frequencies of 1 GHz for the DAC and 250 MHz for the ADC. In these tests, since we use the broadband PLC of IEEE 1901 as a study case [4], a transmission rate of 62.5 Msps has been considered, so these converters are able to transmit the corresponding signals. Furthermore, a suitable Analog Front-End (AFE) is necessary [3]. Nevertheless, the design of this AFE has not been included in this work and the following experimental results have been obtained in a PLC channel that is not connected to the grid.



**FIGURE 6.** Global SoC architecture developed for experimental tests.



#### **FIGURE 7.** Global experimental setup.

The Microblaze processor is in charge of generating the data to be transmitted. For that purpose, Pulse Amplitude Modulation (PAM) with 2, 4, 8, 16, and 32 levels has been implemented according to the PLC standard [4]. Thanks to the use of the PAM modulation, the Symbol Error Rate (SER) values for the ideal channel can be estimated. Also, the Signal-to-Noise Ratio (SNR), the Mean Square Error (MSE), the Peak Signal-to-Noise Ratio (PSNR), and the Maximum Error (ME) are calculated.

To obtain all these parameters, three types of tests have been carried out. Firstly, a floating-point simulation of the techniques is performed, which will be considered as a reference for further comparisons. Then, a fixed-point simulation describing the implemented architecture is used to verify the quality of the proposed fixed-point solution. Finally, the experimental tests are also included in the comparison to analyse the correspondence between simulations and real tests.

For the first real test case, an ideal channel has been considered, so the transmitter and the receiver are connected internally inside the FPGA (digital loopback). This ideal channel is used to verify the quality of the fixed-point representation defined for the global architecture. Fig. 8 shows the SNR for the DCT4e-MCM approach (left) as

**TABLE 2.** Performances obtained with the different proposals using an ideal channel.

|                |             | <b>SNR</b> | MSE                  | PSNR  | МE                   |
|----------------|-------------|------------|----------------------|-------|----------------------|
| Experimental   | FBMC        | 45.35      | $2.42 \cdot 10^{-5}$ | 50.04 | 0.0073               |
| test           | DCT4e-MCM   | 45.44      | $1.85 \cdot 10^{-5}$ | 50.13 | 0.0072               |
| Fixed-point    | <b>FBMC</b> | 43.64      | $2.88 \cdot 10^{-5}$ | 48.33 | 0.0115               |
| simulation     | DCT4e-MCM   | 43.83      | $2.65 \cdot 10^{-5}$ | 48.54 | 0.0089               |
| Floating-point | <b>FBMC</b> | 57.34      | $1.57 \cdot 10^{-6}$ | 62.03 | 0.0080               |
| simulation     | DCT4e-MCM   | 65.84      | $3.13 \cdot 10^{-7}$ | 70.53 | $7.99 \cdot 10^{-4}$ |

well as for the FBMC (right). To obtain these parameters, both simulations use the same input signal with a length of 40960 samples. This input signal is randomly generated in a range of [–1, 1]. The value provided for every subchannel *m* is the average of 80 packets, each with 512 samples. The final figures of merit are obtained as an average from all the subchannels *m* and are shown in Table 2.

As can be observed in Fig. 8, the lowest averaged SNR is obtained for the fixed-point model and the experimental tests due to the effect of the finite-precision representation. Additionally, the difference between the fixed-point model and the experimental tests comes from the fact that the FFT model used is a generic version and does not exactly characterize the internal operation of the implemented FFT module. Finally, the PAM modulation allows the SER values to be estimated for the ideal channel. In this case, all the results are null, since there is no symbol error within the ideal channel, and all the transmitted symbols have been correctly recovered.

In a second example scenario, an SMA cable has been considered for experimental tests, so the transmitter and receiver are connected outside the FPGA using the analog converters. This SMA cable can be considered as an almost ideal channel, so this test case is used to verify the quality of the global architecture, including the effect from the converters. Fig. 9 shows the averaged SNR per subchannel *m* for the DCT4e-MCM and FBMC approaches, respectively. Again, the input signal for simulations is the same as previously. Furthermore, the final global and averaged performance figures are listed in Table 3.



**FIGURE 8.** Averaged SNR per subchannel m for the DCT4e-MCM (left) and FBMC (right) approaches in the floating-point model, in the fixed-point one, and in the experimental tests with an ideal channel.



**FIGURE 9.** Averaged SNR per subchannel m for the DCT4e-MCM (left) and FBMC (right) approaches in the floating-point model, the fixed-point one, and the experimental tests with an SMA cable.





In this case, a reduction of the performance can be observed. After analysing it, it can be concluded that it is due to the non-flat frequency response of the coupling transformer included in the DAC output path (transmitter output). In addition, this transformer shows a 3MHz lower cutoff frequency, so the subcarriers in this range have been discarded from the comparison for clarity's sake. With regard to Fig. 9, the differences between the floating-point model, the fixedpoint one, and the experimental tests have been reduced, mainly due to the fact that the real behaviour of the analog components involved is more significant than the finiteprecision error. In the same way, there are some differences between the fixed-point model and the experimental tests, **TABLE 4.** SER values obtained when using an SMA cable.



since the simulation uses an estimation of the transmission channel. In this case, different PAM modulations have also been applied to obtain the SER values. These results are shown in Table 4, where it is possible to observe that the rate is still null up to four modulation levels.

#### **VI. CONCLUSIONS**

An FPGA-based implementation of an SoC architecture for multi-carrier modulations in broadband power line communications has been presented. This architecture has been proposed for real-time performance, where it is possible to flexibly integrate different multi-carrier transmitters and receivers, corresponding to any medium-access technique, as an advanced peripheral (hardware accelerator). The design of any peripheral can be optimized for its integration as a

trade-off among real-time requirements involving the latency and throughput of the system, the effect of the fixed-point representation, and the resource consumption, by maximizing resource reutilization. A soft Microblaze microprocessor is in charge of the management and control of the whole system. Two multi-carrier medium access techniques have been analysed here and integrated in the global SoC architecture: a DCT4e-MCM approach and an FBMC one. Simulations and experimental results from an implementation based on a Xilinx XC6VLX240T FPGA have been particularized for the parameters recommended by the IEEE Standard 1901 for broadband PLC. They have successfully validated the proposal, achieving a transmission rate of 62.5Msps at the converters. The DCT4e-MCM and FBMC architectures provide mean squared errors below 3·10−<sup>5</sup> at the output for an ideal channel, whereas each transmitter/receiver requires around 50% of the DSP cells available in the FPGA, the most demanded resource in the device.

#### **REFERENCES**

- [1] P. Henry and H. Luo, ''WiFi: What's next?'' *IEEE Commun. Mag.*, vol. 40, no. 12, pp. 66–72, Dec. 2002.
- [2] L. Nuaymi, *WiMAX: Technology for Broadband Wireless Access*. Hoboken, NJ, USA: Wiley, 2007.
- [3] K. Dostert, *Powerline Communications*. Englewood Cliffs, NJ, USA: Prentice-Hall, 2001.
- [4] *IEEE Standard for Broadband over Power Line Networks: Medium Access Control and Physical Layer Specifications*, IEEE Standard 1901-2010, 2010, pp. 1–1586.
- [5] L. Litwin and M. Pugel, ''The principles of OFDM,'' *RF Signal Process.*, vol. 2, pp. 30–48, Jan. 2001.
- [6] J. Lorandel, J.-C. Prévotet, and M. Hélard, "Fast power and performance evaluation of FPGA-based wireless communication systems,'' *IEEE Access*, vol. 4, pp. 2005–2018, 2016.
- [7] S. Saponara, M. Rovini, L. Fanucci, A. Karachalios, G. Lentaris, and D. Reisis, ''Design and comparison of FFT VLSI architectures for SoC telecom applications with different flexibility, speed and complexity tradeoffs,'' *Circuits, Syst., Signal Process.*, vol. 31, no. 2, pp. 627–649, 2012.
- [8] N. Al-Dhahir, H. Minn, and S. Satish, ''Optimum DCT-based multicarrier transceivers for frequency-selective channels,'' *IEEE Trans. Commun.*, vol. 54, no. 5, pp. 911–921, May 2006.
- [9] T. H. Pham, I. V. McLoughlin, and S. A. Fahmy, ''Robust and efficient OFDM synchronization for FPGA-based radios,'' *Circuits, Syst., Signal Process.*, vol. 33, no. 8, pp. 2475–2493, Aug. 2014.
- [10] F. Cruz-Roldán, M. E. Domínguez-Jiménez, G. Sansigre-Vidal, J. Piñeiro-Ave, and M. Blanco-Velasco, ''Single-carrier and multicarrier transceivers based on discrete cosine transform type-IV,'' *IEEE Trans. Wireless Commun.*, vol. 12, no. 12, pp. 6454–6463, Dec. 2013.
- [11] F. Cruz-Roldán, M. Dominguez-Jimenez, G. Sansigre, M. Blanco-Velasco, P. Amo-López, and Á. Bravo-Santos, ''On the use of discrete cosine transforms for multicarrier communications,'' *IEEE Trans. Signal Process.*, vol. 60, no. 11, pp. 6085–6091, Nov. 2012.
- [12] P. Tan and N. C. Beaulie, "A comparison of DCT-based OFDM and DFT-based OFDM in frequency offset and fading channels,'' *IEEE Trans. Commun.*, vol. 54, no. 11, pp. 2113–2125, Nov. 2006.
- [13] F. Cruz-Roldán and M. Blanco-Velasco, "Joint use of DFT filter banks and modulated transmultiplexers for multicarrier communications,'' *Signal Process.*, vol. 91, no. 7, pp. 1622–1635, 2011.
- [14] B. Farhang-Boroujeny, "Filter bank multicarrier modulation: A waveform candidate for 5G and beyond,'' *Adv. Elect. Eng.*, vol. 2014, Dec. 2014, Art. no. 482805.
- [15] B. Farhang-Boroujeny, ''OFDM versus filter bank multicarrier,'' *IEEE Signal Process. Mag.*, vol. 28, no. 3, pp. 92–112, May 2011.
- [16] F. Cruz-Roldán, F. A. Pinto-Benel, J. D. O. del Campo, and M. Blanco-Velasco, ''A wavelet OFDM receiver for baseband power line communications,'' *J. Franklin Inst.*, vol. 353, no. 7, pp. 1654–1671, May 2016.
- [17] A. Li, P. Hailes, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, ''1.5 Gbit/s FPGA implementation of a fully-parallel turbo decoder designed for mission-critical machine-type communication applications,'' *IEEE Access*, vol. 4, pp. 5452–5473, 2016.
- [18] M. F. Brejza, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, "A highthroughput FPGA architecture for joint source and channel decoding,'' *IEEE Access*, vol. 5, pp. 2921–2944, 2017.
- [19] X. Cai, M. Zhou, and X. Huang, "Model-based design for software defined radio on an FPGA,'' *IEEE Access*, vol. 5, pp. 8276–8283, 2017.
- [20] B. D. Tensubam, N. L. Chanu, and S. Singh, ''Comparative analysis of FBMC and OFDM multicarrier techniques for wireless communication networks,'' *Int. J. Comput. Appl.*, vol. 100, no. 19, pp. 27–31, 2014.
- [21] M. Mefenza and C. Bobda, "FPGA implementation of subcarrier index modulation OFDM transceiver,'' in *Proc. Parallel Distrib. Process. Symp. Workshops PhD Forum (IPDPSW)*, May 2013, pp. 268–272.
- [22] M. J. Canet, F. Vicedo, V. Almenar, and J. Valls, ''FPGA implementation of an IF transceiver for OFDM-based WLAN,'' in *Proc. IEEE Workshop Signal Process. Syst. (SIPS)*, Oct. 2004, pp. 227–232.
- [23] P. Poudereux, A. Hernández, R. Mateos, F. A. Pinto-Benel, and F. Cruz-Roldán, ''Design of a filter bank multi-carrier system for broadband power line communications,'' *Signal Process.*, vol. 128, pp. 57–67, Nov. 2016.
- [24] J. Nadal, C. A. Nour, A. Baghdadi, and H. Lin, "Hardware prototyping of FBMC/OQAM baseband for 5G mobile communication,'' in *Proc. 25th IEEE Int. Symp. Rapid Syst. Prototyp.*, vol. 1617. Oct. 2014, p. 7277.
- [25] J. Nadal, C. A. Nour, and A. Baghdadi, "Low-complexity pipelined architecture for FBMC/OQAM transmitter,'' *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 63, no. 1, pp. 19–23, Jan. 2016.
- [26] X. Yang *et al.*, ''RaPro: A novel 5G rapid prototyping system architecture,'' *IEEE Wireless Commun. Lett.*, vol. 6, no. 3, pp. 362–365, Jun. 2017.
- [27] S. Knapp and D. Tavana, "Field configurable system-on-chip device architecture,'' in *Proc. 22nd IEEE Custom Integr. Circuits Conf. (CICC)*, May 2000, pp. 155–158.
- [28] E. M. Meza and J. B. Aspiazu, ''Design of a power line communications transceiver based on OFDM,'' *Procedia Technol.*, vol. 17, pp. 107–113, Jan. 2014.
- [29] S. Tsakiris, A. Salis, and N. Uzunoglu, ''FPGA implementation of a frame synchronization algorithm for powerline communications,'' *Radioengineering*, vol. 18, no. 3, pp. 325–329, 2009.
- [30] P. Hu, S. Peng, C. Wang, H. Fan, R. Lin, and Y. Liu, ''Application research of the power line carrier communication multiband based on G3 technology,'' in *Proc. 3rd Int. Conf. Syst. Informat. (ICSAI)*, 2016, pp. 814–818.
- [31] S. Brown, R. Francis, J. Rose, and Z. Vranesic, *Field Programmable Gate Arrays* (Kluwer International Series in Engineering and Computer Science). Norwell, MA, USA: Kluwer, 1992, pp. 12–17.
- [32] *MicroBlaze Processor Reference Guide*, Xilinx, San Jose, California, USA, 2008.
- [33] *LogiCORE IP AXI DMA*, Xilinx, San Jose, California, USA, 2011.
- [34] *7 Series FPGAs Memory Interface Solutions User Guide*, Xilinx, San Jose, CA, USA, 2011.
- [35] S. A. Martucci, "Symmetric convolution and the discrete sine and cosine transforms,'' *IEEE Trans. Signal Process.*, vol. 42, no. 5, pp. 1038–1051, May 1994.
- [36] *Virtex-6 Family Overview*, Xilinx, San Jose, California, USA, 2012.
- [37] *A 16-Bit, 200 MSPS/250 MSPS Analog-to-Digital Converter*, Analog Devices, Norwood, MA, USA, 2013.
- [38] *FMC204 User Manual*, 4DSP, Austin, TX, USA, 2014.



PABLO POUDEREUX received the B.S. degree in electronics systems, the M.Sc. degree in advanced electronics systems, and the Ph.D. degree from the Universidad de Alcalá, Spain, in 2010, 2012, and 2017, respectively. His main research interests include embedded systems, signal processing, and PLC and IR systems.



ÁLVARO HERNÁNDEZ (M'06–SM'15) received the Ph.D. degree from the Universidad de Alcalá (UAH), Spain, and Blaise Pascal University, France, in 2003. He is currently a Professor of digital systems and electronic design with the Electronics Department, UAH. His research interests include multisensor integration, electronic systems for mobile robots, and digital and embedded systems.



FERNANDO CRUZ-ROLDÁN (M'98–SM'06) received the degree in technical telecommunication engineering from the Universidad de Alcalá (UAH), Spain, in 1990, the degree in telecommunication engineering from the Universidad Politécnica de Madrid (UPM), Spain, in 1996, and the Ph.D. degree in electrical engineering from UAH in 2000. He received the UAH Prize for the most outstanding doctoral dissertation in the engineering discipline.

He joined the Department of Ingeniería de Circuitos y Sistemas, UPM, in 1990, where from 1993 to 2003, he was an Assistant Professor. From 1998 to 2003, he was a Visiting Lecturer with UAH. In 2003, he joined UAH, as an Associate Professor, and since 2009, he has been a Professor with the Department of Teoría de la Señal y Comunicaciones, UAH.

His teaching and research interests include digital signal processing, filter design, and multirate systems applied to subband coding and digital communications.



RAÚL MATEOS received the M.Sc. degree in telecommunication engineering from the Technical University of Madrid in 1999, and the Ph.D. degree from the Universidad de Alcalá (UAH), Spain, in 2006. He is currently an Associate Professor with the Electronics Department, UAH. His research interests include high-performance SoC architectures, and EDA tools for system-level performance analysis.

 $\sim$   $\sim$   $\sim$