Introduction
In wireless communications, increasing transmission speed in a limited frequency band has become progressively more important with the spread of broadband access services utilized by various handheld devices such as smart-phones, tablets, and laptops [1], [2]. Higher transmission rates have been achieved by various technologies, and studies to find methods to improve transmission rates are ongoing. For decades, multiple-input multiple-output (MIMO) communication has been considered a key technology for achieving high data rates [3], [4].
Recently, multiuser (MU)-MIMO antenna technology has been studied and applied to many consumer products. For such products, the representative standard is IEEE 802.11ac, which is a widely used wireless local area network standard in MU-MIMO downlink (DL) systems [5]. Standardization trends, such as IEEE 802.11n, have attracted considerable attention, and IEEE 802.11ac aims at a transmission speed of 1 Gbps or more [6]. For IEEE 802.11ac, DL MU-MIMO transmission technology in the 5 GHz band has been proposed [7]. The maximum transfer rate of the physical layer is approximately 7 Gbps with this technology.
Unlike conventional MIMO systems, in MU-MIMO systems, multiple users can communicate simultaneously using a single access point (AP). In other words, in a DL environment, a signal sent from the AP to a given user also reaches other users [8]. These signals are referred to as interfering signals, and users receive both the desired signal and the interfering signals simultaneously [9]. Consequently, performance is reduced by these interfering signals. Therefore, most DL MU-MIMO systems attempt to solve the interference problem by employing precoding technology in the AP [10], [11], [12], [13]. The pre-coding technique is a technique that the AP multiplies a precoding matrix on the channel in advance such that all users are not affected by the interference signals [10], [14], [15]. For example, in the representative zero-forcing beamforming technique, theoretically, an interfering signal can be zero, each user and the AP comprise of a MIMO system, and the entire MU-MIMO system is divided into MIMO [14]. However, in real-world environments, errors, referred to as practical errors, inevitably occur due to errors in channel estimation and incomplete calculation [16]. This practical errors result in an incomplete precoding matrix; thus, some users will still receive interfering signals [17]. Compared to the desired signal, the interference signal is relatively small; however, these signals are enough to affect system performance [18]. Therefore, an interference-aware receiver is required.
Various approaches to mitigate inter-user interference for DL MU-MIMO systems have been proposed. These methods can be categorized into two representative approaches: interference whitening (IW) and interference detection (ID) [19]. IW simply treats interference as Gaussian noise [20]; thus, it is less computationally complex than ID; however, it is unsatisfactory for strong interference. The IW approach has been studied by many interference rejection combiners and their extensions [22], [23], [24], [25], [26]. However, they still can not achieve high performance with strong interference. In contrast, ID detects both desired signals and interference signals to mitigate interference [21]. ID can achieve high performance (regarded as optimal), however, it suffers from high computational complexity. There were several ways to reduce complexity using existing MIMO decoder algorithms [27], [28], [29], [30], [31]. This approach can reduce complexity compared to identity, but is much more complex than IW. Thus ID is not considered for practical implementation. As mentioned previously, the performance of IW is not satisfactory under strong interference conditions, which is an issue with current and future of DL MU-MIMO systems.
In this paper, we propose a symbol detector architecture with an interference mitigation algorithm suitable for DL MU-MIMO systems to realize a suitable hardware implementation. To date, many hardware implementation studies related to the MIMO detection algorithms have been published [32], [33], [34], [35], [36], [37]. However, no studies have attempted to develop implementations for DL MU-MIMO systems despite their widespread usage. Therefore, we focus on improving existing MIMO detectors for DL MU-MIMO systems by combining them with an interference mitigation algorithm. In this process, the target is to minimize cost relative to compatibility with existing MIMO detectors, which we expect to simplify the adaptation of the MIMO detectors to DL MU-MIMO systems. The primary contributions of this paper are summarized as follows.
An improved interference mitigation algorithm is proposed. The proposed algorithm can achieve better performance than existing interference mitigation algorithms in terms of bit error rate (BER) and computational complexity. Furthermore, the proposed algorithm is suitable for a hardware implementation, particularly with tree-search based MIMO detectors.
An efficient method to combine the proposed interference mitigation algorithm and a fixed-complexity sphere decoder (FSD) algorithm is presented. The proposed interference mitigation algorithm is partially calculated according to the successive architecture of an FSD algorithm [38], [39].
For high throughput, a fully-pipelined hardware architecture design is presented. The implemented design can support 3.6 Gbps at 150 MHz for
with 64-QAM modulated signals in DL MU-MIMO systems.$8 \times [{4, 4}]$
The remainder of this paper is organized as follows. Section II reviews MIMO and MU-MIMO system models and briefly discusses MIMO detection and interference mitigation algorithms. Section III describes the proposed interference mitigation algorithm and its implementation. Performance and computational complexity comparisons with other algorithms are presented in Section IV. Section V describes the hardware design and implementation of the FSD with the proposed algorithm. Implementation results are compared to previously implemented MIMO detectors in Section VI, and conclusions are given in Section VII.
System Model
This section introduces a basic review of MIMO and DL MU-MIMO systems. First, a mathematical description and MIMO detection algorithms for MIMO systems are presented. Then, the mathematical description and interference mitigation algorithms for DL MU-MIMO systems are discussed.
A. MIMO System
1) Mathematical Description
We consider a MIMO system with \begin{equation*} {\mathbf{y}} = {\mathbf{Hx}}+{\mathbf{n}},\tag{1}\end{equation*}
2) MIMO Detection
A maximum likelihood (ML) detector can offer the best performance; however, it is unsuitable for hardware implementation due to its exponential complexity. This problem has led to the development of tree-search based detectors. The FSD algorithm provides near-ML performance in a predefined tree architecture with fixed complexity [38]. FSD can be considered a hybrid model that combines a full extension (FE) stage to perform ML detection and a single extension (SE) stage to perform linear detection as shown in Fig. 1 (a). The FSD algorithm can be implemented by regular successive stages and, similar to other tree-search algorithms, can be pipelined architecture. Thus, an FSD can be employed to implement a high throughput MIMO detector. Note that categorizing FE stage and SE stages and ordering the streams in the SE stage are very important. The most popular stream ordering method is performed by estimating stream magnitude based on a channel matrix. In this paper, the tree-search unit is only considered for a hardware implementation.
Tree structure for four streams with a 16-QAM modulation scheme. (a) hard-output FSD and (b) list-extended soft-output FSD.
The FSD algorithm begins with QR decomposition such that channel matrix \begin{equation*} {\mathbf{z}}={\mathbf{Rx}}+{\mathbf{v}},\tag{2}\end{equation*}
\begin{equation*} {\mathbf{L}}=\{\tilde {\mathbf{x}}_{1},\cdots,\tilde {\mathbf{x}}_{|\Omega |^{p}}\}\tag{3}\end{equation*}
\begin{align*} {\tilde {x}_{k,i}} = \left \{{ {\begin{array}{llllllllllllllllllll} {{\mathrm {one\;\;of\;\;symbols}}\in \Omega }, &\quad i \in {{\text {FE}}} \\ {Q\left\{{\left({z_{i}-\sum _{j=i+1}^{M} R_{i,j}\tilde {x}_{k,j}}\right)/R_{i,i}}\right\} }, &\quad i \in {{\text {SE}}} \\ \end{array}}}\right \}\tag{4}\end{align*}
\begin{equation*} {\hat{ {\boldsymbol {x}}}_{{\text {FSD}}}} = \arg \min \limits _{\hat{ {\boldsymbol {x}}} \in {\mathbf{L}}} ||{\mathbf{z}}-{ {{\boldsymbol{R}}\hat {x}}}||^{2}.\tag{5}\end{equation*}
To extend the hard-output FSD to soft-output FSD (SFSD), a list-extended SD (LSD) algorithm has been proposed, as shown in Fig. 1 (b), such that additional tree searching is performed for
Although the FSD can achieve ML performance and is suitable for a fully-pipelined hardware implementation with high throughput, it can have significant computational complexity depending on the total number of levels, particularly sensitive to the number of FE levels.
B. DL Mu-MIMO System
1) Mathematical Description
Here, a DL MU-MIMO system with a single AP with \begin{equation*} {{\mathbf{y}}_{i}} = \underbrace { {{\mathbf{H}}_{i}}{{\widehat {\mathbf{V}}}_{i}}{{\mathbf{x}}_{i}}}_{{\mathrm {Desired\;signal}}} + \underbrace {\sum \limits _{j = 1,j \ne i}^{K} { {{\mathbf{H}}_{i}}{{\widehat {\mathbf{V}}}_{j}}{{\mathbf{x}}_{j}}} }_{{\mathrm {Interference\;signal}}} + {{\mathbf{n}}_{i}},\tag{6}\end{equation*}
In this paper, a block diagonal precoding matrix design algorithm is assumed [14]. This scheme employs a precoding matrix that satisfies \begin{equation*} {\widehat {\mathbf{V}}_{i}} = {{\mathbf{V}}_{i}} + {{\mathbf{E}}_{i}},\tag{7}\end{equation*}
\begin{equation*} {{\mathbf{y}}_{i}} = \underbrace {{{\mathbf{H}}_{D}}{{\mathbf{x}}_{D}}}_{{\mathrm {Desired\;signal}}} + \underbrace {{{\mathbf{H}}_{I}}{{\mathbf{x}}_{I}}}_{{\mathrm {Interference\;signal}}} + {{\mathbf{n}}_{i}},\tag{8}\end{equation*}
2) Interference Mitigation
As mentioned previously, interference mitigation algorithms can be categorized into two representative approaches [19], i.e., IW and ID. The straightforward approach to inter-user interference is to treat interference as Gaussian noise [20]. To whiten the effective noise, which is assumed to be colored Gaussian noise, the whitening filter is multiplied as follows \begin{equation*} {{\mathbf{Wy}}_{i}} = {{\mathbf{WH}}_{D}}{{\mathbf{x}}_{D}} + {{\mathbf{WH}}_{I}}{{\mathbf{x}}_{I}} + {{\mathbf{Wn}}_{i}},\tag{9}\end{equation*}
\begin{equation*} ({\mathbf{W}}^{-1})^{\dagger }{\mathbf{W}}^{-1} = {{\mathbf{H}}_{I}}{\mathbf{H}}_{I}^{^{\dagger} } + {{\mathbf{I}}_{N_{R}}}.\tag{10}\end{equation*}
\begin{equation*} \{ {{{\widehat {\mathbf{x}}}_{D}},{{\widehat {\mathbf{x}}}_{I}}} \} = \arg \min {\left \|{ {{{{\mathbf{y}}}_{i}} - {{{\mathbf{H}}}_{D}}{{\widehat {\mathbf{x}}}_{D}} - {{{\mathbf{H}}}_{I}}{{\widehat {\mathbf{x}}}_{I}}} }\right \|^{2}}.\tag{11}\end{equation*}
Proposed Algorithm
This section introduces the proposed interference mitigation algorithm. First, the proposed algorithm is derived in consideration of trade off between computational complexity and performance. Then, a method to implement the proposed interference mitigation algorithm with the FSD algorithm is presented.
A. Algorithm Description
The goal of the proposed algorithm is to achieve an efficient interference mitigating technique to ensure high performance under the interference conditions without significantly increasing complexity compared to existing MIMO detectors. The proposed algorithm includes both IW and ID methods because the IW method alone does not achieve sufficiently high performance under strong interference conditions. The IW method involves a filtering process to whiten interference signals. Note that interference signals are detected directly as desired signals with the ID method.
Therefore, the key idea is to minimize the influence of an interference signal when detecting a desired signal using IW as a preprocessing step and to detect the interference signal for only the candidate desired symbol vectors detected by the FSD. The IW filter makes the effective channel matrix orthogonal, i.e.,
The overall flow of the proposed algorithm is divided into three stages.
IW. Prior to detecting the desired signal with the FSD, an IW filter is applied to minimize the effect of the interference signal. Here, the interference signal is considered the same as whited noise, and the detection of the request signal does not have a colored effect on each stream.
FSD. A tree-search detector used in existing MIMO systems is utilized to detect a desired signal. Note that we assume an FSD algorithm for the hard-output detector and an SFSD algorithm for soft-output detector.
IC. The interference signal vector is detected based on the desired signal candidate vector obtained by the FSD, and the final output is generated. If a hard-output detector is assumed, the interference signal is calculated according to each candidate signal vector, and the final ED is recalculated to select the desired signal candidate vector that minimizes this value. With the soft-output detector, the ED, including the interference signal according to each desired signal candidate vector, is recalculated to generate the LLR value per bit.
B. Implementation Method
The IW filter minimizes the influence of the interference signal and the FSD detects the desired signal without considering the interference signal. The IC process removes influence of the interference signal based on the candidate vectors. Here, the \begin{equation*} {\mathbf{Q}}^{\text {H}}{{\mathbf{Wy}}_{i}} = {{\mathbf{R}}}{{\mathbf{x}}_{D}} + {\mathbf{Q}}^{\text {H}}{{\mathbf{WH}}_{I}}{{\mathbf{x}}_{I}} + {\mathbf{Q}}^{\text {H}}{{\mathbf{Wn}}_{i}},\tag{12}\end{equation*}
\begin{equation*} {{\mathbf{Q}}^{\text {H}}{\mathbf{WH}}_{I}}\hat{ {\boldsymbol {x}}}_{I,i,ZF} = {\mathbf{Q}}^{\text {H}}{{\mathbf{Wy}}_{i}} - {{\mathbf{R}}}\hat{ {\boldsymbol {x}}}_{D},_{i}\tag{13}\end{equation*}
\begin{align*} \hat{ {\boldsymbol {x}}}_{I,i,ZF}=&(({\mathbf{Q}}^{\text {H}}{{\mathbf{WH}}_{I}})^{\text {H}}({\mathbf{Q}}^{\text {H}}{{\mathbf{WH}}_{I}}))^{-1} \\&\cdot ({\mathbf{Q}}^{\text {H}}{{\mathbf{WH}}_{I}})^{\text {H}}({\mathbf{Q}}^{\text {H}}{{\mathbf{Wy}}_{i}} - {{\mathbf{R}}}\hat{ {\boldsymbol {x}}}_{D},_{i}) \\=&({{\mathbf{H}}_{I}}^{\text {H}}{{\mathbf{W}}}^{\text {H}}{{\mathbf{Q}}}{\mathbf{Q}}^{\text {H}}{{\mathbf{WH}}_{I}})^{-1}({{\mathbf{H}}_{I}}^{\text {H}}{{\mathbf{W}}}^{\text {H}}{{\mathbf{Q}}})\cdot \\&({\mathbf{Q}}^{\text {H}}{{\mathbf{Wy}}_{i}} - {{\mathbf{R}}}\hat{ {\boldsymbol {x}}}_{D},_{i}) \\=&({{\mathbf{H}}_{I}}^{\text {H}}{{\mathbf{W}}}^{\text {H}}{{\mathbf{WH}}_{I}})^{-1}({{\mathbf{H}}_{I}}^{\text {H}}{{\mathbf{W}}}^{\text {H}}{{\mathbf{Q}}})\cdot \\&({{\mathbf{Q}}}^{\text {H}}{{\mathbf{Wy}}_{i}} - {{\mathbf{R}}}\hat{ {\boldsymbol {x}}}_{D},_{i}) \;\;\;\;\;(\because \; {{\mathbf{Q}}}{\mathbf{Q}}^{\text {H}}={\mathbf{I}}) \\\simeq&({{\mathbf{H}}_{I}}^{\text {H}}{{\mathbf{W}}}^{\text {H}}{{\mathbf{Q}}})({{\mathbf{QWy}}_{i}} - {{\mathbf{R}}}\hat{ {\boldsymbol {x}}}_{D},_{i}) \\&(\because \; {{\mathbf{H}}_{I}}^{\text {H}}{{\mathbf{W}}}^{\text {H}}{{\mathbf{WH}}_{I}}\simeq {\mathbf{I}})\tag{14}\end{align*}
\begin{equation*} \hat{ {\boldsymbol {x}}}_{I,i}\triangleq \mathrm {slice}(\hat{ {\boldsymbol {x}}}_{I,i,ZF})\tag{15}\end{equation*}
Algorithm Performance Analysis
In this section, BER performance of multiple interference mitigation algorithms is presented for both hard-output and soft-output detection cases under two scenarios with different antenna configurations. In addition, a comparison of computational complexity approximated by the multiplier in the tree-search process is also presented for hard-output and soft-output detection cases.
A. BER Comparison
Computer simulations were conducted for hard-output and soft-output scenarios to evaluate multiple interference mitigation algorithms including the proposed algorithm. The simulations considered two scenarios in a DL MU-MIMO system comprising two users with three or four antennas each and an AP with six or eight antennas corresponding to the users, such that
For the scenarios 1 and 2 with different antenna configurations, the uncoded and coded BER was evaluated and is presented from Fig. 3 to Fig. 8, respectively. Figure 3 and Fig. 6 show the BER versus Eb/N0 with the SIR fixed at 15 dB in order to determine the effect of traditional noise on performance due to channel estimation and synchronization errors. On the other hand, Fig. 4 and Fig. 7 show the BER versus SIR with the Eb/N0 fixed at 40 dB to show how interference affects performance. Finally, in Fig. 5 and Fig. 8, the coded BER performance using the channel codec is presented.
Comparison of un-coded BER performance versus the Eb/N0 with the SIR fixed at 15dB for
Comparison of un-coded BER performance versus the SIR with the Eb/N0 fixed at 40dB for
Comparison of coded the BER performance versus the SIR with the Eb/N0 fixed at 40dB for
Comparison of un-coded BER performance versus the Eb/N0 with the SIR fixed at 15dB for
Comparison of un-coded BER performance versus the SIR with the Eb/N0 fixed at 40dB for
Comparison of coded BER performance versus the SIR with the Eb/N0 fixed at 40dB for
The uncoded BER performance was evaluated using hard-output detection with an FSD. As shown in Table 1, four interference mitigation algorithms were compared. Here, ID (
Look at the scenario 1, with the antenna configuration of
For the scenario 2, with the antenna configuration of
B. Complexity Comparison
It is difficult to estimate the design cost of the algorithm; thus, the approximated computational complexity was used to compare the complexities of the algorithms. The FSD algorithm is assumed in this paper; thus, we used the characteristic of the tree search algorithm. The tree architecture of the FSD comprises multiple nodes, and each node performs similar computations such that the number of visiting nodes is usually considered the total complexity of tree search algorithms. Therefore, for the compared interference mitigation algorithms, we first analyzed the number of visiting node and approximated the number of multiplications calculated per node. Then, the number of multiplications for ED calculation was added. Note that other processes, such as FSD channel ordering and QR decomposition, are not changed by the interference mitigation algorithms. In addition, for simplicity, pre-processing for IW with Cholesky decomposition was not considered in this analysis because it has much lower complexity than a tree search unit.
Table 2 shows the approximated complexity for hard-output detection with an FSD with the different interference mitigation algorithms. The preprocessing for IW is not included; thus, none and IW are the same in this analysis. First, ID and proposed scheme makes more number of visiting nodes in tree searching. Next, the proposed algorithm requires IC, which can be performed by each node; therefore, the number of multiplications per node is double that of the other algorithms. Finally, the number of ED calculations depends on the number of candidate outputs, and, compared to none and IW, ID and the proposed algorithm require double the number of multiplications for each calculation. At the upper part of the table, the calculation equation in each part is described using the parameter
Table 3 shows the same analysis for the soft-output detection using the SFSD. Here, the number of visiting node was calculated, including additional nodes from the list extension technique. The proposed algorithm can achieve near performance to ID with 4 and 15 times lower complexity.
Architecture Design
In this section, the architecture design of the FSD combined with the proposed interference mitigation algorithm for DL MU-MIMO systems is presented. First, the design target and overall architecture are presented. In addition, issues that should be considered when designing the symbol detector and overall architecture efficiency are discussed. Note that the the architecture can be divided into two stages, i.e., the IW stage and the symbol detection stage. Design specifies are explained in the following.
A. Design Overview
The designed symbol detector aims to improve communication performance of a DL MU-MIMO system by adding an interference mitigation algorithm to an existing MIMO detector. The considered environment is a
Block diagram of proposed design combining hard-output FSD and interference mitigation algorithm for
Block diagram of proposed LSD unit for SFSD implementation combined with interference mitigation algorithm.
A DL MU-MIMO was introduced to improve system throughput, such that the designed symbol detector can support high throughput. Generally, throughput \begin{equation*} { \it {\Phi }}=f_{c}\times \frac {B \times N_{T}}{N_{clk}}\tag{16}\end{equation*}
The pipelining technique increases throughput efficiently; however, it requires many hardware units, which can increase overall complexity. Therefore, an efficient algorithm that minimizes complexity should be used. As shown in Fig. 9 and 10, the proposed design can be divided into two stages, i.e., the IW and symbol detecting stages. In the IW stage, the IW filtering process is implemented according to an optimized technique. In the symbol detecting stage, the proposed interference mitigation algorithm is implemented efficiently using the existing FSD and SFSD algorithms.
B. IW Stage
In the IW stage, finding
The overall functional flow of the IW stage is shown in Fig. 11. First, the value of
First, we perform Cholesky decomposition on the given channel
Note that matrix inversion by Cholesky decomposition is simple due to the triangular matrix. The matrix inversion is also designed to enable full pipelining as shown in Fig. 12. In addition, matrix inversion can be performed with a delay in each column stage, which can make the design efficient with lower latency.
C. Symbol Detecting Stage
The FSD stage can be divided into three key functions, i.e., FSD pre-processing, FSD and LSD. Note that FSD preprocessing is excluded in the hardware implementation in this paper. The FSD and LSD are designed as a fully-pipelined architecture such that each has the same number of sub-stages as the number of tree levels.
A block diagram of the entire FSD stage (consisting of FSD-LSD) is shown in Fig. 9. First, hard-detection is performed in the FSD stage and the LSD stage is initiated to secure soft output. In this process, there is a minimum selector that determines the minimum candidate value based on the FSD output. Finally, there is an LLR calculator based on the SFSD output.
1) Implementation of FSD Unit
For a four-stream antenna, the number of full extensions is fixed to one, which is close to the near optimal performance. Figure 14 shows the hardware architecture of the FSD for \begin{align*} \tilde {z}_{i}=&z_{i} - \sum _{j=i+1}^{M} R_{i,j}\hat {x}_{j}, \tag{17}\\ \hat {x}_{i}=&\mathrm {slice}\left({\frac {\tilde {z}_{i}}{R_{i,i}}}\right),\tag{18}\end{align*}
In the first stage, which does not have to select a symbol because the first level is FE, only the ICU exists. In addition, in the last stage which does not have to update the matrix, only the NSU exists.
2) Implementation of LSD Unit
After passing through the four stages of the FSD, a bit-based tree-extension stage is initiated to generate soft-output for all bits. This tree extension is performed based on the LSD algorithm and generates the soft-output, and the list parameter
3) Implementation of IC Stage
Note that the proposed interference mitigation scheme is implemented by hardware, and the proposed scheme is implemented to minimize additional hardware requirements by adding an IC unit to the FSD and LSD units as shown in Fig. 13. The interference signal is detected through the operation of (14), which can proceed sequentially in the tree-level structure. For calculation, after FSD preprocessing is performed,
After four stages of the IC unit, the ZF matrix is completed and then goes through the slicing unit, which detects interference signals. Finally, the detected interference signal is used with the detected desired signal in the ED calculation unit.
Note that the IC unit for the LSD works in the same manner. Therefore, in total,
Implementation Result
The proposed soft-output symbol detector design, including the proposed interference mitigation algorithm, was implemented for DL MU-MIMO system. Here, a
Table 5 compares the overall hardware implementation results of previous studies and the proposed design. [32] shows the implementation results of the SD algorithm with hard-output. [33], [34], and [35] show the implementation results of the K-best algorithm with hard and soft output, respectively. [36] and [37] show the implementation results of the FSD algorithm with hard and soft output, respectively. Proposed shows the implementation results of the proposed algorithm with only the FSD and with both the FSD and IC.
To compare the results fairly, under the same condition, throughput and power consumption were normalized to the 65-nm technology at a supply voltage (\begin{equation*} {\mathrm {Normalized\;\;Throughput}}=\left({\frac {{\text {Tech.}}}{\mathrm {65\;nm}}}\right)\times {\text {Throughput}},\tag{19}\end{equation*}
\begin{equation*} {\mathrm {Normalized\;\;Power}}=\left({\frac {1.2{{{\;\text {V}}}}}{V_{\text {dd}}}}\right)^{2}\times \left({\frac {{\mathrm {65\;nm}}}{\text {Tech.}}}\right)\times {\text {Power}}.\tag{20}\end{equation*}
As shown in Table 5, [32] shows a gate count of 212 K and a normalized throughput of 132.9 Mbps at an operating clock rate of 193 MHz. Note that the SD algorithm is not friendly to the hardware architecture; thus the implementation is not efficient and cannot provide high throughput. [33] shows a gate count of 1760 K and a normalized throughput of 100 Mbps at an operating clock of 198 MHz. Note that this is the first implementation of the K-best algorithm with a sorter in each stage. Therefore, the implementation is not efficient, which results in a high gate count and low throughput. [34] attempts to optimize the existing K-best algorithm in terms of both throughput and gate count, and it shows a gate count of 298 K and a normalized throughput of 2 Gbps; however, it has a very high operating clock rate of 833 MHz, which is too high for use in a practical implementation. Recently, [35] was presented to achieve high-speed performance as normalized 3.28 Gbps at an operating clock rate of 137 MHz with a gate count 1753 K.
As mentioned previously, the FSD algorithm is proposed as a hardware friendly implementation. [36] shows much better performance with a gate count of 88.2 K and a normalized throughput of 1.98 Gbps at an operating clock rate of 165 MHz. Here, a potential disadvantage of the FSD algorithm is how to obtain sufficient candidates for soft-output generation because it typically uses a much lower number of candidates compared to K-best algorithm. [37] shows the soft-output FSD with a gate count 555 K and a normalized throughput of 3.05 Gbps at an operating clock rate of 370 MHz.
Since there have been no previous implementations of the interference mitigation algorithm for an MIMO detector, the proposed algorithm was first implemented using only the FSD stage for comparison with existing MIMO detectors. In addition, it was implemented with the IC stage to evaluate the cost of extending the DL MU-MIMO system.
First, when comparing the results of [37] and the FSD component of this study, the FSD component shows a gate count of 887 K and a normalized throughput of 3.6 Gbps at an operating clock rate of 150 MHz. Recall that this study employs a fully-pipelined architecture; therefore, the throughput is much greater at the same operating clock rate although the gate count is greater than that of [37]. Similarly, power consumption can be lower due to the lower operating clock rate. In other words, the proposed implementation is comparable to the existing FSD implementation.
Next, the FSD+IC component is compared to the FSD component to determine how it is changed by extending the MIMO detector to a DL MU-MIMO detector. As shown in Table 3, the proposed FSD with IC requires approximately double the computational complexity of the existing FSD. This result can be observed in Table 4. If we consider FSD preprocessing, the total gate count is closer to the double value because the implementation includes preprocessing for only the IC because FSD preprocessing is usually excluded due to its variable implementation method. As shown in Table 5, with the interference mitigation algorithm, proposed algorithm can achieve a normalized throughput of 3.6 Gbps with a gate count of 2107 K, which is significantly better performance than existing MIMO detectors under DL MU-MIMO systems and when only IW is included.
Conclusion
In this paper, we have proposed a design and implementation of an FSD combined with an interference mitigation algorithm for a DL MU-MIMO system. The simulation results show 0.5 dB near-optimal performance which is 4 dB better than the existing IW method for soft-output decisions. Both hard and soft output FSDs are implemented with the proposed interference mitigation algorithm for