Journals & Magazines >IEEE Access >Volume: 12

Deep Learning Based One Bit-ADCs Efficient Channel Estimation Using Fewer Pilots Overhead for Massive MIMO System

The proposed BiLSTM with its different layers and architecture.

Abstract:

The massive MIMO approach presents an exciting prospect for the upcoming generation of wireless transmission systems. However, the adoption of actual massive MIMO scenari...Show More

Metadata

Abstract:

The massive MIMO approach presents an exciting prospect for the upcoming generation of wireless transmission systems. However, the adoption of actual massive MIMO scenarios is hindered by high hardware expenses and increased energy usage, particularly as the quantity of RF modules expands. To address this issue and make massive MIMO more commercially viable, the design of 1-bit analog-to-digital converters (ADCs) has been considered as a solution. Various deep learning (DL) techniques for channel estimation (CE) with 1-bit ADCs have been developed in the literature. Nonetheless, most of these methods demonstrate limited performance in CE regarding pilot lengths and noise levels. In this paper, an efficient DL model known as bi-directional long short-term memory (BiLSTM) is proposed. This model enhances CE performance with limited pilot signals by training on long input sequence data within a bi-directional framework. The bi-directional (forward and backward) tasks in the hidden layers of BiLSTM contribute to its enhanced training ability, thereby enriching the CE of the proposed system. Moreover, in this paradigm, BiLSTM is utilized in conjunction with previous channel estimation data to learn the complex mapping from quantized incoming evaluations to channels. Consequently, the proposed model demonstrates superior CE efficiency for the same size of pilot sequencing as it deduces the necessary length and configuration of the pilot sequencing to ensure the existence of this mapping function. Therefore, lower pilot signals are needed with additional antennas for identical CE capability. Simulation outcomes verify that the proposed model exhibits satisfactory CE accuracy. It is confirmed that the increase of the number of antennas improves CE concerning the acquired signal-to-noise ratio per antenna and the normalized mean squared error.

The proposed BiLSTM with its different layers and architecture.

Published in: IEEE Access ( Volume: 12)

Page(s): 64823 - 64836

Date of Publication: 06 May 2024

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2024.3397328

Funding Agency:

References is not available for this document.

Contents

SECTION I.

Introduction

Utilizing a substantial number of antennas at the transmitter, massive multiple-input multiple-output (MIMO) can significantly enhance network capacity, spectral efficiency, and overall signal coverage for wireless communication networks [1], [2]. However, an effective approach to mitigate hardware expenses and reduce wireless power consumption in massive MIMO techniques is to utilize lower resolutions (such as 1, 2, or 3-bits) for analog-to-digital converters (ADCs). The complexity and energy consumption of an ADC both increase linearly with resolution [3] since the number of comparators in a k-bit ADC increases linearly with k. Consequently, lower-resolution ADCs are significantly less expensive and consume less power than higher-resolution ADCs. When lower-resolution ADCs are employed, the equipment structures of related RF chain elements can also be streamlined or eliminated. For example, 1-bit ADCs do not require digital gain control as they only preserve the sign of the real and imaginary components of the received signals in a simple design. The use of low-resolution ADCs in real-world massive MIMO communication networks offers significant advantages on the hardware front [4]. However, the use of purely low-resolution ADCs can hinder overall efficiency, leading to error floors in linear multi-user detection [5], degradation of data rates in higher SNR regions [6], and difficulties in channel estimation (CE) [7]. Therefore, it is crucial to develop effective signal-processing techniques for information discovery and CE in these systems to facilitate their transition into commercially viable solutions.

Numerous scientific studies have been conducted on CE within massive MIMO systems, garnering significant attention, particularly with regard to the use of 1-bit ADCs in various environments [8], [9], [10], [11], [12], [13]. For instance, efficiency limitations for mmWave 1-bit ADCs in massive MIMO CE were revealed in [8]. The study in [9] introduced a 1-bit Bussgang-aided minimal mean-squared error (BMMSE) CE method utilizing Bussgang decomposition. Furthermore, authors in [10] investigated angular-domain CE for 1-bit massive MIMO approaches. The work in [14] proposed a variational Bayesian-sparse Bayesian learning-based CE algorithm for the multi-user massive MIMO system where hybrid analog-digital processing and low-resolution ADCs are utilized at the BS. In [11], researchers studied multi-cell analysis, considering pilot interference and spatially/temporally associated channels. The CE using maximum a posteriori and maximum likelihood was explored in [12] and [13] respectively, focusing on sparse mmWave MIMO communication systems. In order to CE in a massive MIMO system, the authors in [15] first made an effort to characterize the target parameter information and channel state information from the standpoint of sensing and communication channels in a single framework. The framework works well for channels that change in time, channels that are frequency selective, and beam squint effects. Regarding CE with 1-bit ADCs, researchers in [16] presented an amplitude retrieval technique. This technique aims to restore missing amplitudes and recover the direction of arrival, facilitating CE. While the aforementioned ADC structure can reduce hardware expenses and energy consumption, the lower-resolution ADC section also limits the efficiency of the transceiver, particularly in CE scenarios with fewer pilot overheads.

Recently, in wireless communication systems, deep learning (DL) deployment has demonstrated effective performance in the field of CE [17], [18], [19], design of pilot [4], [20], [21], [22], detection of information [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], and feedback of channel state information [33], [34], [35], [36]. For handling the CE performance in a 1-bit massive MIMO system, DL techniques demonstrate superior effectiveness compared to traditional methods. Recently, there have been significant researches conducted on DL-based CE for 1 and mixed-bits ADCs [37], [38], [39], [40], [41], [42], [43], [44], [45], [46]. Specifically, to calculate the channel matrix using 1-bit quantization for incoming information, a conditional generative adversarial network (cGAN) was constructed in [39]. This system outperforms basic neural networks (NNs), such as the naive convolutional (CNN), as well as conventional CE methods. In the study of [40], a deep (DNN)-based CE and a learning signal configuration for lower-resolution quantization in MIMO networks were developed. However, the NN performs poorly due to its straightforward fully connected structure. Furthermore, the investigation into the relationship between larger antennas and fewer pilots is introduced in the multilayer perceptron (MLP)-based CE for large MIMO systems with 1-bit ADCs in [41]. Authors in [42] proposed a two-stage model-driven system called OBMNet in the massive MIMO system for efficient data detection. The proposed model structure is designed based on the architecture of the DNN model and exhibits excellent performance compared to existing methods. A combined pilot architecture and CE strategy for mixed-ADCs in massive MIMO research are presented based on the DL algorithm in [43]. They construct a pilot layout NN whose weights explicitly reflect optimized pilots and establish a meticulously connected model driven by a Runge-Kutta method for CE. In [44], a DNN-based mixed-ADC massive MIMO system for CE was proposed. To simplify the input signals of all antennas fed into the fully connected layers of DNN, a directed input model called (DI)-DNN was proposed. Additionally, the authors in [45] proposed a two-phase DNN model with mixed-ADC and low-ADC for CE in the uplink MIMO system. In the first phase, a recovering DNN model is used for coarse CE with fewer-ADC antennas, while in the second phase, a refining DNN is employed for CE with all antennas. In [46], authors proposed a modified DNN-based CE with a mixed-ADC architecture, where the majority of the antennas are fitted with low-resolution ADC, and the remainder are outfitted with high-resolution ADC, respectively. However, recurrent neural network (RNN) is one of the DL models better suited for handling periodic and sequential data than DNN and CNN, as mentioned in [31].

RNN is a feed-forward NN implementation and can process input sequences of varying lengths over time. RNNs utilize past information to predict future outcomes by retaining memories of past events. An RNN variant known as LSTM is designed with specialized gating techniques to control access to memory cells [47]. RNNs like LSTM are commonly used to address sequencing issues. The LSTM employs a gate architecture instead of the hidden unit in a standard RNN framework, enabling it to select and retain crucial data while disregarding irrelevant data. Using the mechanism of LSTM, to solve the CE with 1-bit ADCs in massive MIMO systems, authors in [48] designed an integrated model named LSTM-gated recurrent unit (GRU), demonstrating higher CE accuracy than other existing models. In contrast to LSTM, the BiLSTM network enables bidirectional data flow, making it advantageous for sequence categorization. Compared to LSTM, BiLSTM delivers improved accuracy by utilizing input from both preceding (backward) and subsequent (forward) events simultaneously. This design captures additional input data features, thereby enhancing learning performance as the flow of data in the BiLSTM network is improved [49]. The findings of the experiment indicate that BiLSTM performs better at extracting features than LSTM, according to the study [50] that compared the two models’ performances. For MIMO OFDM communication, the authors in [51] contrasted the BiLSTM approach with CNNs and DNNs. In comparison to other models, the BiLSTM can produce findings for CE issues that are more accurate, according to the conclusion. To recover the transmitted data, the study in [32] suggested a multiuser uplink CE and signal detection in the NOMA-OFDM systems based on the BiLSTM model. Yu et al. [52] previously presented a stacked BiLSTM framework for RIS-assisted unmanned aerial vehicle communication systems. However, BiLSTM faces challenges in complexity, data dependence, and hyperparameter tuning. Despite this, BiLSTM networks offer advantages for CE, capturing sequential context and bidirectional information flow. In addition, the automatic feature representation and context awareness make the BiLSTM promising for CE estimation tasks. In this article, effective CE using 1-bit ADCs in massive MIMO is addressed through the implementation of a DL model known as BiLSTM. The proposed BiLSTM model exhibits improved performance due to the bidirectional operation of two LSTM units, operating in both forward and backward directions. The rationale behind using BiLSTM lies in its ability to embed certain knowledge into long-short-term memory, aiding in retaining crucial information. Moreover, the bidirectional mechanism of the proposed model enables training on more data while preserving more information, thus enhancing overall CE performance.

The CE of the 1-bit ADC was significantly influenced by the findings in the previously mentioned research, especially when fewer pilot signals are employed. However, BiLSTM can be an effective approach in improving CE performance with 1-bit ADCs in massive MIMO communication strategies, particularly in scenarios with fewer pilots and more available antennas. Motivated by the aforementioned approach and research gap as well as the advantages of BiLSTM over other DL frameworks in the literature, this paper proposes an effective BiLSTM-based approach for CE using fewer pilots with 1-bit ADCs in massive MIMO systems. The proposed approach involves the design of the massive MIMO system initially. Subsequently, the transmission frame, comprising both pilot and data tones, is sent through the uplink channel. The next step involves offline training of the BiLSTM model using real and imaginary values from the received data. The effectiveness of the suggested model, with reduced pilot overhead, is observed during the online phase.

The main contributions to the suggested framework can be summed up as follows:

The problem of uplink CE using 1-bit ADCs in massive MIMO is addressed through the implementation of an effective DL model known as BiLSTM. This framework learns to map incoming quantized measurements (QM) to the channels by leveraging DL techniques and past channel estimation data. To establish this mapping, an appropriate pilot sequence (PS) length and architecture are determined, ensuring the existence of these QMs. It is worth noting that a reduced number of pilots is required for a specific set of user locations when more antennas are employed to ensure the existence of this mapping. This might seem counterintuitive, but it indicates that as the number of base station (BS) antennas increases, fewer pilots are necessary for CE. This is supported by the fact that the QM vectors corresponding to different channels become more distinct with an increased number of antennas. Consequently, there is a decreased probability of errors when effectively matching them to their respective channels.
Unlike the previous studies, our proposed model learns long sequences of input data in both directions of hidden layers, thereby maximizing overall training performance. Consequently, the proposed model confirms an increase in CE by calculating the normalized mean-squared error (NMSE) in scenarios with more BS antennas and lower pilot overhead.
To observe the effectiveness of the proposed model, we conduct simulations involving various analyses and compare its performance against other existing methods. This comparison is based on measuring NMSE across different antenna numbers, signal-to-noise ratios (SNR), and pilot lengths. The simulation results reveal that the proposed model consistently outperforms other methods across all analyses. Furthermore, in order to maximize the model’s efficacy, we fine-tune its hyperparameters by experimenting with different learning rates, three optimization algorithms, and varying minibatch sizes during the training phase. Consequently, we identify the optimal parameter settings that significantly enhance the efficiency of CE performance.

The rest of the paper is organized as follows: the proposed system model is presented in Section II. In Section IV, the proposed model with channel estimation is described. The simulation outcome of the proposed system is presented in Section V. Finally, Section VI concludes the paper.

Notations: The boldface letter in lower case and upper case denotes a vector and a matrix, respectively; the subscript on the lowercase letter $\mathbf {x}_{i}$ represents the i-th element of the vector x; the Hermitian transpose, and conjugate-transpose are represented by the letters $(\cdot)^{H}$ , $(\cdot)^{T}$ , respectively; $\Re _{e}$ {h} and $\Im _{m}$ {h} denotes the modulus, real part and imaginary part of complex number h, respectively; $\mathbb {E}$ stands for the statistical expectation.

SECTION II.

System Model

In the proposed system, we have assumed the uplink of massive MIMO communication with 1-bit ADC where U single user equipment (UE) antenna is considered and BS belongs to G antennas. The proposed system model is illustrated in Fig. 1, where the channel is calculated by uplink learning and is utilized for downstream communication of data in a time division duplexing mechanism. Consider N as the number sequence of pilot length and the U UE sends an uplink PS of $\mathbf {x}\in {\mathbb {C}}^{N\times U}$ . However, after ADC quantization, the obtained signal y at the BS may be represented as follows: $\begin{equation*} {\mathbf {y}}= \rm {sgn}({\mathbf {h}} {\mathbf {x}} ^{T} + {\mathbf {s}}), \tag {1}\end{equation*}$ View Source where the channel vector between the BS antennas and the UE is defined by ${\mathbf {h}\in } {\mathbb {C}^{G\times {U}}}$ , at the BS, and the noise s is ${{\mathcal {N}}}_{\mathbb {C}}(0,\sigma ^{2})$ which is independent and identically distributed (i.i.d.) elements. In addition, the transmission of PS fulfills $\mathbb {E}[\mathbf {xx^{H}}]=R_{t}\mathbf {I}$ where $R_{t}$ represents the average transmitted power per symbol. The signum function sgn $(\cdot)$ , which operates on elements rather than whole numbers, treats the real ( $\Re _{e}$ ) and imaginary ( $\Im _{m}$ ) parts of its argument independently and can be mathematically formulated as follows: $\begin{align*} \mathrm {sgn}(x)=\begin{cases} \displaystyle 1, & \text {if} \, x \geq 0 \\ \displaystyle -1, & \text {otherwise}. \end{cases} \tag {2}\end{align*}$ View Source The obtained pilot signals after quantization comprise y, which is the $G\times N$ quantized estimation matrix. In addition, the received quantized signal y, where the high quantization employing 1-bit ADCs causes the components within the $\Re _{e}$ and $\Im _{m}$ parts to fall into the set of [ $-1, 1$ ].

$FIGURE 1. - The widespread usage of 1-bit ADCs in base station receivers for massive MIMO systems. A DL model that estimates the channel vector $\hat {\mathbf {h}}$ is fed into the uplink received signal matrix y.$

FIGURE 1.

The widespread usage of 1-bit ADCs in base station receivers for massive MIMO systems. A DL model that estimates the channel vector $\hat {\mathbf {h}}$ is fed into the uplink received signal matrix y.

Show All

SECTION III.

Channel Modeling and Data Transmission

For channel model h, we utilize a generic geometric channel and assume that there are P possible pathways for the signal to travel before reaching the BS from the UE. Each pathway p includes an angle of arrival $\phi _{p}$ and a complex gain $\theta _{p}$ , and the BS array response vector is ${\mathbf {a}}(\phi _{p})$ , and then the channel h is expressed as follows: $\begin{equation*} {\mathbf {h}} = \sum \limits _{p =1}^{P}\theta _{p }{\mathbf {a}}(\phi _{p}). \tag {3}\end{equation*}$ View Source

The formulation of BS array response vector ${\mathbf {a}}(\phi _{p})$ can be mathematically expressed as follows: $\begin{equation*} \mathbf {a}(\phi _{p})\!=\!\frac {1}{\sqrt {P}}\left [{{1,e^{-j2\pi \frac {d}{\lambda }\!\sin (\phi _{p})},\ldots,e^{-j2\pi \frac {d}{\lambda }(M\!-\!1)\!\sin (\phi _{p})}}}\right ]^{T}\!,\! \tag {4}\end{equation*}$ View Source where the carrier frequency wavelength is $\lambda$ and d stands for the distance between neighboring antennas at the BS.

The downlink beamforming of $\mathbf {C_{b}}$ is created as a conjugate beamforming based on the predicted channel vector $\hat {\mathbf {h}}$ . Then the conjugate beamforming can be expressed as follows: $\begin{equation*} \mathbf {C_{b}}=\frac {\hat {\mathbf {h^{*}}}}{||\hat {\mathbf {h}}||}. \tag {5}\end{equation*}$ View Source However, the downstream receiving SNR per transmit antenna of $\textsf {SNR}_{\mathrm {ant}}$ can be expressed as follows: $\begin{equation*} \textsf {SNR}_{\mathrm {ant}}=\frac {\gamma }{G} \frac {\left |{{\widehat {\mathbf {h}}^{H} {\mathbf {h}}}}\right |^{2}}{\|\widehat {\mathbf {h}}\|^{2}}, \tag {6}\end{equation*}$ View Source where the average received SNR before beamforming is shown by the symbol $\gamma$ .

SECTION IV.

Proposed Model Based Channel Estimation

Utilizing the highly quantized signal y, we aim to develop an effective CE approach for generating the channel vector h. Our objective is to establish a CE technique that minimizes the NMSE between the predicted channel $\hat {\mathbf {h}}$ and the initial channel h, assuming that the BS possesses knowledge of the PS x. Previous CE techniques have attempted to estimate the channel $\hat {\mathbf {h}}$ by processing the quantized signal y. However, due to the high quantization of y, extensive PS x has been necessary to achieve satisfactory CE performance. To address this challenge, this study proposes the use of BiLSTM models to accurately predict channels from quantized data while employing only short PS x.

For massive MIMO systems employing lower-resolution ADCs, traditional CE approaches like [7] attempt to compute the channel just from the quantized received signal without leveraging previous measurements. However, the channels effectively contain various environmental factors, such as geometry, resource allocation, and transmitter/receiver placements [53]. This implies that comparable channels will probably be seen more than once by the BS deployed in the given environment. The mapping from the received quantized measurements to the channels is learned in the proposed study by utilizing the DL model and prior channel estimation data. Therefore, prior experience can be utilized in finding the fundamental relationship between the channels and the quantized received signals. This may result in a large reduction in the pilot length. The research presented in [54] showed that there are significant correlations between the channels of adjacent subcarriers. The received signals will become highly connected and their practical performance will be degraded if two adjacent subcarriers are simultaneously dedicated to the pilot. However, in order to obtain higher diversity gain and less correlation between subcarriers, the authors used the widely utilized uniform pilot placement. Therefore, this paper proposes utilizing the BiLSTM model to establish the mapping between the obtained quantized assessment matrix y and h. In the subsequent section, we first define the conditions under which this mapping occurs before highlighting an intriguing finding: increasing the number of antennas reduces the required number of pilots.

A. Mapping from Quantized Measurements to Channels

It is assumed that an indoor or outdoor environment setup with a massive MIMO system where BS is aiding UE with a single antenna is mentioned in section II. Consider the channel of candidate set $\{\mathbf {h}\}$ for the UE that relies on the placements of the potential users and the surrounding environment. Furthermore, let y represent the associated QM matrices for the PS of x and the channel set h. The mapping from QM matrices to channels, denoted by $\boldsymbol {\psi } (\cdot)$ , is expressed as follows: $\begin{equation*} \boldsymbol {\psi }~:~\{ {\mathbf {y}}\} \rightarrow \{ {\mathbf {h}}\}. \tag {7}\end{equation*}$ View Source

The channel vector h can be predicted from the QM matrices y if this mapping has been identified. Therefore, we aim to establish the existence of the aforementioned mapping in Hypothesis 1 and explain the procedure to enhance our understanding of it.

Hypothesis 1: The system and channel of the proposed study are as follows: $\begin{align*} {\mathbf {y}}& = \rm {sgn}({\mathbf {h}} {\mathbf {x}} ^{T} + {\mathbf {s}}), \tag {8}\\ {\mathbf {h}} & = \sum \limits _{p =1}^{P}\theta _{p }{\mathbf {a}}(\phi _{p}). \tag {9}\end{align*}$ View Source

In equation (8), let the value of $\mathbf {s}=0$ and the candidate channel $\{\mathbf {h}\}$ . Therefore, the angle of $\theta$ in equation (9) can be expressed as follows: $\begin{align*} {\theta } = \min \limits _{\substack {\forall {\mathbf {h}} _{\mathrm { u}}, {\mathbf {h}}_{\mathrm { v}} \in \{ {\mathbf {h}}\} \\ u \neq v}} ~\max \limits _{\forall g}\left |{{\angle \left [{{ {\mathbf {h}}_{\mathrm { u}}}}\right ]_{\mathrm { g}} - \angle \left [{{ {\mathbf {h}}_{\mathrm { v}}}}\right ]_{\mathrm { g}}}}\right |. \tag {10}\end{align*}$ View Source

The mapping function $\boldsymbol {\psi }$ ( $\cdot$ ) exists if the PS of x is designed to have a length N that satisfies $N\ge \left [{{\frac {\pi }{2\theta }}}\right ]$ , where the angles of the pilot complex symbols evenly sample the range $\left [{{0,2}}\right ]$ . The proof of Hypothesis 1 and equation (10) is explained in the study of [41].

According to Hypothesis 1, once the PS of x is constructed following the specified structure outlined in the proposal, there exists a one-to-one mapping $\boldsymbol {\psi }$ ( $\cdot$ ) capable of mapping the QM matrices y to the channel h, effectively predicting h using y. It is crucial to note that in large MIMO systems, only a small number of pilot symbols (particularly with a small N) are needed for this mapping $\boldsymbol {\psi }$ ( $\cdot$ ) to exist with a high degree of probability, as demonstrated in the simulation results section V. Compared to traditional 1-bit ADC CE methods, this could significantly reduce the expenses associated with channel learning. However, we need to understand this mapping function to leverage its benefits. Due to the non-linear quantization, characterizing this mapping analytically is extremely challenging. Taking this as our motivation, we propose the BiLSTM model’s effective capacity to learn this mapping and thereby achieve the advantage of reducing channel training overhead.

B. Analysis: Fewer Pilots are Needed as There Are More Antennas

According to Hypothesis 1 and its demonstration, the ideal PS should be long enough to ensure that every pair of channels in h yields two distinct QM matrices. It makes sense that with more antennas installed at the BS, and with identical uplink PS duration, the probability increases for these antennas to result in distinct measurement matrices. It is interesting to note that additional antennas will improve CE, as shown in the simulation results in Section V. This indicates that smaller pilots will be required at the BS if more antennae are used to ensure a one-to-one or bijective mapping from h to y. Analytical descriptions of this intriguing relationship are feasible for various channel approaches. The following Corollary 1 demonstrates that smaller pilots are required when more antennas are added in the LOS channel approach. Corollary 1: It is assumed that a BS has a single-path channel scheme ( $P =1$ ), and a ULA has half-wavelength antenna separation. The minimum distinction within any two angles of arrival, $\phi _{1},\phi _{2}\in \left [{{0,\pi }}\right ]$ , of any two users is denoted by the symbol $\tau \phi$ . If the PS is constructed to satisfy Hypothesis 1, the necessary length of the PS to ensure the existence of the mapping $\boldsymbol {\psi }$ ( $\cdot$ ) is expressed as follows: $\begin{equation*} N=\left \lceil {{\frac {1}{(G-1)(4 \sin ^{2} (\tau \phi /2))}}}\right \rceil. \tag {11}\end{equation*}$ View Source

The proof of (11) is explained in the study by [41], and it is implied by Hypothesis 1. The intriguing advantage of our suggested BiLSTM strategy becomes apparent in Corollary 1, which states that a greater number of antennas require fewer pilots to ensure the presence of $\boldsymbol {\psi }(\cdot)$ and identical CE efficiency. In Section V, simulation analyses will also be employed to confirm this intriguing finding.

C. Proposed Deep Learning Model

In this study, to effectively calculate the channel matrix, a time series-based model called BiLSTM is used. It is seen from the equation (1) that all of the parameters such as $\mathbf {y}, \mathbf {h}, \mathbf {s}$ , and x comprise of $\Re _{e}$ and $\Im _{m}$ matrices. A PS $\mathbf {x}\in \mathbb {C}^{N\times U}$ of size N is utilized to create the training data, which is used for estimating the channel as follows: $\begin{equation*} {\mathbf {y}}= \rm {sgn}({\mathbf {h}} {\mathbf {x}} + {\mathbf {s}}), \tag {12}\end{equation*}$ View Source where $\begin{align*} \mathbf {y} & = \left [{{\Re _{e} \lbrace {\mathbf {y}}_{\mathrm {t}}\rbrace, \Im _{m} \lbrace {\mathbf {y}}\rbrace }}\right ],~\mathbf {h}= \left [{{\Re _{e} \lbrace {\mathbf {h}}\rbrace, \Im _{m} \lbrace {\mathbf {h}}\rbrace }}\right ], \\ \mathbf {s} & = \left [{{\Re _{e} \lbrace {\mathbf {s}}\rbrace, \Im _{m} \lbrace {\mathbf {s}}\rbrace }}\right ],~\mathbf {x} = \begin{bmatrix}\Re _{e} \lbrace {\mathbf {x}}\rbrace & \Im _{m} \lbrace {\mathbf {x}}\rbrace \\ -\Im _{m} \lbrace {\mathbf {x}}\rbrace & \Re _{e} \lbrace {\mathbf {x}}\rbrace \end{bmatrix}.\end{align*}$ View Source

The objective of this study is to estimate the channel h from the one-bit quantized data y, analyze the best possible CE, and explore optimal training sequences x. To achieve this goal, we describe an effective time series BiLSTM model enhancing CE performance with reduced pilot data sequences. Leveraging the capabilities of gated units, particularly BiLSTM, we aim to map quantized incoming signals to complex-valued channels. The selection of BiLSTM models represents the current state-of-the-art in translation. The core idea is that with more antennas and higher SNR but fewer pilot overheads, our proposed model should exhibit improved performance due to the bidirectional operation of two LSTM units, operating in both forward and backward directions. The rationale behind using BiLSTM lies in its ability to embed certain knowledge into long-short-term memory, aiding in retaining crucial information. Moreover, the bidirectional mechanism of the proposed model enables training on more data while preserving more information, thus enhancing overall CE performance with fewer pilot data. Consequently, it can denoise the pre-processed y. As mentioned earlier, in cases of prolonged sequential training, conventional back-propagation training faces challenges to diminishing gradients. The proposed BiLSTM utilizes an effective gradient-based method with a structure designed to ensure continuous error flow through internal states of specialized units, addressing these issues related to error back-flow.

1) Data Pre-Possessing and Preparation

Pre-processing of the network inputs and outputs is necessary for effective training before any training occurs. Whether the channels are in the training or testing datasets, the initial stage of pre-processing uses the maximum absolute channel value from the training set to normalize those channels to the range [ $-1, 1$ ]. Such normalization has shown to be quite beneficial to us in previous studies [53], [55]. In this simulation, we take into account the indoor massive MIMO scenario I1_2p4, provided by the DeepMIMO dataset [56], which is produced using the precise 3D ray-tracing simulator Wireless InSite [57]. In this scenario, users are arranged across two $x-y$ grids in a $10m \times 10~m$ space that has two tables and operates in an indoor setting at 2.5 GHz MIMO. The DeepMIMO dataset, which encompasses the channels between each potential user location and each antenna at the BS, is created based on this ray-tracing scenario. The DeepMIMO settings are as follows: the scenario name is I1_2p4, there are 32 active BSs, row 1 to 502 active users, 4 BS antennas in $(x, y, z)$ : (1, 100, 1) system bandwidth: 0.01 GHz, 1 OFDM sub-carrier (single-carrier), and 10 multipaths. The resulting DeepMIMO dataset components are first shuffled, and then it is divided into 70% training and 30% testing datasets. During the generation of training datasets, we have considered the SNR ranges from 0 to 30dB. We have generated the datasets by taking $0:5:30$ dB i.e., seven intervals with seven different SNR-contained datasets using the Algorithm 1, and then the training is performed with these datasets. The procedure of the data preparation is shown in Algorithm 1. Subsequently, the BiLSTM model is trained using the generated datasets, and the effectiveness of the proposed approach is evaluated. The training parameters are listed in Table 1.

TABLE 1 List of Training Simulation Parameters of the Proposed Model

Algorithm 1 Data Preparation Algorithm

Begin

Initialize the parameters G, M, T, U, and N.

Generate channel matrices: ${\mathbf {h}}_{\mathrm {i}} \in \mathbb {C}^{G\times U}$ for $\{i=1,2,\ldots,M\}$

Generate uplink pilot symbols: ${\mathbf {x}}_{\mathrm {i}}\in \mathbb {C}^{N\times U}$

Transmit pilot symbols: $\forall _{i} \in \{1,2,\ldots,M\}, {\mathbf {y}}_{\mathrm {i}} = \mathbf {h}_{i}\mathbf {x}+\mathbf {s}$

Simulate 1-bit ADC operation: $\forall _{i} \in \{1,2,\ldots,M\}, \mathbf {B}_{i}=\mathrm {sgn}({\mathbf {y}}_{\mathrm {i}})$ , $\mathbf {B}_{i}\in \left [{{-1, 1 }}\right ]^{G\times T}$

Feature extraction: $\mathbf {\forall }_{i} \in \{1,2,\ldots,M\}, F_{i}=\left ({{\Re _{e} \lbrace {\mathbf {y}}_{\mathrm {i}}\rbrace, \Im _{m} \lbrace \bar {\mathbf {y}}_{\mathrm {i}}\rbrace }}\right)(\mathbf {B}_{i}), \mathbf {F}_{i}\in \mathbb {R}^{D}$

Construct training dataset: $\{\left ({{\Re _{e} \lbrace {\mathbf {F}}_{\mathrm {i}}\rbrace, \Im _{m} \lbrace {\mathbf {F}}_{\mathrm {i}}\rbrace }}\right), \left ({{\Re _{e} \lbrace {\mathbf {h}}_{\mathrm {i}}\rbrace, \Im _{m} \lbrace {\mathbf {h}}_{\mathrm {i}}\rbrace }}\right)\}_{i=1}^{M}$

Split the training and testing dataset as 70% and 30%, respectively.

10:

End

2) Proposed Model Architecture

The BiLSTM architecture consists of two LSTM units operating in the forward and backward direction. These LSTM units comprise multiple memory units known as cells. The input gate, output gate, and forget gate are the three gates that regulate the cell. The forget gate discards unnecessary information from the cell state, while the input gate incorporates new information into it. Finally, the output gate selects crucial data from the current cell state and presents it as the output. We trained the BiLSTM model to learn channel mapping from quantized data, leveraging the efficiency of these gated units known for their memory capabilities. The proposed model and its corresponding gate structure are depicted in Fig. 2. The description of the proposed model structure is articulated as follows:

FIGURE 2.

The proposed BiLSTM with its different layers and architecture.

Show All

ImageInputLayer: After preprocessing of y with size $G\times N$ along with $\Re _{e}$ and $\Im _{m}$ values, the outcome with the size of $G\times U\times 2$ is fed to the input image layer that processes image data. Mathematically, if we consider k as the input image, the output of the imageInputLayer can be represented as vecInput = k. FlattenLayer: The flattenlayer simply reshapes the input tensor from a multi-dimensional array into a one-dimensional array without any mathematical operations. In this layer, the size of $G\times U\times 2$ is then reshaped as flatten (k) = reshape(k,[ $G\times U\times 2$ , 1]), where flatten (k) represents the flattened vector. BiLSTMLayer: The BiLSTM layer processes a sequence of input vectors with two different directional layers: forward (Frd) and backward (Bkd). The dimension of ${\mathbf {F}}_{\mathrm {i}}$ = $G\times (U\times 2)$ with G sequence is fed to this layer. The mathematical formulation of BiLSTM layers is as follows: $\begin{align*} \overrightarrow {Frd (H_{t})} & = LT_{leyar}({\mathbf {F}}_{\mathrm {i}}, \overrightarrow {H}_{t-1}) \\ \overleftarrow {Bkd (H_{t})} & = LT_{leyar}({\mathbf {F}}_{\mathrm {i}}, \overleftarrow {H}_{t+1}) \\ H_{t} & = [ \overrightarrow {Frd (H_{t})};\overleftarrow {Bkd (H_{t})}], \tag {13}\end{align*}$ View Source where $LT_{leyar}$ is the LSTM layer, ${\mathbf {F}}_{\mathrm {i}}$ is the input of the network. Additionally, $\overrightarrow {H}_{t-1}$ and $\overleftarrow {H}_{t+1}$ are the hidden units of the forwarded and backward BiLSTM layers, respectively. $H_{t}$ is the combined output of the proposed network. ReluLayer: This layer does not modify the dimensions. It uses the ReLU activation function element-wise, i.e., $relu(k) = max(0,k)$ , where $relu(k)$ is the activation function. DroupoutLayer: This layer randomly sets a fraction p of input units of k to zero during training, i.e., $\begin{align*} \mathrm {drop}(k,p)=\begin{cases} \displaystyle k & \text {with probability} \, 1-p \\ \displaystyle 0 & \text {with probability} \, p. \end{cases} \tag {14}\end{align*}$ View Source uses two fullyconnected layers with the number of neurons specified, which is 8272 for the first one and the dimension of output is the second one. Finally, the estimated channel matrix $\hat {\mathbf {h}}$ is done with the size of $\mathbf {G\times U \times 2}$ .

D. Objective Function

The objective of the BiLSTM reduces loss and maximizes CE performance with less pilot overhead. The proposed BiLSTM model is deployed at the BS side for uplink CE, where the estimation is performed through a non-learning mapping relationship from the acquired pilot signal N to the uplink channel and is expressed as follows: $\begin{equation*} \hat {\mathbf {h}} = { \boldsymbol {\psi }}_{ \boldsymbol {\theta }}({\mathbf {y}}^{N}), \tag {15}\end{equation*}$ View Source where ${ \boldsymbol {\psi }}_{ \boldsymbol {\theta }}$ expresses the non-linear mapping function with $\theta$ weights. However, using training dataset { $\left ({{\Re _{e} \lbrace {\mathbf {F}}_{\mathrm {i}}\rbrace, \Im _{m} \lbrace {\mathbf {F}}_{\mathrm {i}}\rbrace }}\right)^{N}$ , $\left ({{\Re _{e} \lbrace {\mathbf {h}}_{\mathrm {i}}\rbrace, \Im _{m} \lbrace {\mathbf {h}}_{\mathrm {i}}\rbrace }}\right)^{N}\}_{i=1}^{M}$ , the proposed BiLSTM is trained and loss function can be formulated as follows: $\begin{equation*} \mathcal {L}(\boldsymbol {\theta }) = \frac {1}{M}\sum \limits _{i=1}^{M}{\parallel \hat {\mathbf {h}}_{i}-{\mathbf {h}}_{i} \parallel }_{2}^{2}. \tag {16}\end{equation*}$ View Source

The training of the BiLSTM is to maximize the weights $\theta$ and minimizes the above loss function $\mathcal {L}(\boldsymbol {\theta })$ which can be represented as follows: $\begin{equation*} \mathop {\min }\limits _{ \boldsymbol {\theta }}\mathcal {L}(\boldsymbol {\theta }) = \frac {1}{M}\sum \limits _{i=1}^{M}{\parallel { \boldsymbol {\psi }}_{ \boldsymbol {\theta }}({\mathbf {y}_{i}}^{N}) -{\mathbf {h}}_{i} \parallel }_{2}^{2}. \tag {17}\end{equation*}$ View Source

Iteratively training of the BiLSTM on the training dataset involves the definition of the objective function. The weights $(\boldsymbol {\theta })$ are updated by the gradient descent in every iteration t, which is expressed as follows: $\begin{equation*} { \boldsymbol {\theta }}_{t+1} = { \boldsymbol {\theta }}_{t} - {\eta }_{t} {\mathbf {g}}({ \boldsymbol {\theta }}_{t}), \tag {18}\end{equation*}$ View Source where the weights for the t th and $t+1$ th iterations are denoted by $(\boldsymbol {\theta }_{t})$ and $(\boldsymbol {\theta }_{t+1})$ , respectively. The gradient vector for $(\boldsymbol {\theta }_{t})$ is denoted by g( $(\boldsymbol {\theta }_{t})$ ), while the learning rate is represented by ${\eta }_{t}$ . However, it can immediately estimate the channel $\hat {\mathbf {h}}$ based on the learned BiLSTM after training it.

E. Training and Testing of the Proposed Model

The number of antennas at the BS and the number of pilots used for estimation determines the dimensions of the input and output layers. For instance, if there are 200 antennas and 20 pilot symbols, the input and output sizes will be 4000 and 400, respectively. The training samples are arranged as $(y,h)$ , where y represents the input and h represents the target channel. Each sample corresponds to a single user randomly selected from the two grids. To minimize the loss function, we adopt the well-known optimization algorithm “Adam” [58] and add noise within an SNR range from 0 to 30 dB during training. While the model is trained, we assess performance using various optimizers (such as SGDm and RMSprop) along with different learning rates and minibatch sizes. The proposed model training and estimation process are depicted in Algorithm 2.

Algorithm 2 BiLSTM Training and Channel Estimation Algorithm

Initialize estimation function $\Theta$ , epochs E, training samples M.

Load training dataset: $\{\left ({{\Re _{e} \lbrace {\mathbf {F}}_{\mathrm {i}}\rbrace, \Im _{m} \lbrace {\mathbf {F}}_{\mathrm {i}}\rbrace }}\right), \left ({{\Re _{e} \lbrace {\mathbf {h}}_{\mathrm {i}}\rbrace, \Im _{m} \lbrace {\mathbf {h}}_{\mathrm {i}}\rbrace }}\right)\}_{i=1}^{M}$

Initialize the weights and biases of forward and backward as ( $W_{frd}$ , $W_{bkd}$ ) and ( $b_{frd}$ , $b_{bkd}$ )

Calculate loss function $\mathcal {L}(\boldsymbol {\theta })$

Select the optimizer Adam as follows operation: $\theta _{t+1}= \theta _{t} - \frac {{\eta }_{t}}{\sqrt {\hat {v}_{t}} + \zeta } \cdot \hat {m}_{t}$

where $m_{t}$ and $v_{t}$ are first and second moment estimates of gradients ${\mathbf {g_{t}}}(\boldsymbol {\theta }_{t})$ and $\zeta$ is a small value to prevent division by zero.

for $e = 1:E$ do

for $i \in (1,\ldots.,M)$ do

Perform estimation: $\hat {h}_{i} = \Theta [\mathop {LT_{leyar}}\limits ^{\rightarrow }(F_{i}); \mathop {LT_{leyar}}\limits ^{\leftarrow }(F_{i})]$

Compute loss function $\mathcal {L}(\boldsymbol {\theta })$ by (16)

Gradient descent: $\nabla _{\theta _{t}} \mathcal {L} = \frac {\partial \mathcal {L}}{\partial \theta _{t}}$

Update parameters: $\theta _{t} \leftarrow \theta _{t} - {\eta }_{t} \nabla _{\theta _{t}} \mathcal {L}$

end for

Input:

Test dataset

$\{(\Re _{e} \lbrace {\mathbf {F}}_{\mathrm {test}}\rbrace, \Im _{m} \lbrace {\mathbf {F}}_{\mathrm {test}}\rbrace), (\Re _{e} \lbrace {\mathbf {h}}_{\mathrm {test}}\rbrace, \Im _{m} \lbrace {\mathbf {h}}_{\mathrm {test}}\rbrace)\}_{i=1}^{M}$

Output:

Estimated outputs $\{\hat {\mathbf {h}}_{\mathrm {test}}\}_{i=1}^{M}$ , NMSE

Initialize NMSE = 0

for $i \in (1,\ldots.,M)$ do

Perform estimation: $\hat {\mathbf {h}}_{\mathrm {test}_{i}} = \Theta [\mathop {LT_{leyar}}\limits ^{\rightarrow }(F_{\mathrm {test}_{i}}); \mathop {LT_{leyar}}\limits ^{\leftarrow }(F_{\mathrm {test}_{i}})]$

Compute NMSE for i th samples: ${\text {NMSE}}_{i} = \frac {\|\hat {\mathbf {h}}_{\mathrm {test}_{i}} - \mathbf {h}_{\mathrm {test}_{i}}\|^{2}}{\|\mathbf {h}_{\mathrm {test}_{i}}\|^{2}}$

end for

We use the NMSE to calculate the SNR divergence among the actual channel h and the predicted channel matrix $\hat {\mathbf {h}}$ . The mathematical formulation of NMSE is as follows: $\begin{align*} \textsf {NMSE} = \mathbb {E}\left [{{\frac {\|{\hat {\mathbf {h}}}-{\mathbf {h}}\|^{2}}{\|{\mathbf {h}}\|^{2}}}}\right ]=10 \log \left \lbrace {{\mathop {\mathbb {E}} \left [{{ \frac {||\hat {\mathbf {h}} - {\mathbf {h}}||^{2}}{||\mathbf {h}||^{2}} }}\right ] }}\right \rbrace. \tag {19}\end{align*}$ View Source where $\mathbb {E}$ represents the expectation operator, $||\hat {\mathbf {h}} - {\mathbf {h}}||^{2}$ term represents the squared difference between the true value h and the estimated value $\hat {\mathbf {h}}$ . 10log{} is the logarithm base 10 of the ratio inside the curly braces which is often used to express quantities in decibels (dB).

All simulations maintain the same network topology and training settings, except for variations in input and output dimensions, ensuring equitable evaluation. This study conducts simulations within the MATLAB R2022b environment, utilizing a 12th Gen Intel(R) Core(TM) i7- $12700~2.10$ GHz CPU and an Nvidia RTX 3060 graphics processing unit.

SECTION V.

Simulation Results

In this section, we describe the performance of the proposed DL model for CE in the context of massive MIMO systems employing lower-resolution ADCs. This is presented in different simulation parameter settings. The performance of the proposed model is compared with other studies, and our solution surpasses alternative methods due to the enhanced learning capability and sequence prediction offered by the BiLSTM model. In this study, we evaluate the simulation outcomes using BiLSTM-based CE and adopt the massive MIMO system of Section IV.

A. Impact of SNRs

To evaluate the model performance across different SNR ranges during training, we conduct simulations based on SNR per antenna with the number of antennas represented as G. Additionally, we consider different pilot signal quantities denoted as N, specifically 2, 5, and 10, aiming to distinguish their impact on performance, as depicted in Fig. 3(a), (b), and (c) respectively. In Fig. 3(a), a range of SNR (0 to 30) dB is analyzed for received matrices. It illustrates that with $N = 2$ , the SNR per antenna gradually increases across the SNR range of 0 to 30, except for a slight drop at lower values of G when SNR is 0. The principal cause of the drop (for $N=2$ ) is that the amount of enhancement in the overall SNR does not keep up with the growth rate of G. Fig. 3(b) and (c) further demonstrate that for $N = 5$ and $N = 10$ pilot signals, the SNR per antenna steadily increases with the number of antennas G, without any drop in accuracy at SNR value of 0. This leads to the conclusion that as more pilots are transmitted or more antennas are employed, the mapping from quantized data to channels becomes more bijective.

FIGURE 3.

Simulation results with different SNR and pilot signals to analyze the performance of SNR per Antenna versus the number of antennas.

Show All

B. Impact of Pilots and SNRs for NMSE

To observe the impact of $N = 2, 5, 10$ and different SNRs (from 0 to 30) dB, we evaluate NMSE versus the number of antennas G performance of the proposed model, as shown in Fig 4(a), (b), and (c). This evaluation adds noise samples to the measurement matrices utilized for the proposed model’s training and testing phases. From Fig. 4(a), with the pilot $N = 2$ , the NMSE gradually is improved with the increase of SNR range during both training and testing concerning the number of antennas G. Similar trends are observed for $N = 5$ and $N = 10$ in Fig. 4(b) and (c), where the NMSE performance is improved compared to the pilot $N = 2$ , particularly for lower ranges of the number of antennas G. This improvement provides very accurate channel predictions with very few pilots N, which is contrasted with classical CE methods such as expectation maximization Gaussian-mixture generalized approximate message passing (EM-GM-GAMP) [7]. To assess the effectiveness of the proposed model, we compare CE accuracy with non-machine learning approaches like EM-GM-GAMP [7], MLP [41], and the normal LSTM model, as shown in Fig. 4(d). It is evident from Fig. 4 that the proposed model clearly outperforms the previous studies with very short pilot sequences N. Specifically, for the normal LSTM, with pilot sequences $N = 2, 5, 10$ , the performance remains almost the same up to a number of antennas $G = 10$ ; thereafter, its prediction accuracy is reduced compared to the proposed model. Additionally, as depicted in Fig. 4, the NMSE performance of the suggested approach dramatically increases with the number of antennas G at the BS, supporting our findings from Section IV-B. This intriguing outcome supports Hypothesis 1. Furthermore, we can demonstrate that the proportion of channels requiring long pilots N to be distinguishable is negligible, despite Hypothesis 1 that a large number of pilots N is needed for full bijectiveness i.e., the ability to distinguish between any two channels. For instance, 98% of the dataset’s complex channels can be distinguished with just $N = 5$ pilots. With $N = 10$ pilots, this ratio rises to 99.5%. This explains that the suggested method is performed well even with a small number of pilots N. This sufficiently demonstrates the promising potential of the model-based strategy, which requires relatively modest pilot sequences x for large MIMO networks.

FIGURE 4.

Simulation results with different SNR and pilot signals to analyze the performance of NMSE versus the number of antennas.

Show All

C. Comparison for the Performance of the Proposed Model With Other Studies in Terms of Different SNRs and Pilot Length

Fig. 5 shows the comparative performance of the proposed model with MLP [41], cGAN [39], LSTM-GRU [48], FBM-CENet [4], SVM [59], and CNN [60] models, respectively. From Fig. 5, it is evident that the proposed model outperforms the others across different SNR values. For instance, with the proposed model using $N = 10$ pilots and SNR =0 dB, the NMSE achieves −36 dB, while other models are ranged between (−9 to −16) dB NMSE. Subsequently, as the SNR is increased, the performance of the proposed model gradually is improved. At an SNR value of 30 dB, the proposed model nearly reaches an NMSE gain of −46 dB, whereas others are ranged between (−10 to −21) dB NMSE. Moreover, the results indicate that after an SNR of 10 dB, the performance of other techniques (MLP, CNN, SVM, FBM-CENet, cGAN) is saturated, whereas the performance of the proposed model exhibits significant improvement.

FIGURE 5.

Simulation results of the proposed model and other studies for evaluation of NMSE versus different SNRs.

Show All

Additionally, to demonstrate the proposed model’s CE solution, we conduct simulations using different numbers of pilots, denoted as $N = 4, 8, 16$ , measuring the NMSE in dB. Figure 6, illustrates the performance of the proposed model in comparison to cGAN [39], MLP [41], LSTM-GRU [48], and CNN [60] models at $N = 4, 8, 16$ , respectively. The simulation is performed by generating datasets with an SNR of 0 dB. The results reveal a significant degradation in the efficiency of the MLP model with decreasing values of N. For example, at $N = 4, 8, 16$ pilots, MLP achieves NMSE values of −1.88, −6.25, and −7.11 dB, respectively, whereas the proposed model achieves significantly better results of −30.88, −35.41, and −39.36 dB, respectively. However, CNN achieves NMSE values of −8.33, −10.21, and −11.49 at $N = 4, 8, 16$ , respectively. Notably, cGAN and LSTM-GRU exhibit superior performance compared to MLP and CNN. Importantly, the NMSE findings from our proposed model demonstrate significantly better performance and less degradation as N is decreased. Moreover, our proposed model showcased improved NMSE performance as N is increased, whereas other comparative methods nearly reached saturation.

FIGURE 6.

Performanc comparison of the proposed model with others in terms of pilot lengths versus NMSE.

Show All

D. Impact of Optimization Algorithms

The choice of the optimal optimization method to address a particular problem is a challenging task. To achieve the best performance for the BiLSTM-based CE with reduced pilot overhead, it is crucial to evaluate the effectiveness of various optimizer functions based on the model and the dataset at hand. In this section, we present simulated comparisons of three optimization techniques to assist in the selection of the most suitable method for CE challenges. The three optimization techniques employed in this simulation are Adam, RMSprop, and SGDm. Fig. 7 illustrates the NMSE performance of the proposed BiLSTM model using three optimization algorithms concerning different numbers of antennas (G), with pilots (N) set to 2, 5, and 10. The results indicate that, up to $G = 20$ , all three optimizers exhibit similar NMSE performance for different N values. However, beyond $G = 20$ , the performance of the SGDm optimizer is deteriorated. Furthermore, it is observed that for $N = 2$ , both Adam and RMSprop optimizers exhibit similar performance, whereas for $N = 5$ and 10, RMSprop’s performance is nearly similar to Adam across varying values of G. Considering the advantages of the Adam optimizer, this study adopts Adam for all simulations due to its superior performance in the given context.

FIGURE 7.

Simulation results for the effect of optimization algorithm for the proposed system.

Show All

E. Impact of Learning Rates

In order to achieve the best predictive effectiveness, adjusting training parameters is a crucial aspect during the model-learning process. To attain optimal CE performance in this study, we train and predict the proposed model using different learning rates (LRs). Fig. 8 displays the NMSE performance against the number of antennas (G) for the proposed BiLSTM model with three LRs: 0.0001, 0.01, and 0.000001, corresponding to pilots $N=2$ , 5, and 10, respectively. The results indicate that with pilots $N=2$ , 5, and 10, and LR =0.000001, the NMSE performance of the proposed model is deteriorated after $G=10$ , demonstrating lower efficacy compared to other LR settings. However, when LR =0.01 is applied, the model exhibits better NMSE performance than LR =0.000001. Ultimately, with LR =0.0001, across different values of N, the proposed model demonstrates the best NMSE performance for varying numbers of G. Consequently, in this study, to achieve the best CE performance, we select the LR of 0.0001 for all simulation analyses.

FIGURE 8.

Simulation results for the learning rates for the proposed system.

Show All

F. Impact of Minibatch Size

During the model training process, the minibatch size (MB) significantly influences the optimal prediction rate of the DL model. In this study, the proposed model undergoes tuning using four distinct MB sizes: 50, 100, 500, and 900, respectively. Fig. 9 illustrates the impact of MB size on the NMSE measurement performance concerning the number of antennas G. To assess the effectiveness of MB size on NMSE, the results are gathered using different pilot signal values of $N=2$ , 5, and 10, respectively. The outcomes reveal that with an MB size of 50 and $N=2$ , 5, and 10, the NMSE performance against G shows lower accuracy compared to other configuration schemes. Conversely, for MB sizes of 100, 500, and 900 with $N=2$ , 5, and 10, the NMSE performance against G exhibits nearly similar efficiency with a lower degradation rate. To achieve the best CE performance, we select an MB size of 100 for our simulation results.

FIGURE 9.

Performance of the minibatch size effect for the proposed system.

Show All

SECTION VI.

Conclusion

In this paper, a DL-based model called BiLSTM is proposed to estimate the channel matrix from highly quantized received signals using one-bit ADCs for a massive MIMO system. To ensure a mapping from quantized data to channels, we determined the length and structure of the PS. Subsequently, we demonstrated that, with increasing antenna numbers, fewer pilots are required. The efficiency of estimating massive MIMO channels only necessitates a small number of pilots and both simulation and analysis outcomes are shown. The proposed BiLSTM enhances CE performance with limited pilot signals by training on long input sequence data using a bi-directional framework. The inclusion of bi-directional (forward and backward) tasks in the hidden layers of BiLSTM enhances training ability and improves the CE of the proposed system. In the simulation, we evaluated the performance of the proposed model based on NMSE and SNR per antenna for different antennas. Additionally, simulations of NMSE with different pilot lengths and SNR were conducted. The results demonstrated that the proposed model outperforms MLP, CNN, cGAN, LSTM-GRU, FBM-CENet, and SVM-based CE methods across various simulation scenarios. To achieve optimal outcomes, we tuned the proposed model by adjusting different learning rates, using three optimization algorithms, and varying minibatch sizes. Consequently, we selected the best model settings to achieve maximum CE accuracy with lower pilot overhead. The performance of the proposed system shows promising improvements for large MIMO systems. The system model that we considered is designed for discrete samples and the communication happens within these angle spaces. In our study, we did not consider the continuous angle space for channel estimation. The study primarily focused on the performance of BiLSTM based 1-bit ADCs with fewer pilot overhead. The change of continuous angle space is not considered. However, it can be a future study scope to observe the performance of the proposed model. In addition, this system could be applied to promising physical layers, such as reflecting intelligent surface-based systems.

References is not available for this document.

Deep Learning Based One Bit-ADCs Efficient Channel Estimation Using Fewer Pilots Overhead for Massive MIMO System

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

System Model

Channel Modeling and Data Transmission

Proposed Model Based Channel Estimation

A. Mapping from Quantized Measurements to Channels