Introduction
Recently, designing capacity-achieving codes to provide high performance over fading channels has been the main emphasis and difficulty of digital communication. To achieve channel capacity for discrete memoryless channels with symmetric binary input, Erdal Arikan presented polar codes, which are the first error-correcting codes that have been demonstrated to work [1]. Polar codes are highly valued for their remarkable performance and adaptability in a range of communication contexts, making them highly essential in the field of error correction coding. Another important obstacle of future wireless communications is ultra-reliable and low-latency communication (URLLC) [2]. This latency can be minimized by using short packets. However, it leads to a significant reduction in channel coding gain. On the other side, ensuring reliability typically demands more resources, such as employing robust channel codes with greater redundancy or incorporating re-transmission techniques, which in turn increase latency. Low-code rate channel codes are commonly used for transmitting data in URLLC scenarios. Low-density parity checks (LDPC), turbo codes, and polar are often evaluated and considered for these applications. LDPC and polar codes demonstrate comparable performance at large packet sizes, whereas Turbo codes exhibit superior performance over both [3]. However, polar codes demonstrate better performance than LDPC and Turbo codes with small packet sizes [4]. Moreover, polar codes perform better than LDPC codes and turbo codes mainly due to their simpler encoding and decoding processes [5]. This adaptability makes them suitable for a wide range of applications requiring both high-throughput capabilities and low latency, including fifth-generation (5G) wireless networks, optical communication, satellite communication, and storage systems [6]. Polar codes have also been incorporated into a number of communication protocols, such as 5G new radio, demonstrating their significance and broad adoption in the telecom sector [7]. Their alignment with contemporary communication standards and feasibility for hardware implementation further highlight their practical utility. In summary, polar codes are important because they can deliver nearly optimal error correction performance with minimal complexity. This makes them an attractive choice for a wide range of communication applications in both present-day and future wireless systems.
Polar codes are designed specifically for binary-symmetric channels to achieve channel capacity. Polar codes are therefore desirable for coding over a variety of channels. As mentioned in [8], one capacity-achieving polar coding method is intended for additive white Gaussian noise (AWGN) channels. In [9], the author designed a memory-less AWGN channel for polar coding. A nonlinear large-kernel polar coding technique over the AWGN channel is applied in [10]. Polar codes rely on reliable channel state information (CSI) to adapt their encoding and decoding strategies effectively. These robust and efficient polar codes encounter notable challenges when used over fading channels, especially when there is substantial variation in the channel response across the transmitted signal’s bandwidth. All frequencies suffer from comparable fading characteristics [11]. Fading can degrade the error-correction capabilities of polar codes since they assume a more uniform noise distribution. Moreover, the performance of polar code is significantly affected when deployed over fading channels due to the challenges in loss of code rate efficiency and adaptive coding complexities [12]. Consequently, the recommended approaches for constructing polar codes often lead to complicated designs. For example, the author in [13] suggested constructing the code using the estimated error probabilities of subchannels, which are created by polarizing a fading channel and then transforming it into another fading channel with multiplicative and additive Gaussian noise. For the short packet transmitting system, estimating approximations may not be accurate, which can result in poor packet error rate performance. The Monte Carlo approach recommended in [1] and [14] was used to determine the information set during code construction under the Rayleigh fading channel. Designing a fading channel for polar code is a challenging task. Addressing these challenges often requires a careful balance of system design and adaptive algorithms.
Due to the strong error correction capacity and efficiency, decoding plays an important role in polar codes. Polar code uses complex decoding techniques to extract the original data from noisy channels. Decoding involves complex calculations to accurately estimate the transmitted bits, which is essential for ensuring reliable communication. Some important traditional algorithms, such as belief propagation (BP), successive cancellation (SC), successive cancellation list (SCL), SC-Flip (SCF), and SC-stack (SCS) are significant approaches for polar decoding. The SC decoding methods have low complexity, but when dealing with long polar codes, they have poor throughput and excessive latency [15]. Various studies have aimed to decrease the decoding latency of SC-based algorithms with minimal complexity overhead. Using the recursive generation of polar codes, the methods outlined in [16] and [17] identified certain subcodes within their structure and suggested quick decoders for these subcodes that can be applied to SC decoding. In order to further minimize SC-based decoding delay, the author presented a generalized method for quick polar code decoding in [18]. By combining Fano sequential decoding with SC decoding, the author of [19] offered an alternate enhancement to SC decoding. SCL decoding offers better error correction capabilities compared to SC decoding, especially in channels with high noise levels. The list decoding approach maintains a list of possible codeword candidates, which helps in recovering from errors more effectively [20], [21], [22], [23]. In [22], a simplified SCL decoder was introduced by eliminating unnecessary computations, and it has been demonstrated that this simplified approach is equivalent in performance to conventional SCL algorithms. An adaptive SCL decoder was proposed in [20], and the author demonstrated that it can reduce complexity. The BP algorithm provides significant advantages over SC-based decoding. It supports efficient parallelization, allowing for high throughput and low latency implementations. Moreover, BP decoding inherently enables soft-in/soft-out decoding, which aids in joint iterative detection and decoding tasks. Many studies have applied BP decoding systems to polar codes [24], [25]. However, in order to attain the intended performance with fewer iterations, enhancements are required. Deep learning (DL) approaches have been used to overcome the current issues with polar channels and decoders, which have been identified as a type of classification challenge.
DL holds significant importance across various fields due to its capabilities to handle complex tasks that are traditionally challenging. DL algorithms can automatically learn intricate hierarchical representations of data, enabling them to extract meaningful features from raw inputs without explicit human intervention. Moreover, DL models often achieve state-of-the-art performance in tasks, and they can handle large-scale datasets and learn from massive amounts of data to continuously improve accuracy [26], [27], [28], [29], [30]. DL has been widely employed in wireless communication to leverage its advantages and to enhance various aspects of the technology [31], [32], [33], [34], [35], [36], [37]. Notably, some recent works have focused on unsupervised DL models [38], which explore their potential in this domain.
A decoder for polar codes based on DL utilizes neural networks (NN) to improve decoding efficiency and performance. These decoders leverage DL techniques to enhance error correction capabilities and adapt to different channel conditions, potentially leading to faster and more accurate decoding compared to conventional methods [39]. Recently, deep learning methods have been applied in numerous studies for polar decoding [7], [40], [41], [42], [43], [44]. Convolutional neural networks (CNNs) are exceptionally powerful in computer vision and image processing due to their ability to autonomously acquire hierarchical features, and manage extensive datasets effectively through parameter sharing and sparse connectivity [45], [46]. On the other hand, deep neural networks (DNNs) are versatile for a variety of tasks because of their multilayer learning capabilities, global information processing capabilities, and adaptability to a wide range of input structure types. This makes them useful in a variety of fields where obtaining high-performance results depends on comprehending a larger context and capturing abstract representations [47], [48]. Another type of NN is the recurrent neural network (RNN), which uses the output from the preceding step as the input for the current step. The hidden state of an RNN, which retains some information about a sequence, is its primary and most significant feature, and it is also known as memory state [49], [50]. Long short-term memory (LSTM) is an example of RNN. LSTMs are adept at managing long-term relationships in sequential data through their gated design, enabling them to choose whether to retain or discard information as needed. They effectively address the vanishing gradient issue encountered in traditional DNNs and RNNs and have proven highly effective in diverse sequence modeling applications, showcasing superior performance [51], [52]. Bidirectional LSTMs (Bi-LSTMs) are a key advancement in LSTM models, capable of capturing contextual information from both past and future sequences. This bi-directional processing enhances their ability to understand input sequences comprehensively, improving prediction and classification accuracy. By modeling intricate relationships across entire sequences, Bi-LSTMs excel in tasks that require complex temporal data comprehension [53], [54], [55].
A deep NN decoder and a multiple-scaled BP method were proposed in [7] in order to improve performance, low latency, and complexity. The authors of the paper [5], described a DL technique to improve the performance of the polar BP decoder using a one-bit quantizer in order to get higher performance and faster training convergence. The suggested DL-based decoder with a one-bit quantizer outperforms traditional BP NN decoders running over an AWGN channel in terms of learning efficiency and error performance in a zero-delay system model. In [56], the authors introduced a modified log-likelihood ratios (LLR) for the free-space optical (FSO) channel, illustrating it as an instance of the neural successive cancellation (NSC) decoder. This NSC decoder demonstrated robustness across a broad spectrum of turbulence conditions, having been trained specifically for high and medium turbulence environments. In [57], the authors devised an RNN-based polar BP decoder that utilizes weight quantization via a codebook. This approach aims to overcome the additional memory requirements and computational challenges associated with DNN-based BP methods. In [58], the author designed an SCL decoder as a maze-traversing game, which is solved using deep reinforcement learning (DRL). While the game-based method demonstrates lower complexity, its frame error rate (FER) performance is comparable to that of state-of-the-art SCL decoding processes. A DNN-aided SCL with a shifted pruning decoder was designed in [59], eliminating the need for costly transcendental function calculations. A DNN-aided adaptive dynamic SCL flip (D-SCLF) decoder was proposed in [60], where the author introduced an approximation scheme to reduce computational complexity. In this paper, the bit error rate (BER) performance is improved by up to 0.35 dB, and the average complexity is reduced by up to 57.65%. A new artificial intelligence-based framework utilizing a multilayer perceptron (MLP) for adaptive polar coding under the SCL decoder was proposed in [61]. In [62], the author introduced a DL-aided SCF decoding technique that uses LSTM networks to precisely identify the erroneous bits under the binary symmetric channel (BSC). Consequently, a two-phase training procedure that integrates supervised and reinforced learning has been suggested for the LSTM network. In this paper, the author evaluates the block error rate (BLER) for different signal to ratios (SNRs).
Polar codes, known for their capacity-achieving properties in communication systems, benefit from Bi-LSTM models, which analyze forward and backward bit dependencies. This improves the decoding process, allowing for more accurate error correction compared to traditional methods. By leveraging LLRs, Bi-LSTMs make more informed decisions, enhancing noise resilience and accuracy. This approach makes Bi-LSTM-based decoders particularly effective for real-time communication systems requiring URLLC, ensuring efficient handling of complex data. The structure of a Bi-LSTM model with input, forward, backward, activation, and output layers is shown in Fig 1. In this paper, we propose a robust polar decoding technique over flat fading channels based on the Bi-LSTM model, leveraging its advantages to overcome the challenges posed by fading channels and decoding issues mentioned earlier. In this study, we evaluate the packet error rate (PER) across average SNR for CNN and Bi-LSTM models with similar configurations and compare the results with other learning models. The key contribution of this paper is illustrated below:
We study packet transmission over frequency-flat quasi-static Rayleigh fading channels, which allow efficient resource allocation and simplify equalization, scheduling, and transmission planning techniques. Consequently, the channel coefficient changes from packet to packet but remains constant within each packet.
In this paper, we propose a Bi-LSTM network for decoding polar codes, taking advantage of its bidirectional processing to manage extended dependencies and fluctuating channel conditions, thus improving the accuracy and decoding performance of the proposed systems.
We calculate the PER for the proposed system across various SNR levels. Simulation results indicate that the proposed model achieves coding gain in fading channels. The evaluation includes different modulation orders and optimizers, where the Adam optimizer shows superior performance compared to the others. Additionally, the results confirm that the proposed Bi-LSTM model outperforms both CNN and DNN models.
Table 1 properly explains the novelty and contribution of this paper. The rest of the paper follows this structure: Section II introduces the system modeling. Section III explains the polar code generation methodology. Section IV provides a detailed explanation of the proposed model, including descriptions of offline training and online testing procedures. Section V presents the simulation results, and Section VI concludes with the findings.
System Model
A simple block diagram for the polar code transmission process is shown in Fig 2. Using the polar encoder, information fragments are converted into codewords. To transmit the encoded bits over the physical communication channel, the codeword is initially mapped onto modulation signals. This mapping ensures compatibility with the channel’s characteristics and facilitates efficient transmission of the encoded information. After that, the modulated symbols are transmitted through the channel, where the symbol may encounter noise, interference, and other impairments that can introduce errors. At the receiver, the incoming signals undergo processing to retrieve the original information bits. Decoding techniques are applied to estimate the transmitted codewords and deduce the information bits. A single carrier communication is considered in this experiment, where the channel encoder is used to encode the j binary streams \begin{equation*} C = \log _{2} \left ({{ 1 + \frac {p_{t}}{\sigma ^{2}} |H|^{2} }}\right ), \tag {1}\end{equation*}
\begin{equation*} \mathbf {y}=\mathbf {H} \cdot \mathbf {x}+\mathbf {s}, \tag {2}\end{equation*}
\begin{equation*}\hat { \mathrm {y} } = \frac { H ^ { * } } { | H | ^ { 2 } } \cdot \mathrm { y }, \tag {3}\end{equation*}
LLR is utilized in polar codes to incorporate soft information into the decoding process, thereby improving error-correction performance, especially in challenging channel conditions where hard decisions alone may be insufficient. They quantify the reliability of each bit position based on the likelihood of the received signal under both possible bit values, calculated using a combination of a priori bit probabilities and received channel values [64]. We can determine the LLR values for each unit with the Gaussian distribution assumption of a received signal by the following equation:\begin{equation*} \rm { LLR } ( \hat { \mathbf { y } } ) = \ln \frac { P ( y = 0 | \hat { y } ) } { P ( y = 1 | \hat { y } ) } = \frac { 2 } { \sigma ^ { 2 } } R e ( \hat { y } ), \tag {4}\end{equation*}
Polar Codes
In this experiment, we use the straightforward code optimized for the AWGN channel [65] to extract the Bhattacharyya parameters [66] of its synthetic channels rather than creating the codes for fading channels. To drive the codeword d, the input vector u is multiplied with the Kronecker product of the matrix F repeated n times i.e., with the \begin{equation*} \mathbf { G } _ { N } = \mathbf { F } ^ { \otimes n }, \tag {5}\end{equation*}
In this equation, the basic encoding matrix
Proposed Bi-Directional Learning Framework
The structure of the proposed DL framework is depicted in Fig. 3. Our proposed DL model comprises three Lambda layers without trainable parameters, three Bi-LSTM layers, and an output layer. The first lambda layer serves as the mapping layer, with both input and output features having dimension N. The second lambda layer acts as the fading layer, receiving N-dimensional input from the mapping layer and providing an output of the same dimension. The last lambda layer functions as the LLR layer and the operation details of which are shown in (4). The last functional unit is the decoder, which comprises three Bi-LSTM layers followed by one dense layer with a sigmoid activation function. In this experiment, the dense layer served as the output layer. The structure of the proposed Bi-LSTM based decoder is shown in Fig. 4.
The output of the LLR, with a shape of (N,), serves as the input to the proposed decoder. This input is first reshaped by a reshape layer to match the input requirements of the Bi-LSTM layer, using the following function:\begin{equation*} \hat {\mathbf {x}}= reshape ((1, N))(\hat {\mathbf {y}}). \tag {6}\end{equation*}
The reshaped data is fed as input to the Bi-LSTM model, which includes two LSTM layers (forward and backward layers) in its hidden layer, as illustrated in Fig. 1. Each LSTM cell within these layers consists of three gates: the input gate, the output gate, and the forget gate. The internal cell structure of an LSTM layer is depicted in Fig. 5. Additionally, an LSTM has two primary states: the cell state
Internal Operations of Bi-LSTM Model
Start
The function of the forget gate:
The function of the input gate:
The function of the output gate [67]:
The function of the candidate gate:
Updating the previous cell state,
At any time step t, the function of the hidden state:
The operation of the Bi-LSTM layer’s output hidden state [67]:
Stop
The output of the Bi-LSTM layer, with a shape of \begin{equation*}\hat {\mathbf {u}}= \mathbf {f}_{Sigmoid}({\mathbf {Q}}_{s}\times {\mathbf {H}_{t}}+ \mathbf {b}_{s}), \tag {7}\end{equation*}
We also calculate the PER value for the CNN model. This CNN model is designed with three one-dimensional convolutional (Conv1D) layers, each followed by a batch normalization layer. After these layers, we utilize a one-dimensional global max pooling layer (GlobalMaxPool1D). Finally, we add a dense layer with a sigmoid activation function as the output layer.
A. Training and Testing Process
Before deploying the Bi-LSTM decoder, it is necessary to train the model in offline using simulation data. In the proposed supervised learning approach, a training set comprising known input-output pairs is essential. The polar code encoder transforms the input information bits into codewords. After encoding,
We train the proposed Bi-LSTM decoder in epochs, where each epoch represents a complete cycle of training iterations. The Bi-LSTM decoder is trained over multiple epochs across various experimental configurations. During an epoch, the network undergoes backpropagation to compute the gradients of the loss function with respect to its parameters. This iterative process allows the Bi-LSTM decoder to learn gradually how to map inputs to the desired outputs based on the training data.
For optimal weight and bias values, the loss function must be convex. This property ensures that gradient descent can effectively find the global minimum of the loss function, leading to improved convergence and performance during training. We use the Adam optimizer to compute the gradient of the loss function. This optimizer is widely supported on various commercial DL platforms, such as Tensorflow and Keras [68]. The Bi-LSTM model is trained using collected data to reduce the difference between the true and predicted bits, as well as the PER. In this study, we utilize the mean squared error (MSE) function to compute the training loss as follows:\begin{equation*}\mathcal {L}({\hat {\mathbf {y}}},{\hat {\mathbf {u}}};\theta )= \frac {1}{N} ||({\hat {\mathbf {y}}}-{\hat {\mathbf {u}}})||^{2}, \tag {8}\end{equation*}
\begin{equation*} \theta ^{+} \mathrel {\mathrel {\mathop :}\hspace {-0.0672em}=}\theta -\eta \nabla \mathcal {L}\left ({{\hat {\mathbf {y}},\hat {\mathbf {u}};\theta }}\right ), \tag {9}\end{equation*}
Table 3 outlines the specific parameters used for the training and testing of the proposed model. The model is trained with varying numbers of information bits N and code lengths j. The training and validation losses of the proposed model are shown in Fig. 6, with
Training and validation loss for the proposed model with
Simulation Results
Reliability is characterized by the probability of successfully transmitting n packets within the specified user plane latency under specific channel conditions. In this study, we employ PER as a metric to evaluate and compare the effectiveness of various parameter adjustments in terms of reliability. The construction of polar codes follows the method described in [1], employing AWGN with a design-SNR set to 0 dB.
In DL, selecting the appropriate batch size is essential to strike a balance between computing efficiency and model convergence. Greater batch sizes use parallelism to speed up training, but generalization could suffer as a result. More accurate gradient updates per example are produced by smaller batches, which may improve generalization at the expense of longer training times. Determining the ideal batch size involves trade-offs that significantly impact both the model’s overall performance and its training speed [69]. In this experiment, we calculate the PER for different batch sizes and compare them, as shown in Fig. 7. This result is obtained for epoch 216 with
The comparative PER performance of the proposed Bi-LSTM decoder for various batch sizes, with
The learning rate is an important topic for the DL model since it determines the gradient descent step size, which affects the convergence and performance of the model. A carefully selected learning rate strikes a balance between training speed and model accuracy, enabling quicker convergence to optimal solutions [69]. The PER performance for different learning rates is depicted in Fig. 8. We evaluated the PER for learning rates of 0.005, 0.001, 0.0005, and 0.0001. The model exhibits superior PER performance when trained with a learning rate of 0.0005. Deviating from this rate, either higher or lower, results in decreased performance. The evaluation indicates nearly 1 dB better performance at a learning rate of 0.0005 compared to 0.0001 and 0.001. Furthermore, the model demonstrates approximately 1.8 dB better performance compared to the 0.005 learning rate.
The comparative result of the proposed Bi-LSTM decoder for different learning rates, with
Another critical component for signal processing tasks is the training SNR, which has a direct impact on the way models learn and generalize from data. Models with optimal SNR values can discriminate between signals and noise, which improves training convergence and performance on unseen data. The PER performance of the proposed Bi-LSTM decoder for various training SNRs is illustrated in Fig. 9. This analysis considers with epoch 216, at j =4 and N =8. As the training SNR increases from 0 dB to 10 dB, the performance decreases. Similarly, performance also drops when trained with SNRs lower than 0 dB. At -2 dB training SNR, the Bi-LSTM model shows about a 0.4 dB decrease compared to 0 dB, and more than a 2 dB decrease at 20 dB training SNR. Therefore, from Fig. 9, it is clear that the model performs well at 0 dB training SNR, and hence we designed all our experimental setups with 0 dB training SNR.
The comparison of PER performance for the proposed model across different training SNR, with
DL optimizer plays a key role in effectively tuning model parameters during training, which affects generalization across different tasks and datasets as well as convergence speed and stability. Selecting the right optimizer is crucial for maximizing model performance and obtaining successful learning objectives [68]. Fig. 10 illustrates the PER results for different optimizer algorithms: Adam, RMSProp, Adamax, and SGD. Among them, the SGD optimizer exhibits poor performance, while Adam, Adamax, and RMSProp demonstrate good performance. However, the Adam optimizer performs better than all other optimizers. Therefore, we have used the Adam optimizer to calculate all PER values in this paper.
The evaluation of the PER performance of the Bi-LSTM based decoder using various optimizers, with parameters
The PER performance of the proposed model is illustrated in Fig. 11 for different epochs. We began evaluation of the model with 24 epochs, where performance was notably poor. As we continuously increased the epoch sizes, performance was also improved. From the figure, it is evident that the proposed model exhibits very good performance at 216 epochs. However, when we further increased the epoch size to 218, the model’s performance decreased. Therefore, Fig. 11 demonstrates that the Bi-LSTM decoder performs better at 216 epochs.
The PER performance comparison of the Bi-LSTM decoder across varying epochs, with
In Fig. 12, we compare the PER performance of the proposed Bi-LSTM model with CNN and DNN [63] models. The results for all these models are obtained using the same setup: 216 epochs, 0.0005 learning rate, 0 training SNR, and 256 batch sizes. From the results, it is clear that the Bi-LSTM decoder outperforms both the DNN [63] and CNN models. Initially, the DNN model significantly outperforms the CNN model. However, beyond 10 dB SNR, the PER performance for CNN and DNN becomes almost the same. The comparative results demonstrate that the Bi-LSTM model outperforms the CNN and DNN models by approximately 1 dB and 0.8 dB, respectively, at 20 dB SNR.
The PER performance evaluation of the Bi-LSTM-based decoder compared to other decoders with a batch size of 216, at
In Fig. 13, we conduct a comparative analysis of the Bi-LSTM and DNN [63] models across various epochs, maintaining
The comparison of PER performance between the Bi-LSTM model and DNN model, along with theoretical results, for different batch sizes with
Fig. 14 compares the PER performance of Bi-LSTM, DNN [63], and CNN models with
Evaluation of PER performance of the Bi-LSTM-based decoder with other decoders at a batch size of 216, under the condition where
With the exception of Fig. 15, all results presented in this paper are derived from simulations using BPSK modulation. In Fig. 15, however, we extend the evaluation of the proposed model to higher-order modulation, specifically 4QAM, to assess its performance. In the case of 4QAM, we map two bits to each symbol, while higher-order QAM schemes, like 16QAM and 64QAM, map four and six bits to each symbol, respectively. This increases the spectral efficiency, but also increases the noise sensitivity [71], [72]. For this experiment, we adopt a configuration of
The performance of the proposed model for 4QAM modulation, under the condition of
Conclusion
This study proposes a Bi-LSTM decoder for short packet transmission over a flat fading channel. We evaluated the PER of the system across different SNR levels, and the simulation results demonstrate that the proposed technique achieves coding gains in fading channels, even when using a basic codebook designed for AWGN channels without fading. The best performance was observed with a learning rate of 0.0005, a training SNR of 0 dB, and a batch size of 256. In general, improvements were achieved by increasing the number of training epochs. Specifically, the Bi-LSTM decoder showed significant coding gains when 28 epochs were trained. We also compared several optimizers and found that the Adam optimizer provided the best performance. Although the proposed system performed well with both BPSK and 4QAM modulation schemes, 4QAM lost its consistency at higher SNR values. Hence, BPSK was chosen as the focus for comparison. The results further demonstrate that the Bi-LSTM decoder outperforms CNN and DNN models under the same conditions.