Bidirectional Deep Learning Decoder for Polar Codes in Flat Fading Channels | IEEE Journals & Magazine | IEEE Xplore

Bidirectional Deep Learning Decoder for Polar Codes in Flat Fading Channels


Proposed deep learning framework for polar decoding.

Abstract:

One of the main issues facing in the future wireless communications is ultra-reliable and low-latency communication. Polar codes are well-suited for such applications, an...Show More

Abstract:

One of the main issues facing in the future wireless communications is ultra-reliable and low-latency communication. Polar codes are well-suited for such applications, and recent advancements in deep learning have shown promising results in enhancing polar code decoding performance. We propose a robust decoder based on a bidirectional long short-term memory (Bi-LSTM) network, which processes sequences in both forward and backward directions simultaneously. This approach leverages the strengths of bidirectional recurrent neural networks to improve the decoding of polar-coded short packets. Our study focuses on packet transmission over frequency-flat quasi-static Rayleigh fading channels, using a simple codebook originally designed for additive white Gaussian noise channels. We evaluate the packet error rate for various signal-to-noise ratio levels using different modulation schemes. The simulation results demonstrate that the proposed Bi-LSTM-based decoder closely approaches the theoretical outage performance and achieves significant coding gains in fading channels. Furthermore, the proposed decoder outperforms convolutional neural network and deep neural network-based decoders, validating its superiority in decoding polar codes for short packet transmission in challenging wireless environments.
Proposed deep learning framework for polar decoding.
Published in: IEEE Access ( Volume: 12)
Page(s): 149580 - 149592
Date of Publication: 09 October 2024
Electronic ISSN: 2169-3536

Funding Agency:


SECTION I.

Introduction

Recently, designing capacity-achieving codes to provide high performance over fading channels has been the main emphasis and difficulty of digital communication. To achieve channel capacity for discrete memoryless channels with symmetric binary input, Erdal Arikan presented polar codes, which are the first error-correcting codes that have been demonstrated to work [1]. Polar codes are highly valued for their remarkable performance and adaptability in a range of communication contexts, making them highly essential in the field of error correction coding. Another important obstacle of future wireless communications is ultra-reliable and low-latency communication (URLLC) [2]. This latency can be minimized by using short packets. However, it leads to a significant reduction in channel coding gain. On the other side, ensuring reliability typically demands more resources, such as employing robust channel codes with greater redundancy or incorporating re-transmission techniques, which in turn increase latency. Low-code rate channel codes are commonly used for transmitting data in URLLC scenarios. Low-density parity checks (LDPC), turbo codes, and polar are often evaluated and considered for these applications. LDPC and polar codes demonstrate comparable performance at large packet sizes, whereas Turbo codes exhibit superior performance over both [3]. However, polar codes demonstrate better performance than LDPC and Turbo codes with small packet sizes [4]. Moreover, polar codes perform better than LDPC codes and turbo codes mainly due to their simpler encoding and decoding processes [5]. This adaptability makes them suitable for a wide range of applications requiring both high-throughput capabilities and low latency, including fifth-generation (5G) wireless networks, optical communication, satellite communication, and storage systems [6]. Polar codes have also been incorporated into a number of communication protocols, such as 5G new radio, demonstrating their significance and broad adoption in the telecom sector [7]. Their alignment with contemporary communication standards and feasibility for hardware implementation further highlight their practical utility. In summary, polar codes are important because they can deliver nearly optimal error correction performance with minimal complexity. This makes them an attractive choice for a wide range of communication applications in both present-day and future wireless systems.

Polar codes are designed specifically for binary-symmetric channels to achieve channel capacity. Polar codes are therefore desirable for coding over a variety of channels. As mentioned in [8], one capacity-achieving polar coding method is intended for additive white Gaussian noise (AWGN) channels. In [9], the author designed a memory-less AWGN channel for polar coding. A nonlinear large-kernel polar coding technique over the AWGN channel is applied in [10]. Polar codes rely on reliable channel state information (CSI) to adapt their encoding and decoding strategies effectively. These robust and efficient polar codes encounter notable challenges when used over fading channels, especially when there is substantial variation in the channel response across the transmitted signal’s bandwidth. All frequencies suffer from comparable fading characteristics [11]. Fading can degrade the error-correction capabilities of polar codes since they assume a more uniform noise distribution. Moreover, the performance of polar code is significantly affected when deployed over fading channels due to the challenges in loss of code rate efficiency and adaptive coding complexities [12]. Consequently, the recommended approaches for constructing polar codes often lead to complicated designs. For example, the author in [13] suggested constructing the code using the estimated error probabilities of subchannels, which are created by polarizing a fading channel and then transforming it into another fading channel with multiplicative and additive Gaussian noise. For the short packet transmitting system, estimating approximations may not be accurate, which can result in poor packet error rate performance. The Monte Carlo approach recommended in [1] and [14] was used to determine the information set during code construction under the Rayleigh fading channel. Designing a fading channel for polar code is a challenging task. Addressing these challenges often requires a careful balance of system design and adaptive algorithms.

Due to the strong error correction capacity and efficiency, decoding plays an important role in polar codes. Polar code uses complex decoding techniques to extract the original data from noisy channels. Decoding involves complex calculations to accurately estimate the transmitted bits, which is essential for ensuring reliable communication. Some important traditional algorithms, such as belief propagation (BP), successive cancellation (SC), successive cancellation list (SCL), SC-Flip (SCF), and SC-stack (SCS) are significant approaches for polar decoding. The SC decoding methods have low complexity, but when dealing with long polar codes, they have poor throughput and excessive latency [15]. Various studies have aimed to decrease the decoding latency of SC-based algorithms with minimal complexity overhead. Using the recursive generation of polar codes, the methods outlined in [16] and [17] identified certain subcodes within their structure and suggested quick decoders for these subcodes that can be applied to SC decoding. In order to further minimize SC-based decoding delay, the author presented a generalized method for quick polar code decoding in [18]. By combining Fano sequential decoding with SC decoding, the author of [19] offered an alternate enhancement to SC decoding. SCL decoding offers better error correction capabilities compared to SC decoding, especially in channels with high noise levels. The list decoding approach maintains a list of possible codeword candidates, which helps in recovering from errors more effectively [20], [21], [22], [23]. In [22], a simplified SCL decoder was introduced by eliminating unnecessary computations, and it has been demonstrated that this simplified approach is equivalent in performance to conventional SCL algorithms. An adaptive SCL decoder was proposed in [20], and the author demonstrated that it can reduce complexity. The BP algorithm provides significant advantages over SC-based decoding. It supports efficient parallelization, allowing for high throughput and low latency implementations. Moreover, BP decoding inherently enables soft-in/soft-out decoding, which aids in joint iterative detection and decoding tasks. Many studies have applied BP decoding systems to polar codes [24], [25]. However, in order to attain the intended performance with fewer iterations, enhancements are required. Deep learning (DL) approaches have been used to overcome the current issues with polar channels and decoders, which have been identified as a type of classification challenge.

DL holds significant importance across various fields due to its capabilities to handle complex tasks that are traditionally challenging. DL algorithms can automatically learn intricate hierarchical representations of data, enabling them to extract meaningful features from raw inputs without explicit human intervention. Moreover, DL models often achieve state-of-the-art performance in tasks, and they can handle large-scale datasets and learn from massive amounts of data to continuously improve accuracy [26], [27], [28], [29], [30]. DL has been widely employed in wireless communication to leverage its advantages and to enhance various aspects of the technology [31], [32], [33], [34], [35], [36], [37]. Notably, some recent works have focused on unsupervised DL models [38], which explore their potential in this domain.

A decoder for polar codes based on DL utilizes neural networks (NN) to improve decoding efficiency and performance. These decoders leverage DL techniques to enhance error correction capabilities and adapt to different channel conditions, potentially leading to faster and more accurate decoding compared to conventional methods [39]. Recently, deep learning methods have been applied in numerous studies for polar decoding [7], [40], [41], [42], [43], [44]. Convolutional neural networks (CNNs) are exceptionally powerful in computer vision and image processing due to their ability to autonomously acquire hierarchical features, and manage extensive datasets effectively through parameter sharing and sparse connectivity [45], [46]. On the other hand, deep neural networks (DNNs) are versatile for a variety of tasks because of their multilayer learning capabilities, global information processing capabilities, and adaptability to a wide range of input structure types. This makes them useful in a variety of fields where obtaining high-performance results depends on comprehending a larger context and capturing abstract representations [47], [48]. Another type of NN is the recurrent neural network (RNN), which uses the output from the preceding step as the input for the current step. The hidden state of an RNN, which retains some information about a sequence, is its primary and most significant feature, and it is also known as memory state [49], [50]. Long short-term memory (LSTM) is an example of RNN. LSTMs are adept at managing long-term relationships in sequential data through their gated design, enabling them to choose whether to retain or discard information as needed. They effectively address the vanishing gradient issue encountered in traditional DNNs and RNNs and have proven highly effective in diverse sequence modeling applications, showcasing superior performance [51], [52]. Bidirectional LSTMs (Bi-LSTMs) are a key advancement in LSTM models, capable of capturing contextual information from both past and future sequences. This bi-directional processing enhances their ability to understand input sequences comprehensively, improving prediction and classification accuracy. By modeling intricate relationships across entire sequences, Bi-LSTMs excel in tasks that require complex temporal data comprehension [53], [54], [55].

A deep NN decoder and a multiple-scaled BP method were proposed in [7] in order to improve performance, low latency, and complexity. The authors of the paper [5], described a DL technique to improve the performance of the polar BP decoder using a one-bit quantizer in order to get higher performance and faster training convergence. The suggested DL-based decoder with a one-bit quantizer outperforms traditional BP NN decoders running over an AWGN channel in terms of learning efficiency and error performance in a zero-delay system model. In [56], the authors introduced a modified log-likelihood ratios (LLR) for the free-space optical (FSO) channel, illustrating it as an instance of the neural successive cancellation (NSC) decoder. This NSC decoder demonstrated robustness across a broad spectrum of turbulence conditions, having been trained specifically for high and medium turbulence environments. In [57], the authors devised an RNN-based polar BP decoder that utilizes weight quantization via a codebook. This approach aims to overcome the additional memory requirements and computational challenges associated with DNN-based BP methods. In [58], the author designed an SCL decoder as a maze-traversing game, which is solved using deep reinforcement learning (DRL). While the game-based method demonstrates lower complexity, its frame error rate (FER) performance is comparable to that of state-of-the-art SCL decoding processes. A DNN-aided SCL with a shifted pruning decoder was designed in [59], eliminating the need for costly transcendental function calculations. A DNN-aided adaptive dynamic SCL flip (D-SCLF) decoder was proposed in [60], where the author introduced an approximation scheme to reduce computational complexity. In this paper, the bit error rate (BER) performance is improved by up to 0.35 dB, and the average complexity is reduced by up to 57.65%. A new artificial intelligence-based framework utilizing a multilayer perceptron (MLP) for adaptive polar coding under the SCL decoder was proposed in [61]. In [62], the author introduced a DL-aided SCF decoding technique that uses LSTM networks to precisely identify the erroneous bits under the binary symmetric channel (BSC). Consequently, a two-phase training procedure that integrates supervised and reinforced learning has been suggested for the LSTM network. In this paper, the author evaluates the block error rate (BLER) for different signal to ratios (SNRs).

Polar codes, known for their capacity-achieving properties in communication systems, benefit from Bi-LSTM models, which analyze forward and backward bit dependencies. This improves the decoding process, allowing for more accurate error correction compared to traditional methods. By leveraging LLRs, Bi-LSTMs make more informed decisions, enhancing noise resilience and accuracy. This approach makes Bi-LSTM-based decoders particularly effective for real-time communication systems requiring URLLC, ensuring efficient handling of complex data. The structure of a Bi-LSTM model with input, forward, backward, activation, and output layers is shown in Fig 1. In this paper, we propose a robust polar decoding technique over flat fading channels based on the Bi-LSTM model, leveraging its advantages to overcome the challenges posed by fading channels and decoding issues mentioned earlier. In this study, we evaluate the packet error rate (PER) across average SNR for CNN and Bi-LSTM models with similar configurations and compare the results with other learning models. The key contribution of this paper is illustrated below:

  • We study packet transmission over frequency-flat quasi-static Rayleigh fading channels, which allow efficient resource allocation and simplify equalization, scheduling, and transmission planning techniques. Consequently, the channel coefficient changes from packet to packet but remains constant within each packet.

  • In this paper, we propose a Bi-LSTM network for decoding polar codes, taking advantage of its bidirectional processing to manage extended dependencies and fluctuating channel conditions, thus improving the accuracy and decoding performance of the proposed systems.

  • We calculate the PER for the proposed system across various SNR levels. Simulation results indicate that the proposed model achieves coding gain in fading channels. The evaluation includes different modulation orders and optimizers, where the Adam optimizer shows superior performance compared to the others. Additionally, the results confirm that the proposed Bi-LSTM model outperforms both CNN and DNN models.

FIGURE 1. - The Bi-LSTM model structure with forward and backward layer.
FIGURE 1.

The Bi-LSTM model structure with forward and backward layer.

Table 1 properly explains the novelty and contribution of this paper. The rest of the paper follows this structure: Section II introduces the system modeling. Section III explains the polar code generation methodology. Section IV provides a detailed explanation of the proposed model, including descriptions of offline training and online testing procedures. Section V presents the simulation results, and Section VI concludes with the findings.

TABLE 1 Summary of Some Recent Works on DL-Based Polar Coding
Table 1- Summary of Some Recent Works on DL-Based Polar Coding

SECTION II.

System Model

A simple block diagram for the polar code transmission process is shown in Fig 2. Using the polar encoder, information fragments are converted into codewords. To transmit the encoded bits over the physical communication channel, the codeword is initially mapped onto modulation signals. This mapping ensures compatibility with the channel’s characteristics and facilitates efficient transmission of the encoded information. After that, the modulated symbols are transmitted through the channel, where the symbol may encounter noise, interference, and other impairments that can introduce errors. At the receiver, the incoming signals undergo processing to retrieve the original information bits. Decoding techniques are applied to estimate the transmitted codewords and deduce the information bits. A single carrier communication is considered in this experiment, where the channel encoder is used to encode the j binary streams $u_{i}, i = \{1,2\ldots .,j\}$ , to generate N length codeword. After that, the codeword $d_{k}, k = \{1,2\ldots ., N\}$ , is mapped to the binary phase shift keying (BPSK) symbol $\mathbf {x}{\epsilon } {\Re ^{N}}$ . Following that, frequency-flat quasi-static Rayleigh fading channels are utilized for transmitting the BPSK symbols. In these channels, the complex channel coefficient signal H remains constant throughout the transmission of each packet but varies from one packet to another. This coefficient is complex-valued, encompassing both amplitude and phase components, and varies slowly packet by packet over time due to the quasi-static nature of the fading channel. A known channel state information (CSI) is considered in this experiment. The capacity of a flat fading channel having complete channel knowledge at the receiver may be expressed as follows:\begin{equation*} C = \log _{2} \left ({{ 1 + \frac {p_{t}}{\sigma ^{2}} |H|^{2} }}\right ), \tag {1}\end{equation*} View SourceRight-click on figure for MathML and additional features.where the average input power at the transmit antenna is denoted by $P_{t}$ , and the channel noise power is represented by $\sigma $ . The signal $\mathbf {y}{\epsilon } {\Re ^{N}}$ is received at the receiving terminal, which is a complex number. This received signal is represented as follows:\begin{equation*} \mathbf {y}=\mathbf {H} \cdot \mathbf {x}+\mathbf {s}, \tag {2}\end{equation*} View SourceRight-click on figure for MathML and additional features.where s is a complex Gaussian random variable with zero mean that represents the noise component and it varies at the rate of $\sigma ^{2}/2$ per dimension. At the receiver, the signal detection is conducted by de-rotating the received signal as follows:\begin{equation*}\hat { \mathrm {y} } = \frac { H ^ { * } } { | H | ^ { 2 } } \cdot \mathrm { y }, \tag {3}\end{equation*} View SourceRight-click on figure for MathML and additional features.where $H^{*}$ represents the complex conjugate of H and $(\bullet)^{*}$ represents the complex conjugate operator.

FIGURE 2. - Block diagram for the polar code transmission system.
FIGURE 2.

Block diagram for the polar code transmission system.

LLR is utilized in polar codes to incorporate soft information into the decoding process, thereby improving error-correction performance, especially in challenging channel conditions where hard decisions alone may be insufficient. They quantify the reliability of each bit position based on the likelihood of the received signal under both possible bit values, calculated using a combination of a priori bit probabilities and received channel values [64]. We can determine the LLR values for each unit with the Gaussian distribution assumption of a received signal by the following equation:\begin{equation*} \rm { LLR } ( \hat { \mathbf { y } } ) = \ln \frac { P ( y = 0 | \hat { y } ) } { P ( y = 1 | \hat { y } ) } = \frac { 2 } { \sigma ^ { 2 } } R e ( \hat { y } ), \tag {4}\end{equation*} View SourceRight-click on figure for MathML and additional features.where $ P (y = 0 | \hat { y }) $ and $ P (y = 1 | \hat { y }) $ are the probabilities that $\hat { \mathbf { y } }$ is 0 or 1 respectively. The function $Re(\bullet)$ denotes the operation that returns the real part of a complex number $(\bullet)$ . The value of LLR is fed as input to the decoder and finally, it is decoded by the Bi-LSTM decoder to get $\hat {u}$ .

SECTION III.

Polar Codes

In this experiment, we use the straightforward code optimized for the AWGN channel [65] to extract the Bhattacharyya parameters [66] of its synthetic channels rather than creating the codes for fading channels. To drive the codeword d, the input vector u is multiplied with the Kronecker product of the matrix F repeated n times i.e., with the $\mathbf {F}^{\otimes {n}}$ . Therefore, the generator matrix $G_{N}$ generates the polar code having the packet size $N = 2^{n}$ and it is represented as [65] follows:\begin{equation*} \mathbf { G } _ { N } = \mathbf { F } ^ { \otimes n }, \tag {5}\end{equation*} View SourceRight-click on figure for MathML and additional features.

In this equation, the basic encoding matrix $\begin{aligned} \mathbf {F} = \left [{{\begin{array}{cr} 1 & \quad 0 \\ 1 & \quad 1 \\ \end{array} }}\right] \end{aligned}$ . The vector u comprises j information$^{^{^{^{}}}}$ bits and $N-j$ is the frozen position. Information bits are positioned according to the set $\mathcal {F}$ , which is determined using the channel’s Bhattacharyya parameter as shown in [1].

SECTION IV.

Proposed Bi-Directional Learning Framework

The structure of the proposed DL framework is depicted in Fig. 3. Our proposed DL model comprises three Lambda layers without trainable parameters, three Bi-LSTM layers, and an output layer. The first lambda layer serves as the mapping layer, with both input and output features having dimension N. The second lambda layer acts as the fading layer, receiving N-dimensional input from the mapping layer and providing an output of the same dimension. The last lambda layer functions as the LLR layer and the operation details of which are shown in (4). The last functional unit is the decoder, which comprises three Bi-LSTM layers followed by one dense layer with a sigmoid activation function. In this experiment, the dense layer served as the output layer. The structure of the proposed Bi-LSTM based decoder is shown in Fig. 4.

FIGURE 3. - Proposed deep learning framework for polar decoding.
FIGURE 3.

Proposed deep learning framework for polar decoding.

FIGURE 4. - The structure of Bi-LSTM based decoder.
FIGURE 4.

The structure of Bi-LSTM based decoder.

The output of the LLR, with a shape of (N,), serves as the input to the proposed decoder. This input is first reshaped by a reshape layer to match the input requirements of the Bi-LSTM layer, using the following function:\begin{equation*} \hat {\mathbf {x}}= reshape ((1, N))(\hat {\mathbf {y}}). \tag {6}\end{equation*} View SourceRight-click on figure for MathML and additional features.

The reshaped data is fed as input to the Bi-LSTM model, which includes two LSTM layers (forward and backward layers) in its hidden layer, as illustrated in Fig. 1. Each LSTM cell within these layers consists of three gates: the input gate, the output gate, and the forget gate. The internal cell structure of an LSTM layer is depicted in Fig. 5. Additionally, an LSTM has two primary states: the cell state $c_{t-1}$ and the hidden state $h_{t-1}$ . The cell state $c_{t-1}$ acts as a memory unit where information from previous inputs is retained. The hidden state $h_{t-1}$ is utilized in computing the output. Here, t denotes the current time step, and $\hat {x_{t}}$ represents the current input. During each time step, the Bi-LSTM cell can modify the contents of the cell state based on the activity of its gates. Specifically, the forget gate regulates the amount of information in the cell state that should be discarded, while the input gate determines how much of the updated information should be stored in the cell state. Algorithm 1 provides a concise overview of the internal operations of a Bi-LSTM model [67]. Table 2 displays the notation for each parameter utilized in Algorithm 1. In our proposed model, the first Bi-LSTM layer operates with 512 hidden units, the second layer with 256 hidden units, and the third layer with 128 hidden units, respectively.

TABLE 2 The Notation of Algorithm 1 Parameters
Table 2- The Notation of Algorithm 1 Parameters
FIGURE 5. - The internal cell structure of LSTM.
FIGURE 5.

The internal cell structure of LSTM.

SECTION Algorithm 1

Internal Operations of Bi-LSTM Model

1:

Start

2:

The function of the forget gate: $Fg_{t}$ = $\sigma _{\psi }(Q_{Fg}\hat {x_{t}} + T_{Fg}h_{t-1} + b_{Fg})$

3:

The function of the input gate: $Ig_{t}$ = $\sigma _{\psi }(Q_{Ig}\hat {x_{t}} + T_{Ig}h_{t-1} + b_{Ig})$

4:

The function of the output gate [67]: $Og_{t}$ = $\sigma _{\psi }(Q_{Og}\hat {x_{t}} + T_{Og}h_{t-1} + b_{Og})$

5:

The function of the candidate gate: $cn_{t}$ = $\sigma _{tanh} (Q_{cn}\hat {x_{t}} + T_{cn}h_{t-1} + b_{cn})$

6:

Updating the previous cell state, $c_{t-1}$ to the current cell state: $ud_{t}$ = $(c_{t-1} {\bigodot } Fg_{t})+(Ig_{t} {\bigodot } cn_{t})$

7:

At any time step t, the function of the hidden state: $h_{t} = Og_{t} {\bigodot } \sigma _{tanh}(ud_{t})$

8:

The operation of the Bi-LSTM layer’s output hidden state [67]: $H_{t} = \sigma _{f}(Q_{H{\overrightarrow {T}}}{\overrightarrow {T}}_{t} + Q_{H{\overleftarrow {T}}}{\overleftarrow {T}}_{t} + b_{z})$

9:

Stop

The output of the Bi-LSTM layer, with a shape of $(j,)$ , is fed as input to the dense layer. The Sigmoid activation function $f_{Sigmoid}{\mathbf {(}x)} = {1}/{(1+\mathbf {e^{-x}})}$ is utilized in this layer to map each element of the output vector within the interval (0, 1). The output vector $\hat {\mathbf {u}}$ from the output layer can be expressed as follows:\begin{equation*}\hat {\mathbf {u}}= \mathbf {f}_{Sigmoid}({\mathbf {Q}}_{s}\times {\mathbf {H}_{t}}+ \mathbf {b}_{s}), \tag {7}\end{equation*} View SourceRight-click on figure for MathML and additional features.where ${\mathbf {Q}}_{s}$ denotes the weight vector and $\mathbf {b}_{s}$ denotes the bias vector of the output layer.

We also calculate the PER value for the CNN model. This CNN model is designed with three one-dimensional convolutional (Conv1D) layers, each followed by a batch normalization layer. After these layers, we utilize a one-dimensional global max pooling layer (GlobalMaxPool1D). Finally, we add a dense layer with a sigmoid activation function as the output layer.

A. Training and Testing Process

Before deploying the Bi-LSTM decoder, it is necessary to train the model in offline using simulation data. In the proposed supervised learning approach, a training set comprising known input-output pairs is essential. The polar code encoder transforms the input information bits into codewords. After encoding, $2^{j}$ codewords, each of size N bits, are generated, resulting in $2^{j}$ possible packets of j bits. These encoded codewords are then modulated using a chosen modulation scheme and transmitted over a flat Rayleigh fading channel. The objective of the Bi-LSTM model is to decode the noisy, faded signal and recover the original information bits from the received signal. During training, the model’s input is the received signal (processed through LLR calculation), while the labels are the original information bits, which serve as the ground truth for the model to predict.

We train the proposed Bi-LSTM decoder in epochs, where each epoch represents a complete cycle of training iterations. The Bi-LSTM decoder is trained over multiple epochs across various experimental configurations. During an epoch, the network undergoes backpropagation to compute the gradients of the loss function with respect to its parameters. This iterative process allows the Bi-LSTM decoder to learn gradually how to map inputs to the desired outputs based on the training data.

For optimal weight and bias values, the loss function must be convex. This property ensures that gradient descent can effectively find the global minimum of the loss function, leading to improved convergence and performance during training. We use the Adam optimizer to compute the gradient of the loss function. This optimizer is widely supported on various commercial DL platforms, such as Tensorflow and Keras [68]. The Bi-LSTM model is trained using collected data to reduce the difference between the true and predicted bits, as well as the PER. In this study, we utilize the mean squared error (MSE) function to compute the training loss as follows:\begin{equation*}\mathcal {L}({\hat {\mathbf {y}}},{\hat {\mathbf {u}}};\theta )= \frac {1}{N} ||({\hat {\mathbf {y}}}-{\hat {\mathbf {u}}})||^{2}, \tag {8}\end{equation*} View SourceRight-click on figure for MathML and additional features.where $\theta $ denotes the bias and weight of the model. The gradients are used to update the parameters $\theta $ in the direction that minimizes the loss, typically employing an optimization algorithm like gradient descent for randomly selected batches from the data sample as follows:\begin{equation*} \theta ^{+} \mathrel {\mathrel {\mathop :}\hspace {-0.0672em}=}\theta -\eta \nabla \mathcal {L}\left ({{\hat {\mathbf {y}},\hat {\mathbf {u}};\theta }}\right ), \tag {9}\end{equation*} View SourceRight-click on figure for MathML and additional features.where $\eta $ represents the learning rate.

Table 3 outlines the specific parameters used for the training and testing of the proposed model. The model is trained with varying numbers of information bits N and code lengths j. The training and validation losses of the proposed model are shown in Fig. 6, with $j = 8$ , $N = 16$ , and a training SNR of 0 dB. Fig. 6 (a) illustrates the loss over 28 epochs, Fig. 6 (b) over 212 epochs, and Fig. 6 (c) over 216 epochs. From the figure, it is evident that the training and validation losses follow a similar trend. Additionally, the figure demonstrates that both training and validation losses decrease as the number of epochs increases, achieving a stable condition after approximately 10,000 epochs. The choice of training SNR significantly influences both the convergence behavior and block error rate (BLER) performance of the proposed model. In this study, the convergence results were obtained with a training SNR of 0 dB, under which the model demonstrated robust convergence characteristics. While our proposed model exhibits strong performance at positive SNR levels, its performance degrades notably when trained under negative SNR conditions. The learning rate is another crucial parameter for optimizing the performance of the DL model. In this experiment, we meticulously select the learning rate to perform well across different ranges of SNR. Our training includes various SNR levels and learning rates to comprehensively evaluate model performance. When training deep learning models, the batch size is crucial for balancing efficiency and performance. We experiment with different batch sizes to find the best setting that speeds up training while improving overall model accuracy. The CNN model is also trained and tested using the same parameters as the Bi-LSTM model.

TABLE 3 The Parameters for Simulating the Proposed System
Table 3- The Parameters for Simulating the Proposed System
FIGURE 6. - Training and validation loss for the proposed model with 
$j = 8$
 and 
$N = 16$
 setup.
FIGURE 6.

Training and validation loss for the proposed model with $j = 8$ and $N = 16$ setup.

SECTION V.

Simulation Results

Reliability is characterized by the probability of successfully transmitting n packets within the specified user plane latency under specific channel conditions. In this study, we employ PER as a metric to evaluate and compare the effectiveness of various parameter adjustments in terms of reliability. The construction of polar codes follows the method described in [1], employing AWGN with a design-SNR set to 0 dB.

In DL, selecting the appropriate batch size is essential to strike a balance between computing efficiency and model convergence. Greater batch sizes use parallelism to speed up training, but generalization could suffer as a result. More accurate gradient updates per example are produced by smaller batches, which may improve generalization at the expense of longer training times. Determining the ideal batch size involves trade-offs that significantly impact both the model’s overall performance and its training speed [69]. In this experiment, we calculate the PER for different batch sizes and compare them, as shown in Fig. 7. This result is obtained for epoch 216 with $N=8$ and $j=4$ . From the evaluation, we observe that the proposed model performs well with smaller batch sizes. Specifically, at a batch size of 256, it shows better PER performance. However, as batch sizes increase or decrease, the performance of the Bi-LSTM model decreases. Batch sizes 128 and 512 exhibit almost similar performance, but at 1024, it shows very poor performance. According to Fig. 7, the proposed Bi-LSTM model demonstrates approximately 0.8 dB better performance for batch size of 256, compared to batch sizes 128 and 512. Therefore, all experimental setups in this paper are designed with a batch size of 256.

FIGURE 7. - The comparative PER performance of the proposed Bi-LSTM decoder for various batch sizes, with 
$j = 4$
 and 
$N = 8$
.
FIGURE 7.

The comparative PER performance of the proposed Bi-LSTM decoder for various batch sizes, with $j = 4$ and $N = 8$ .

The learning rate is an important topic for the DL model since it determines the gradient descent step size, which affects the convergence and performance of the model. A carefully selected learning rate strikes a balance between training speed and model accuracy, enabling quicker convergence to optimal solutions [69]. The PER performance for different learning rates is depicted in Fig. 8. We evaluated the PER for learning rates of 0.005, 0.001, 0.0005, and 0.0001. The model exhibits superior PER performance when trained with a learning rate of 0.0005. Deviating from this rate, either higher or lower, results in decreased performance. The evaluation indicates nearly 1 dB better performance at a learning rate of 0.0005 compared to 0.0001 and 0.001. Furthermore, the model demonstrates approximately 1.8 dB better performance compared to the 0.005 learning rate.

FIGURE 8. - The comparative result of the proposed Bi-LSTM decoder for different learning rates, with 
$j = 4$
 and 
$N = 8$
.
FIGURE 8.

The comparative result of the proposed Bi-LSTM decoder for different learning rates, with $j = 4$ and $N = 8$ .

Another critical component for signal processing tasks is the training SNR, which has a direct impact on the way models learn and generalize from data. Models with optimal SNR values can discriminate between signals and noise, which improves training convergence and performance on unseen data. The PER performance of the proposed Bi-LSTM decoder for various training SNRs is illustrated in Fig. 9. This analysis considers with epoch 216, at j =4 and N =8. As the training SNR increases from 0 dB to 10 dB, the performance decreases. Similarly, performance also drops when trained with SNRs lower than 0 dB. At -2 dB training SNR, the Bi-LSTM model shows about a 0.4 dB decrease compared to 0 dB, and more than a 2 dB decrease at 20 dB training SNR. Therefore, from Fig. 9, it is clear that the model performs well at 0 dB training SNR, and hence we designed all our experimental setups with 0 dB training SNR.

FIGURE 9. - The comparison of PER performance for the proposed model across different training SNR, with 
$j = 4$
 and 
$N = 8$
.
FIGURE 9.

The comparison of PER performance for the proposed model across different training SNR, with $j = 4$ and $N = 8$ .

DL optimizer plays a key role in effectively tuning model parameters during training, which affects generalization across different tasks and datasets as well as convergence speed and stability. Selecting the right optimizer is crucial for maximizing model performance and obtaining successful learning objectives [68]. Fig. 10 illustrates the PER results for different optimizer algorithms: Adam, RMSProp, Adamax, and SGD. Among them, the SGD optimizer exhibits poor performance, while Adam, Adamax, and RMSProp demonstrate good performance. However, the Adam optimizer performs better than all other optimizers. Therefore, we have used the Adam optimizer to calculate all PER values in this paper.

FIGURE 10. - The evaluation of the PER performance of the Bi-LSTM based decoder using various optimizers, with parameters 
$j = 4$
 and 
$N = 8$
 at 216 epoch.
FIGURE 10.

The evaluation of the PER performance of the Bi-LSTM based decoder using various optimizers, with parameters $j = 4$ and $N = 8$ at 216 epoch.

The PER performance of the proposed model is illustrated in Fig. 11 for different epochs. We began evaluation of the model with 24 epochs, where performance was notably poor. As we continuously increased the epoch sizes, performance was also improved. From the figure, it is evident that the proposed model exhibits very good performance at 216 epochs. However, when we further increased the epoch size to 218, the model’s performance decreased. Therefore, Fig. 11 demonstrates that the Bi-LSTM decoder performs better at 216 epochs.

FIGURE 11. - The PER performance comparison of the Bi-LSTM decoder across varying epochs, with 
$j = 4$
 and 
$N = 8$
.
FIGURE 11.

The PER performance comparison of the Bi-LSTM decoder across varying epochs, with $j = 4$ and $N = 8$ .

In Fig. 12, we compare the PER performance of the proposed Bi-LSTM model with CNN and DNN [63] models. The results for all these models are obtained using the same setup: 216 epochs, 0.0005 learning rate, 0 training SNR, and 256 batch sizes. From the results, it is clear that the Bi-LSTM decoder outperforms both the DNN [63] and CNN models. Initially, the DNN model significantly outperforms the CNN model. However, beyond 10 dB SNR, the PER performance for CNN and DNN becomes almost the same. The comparative results demonstrate that the Bi-LSTM model outperforms the CNN and DNN models by approximately 1 dB and 0.8 dB, respectively, at 20 dB SNR.

FIGURE 12. - The PER performance evaluation of the Bi-LSTM-based decoder compared to other decoders with a batch size of 216, at 
$j = 4$
 and 
$N = 8$
.
FIGURE 12.

The PER performance evaluation of the Bi-LSTM-based decoder compared to other decoders with a batch size of 216, at $j = 4$ and $N = 8$ .

In Fig. 13, we conduct a comparative analysis of the Bi-LSTM and DNN [63] models across various epochs, maintaining $j = 8$ and $N = 16$ . The results indicate that the Bi-LSTM model exhibits superior performance compared to the DNN model in epochs 28 and 216. In epoch 212, while both models perform closely, the Bi-LSTM model shows a slight edge over the DNN model. Also, Fig. 13 demonstrates that the Bi-LSTM model achieves coding gain with 28 epochs where the DNN model fails to archive coding gain at 28 epoch. It should be noted that coding gain typically indicates the decoder’s ability to effectively mitigate errors and enhance signal reliability [70].

FIGURE 13. - The comparison of PER performance between the Bi-LSTM model and DNN model, along with theoretical results, for different batch sizes with 
$j = 8$
 and 
$N = 16$
.
FIGURE 13.

The comparison of PER performance between the Bi-LSTM model and DNN model, along with theoretical results, for different batch sizes with $j = 8$ and $N = 16$ .

Fig. 14 compares the PER performance of Bi-LSTM, DNN [63], and CNN models with $j = 8$ and $N = 16$ , alongside theoretical results [70]. The Bi-LSTM and CNN models are evaluated using a learning rate of 0.0005, 216 epochs, 0 dB training SNR, and 256 batch size, while the DNN model also uses 216 epochs but a -2 dB training SNR. From Fig. 14, it is observed that the CNN model initially exhibits increasing PER performance with SNR, demonstrating a coding gain but loses this advantage beyond 15 dB SNR. In contrast, the Bi-LSTM consistently achieves a coding gain and outperforms both CNN and DNN models. Specifically, the proposed Bi-LSTM model surpasses the uncoded performance by almost 4 dB.

FIGURE 14. - Evaluation of PER performance of the Bi-LSTM-based decoder with other decoders at a batch size of 216, under the condition where 
$j = 4$
 and 
$N = 8$
.
FIGURE 14.

Evaluation of PER performance of the Bi-LSTM-based decoder with other decoders at a batch size of 216, under the condition where $j = 4$ and $N = 8$ .

With the exception of Fig. 15, all results presented in this paper are derived from simulations using BPSK modulation. In Fig. 15, however, we extend the evaluation of the proposed model to higher-order modulation, specifically 4QAM, to assess its performance. In the case of 4QAM, we map two bits to each symbol, while higher-order QAM schemes, like 16QAM and 64QAM, map four and six bits to each symbol, respectively. This increases the spectral efficiency, but also increases the noise sensitivity [71], [72]. For this experiment, we adopt a configuration of $j = 2$ and $N = 4$ , while keeping all other parameters consistent with the BPSK setup. The performance results indicate that the model achieves favorable outcomes when 28 epochs are trained. However, when the number of epochs increases to 212, the model exhibits satisfactory performance in the low-SNR regime but fails to maintain consistent performance at SNR values beyond 10 dB. A similar degradation is observed when the model is trained for 24 epochs. Additionally, we tried to explore the impact of using longer polar codes with higher code lengths, such as $j = 4$ or $j = 8$ , and $N = 8$ or $N = 16$ . In these configurations, the Bi-LSTM-based model fails to decode the polar codes effectively. This observation suggests that the current model architecture may not generalize well to higher-order modulations in this specific configuration. While the neural network architecture remains largely the same, the LLR computation must account for the more intricate constellation points in higher-order QAM. Increased model complexity may also be required to ensure effective decoding of these higher-order schemes. Given these results, we chose to focus on BPSK modulation, consistent with prior studies in this area, as summarized in Table 1. This approach ensures a more reliable comparison of the model’s performance.

FIGURE 15. - The performance of the proposed model for 4QAM modulation, under the condition of 
$j = 2$
 and 
$N = 4$
.
FIGURE 15.

The performance of the proposed model for 4QAM modulation, under the condition of $j = 2$ and $N = 4$ .

SECTION VI.

Conclusion

This study proposes a Bi-LSTM decoder for short packet transmission over a flat fading channel. We evaluated the PER of the system across different SNR levels, and the simulation results demonstrate that the proposed technique achieves coding gains in fading channels, even when using a basic codebook designed for AWGN channels without fading. The best performance was observed with a learning rate of 0.0005, a training SNR of 0 dB, and a batch size of 256. In general, improvements were achieved by increasing the number of training epochs. Specifically, the Bi-LSTM decoder showed significant coding gains when 28 epochs were trained. We also compared several optimizers and found that the Adam optimizer provided the best performance. Although the proposed system performed well with both BPSK and 4QAM modulation schemes, 4QAM lost its consistency at higher SNR values. Hence, BPSK was chosen as the focus for comparison. The results further demonstrate that the Bi-LSTM decoder outperforms CNN and DNN models under the same conditions.

References

References is not available for this document.