

Received April 20, 2018, accepted May 20, 2018, date of publication May 31, 2018, date of current version June 20, 2018. *Digital Object Identifier 10.1109/ACCESS.2018.2842231* 

# **Channel Coding for High Performance Wireless Control in Critical Applications: Survey and Analysis**

# MING ZHAN<sup>®1</sup>, (Member, IEEE), ZHIBO PANG<sup>®2</sup>, (Senior Member, IEEE), DACFEY DZUNG<sup>3</sup>, (Member, IEEE), AND MING XIAO<sup>®4</sup>, (Senior Member, IEEE)

<sup>1</sup>Key Laboratory of Networks and Cloud Computing Security of University, College of Electronic and Information Engineering,

Southwest University, Chongqing 400715, China

<sup>2</sup>ABB Corporate Research, SE-721 78 Västerås, Sweden

<sup>3</sup>ABB Corporate Research, 5405 Baden, Switzerland
<sup>4</sup>School of Electrical Engineering, KTH Royal Institute of Technology, 10044 Stockholm, Sweden

Corresponding author: Zhibo Pang (pang.zhibo@se.abb.com)

This work was supported in part by the National Natural Science Foundation of China under Grant 61671390 and in part by Vinnova (Swedish Innovation Agency) under Grant 2015-06548 and Grant 2017-02822.

**ABSTRACT** Demanded by high-performance wireless (WirelessHP) networks for industrial control applications, channel coding should be used and optimized. However, the adopted coding schemes in modern wireless communication standards are not sufficient for WirelessHP applications, in terms of both low latency and high reliability. Starting from the essential characteristics of WirelessHP regarding channel coding, this paper gives a detailed analysis of currently used short packet coding schemes in industrial wireless sensor networks, including seven coding schemes and their possible variants. The metrics employed for evaluation are bit-error rate, packet error rate, and throughput. To find suitable coding schemes from a large number of options, we propose four principles to filter the most promising coding schemes. Based on overall comparison from the perspective of practical implementation, challenges of the available coding schemes are analyzed, and directions are recommended for future research. Some reflections on how to construct specially designed coding schemes for short packets to meet the high reliability and low-latency constraints of WirelessHP are also provided.

**INDEX TERMS** Industrial wireless control, WirelessHP, channel coding, ultra-low latency, ultra-high reliability, short packet.

#### I. INTRODUCTION

During the past decades, wireless control in industry has been extensively investigated by the research community, fueled also by the fourth industrial revolution, i.e. Industry 4.0 [1]. These efforts are directed to the deployment of wireless sensors, actuators and controllers at different levels of industrial automation to achieve higher efficiency. Motivated by advantages such as low cost, flexibility, and suitability in harsh environment or mobile scenarios, the first industrial wireless sensor networks (IWSNs) have been implemented in real-time control applications [2]. Research and standardization works have been performed on, e.g., WirelessHART [3] and WIA-PA [4], which are based on the IEEE 802.15.4 standard, and the WIA-FA [5], which is based on the IEEE 802.11g/n standard. Most recently, high-performance wireless (WirelessHP) was proposed [6], aimed to meet 10 us cycle time and

multi-Gbps data rate requirements for mission-critical applications. To facilitate comprehension of this survey, frequently used abbreviations are summarized in Table 1.

The typical WirelessHP configuration is represented by a centralized wireless control network, where periodic messages are sent from the controller to the actuators with extremely low latency. One central challenge to the successful implementation of WirelessHP is to achieve high reliability, on the order of  $10^{-9}$  PER over short intervals [7]. Propagation is affected by obstacles, reflections from mobile or stationary objects, and interference caused by other wireless equipment co-existing in the same frequency bands. To meet the required reliability, channel codes are employed in most modern wireless communication standards to combat interference and noise (Table 2). As shown in Fig. 1, the message packets are input to an encoder where error correction bits are added,

| Abbrev | Explanation                                       | Abbrev  | Explanation                                     | Abbrev | Explanation                            |
|--------|---------------------------------------------------|---------|-------------------------------------------------|--------|----------------------------------------|
| ASIC   | Application-specific integrated circuit           | IWSNs   | Industrial wireless sensor networks             | PER    | Packet error rate                      |
| ASIP   | Application-specific instruction<br>set processor | LDPC    | Low density parity check                        | PSA    | Power systems automation               |
| BCH    | Bose-Choudhary-Hocquenhem                         | LDPC-CC | LDPC convolutional code                         | RS     | Reed-Solomon                           |
| BER    | Bit error rate                                    | LLR     | Log-likelihood ratio                            | SC     | Successive-cancellation                |
| BP     | Belief propagation                                | Log-MAP | Maximum a posteriori algorithm<br>in log domain | SNR    | Signal to noise ratio                  |
| CC     | Convolutional code                                | ML      | Maximum likelihood                              | SPA    | Sum-product algorithm                  |
| CN(U)  | Check node (unit)                                 | NB      | Non-binary                                      | TPC    | Turbo product code                     |
| CRC    | Cyclic redundancy check                           | OSD     | Ordered statistic decoding                      | U-NBPB | Unstrained non-binary protograph-based |
| GJE    | Gauss-Jordan elimination                          | PEC     | Power electronics control                       | VNU    | Variable node unit                     |

#### TABLE 1. Frequently used abbreviation in this paper.

# TABLE 2. A brief review of channel codes adopted in different wireless communication standards.

| Year | Standard          | Channel codes            | Year | Standard         | Channel codes                 |
|------|-------------------|--------------------------|------|------------------|-------------------------------|
| 1992 | GSM               | CC                       | 2008 | Wireless<br>HART | No                            |
| 1999 | IEEE<br>802.11a/b | CC                       | 2009 | IEEE<br>802.11n  | CC, LDPC                      |
| 2000 | CDMA<br>2000      | CC<br>Turbo codes        | 2009 | PNO/WSAN         | Repeated codes<br>CRC         |
| 2000 | TD-<br>SCDMA      | CC<br>Turbo codes        | 2011 | 3GPP<br>V10.1.0  | CC<br>Turbo codes             |
| 2000 | WCDMA             | CC, RS+CC<br>Turbo codes | 2012 | IEEE<br>802.16m  | CC, Duo-binary<br>Turbo codes |
| 2003 | IEEE<br>802.11g   | CC                       | 2012 | LTE-<br>Advanced | CC<br>Turbo codes             |
| 2003 | IEEE<br>802.15.4  | No                       | 2012 | IEEE<br>802.11ac | CC, LDPC                      |
| 2005 | IEEE<br>802.15.1  | No                       | 2012 | IEEE<br>802.11ad | RS, LDPC                      |
| 2006 | WISA              | No                       | 2014 | ISA100.11a       | No                            |
| 2006 | IEEE<br>802.16e   | Turbo codes<br>LDPC      | 2015 | WIA-FA           | CC, LDPC                      |
| 2008 | WIA-PA            | No                       | 2016 | 5G draft         | LDPC<br>Polar codes           |



**FIGURE 1.** Block diagram of an encoding/decoding communication system.

while a corresponding decoding algorithm is performed in the decoder, by which the corrupted bits can be corrected.

Based on today's industrial wireless standards, such as WirelessHART and IEEE 802.11, researchers have embedded BCH and RS codes in the medium access control layer to improve the reliability [8], [9]. These coding schemes are performed by the CPU software, thus introducing considerable delays. Channel coding performed by the physical layer hardware is a powerful tool in latency constrained applications [10], [11]. For instance, in [12], a rate-1/2 CC is adopted for a packet length of 100 bits, and it is shown to achieve 100 us latency with PER of  $10^{-9}$  in factory deployment. In [13], the application of IEEE 802.11n in industrial communications is investigated, where LDPC codes are used to improve reliability, and a number of parameters are optimized to reduce service latency. Theoretical principles regarding code design for short packet length are proposed in [14]. These studies provide improved performance for conventional IWSNs applications, but there are still gaps to the performance required by WirelessHP for critical applications. The works in [12] have combined Automatic Repeat reQuest technique with CC to guarantee high reliability, reaching 1 Mbps throughput and 100 us latency, but are far away form the mulit-Gbps throughput and 10 us latency required by WirelessHP. As for [13], the conventional LDPC codes will degrade reliability when used for short packet communications. Although [14] has given an overview from the perspective of information theory, it does not provide any specific channel coding schemes. To the best of our knowledge, channel codes for industrial wireless communications should meet some specific requirements, and no existing researches have addressed this topic yet. Faced with such an urgent dilemma, we are motivated to review recent progresses in short packet coding schemes. Moreover, according to the essential characteristics of channel coding in WirelessHP, we have given a comprehensive investigation of frequently employed coding schemes in IWSNs. However, it does not mean that these coding schemes can be directly applied to WirelessHP applications. In this regard, the constraints to meet are first given, then principles to select promising channel coding schemes are derived and, finally, directions for future research on the promising coding schemes are outlined. These are the contributions of this paper.

The remainder of this paper is organized as follows: To highlight the challenges imposed by WirelessHP, Section II discusses the WirelessHP application scenarios, the fundamental characteristics of short packet communications in WirelessHP, some properties that differentiate this scenario from conventional IWSNs, and the background of channel coding. According to these criteria, we look at seven kinds of coding schemes in IWSNs. Section III reviews the coding schemes that can approach Shannon limit, including Turbo codes, LDPC codes, fountain codes and polar codes. Section IV reviews the classic error correction codes, including BCH codes, RS codes and CC. For comparison, the concatenated and derived coding schemes are also investigated in this Section. Based on these studies, we propose four principles to select candidates for WirelessHP in Section V. Challenges and directions for these coding schemes are outlined for future researches. Section VI concludes the paper.

#### **II. WIRELESSHP AND CHANNEL CODING OVERVIEW**

In what follows, subsection A gives a brief overview of some typical industry applications, and explains the specific scenarios that WirelessHP addresses. Subsequently, subsection B discusses three most important characteristics of WirelessHP, i.e. high reliability, low latency and short packet length. To highlight the difference between WirelessHP and conventional IWSNs, discussion about energy efficiency, low cost and low decoding complexity is also presented in this subsection. At last, subsection C provides a background on channel coding, which includes a number of different coding schemes. For convenience of comparison and discussion, they are classified as classic coding schemes and modern channel coding schemes.

#### A. WIRELESSHP APPLICATION SCENARIOS AND PHYSICAL LAYER STRUCTURE

To improve production efficiency, application of wireless communications to different industry scenarios is a crucial technology topic. These scenarios are characterized by peculiar features and hence present different requirements. One example is the process automation (PA) for management of mining, oil, paper and chemical processes. Other examples include factory automation (FA) to guarantee steady operation of the production line, and safety monitoring of operator and equipment. In this context, some wireless standards have already been presented and proved feasible for these scenarios. It is the case, for example of WirelessHART and WIA-PA, which have been used in some PA and FA scenarios to provide real-time control. However, as enhanced performance, i.e. higher data refresh rate, better PER and shorter duration of cycle time, are required for critical applications in PSA and PEC scenarios, existing standards are ineffective to be applied because of their limited capability. A primary limitation is that these standards are actually built on top of general purpose wireless standards, such as IEEE 802.15.4 or IEEE 802.11. Although customizing the upper layer of general purpose standards can improve overall performance to a certain level, the physical layer is not optimized, which poses a fundamental bottleneck to meet the requirements demanded by PSA and PEC scenarios. For the sake of comparison, requirements of the above mentioned industrial scenarios are illustrated by Table 3.

To fill in the significant gap between the ultimate performance of currently available wireless standards and the

#### TABLE 3. Requirements of representative industrial scenarios.

| Scenario                                   | Number of nodes                                            | Data refresh rate                                                               | PER                           | System range                   |
|--------------------------------------------|------------------------------------------------------------|---------------------------------------------------------------------------------|-------------------------------|--------------------------------|
| PA                                         | $10^2 - 10^3$                                              | $10^1 \text{ Hz}$                                                               | $10^{-6}$                     | $10^{1}$ - $10^{2}$ m          |
| FA                                         | $10^2 - 10^3$                                              | $10^3 \text{ Hz}$                                                               | $10^{-6}$                     | $10^{1}$ - $10^{2}$ m          |
| PSA                                        | $10^{1}$ - $10^{2}$                                        | $10^4 \text{ Hz}$                                                               | $10^{-9}$                     | $10^2$ - $10^3$ m              |
| PEC                                        | $10^2 - 10^3$                                              | $10^5 \text{ Hz}$                                                               | $10^{-9}$                     | $10^{1}$ - $10^{2}$ m          |
| Transmit<br>packet<br>Scramb<br>Channel en | ted A<br>Ins<br>Ier Jule<br>coder Subcarrier<br>modulation | dd preamble<br>et nulled and<br>ot subcarriers<br>rs<br>n<br>OFDM<br>modulation | OFDM data frame               | Transmission module            |
|                                            |                                                            | (a)                                                                             |                               |                                |
| Received p                                 | bler<br>bler<br>coder<br>Subcarriers<br>demodulation       | ise error<br>imation Frequen<br>OFDM<br>demodula                                | el<br>ion<br>ncy<br>t<br>Prea | Pilot<br>Receive module<br>ata |
|                                            |                                                            | (b)                                                                             |                               |                                |

FIGURE 2. Block diagrams of WirelessHP physical layer. (a) Transmitter. (b) Receiver.

performance required by PSA and PEC applications, the customization of general purpose wireless standards is not a feasible strategy, hence, WirelessHP is proposed and developed for these specific industrial scenarios [15]. Differently, WirelessHP is based on a completely new protocol stack that is exploited to pursue 10 us level latency and fiber level reliability. Most importantly, in the physical layer of WirelessHP as shown in Fig. 2, the OFDM data frame parameters and preamble length are specially designed to achieve low latency. The designed physical layer has been proved to be a promising solution in a typical network configuration of WirelessHP, characterized by centralized network control, logical star topology, tightly scheduled transmitting/receiving pattern and quasi-static wireless channel [15].

As can be seen from Fig. 2, channel coding is a key technology employed in WirelessHP physical layer. Since high reliability is one of the most important requirements in WirelessHP targeted scenarios, selection of qualified channel coding scheme is also constrained by other aspects, such as low latency and short packet length, which are detailed in the following subsection.

# **B. CHARACTERISTICS OF WIRELESSHP**

This subsection summarizes the unique characteristics of WirelessHP, providing exact definition of high reliability, low latency and short packet length.

- High reliability For a WirelessHP system that operates in a factory scenario, the actuators execute actions following given message, and the sensors perceive and collect information during the production process, while the controller sends out message based on information collected from sensors. If there are errors in the received message, the actuator actions may produce unacceptable damages to the equipment. For mission-critical message exchanged in WirelessHP, this paper defines high reliability as a PER no higher than  $10^{-9}$ , which is necessary to guarantee stable operation of wireless controlled systems [7]. However, channel codes may suffer from error floor, according to which the PER performance becomes flat with the increase of SNR. To achieve high reliability, error floor of the qualified coding schemes must be dropped to a very low level. For example, error floor should not be present until a PER of  $10^{-10}$ .
- Low latency In WirelessHP targeted PSA and PEC applications, the maximum data refresh rate can reach up to  $10^5$  Hz. Therefore, this paper defines low latency as the minimum cycle time  $T_{MCT}$ , i.e., the minimum time for a controller to communicate with all nodes, in the order of 10 us. Any message received too late must be considered invalid [16]. In general,  $T_{MCT}$  is mainly determined by the scheduling unit  $T_{SU}$ :

$$T_{SU} = T_{proc} + T_{access} + T_{TX} + T_{ACK}, \qquad (1)$$

- where  $T_{proc}$  is the hardware processing delay,  $T_{access}$  is the time to access the channel,  $T_{TX}$  is the time for packet transmission, and  $T_{ACK}$  is the time to receive the acknowledgment [6]. Since decoding latency is included in  $T_{proc}$ , a key metric of the decoder is the achievable throughput, which is defined as the number of bits that can be successfully transmitted per second. For the most extreme PEC application with  $10^2$ - $10^3$  nodes,  $10^5$  Hz data refresh rate and packet length of 100 bits, we can derive a throughput of  $10^9$ - $10^{10}$  bits in one second duration. Therefore, multi-Gbps processing capability is necessary for decoders in WirelessHP applications [6].
- Short packet In WirelessHP applications, a large number of sensor nodes are deployed to perceive the variations of the interested parameters, such as temperature, velocity, distance, three-dimensional coordinates, current, voltage and so on, which are important metrics to represent instantaneous state of the sensed scenarios. For a typical sensor node at the actuator side, a few information bits are sufficient to represent the huge variation of a sensed quantity. Therefore, in each cycle time slot, the sensor nodes can only collect a small number of information bits for transmission. At the controller side, there are close correlations between neighboring messages that send to a specific actuator, which means a small number of bits are enough for the controller to update the actuator's state in each cycle period. Consequently, the packets that are exchanged between sensor nodes and controller have a short packet length.

In WirelessHP, the packet length is defined in the range from several dozens to few hundreds bits.

In the physical layer design of conventional IWSNs, there are some other important characteristics, e.g., energy efficiency, low cost and low decoding complexity, that are highly emphasized. Typically, the applied scenarios are seriously constrained by energy and computation resources. But in WirelessHP targeted scenarios, high reliability, low latency and short packet length represent the most critical requirements. In terms of hardware implementation based on currently available microelectronics technology, it is contradictory to reach all of the requirements simultaneously. In the following, energy efficiency, low cost and low decoding complexity are analyzed to show that their importance in WirelessHP channel coding design is secondary. Therefore, these characteristics are not investigated in this paper.

- Energy efficiency Currently, many research works in IWSNs are focused on reducing power consumption of the encoding/decoding procedures. Examples are channel coding strategies for wireless systems deployed in remote field observation and mobile medical monitor scenarios, where long battery lifetime is a priority, and the energy consumed by the nodes is of critical importance [17]. However, these constraints do not hold in WirelessHP. As has defined in this subsection, the PER and latency constraints are much more stringent than those required in conventional IWSNs. High reliability and low latency can only be satisfied by using sophisticated decoding algorithms and highly paralleled decoding structures, which are high power consuming operations. Moreover, in the targeted PSA and PEC applications, there will be enough power supply, and the consumed energy for communications is negligible compared to the energy required for the primary equipment.
- *Low cost* In conventional IWSNs applications, current standards, such as ISA100.11a and WirelessHART, are based on the IEEE 802.15.4 standard, where the ARQ technique is adopted to provide low cost communications. However, retransmission of the corrupted packets will introduce longer latency. Although some research works have used channel coding to reduce latency, they are still designed to pursue low hardware overhead. For instance, smart chips with limited signal processing capability and memory with small capacity. Based on this low cost hardware configuration, it is impractical for the designed encoding/decoding systems to meet the high reliability and low latency constraints, which are the priorities in WirelessHP.
- *Low decoding complexity* For a given channel coding scheme, PER, latency, and power consumption are the most important metrics. However, these metrics are mutual restraint to each other. For example, by reducing the decoding complexity, latency and power consumption will be reduced, but at the penalty of PER

loss. Although this strategy is effective in conventional IWSNs to pursue long lifetime, it is not suitable for WirelessHP scenarios. To meet high reliability and low latency constraints, highly paralleled decoding structures and sophisticated decoding algorithms can only be applied, and high decoding complexity is the paid penalty.

#### C. BRIEF INTRODUCTION OF CHANNEL CODING

To characterize the maximum rates for reliable information transmission in noisy channels, in 1948, Shannon proposed the concept of channel capacity as shown in Eq.(2), where C is the channel capacity in bit per second, B is the bandwidth in Hertz, S and N are the power of signal and noise in Watt, respectively.

$$C = B * \log_2(1 + S/N) \tag{2}$$

It has been shown that, if the information transmission speed is no more than the channel capacity C, there exists a coding scheme by which the information bits can be transmitted with arbitrarily small error probability. In 1950s, the first error correction code, Hamming code, was proposed, followed by Gray codes, Reed-Muller codes, cyclic codes, BCH codes, RS codes, etc. All these codes have different coding structure and error correction capability, but they belong to block codes. That is, at the transmitter side, the information bits are equally split into small blocks, and then each block is fed into the encoder separately. At the receiver side, the decoder is enabled only when all bits in one block are received. To reduce encoding/decoding latency, convolutional codes were proposed by Elias in 1955, in which the encoder/decoder can work continuously when it is receiving bits. Later, block codes and convolutional codes have been concatenated, for the purpose of achieving enhanced error correction capability. Since these codes have simple encoding/decoding structures, and were proposed during the early period of information theory, they are normally called as classic coding schemes.

Although classic coding schemes have been widely applied, their best coding gain is still far from the Shannon limit. Therefore, the scientific community has been pushed to find more powerful coding schemes during the past decades. The most relevant achievement of this endeavor was the introduction of Turbo codes in 1993, followed by the discover of LDPC codes in 1996 (first proposed by Gallager in 1963), the invention of fountain codes in 1998, and the latest breakthrough of polar codes in 2009. Taking advantages of random encoding and iterative decoding strategies, these codes have achieved near Shannon limit performance, and are classified as modern channel coding schemes in this paper. To get excellent coding gain, modern channel coding schemes require long packet length. With the purposes of achieving higher throughputs, or improving PER performance, various derived coding schemes have proposed, for example, LDPC codes with highly paralleled coding structure, RS and LDPC concatenated codes, and CRC-aided polar codes.

Actually, there exists another type of error correction codes, i.e., anytime codes, that are specially designed for delay-sensitive control applications where information bits are transmitted through noisy channels. The fundamental feature of anytime codes is the decoding error probability would decay exponentially with decoding delay. After its introduction by Sahai and Mitter [18], a number of anytime coding schemes has been proposed to pursue zero or finite decoding delay [19]-[22]. Although these endeavors have developed the theoretical framework for constructing linear tree codes with anytime reliability, the encoding and decoding are still prohibitively complex to be applied in practical scenarios [22], [23]. Fortunately, some variants of modern channel coding schemes, such as Luby Transform codes (belong to fountain codes) [24], protograph-based LDPC-CC codes [25]–[27] and spatially-coupled LDPC codes [28], have been proved can achieve the anytime property, and are potentially be used in reality due to their fast encoding/decoding architectures. Because of this reason, anytime codes are not surveyed in an individual section, while fountain codes and LDPC-CC codes are investigated in Section III-C and Section III-B, respectively.

It should be noted that, since some codes are constructed by merging the advantages of both classic codes and modern channel codes, this paper has first reviewed modern channel codes in Section III, and then reviewed classic codes and derived/concatenated codes in Section IV.

#### **III. MODERN CHANNEL CODING SCHEMES**

According to the challenging constraints that are listed in subsection B of Section II, this Section gives an overview of modern channel coding schemes. To evaluate their applicability to WirelessHP, these codes are applied to short packet communications. In this survey, the high reliability requirement is measured by BER/PER. It is worth noting that BER is different from PER. Since the two metrics are tightly correlated to each other, some researches only provided BER simulation results. Therefore, we use BER to mirror the changing tendency of PER, if no PER results are available in the referred researches. For the low latency requirement, present researches in channel coding usually use throughput to scale how fast a decoder can process the received packets. Considering that multi-Gbps throughput is necessary in WirelessHP targeted scenarios, the required 10 us level latency constraint is replaced by the multi-Gbps throughput in this review.

#### A. TURBO CODES

The Turbo code family was firstly introduced by Berrou *et al.* [31], and has been adopted as the channel coding scheme by several wireless communication standards because of its excellent error correction capability (Table 2). However, for short packet binary Turbo codes, the small minimum Hamming distance will result in obvious error floor in medium-to-high SNR regions. Moreover, when high code rate is considered, the puncturing patterns not only consume



**FIGURE 3.** Block diagram of Turbo decoder using Log-MAP algorithm (LIFO means last in and first out, BMU is the branch metric unit,  $\alpha$  is the forward state metrics,  $\beta$  is the backward state metrics.  $\Lambda_{apo}$  and  $\Lambda_{ex}$  are the *a posteriori* probability and extrinsic information in the form of LLR, respectively) [29], [30].

a high number of clock cycles in the encoder side, but also deteriorate the error floor performance, as compared with the un-punctured mother codes.

In the implementation of Turbo code decoders, Log-MAP algorithm and its derivatives are iteratively performed (Fig. 3). To improve the decoding speed, parallel window decoding structure, CRC based early stopping criterion, and stochastic computing techniques have been proposed [29], [30], but it is shown that they still cannot meet the multi-Gbps throughput requirement [17]. However, three recently proposed ideas have shown promising results regarding error floor and throughput.

1) Short  $F_{256}$  Turbo codes designed for BP decoding [32]: Derived from a protograph sub-ensemble of the regular LDPC codes ensemble over high order Galois field, the  $F_{256}$  Turbo codes were constructed by parallel concatenation of two time-variant accumulators, and have shown 1 dB gain over binary Turbo code at PER of  $10^{-4}$  (Fig. 6). Moreover, with the coding structure inherited from LDPC codes,  $F_{256}$  Turbo codes can be decoded by using highly paralleled BP decoding algorithm.

2) Fully paralleled Turbo code decoder for missioncritical machine-type communications [33]: By dissolving the dependence between neighbor bits, a fully paralleled decoding structure is developed. An FPGA implementation has shown the decoding throughput of 1.53 Gbps and 0.44 Gbps for LTE-Advanced Turbo codes with packet length of 720 and 40 bits, respectively.

3) Turbo product codes [34], [35]: Owing to the high coding gain and simple encoding/decoding complexity, TPC has been considered in 10 Gbps optical networks. For short packet communications, TPC can be highly parallelized to avoid decoding latency and are much less affected by the error floor. Compared to binary Turbo codes and LDPC codes, TPC with the same decoding latency can offer better error correction capability (Fig. 6). Moreover, the inherent error detection characteristic gives TPC higher efficiency without using CRC, and can be further explored in the design of high-speed early stopping decoding structure.

VOLUME 6, 2018

#### **B. LDPC CODES**

LDPC codes were first proposed by Gallager in 1963, but their powerful error correction capability was only realized after rediscovery by Mackay and Neal [37]. Similar to Turbo codes, LDPC codes are iteratively decoded, and also suffer from BER/PER degradation when applied to short packet communications. To reduce the error floor, a first strategy is to construct new binary LDPC codes with special coding strutecture, based on which extension from binary to high order Galois Field GF(q) represents another effective countermeasure, at the cost of increased decoding complexity and latency.

1) Specially designed finite-length binary protograph LDPC codes: To get better error floor performance while reserving the advantages of conventional LDPC codes, cyclic lifting is used to consturct binary progograph LDPC codes with quasi-cyclic coding structure [38]. Afterwards, variants of this subclass, i.e., rate-compatible root-protograph LDPC codes and distributed progograph LDPC codes, have been proposed for relay communication networks with improved error correction capability [39], [40]. With the features of linear encoding complexity, fast parallel decoding architecture and lower error floor [41], binary protograph LDPC codes can be promising candidates for industrial control scenarios.

2) Finite-length non-binary protograph-based LDPC codes [42]: The U-NBPB codes are constructed by choosing the edge weights before and after the copy-and-permute operation to the original LDPC codes protograph. For a code rate of 1/2 and packet length of 256 bits, U-NBPB can offer 1 dB coding gain over the binary counterpart at PER of  $10^{-4}$  (Fig. 6).

3) Syndrome-based CN algorithm with parallel computation for GF(256) LDPC codes [43]: Built on the syndrome-based CN algorithm, a parallel processing structure of the CN function is designed. Compared to stateof-the-art benchmark, the proposed decoding structure has reduced the latency by a factor of 5.5, with negligible PER loss.

The most effective decoding algorithms for LDPC codes are the BP decoding algorithms, such as SPA (as shown in Fig. 4) and its modified versions. Compared with Turbo codes, the most prominent advantage of LDPC codes is the inherent nature of parallel decoding. Although the BP decoding algorithms are iteratively performed, the throughput can be up to 16.2 Gbps as shown by an FPGA implementation in 2011 [36], much greater than the 1.5 Gbps Turbo decoding in 2016 [33]. In practice, the throughput will decrease as the packet length becomes shorter. With the efficient decoding structures proposed below, short LDPC codes are attractive for high throughput applications.

1) New dimension of parallelism in high throughput LDPC decoder [44]: By unrolling the iterative decoding loop and reducing the routing congestion, the decoding process of one packet can be finished in one clock cycle, reaching a throughput of 160 Gbps with a clock frequency of 257 MHz.



**FIGURE 4.** Block diagram of the decoder using SPA algorithm for the LDPC codes with a  $M \times N$  check matrix [36].



FIGURE 5. Block diagram of polar decoder using SC listing algorithm (*N* is the packet length, *L* is the listing length) [46].

2) Stochastic number generated approach for LDPC decoding [45]: The received messages from the channel and the reliable messages from the check nodes are collected for stochastic stream generation, and thus have reduced the decoding latency by more than 20%.

3) ASIP based LDPC decoder [11]: Fully parallel layered decoding structure has the advantage of high convergence speed, but is constrained by the pipeline between layers of the row-based structure. The ASIP is adopted to bridge the layered schedule and the row-based decoding, which can achieve 7 Gbps throughput with three iterations.

#### **C. FOUNTAIN CODES**

With similar performance as Turbo codes and LDPC codes, fountain codes are a class of rateless coding schemes introduced by Byers *et al.* [47]. For a given set of source symbols, the transmitter potentially generates an infinite supply of randomly encoded symbols, and then transmits these symbols through the channel. At the receiver side, if enough symbols are captured for successful decoding, the receiver returns an acknowledgment message to the transmitter, so as to start transmission of the next symbol. Due to the features of self-adaptivity to channel state and parallel decoding structure, fountain codes are used in the 3G standard for



**FIGURE 6.** Performance comparison of the (N, R) modern channel coding schemes with similar packet length and the same code rate, where N is the packet length, R is the code rate.

multimedia communications, and are a powerful tool in cooperative relayed wireless sensor networks [48], [49]. However, the theoretical derivations of fountain codes are based on the assumption of ergodic behavior, an assumption which is often invalid for short packet communications. Considering the mechanism of fountain codes, the transmitter needs an acknowledgment message to confirm successful delivery of the transmitted symbol, which will result in additional clock cycles. Therefore, application of fountain codes in low latency communications is an open topic [50], [51]. The latest progresses are described in the following.

1) Structure modified fountain codes for short packet length [52]: To eliminate the structural phenomena of the Tanner graph that contributes to the error floor, specially encoded bits are generated for short Raptor codes. At a packet length of 208 bits and PER of  $10^{-4}$ , simulation results show a coding gain of 1.5 dB as compared to the systematic Raptor codes. However, the PER performance is still inferior to other coding schemes, as illustrated in Fig. 6.

2) GJE and BP integrated decoding algorithm [53]: For BP decoding of fountain codes, the failure of providing nonzero LLR to the input symbols is the main reason for early termination of the decoding process. By applying the GJE technique, the input symbols are provided with non-zero LLR to continue the decoding, thus mitigating the phenomenon of error floor.

#### **D. POLAR CODES**

Proposed by Arikan in 2009 with a rigorous mathematical proof showing that they can achieve the channel capacity [54], polar codes have been regarded as a great breakthrough in coding theory. The fundamental decoding algorithm for polar codes is the SC algorithm. However, in terms of BER/PER performance, polar codes cannot beat Turbo codes and LDPC codes for short-to-medium packet lengths, even at the price of increased complexity by using the ML decoding. Fortunately, this was greatly improved by using a CRC-aided strategy, which has been demonstrated to achieve a gain of about 0.5-1 dB over Turbo codes and LDPC codes, for the rate-1/2 polar code with a packet length of 1024 bits at PER of  $10^{-4}$  [55], [56]. Specially designed for short packet polar codes (Fig. 6), the SC listing algorithm (as shown in Fig. 5), the OSD algorithm, and the SC stacking algorithm are proposed [46], [57], [58]. These algorithms have greatly reduced the decoding complexity. As a result, the CRC-aided polar codes have been proposed in the 5G standard for short packet communications.

In hardware implementation, the SC decoding algorithm can be operated in parallel by exploiting combinational logic, and thus shortening the latency [59]. For a polar code with packet length of 1024 bits, this parallel decoding structure can reach a throughput of 2.5 Gbps. To get higher throughput while avoiding BER/PER loss, a better alternative is the highly paralleled BP decoding algorithm. Based on the subfactor graph freezing technique proposed in [60], the average number of iterations is reduced to obtain a throughput of 13.9 Gbps, for a rate-1/2 polar code with packet length of 1024 bits.

# E. COMPARISON OF DECODING PERFORMANCE AND THROUGHPUT

Under the assumption of additive white Gaussian noise channel and binary phase-shift keying modulation, a comparison of the four families of reviewed channel coding schemes is summarized in Fig. 6 (the same simulation conditions hold for other figures if not specially declared). In terms of BER, binary Turbo codes are superior to LDPC codes, but about 0.5 dB inferior to TPC. As PER is considered, F256 Turbo codes and U-NBPB have similar error correction capability. They show clear improvement over binary LDPC codes, of about 1 dB at PER of  $10^{-5}$ . For this reason, F256 Turbo codes and U-NBPB have lower error floors in medium-to-high SNR regions. The throughput is mainly decided by the adopted decoding algorithms and decoding structures. In hardware implementation of Turbo decoders, the iteratively and serially performed Log-MAP algorithm needs a large number of clock cycles for the decoding procedures. The parallel window, early stopping, and the full parallel decodingstructure can improve the throughput, but these improvements are still not enough for WirelessHP in terms of throughputs. Therefore, highly paralleled BP decoding of Turbo codes is a good strategy. Since F256 Turbo codes are specially designed for the BP algorithm, the efficient parallel decoding structure makes F256 Turbo codes equally promising candidates as U-NBPB codes in WirelessHP. Although CRC-aided polar codes that using OSD and SC listing algorithms are inferior to F256 Turbo codes and U-NBPB in PER performance, the CRC-aided strategy and BP decoding algorithm provide polar codes with low error floor and multi-Gbps throughput characteristics. With these advantages, CRC-aided polar codes remain promising coding schemes. As for systematic Raptor codes and improved nonsystematic Raptor codes, the inferior PER performance and required acknowledgment signals make them less suitable for WirelessHP.

#### **IV. CLASSIC CODING SCHEMES**

In general, classic coding schemes cannot approach Shannon limit. However, due to their simple coding structures, classic codes can achieve very low decoding latency. Therefore, application of classic coding schemes should be revisited in WirelessHP to satisfy the multi-Gbps constraint. Since BCH and RS codes have powerful multi-bits error correction capability, and CC can use efficient Viterbi decoding algorithm, most researches in IWSNs have used them as the coding schemes. These codes are representatives of classic coding schemes, and are mainly reviewed in this paper. Furthermore, some new codes are constructed by the combination of classic codes and modern channel codes, and have been shown to provide improved BER/PER, error floor or throughput performance. Based on the construction method, these coding schemes are classified into two types. The first type is the concatenation of two coding schemes, one as the inner code, and the other as the outer code. The second type is the coding structure derived versions, combining the advantages of one code into the generation of another code. These coding schemes are also investigated in this Section.

#### A. BCH AND RS CODES

Taking advantage of their algebraic structure that facilitates hardware implementation, cyclic codes have been widely applied. BCH codes are the most important subclass of cyclic codes. With low processing latency, flexible options in code rate and packet length, BCH codes are used in flash memory and optical communication systems [66]. In general, the decoding complexity is simpler for a coding scheme defined on a smaller symbol size, but also has less error correction capability for burst errors. RS codes, a subclass of BCH codes defined over high order  $GF(2^m)$  with a symbol of *m*-bits, have excellent burst error correction capability, and have been applied in IWSNs for different scenarios [17]. As can be seen in Fig. 7, under the assumption of similar code rate, RS codes have better BER/PER performance than BCH codes, and the improvement becomes larger with increased SNR. In medium-to-high SNR regions, although the RS(63, 55) code has a higher code rate of about 0.87, it still outperforms the BCH(63, 45) code with a smaller code rate of about 0.71.

The conventional Chase decoding algorithm for BCH and RS codes includes four steps: syndrome calculation, error location polynomial calculation, Chien search, and error value evaluation. Recent progresses on high throughput cyclic codes make them relevant to WirelessHP.

1) Efficient decoding of short length linear cyclic codes [63]: By exploring the automorphism property of cyclic



**FIGURE 7.** Performance comparison of (N, K) BCH and RS codes [61]–[65] (*N* is the packet length, *K* is the information bits length).

codes, the permutation is combined with BP parallel decoding to achieve faster convergence speed, while the loss of BER/PER performance is negligible.

2) Chase algorithm based low complexity cyclic codes decoder [67]: Instead of generating multiple candidate codewords, only one codeword with confined degree of error location polynomial is generated. Thus, the simplified decision-making mechanism results in a great throughput increase.

3) Complexity-reduced multiplicity assignment algorithm for RS decoder [68]: The bit-level received voltages are used to represent the probabilities for multiplicity assignment decoding, and hence improve the throughput without deteriorating the BER performance.

4) Turbo decoding of short RS and CC concatenated codes [69]: When the hard decision decoding mechanism is used, potential of RS and CC concatenated codes is not fully exploited. By applying the Turbo decoding strategy, soft decision information is exchanged between inner and outer decoders. This brings about 2.0 dB coding gain at BER of  $10^{-5}$  (Fig. 10), as compared with the hard decision Viterbi decoding.

#### **B. CONVOLUTIONAL CODES**

In contrast to block codes, the encoder/decoder of CC can be enabled to process while receiving bits, and thus has an inherent advantage over block codes in terms of latency. For short packet communications, e.g., less than 100 information bits, CC can approach the lower bounds of block codes, and show better BER/PER performance than binary block codes [70], [71]. The number for state *S*, constraint length *m*, and decoding window length *W* are important parameters



**FIGURE 8.** Performance comparison of rate-1/2  $CC(g_1, g_2)_8(S, m)$ , where  $(g_1, g_2)_8$  is the generator polynomial in Octal, S is the number of state, m is the constraint length, W is the length of decoding window.

of CC. As can be seen in Fig. 8, with the increase of S, m or W, better BER/PER performance can be achieved [72], [73]. However, as these parameters increase, the decoding latency will increase correspondingly. Therefore, a tradeoff should be found between BER/PER performance and decoding latency, according to the requirements of the application scenarios.

Latency comparisons between CC and block codes have been investigated in [70] and [74]. Although iterative and ML decoding can offer optimal BER/PER performance, considering the tight latency requirement, Viterbi and stack sequential decoding are still the best choices in practice [75], [76]. For CC with constraint length *m* no more than 10, Viterbi decoder is preferred, but due to the exponential increased complexity with respect to *m*, the stack sequential decoder performs better for larger constraint length (m > 10).

#### C. DERIVED CODES

As it can be seen from Fig. 6, Fig. 7 and Fig. 8, when high reliability is considered, classic coding schemes cannot beat modern channel coding schemes. For example, at a code rate of 0.5 and SNR of 3 dB, the BER performance of TPC can reach  $6 \times 10^{-5}$ , while the BER of RS(31,15) and  $CC(23, 35)_8(16, 5)$  is about  $10^{-3}$  and  $2 \times 10^{-3}$ , respectively. However, classic coding schemes do have special advantages when applied in WirelessHP. On the one hand, classic coding schemes can easily meet multi-Gbps throughput because of their simple decoding structures. On the second hand, in typical factory scenarios, pulse interference is the main reason for function failure [77], where RS codes can be applied. To achieve high reliability and retain low decoding latency, researchers have constructed new coding schemes by merging the advantages of both modern channel codes and classic codes together.



FIGURE 9. BER comparison of BCH/RS concatenated/derived codes (results from [78], [79], [84], [72], [81], and [80]).

1) Concatenated codes: Concatenation is an effective strategy to improve performance. Normally, Turbo codes, LDPC codes and CC are used as the inner codes, while BCH/RS codes are employed as the outer codes. Fig. 9 gives a comparison of three concatenated coding schemes: BCH and LDPC concatenated codes [78], [79], RS and NB Turbo concatenated codes [80], and RS and CC concatenated codes [81]. These coding schemes achieve better BER performance and reduce the error floor significantly, as compared with the un-concatenated counterparts. However, these concatenated coding schemes also introduce longer decoding latency because of the concatenated coding structure. Except for serial concatenation, researchers have developed more complicated rules to construct concatenated codes. For example, in [82], two parallel generated CC sequences are split and interleaved, and then are linearly combined to construct flexible short packet codes.

2) Coding structure derived codes: To explore the low latency decoding advantage of RS and CC for short packet communications, some coding structure derived codes have been proposed. Representatives of these codes are RS based LDPC codes and LDPC-CC, whose improvements are illustrated in Fig. 9 and Fig. 10. RS based LDPC codes are a special subclass of LDPC codes derived from RS codes [83], [84]. With quasi-cyclic coding structure inherits from RS codes, the decoder architecture can be simplified to achieve higher throughput. Similar to RS based LDPC codes, LDPC-CC is a kind of LDPC codes derived from CC, which was firstly proposed in [85], and further improved in [86]-[88]. Taking advantage of the convolutional coding structure, LDPC-CC also inherits the continuous encoding/decoding advantage from CC. As a result, both highly paralleled BP and Viterbi algorithms can be applied to reduce the decoding latency [89], [90]. Recently, a variant of



FIGURE 10. Performance comparison of RS concatenated/derived codes (code rate of CC in [69] is 1/2, QAM means quadrature amplitude modulation).

LDPC-CC codes, i.e., spatially-coupled LDPC codes have attracted much attention due to their excellent asymptotic properties [91], [92]. Since this type of codes has lower error floor and decoding complexity, they are promising candidates to be applied in WirelessHP applications.

#### D. PERFORMANCE ANALYSIS OF THE DERIVED CODES

BCH and LDPC concatenated codes can use highly paralleled BP decoding as the inner decoder. This advantage makes this solution superior to RS and NB Turbo concatenated codes, because the throughput of NB Turbo decoder is limited by the iteratively performed nature of Log-MAP decoding algorithm, even if the fully paralleled decoding is used. For RS and CC concatenated codes, at the cost of reduced code rate, the BER and error floor performance can be improved. For example, in Fig. 9, when RS(31,27) with maximum 10 bits burst error correction capability is concatenated with  $(CC(23, 35)_8, R = 0.5)$ , the coding gain is 2 dB higher at a BER of about  $10^{-5}$ . Except for Viterbi and stack sequential, RS and CC concatenated codes can use Turbo decoding algorithm. As shown in Fig. 10, for  $RS(15, 13) + (CC(21, 37)_8, 0.5)$  at a BER of  $3 \times 10^{-6}$ , the coding gain of Turbo decoding is about 2 dB superior to Viterbi decoding. However, since Turbo decoding is an iterative performed algorithm, it is impractical to be applied in WirelessHP targeted scenarios.

For RS based LDPC with code rate of 0.84, the BER and error floor performance are slightly inferior to BCH and LDPC concatenated codes. Although BCH and LDPC concatenated codes can achieve a good tradeoff between conflicting metrics (error floor and decoding latency), RS based LDPC codes are superior in terms of latency. The reason is that RS based LDPC codes are derived from RS codes, and thus have retained the cyclic coding structure of RS codes, which is suitable for highly paralleled BP decoding. Synthesis results have shown that the throughput can reach to 41 Gbps at clock frequency of 450 MHz [84]. For a LDPC-CC code with packet length of 422 bits and code rate of 5/6, the FPGA implemented decoder can provide 2-Gbps throughput with a clock frequency of 100 MHz, and has been demonstrated without error floor at BER of  $10^{-13}$  [89]. BER comparison in Fig. 10 also shown that, with the same code rate and packet length, LDPC-CC performs better than quasi-cyclic LDPC codes. Hence, RS based LDPC codes and LDPC-CC are very promising coding schemes in WirelessHP.

#### V. DIRECTIONS OF PROMISING CHANNEL CODING FOR WIRELESSHP

As reviewed in Sections III and IV, the research community has put much effort in developing coding schemes for short packet communications. For industrial wireless control in critical applications, channel coding is an indispensable solution, but still needs substantial progresses to meet the stringent requirements in latency and reliability [15], [93]. Maintaining the same focus on high reliability and low latency, the following subsection first states four principles to find feasible coding candidates for WirelessHP. Aiming to propel future research in this area, the remaining subsections then propose directions and challenges towards the construction of short packet coding schemes for WirelessHP.

#### A. PRINCIPLES FOR CODING SCHEMES IN WIRELESSHP

1) The error floor should be reduced to a reasonable level. Binary Turbo and LDPC codes are modern channel coding schemes, but show error floor at medium-to-high SNR regions when used for short packet communications. To address this defect, two strategies have been proved effective. The first is the extension from binary to high order Galois field, such as  $F_{256}$  Turbo codes and U-NBPB that are defined over GF(256). The minimum Hamming distance is enlarged, which leads to lower error floor than their binary counterparts (Fig. 6). The second is the use of concatenated or derived coding schemes, such as RS and CC concatenated codes, RS based LDPC codes, and LDPC-CC. With lower code rate, special coding schemes show better BER/PER and lower error floor than the individual component schemes.

2) The decoding algorithms should be practical in hardware implementation. Optimal BER/PER and error floor performance can be achieved with optimized decoding algorithms, but in the latency-constrained WirelessHP applications, the implementation of the adopted algorithm is a key factor. For example, when Turbo codes are considered, Log-MAP algorithm and its derivatives are usually applied. However, when this algorithm is performed by using fully paralleled decoding structure, it provides insufficient throughput. For CC concatenated coding schemes, ML decoding of CC can get optimal BER/PER performance. But in terms of decoding latency, Viterbi or stack

sequential decoding algorithms are more practical. To summarize, the non-iterative decoding algorithm, i.e., the highly paralleled BP decoding algorithms for LDPC and  $F_{256}$  Turbo codes, the OSD algorithm for polar codes, and the hard decision Viterbi or stack sequential algorithms for CC are preferred.

3) The decoding architectures should provide high throughput. As mentioned in 1) of this Section, by extension of the Galois field, or concatenation of two coding schemes, the BER/FER and error floor performance can be improved. However, there are also some penalties. For example, RS and non-binary Turbo concatenated codes have better BER/FER and error floor performance, but the long latency of Turbo decoders may not be acceptable for real-time control applications. Another example is the BCH and LDPC concatenated codes, a coding scheme that provides similar BER performance to that of the RS based LDPC codes. Considering the concatenated coding structure, it is inferior to RS based LDPC codes in terms of decoding latency. The third example is the F<sub>256</sub> Turbo codes. Although it can be decoded using the Log-MAP algorithm, the limited throughput makes the parallel performed BP decoding preferable.

4) The coding schemes should have powerful burst error correction capability. As it has been mentioned in Section II, WirelessHP is mainly used for industrial applications. In this kind of scenarios, layout and movement of the equipment, coexistence of different wireless devices, and shortterm electromagnetic radiation from switching on/off of high power appliances, may exert pulse interference to the transmitted packets. To counteract burst errors, interleaving/ de-interleaving is regarded as an effective measure. At the transmitter side, the order of the bits in one encoded packet is scattered as randomly as possible. At the receiver side, when the interleaved packet is de-interleaved to restore its previous order, the burst errors occured in the wireless channel will be scattered. However, when the interleaving/de-interleaving technique is applied to short packet, the ratio of burst errors length to packet length will increase. As a result, the error corrupted bits cannot be scattered far away enough from each other, hence there may be still burst errors in the deinterleaved packets. To guarantee high reliability, burst error correction capability is necessary for channel codes to be used in WirelessHP scenarios.

### B. DIRECTIONS AND CHALLENGES OF THE PROMISING CODING SCHEMES

Based on the above criteria, Table 4 groups the reviewed coding schemes into five families: Turbo codes, LDPC codes, polar codes, CC, and cyclic codes. The following subsections give comments on practical directions and challenges for each of these coding families. With clear guidance for future research, Table 5 summarizes the challenges of the feasible coding schemes when they are applied to WirelessHP, because these codes are originally proposed for general purpose wireless standards. On the contrary, Table 6 presents unique insights on constructing new coding schemes, which

| TABLE 4. | Directions | for short | packet coding | g schemes in | WirelessHP. |
|----------|------------|-----------|---------------|--------------|-------------|
|----------|------------|-----------|---------------|--------------|-------------|

| Coding family                               | Subclasses                                                                                                                         | Pros                                                                                                                                                                                                      | Cons                                                                                                                                                                                      | Directions                                                                                     |
|---------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|
| Turbo code family<br>and concatenated codes | Binary Turbo codes<br>TPC<br>F <sub>256</sub> Turbo codes<br>RS+(NB Turbo)                                                         | None<br>Parallel decoding, better error floor<br>Low error floor, BP parallel decoding<br>Low error floor                                                                                                 | High decoding complexity, high error floor<br>Prolonged encoding latency<br>High decoding complexity<br>Increased complexity, prolonged latency                                           | F <sub>256</sub> Turbo codes<br>TPC                                                            |
| LDPC code family<br>and concatenated codes  | Binary LDPC codes<br>Binary protograph LDPC<br>BCH+LDPC codes<br>U-NBPB<br>RS based LDPC codes<br>LDPC-CC (spatially coupled LDPC) | Parallel decoding, high error floor<br>Fast decoding, improved error floor<br>Low error floor<br>Low error floor, parallel decoding<br>Low error floor, parallel decoding<br>Low error floor, low latency | High decoding complexity<br>high decoding complexity<br>Increased complexity, prolonged latency<br>High decoding complexity<br>High decoding complexity<br>simplified decoding complexity | Binary protograph LDPC<br>U-NBPB<br>RS based LDPC codes<br>LDPC-CC<br>(spatially coupled LDPC) |
| Fountain code family                        | Systematic Raptor codes<br>Improved non-systematic<br>Raptor codes                                                                 | Parallel decoding, moderate complexity Parallel decoding                                                                                                                                                  | Poor PER performance<br>Moderate PER performance                                                                                                                                          | No recommendation                                                                              |
| Polar code family                           | Polar codes<br>CRC-aided polar codes                                                                                               | Parallel decoding, moderate complexity<br>Low error floor, parallel decoding                                                                                                                              | High error floor<br>Increased complexity and latency                                                                                                                                      | CRC-aided polar codes                                                                          |
| Cyclic code family                          | BCH codes<br>RS codes                                                                                                              | Low latency, simple decoding<br>Low latency, simple decoding,<br>burst error correction                                                                                                                   | Ordinary PER performance<br>Moderate PER performance                                                                                                                                      | No recommendation                                                                              |
| CC family                                   | CC                                                                                                                                 | Simple decoding, low latency                                                                                                                                                                              | Moderate PER performance                                                                                                                                                                  |                                                                                                |
| and concatenated codes                      | RS+CC (Viterbi decoding)                                                                                                           | Simple decoding, low latency,<br>better PER performance Increased complexity and latency                                                                                                                  |                                                                                                                                                                                           | RS+CC (Viterbi decoding)                                                                       |
|                                             | RS+CC (Turbo decoding)                                                                                                             | Low error floor                                                                                                                                                                                           | High complexity, increased latency                                                                                                                                                        |                                                                                                |

TABLE 5. Challenges of feasible coding schemes in future researches.

| Coding schemes                       | Packet<br>length | Future researches<br>on high reliability                                           | Future researches<br>on throughput          |
|--------------------------------------|------------------|------------------------------------------------------------------------------------|---------------------------------------------|
| F <sub>256</sub> Turbo<br>codes [32] | 256 bits         | Error floor<br>Burst error correction                                              | ASIC or FPGA implementation                 |
| TPC [34]                             | 256 bits         | Error floor<br>Burst error correction                                              | ASIC or FPGA implementation                 |
| U-NBPB [42]                          | 256 bits         | Error floor<br>Burst error correction                                              | ASIC or FPGA implementation                 |
| RS based LDPC<br>codes [84]          | 2048 bits        | Error floor<br>Burst error correction                                              | 41 Gbps, ASIC, clock<br>  frequency 450 MHz |
| CRC-aided polar<br>codes [60]        | 1024 bits        | Error floor<br>Burst error correction                                              | 13.9 Gbps, ASIC                             |
| LDPC-CC [89]                         | 422 bits         | Burst error correction                                                             | 2 Gbps, FPGA, clock<br>frequency 100 MHz    |
| RS+CC [67], [81]                     | 310 bits         | BER/PER, error floor<br>for different inner<br>and outer code rate<br>combinations | 2.56 Gbps, ASIC                             |

are specially designed to meet the utmost important requirements of WirelessHP, i.e., high reliability, low latency (high throughput) and short packet length.

1) Turbo code family: In Turbo code family,  $F_{256}$  Turbo codes are a promising solution. Defined over high order Galois field GF(256),  $F_{256}$  Turbo codes have enhanced BER/PER performance over binary Turbo and LDPC codes. Moreover,  $F_{256}$  Turbo codes are derived from LDPC codes, and the parallel BP decoding is also applicable to design multi-Gbps  $F_{256}$  Turbo decoder. Another important coding scheme is the TPC. TPC using parallel BP decoding has the same constraints on latency as LDPC codes, but outperforms LDPC codes in terms of BER/FER. As it can be seen from Fig. 6, although these codes have shown better BER/FER and error floor performance at the order of  $10^{-5}$ , it is not clear whether they can keep these characteristics at a much stringent order of  $10^{-9}$ . Additionally, hardware implementations of the BP decoding algorithm can reach multi-Gbps

VOLUME 6, 2018

throughput, but none of present works are designed for short packet  $F_{256}$  Turbo codes and TPC. The third challenge is to investigate the burst error correction capability. As a whole, future research of  $F_{256}$  Turbo codes and TPC should focus on these topics. For conventional Turbo codes, there are two challenges: reducing error floor and improving throughput. The first drawback can be addressed by using concatenated Turbo coding schemes, for example, RS and binary Turbo concatenated codes. As for the throughput constraint, Log-MAP algorithm has been shown to provide an insufficient throughput [33]. To be applied in WirelessHP, the highly paralleled BP decoding of Turbo codes and the corresponding hardware realization are the key technologies.

2) LDPC code family: Thanks to the inherent parallel decoding, throughput of FPGA based LDPC decoder is up to 160 Gbps [44], and can be increased to 400 Gbps by using ASIC [94]. Compared with the Viterbi decoded CC, binary LDPC codes are reported to have better BER/PER performance for high code rate with moderate latency [75], [95]. But for short packet communications, the error floor and the moderate latency make them unsuitable for WirelessHP. Fortunately, this defect is significantly alleviated by using the specially designed binary protograph LDPC codes [41]. As shown in Fig. 9, the BCH(255, 239)+LDPC(576, 0.83) concatenated code shows no error floor at BER of  $10^{-9}$ , which has similar performance as compared to the RS based LDPC code (2048, 0.83). But RS based LDPC codes are superior to RS and LDPC concatenated codes with lower decoding latency. Specially designed for short packet coding, U-NBPB is another promising candidate. At the cost of increased decoding complexity, this subclass of LDPC code family provides the best PER performance at medium-to-high SNR regions. RS based LDPC codes and U-NBPB have their advantages, but these codes still need further demonstrations. For RS based LDPC codes, the low error floor is shown

| Construction strategy   C                                                                          | Coding schemes                                                                                                                                                                                                                       | Future researches on high reliability                                                                                               | Future researches on throughput |
|----------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|---------------------------------|
| Extended to non-binary N                                                                           | NB CRC-aided polar codes                                                                                                                                                                                                             | BER/PER, error floor and burst error correction performance                                                                         | ASIC or FPGA implementation     |
| RS concatenated codes R<br>R<br>R<br>R<br>R<br>R<br>R<br>R<br>R<br>R<br>R<br>R<br>R<br>R<br>R<br>R | RS+(F <sub>256</sub> Turbo codes)<br>RS+TPC<br>RS+(Binary protograph LDPC)<br>RS+(U-NBPB)<br>RS+(CRC-aided polar codes)<br>RS+(NB CRC-aided polar codes)<br>RS+(RS based LDPC codes)<br>RS+(LDPC-CC )<br>RS+(spatially coupled LDPC) | BER/PER, error floor performance and burst error correction<br>performance, on condition of different outer and inner code<br>rates | ASIC or FPGA implementation     |

TABLE 6. Future researches of specially constructed short packet coding schemes for WirelessHP.

for a long packet length of 2048 bits. In the future research works, this kind of code should be used for short packet communications, in order to verify if the low error floor performance is still retained. For U-NBPB, the design of high throughput U-NBPB decoders is of interest, because most works on multi-Gbps LDPC decoders are designed for the binary subclasses. Similar to  $F_{256}$  Turbo codes and TPC, binary protograph LDPC codes, RS based LDPC, LDPC-CC and U-NBPB also lack of concrete researches to prove their burst error correction capability.

3) Polar code family: After a few years of research, the BER/PER performance of polar codes have been greatly improved by using CRC-aided decoding. Furthermore, high speed decoders that using SC or BP algorithms can achieve multi-Gbps throughput. With these improvements, CRC-aided polar codes can be qualified candidates for WirelessHP, if future researches will demonstrate that these codes can satisfy the error floor and burst error correction constraints. With the same packet length and code rate, Fig. 6 shows that the CRC-aided polar code is about 1 dB inferior to the  $F_{256}$  Turbo and U-NBPB at PER of  $10^{-4}$ . To reduce this gap, concatenation of RS and CRC-aided polar codes is a good choice. However, few papers have touched this issue [96], [97], and these researches only use conventional polar codes as the inner coding scheme. For future steps, application of concatenated RS and CRC-aided polar codes to short packet communications represents a valuable topic.

4) Cyclic code family and convolutional code family: With simple encoding/decoding overhead, BCH, RS and CC codes are preferred in latency-constrained scenarios. Unfortunately, when these codes are individually employed, the moderate BER/PER performance will limit their applications. In WirelessHP, these codes are better used as component codes to construct new coding schemes. For example, RS and CC concatenated codes, RS based LDPC codes and LDPC-CC. For concatenated coding schemes, with the same overall code rate, different combinations of inner code rate and outer code rate will show different performance. Therefore, parameters of the component codes should be carefully designed. As for the derived coding schemes, present simulation results are obtained under the assumption of additive white Gaussian noise channel model. Since pulse interference is common in

factory scenarios, future works should concentrate on testing their burst error correction capability. Other research topics include error floor and hardware implementation of multi-Gbps decoders, which have been mentioned in the previous paragraphs.

## C. INSPIRATION ON CONSTRUCTING NEW CODES FOR WIRELESSHP

The recommended codes in Table 4 are promising to be applied, but they are not originally designed for high performance wireless control scenarios. Motivated by constructing strategy of these codes, and aimed at the unique characteristics declared in subsection B and C of Section II, it is possible to construct new coding schemes suitable for WirelessHP. To summarize, the proposed coding schemes in Table 6 are important candidates to meet the stringent requirements of WirelessHP.

1) Non-binary coding schemes with low decoding latency. For codes defined over high order Galois field, such as F256 Turbo and RS codes, the codes can achieve larger minimum Hamming distance, which is effective to improve BER/PER and error floor performance. The cost of these codes is the increased decoding complexity. An attractive candidate is represented by non-binary polar codes [98], [99]. Presently, no researches have applied it for short packet communications. To meet the low latency constraint, three decoding algorithms are preferred: non-iterative OSD decoding [46], sphere decoding [58] and highly paralleled BP decoding [60]. With fast decoding architecture or simplified decoding complexity, these algorithms have shown the ability to provide mulit-Gbps throughput. In the future works, researchers are encouraged to test the BER/PER and error floor performance, or to construct non-binary polar codes concatenated/derived coding schemes for WirelessHP applications.

2) RS based concatenated coding schemes. For short packet communications in WirelessHP, burst errors are a common phenomenon that should be eliminated as much as possible. With predefined symbol length and error correction parameters, RS codes are flexible to correct burst errors with different length. Therefore, RS concatenated coding schemes are of special importance in this area. As the outer code, non-iterative decoding performed RS decoders have a high throughput. For this reason, the challenge is to select qualified inner codes. In general, F<sub>256</sub> Turbo codes, TPC, U-NBPB, binary protograph LDPC, CRC-aided polar codes, RS based LDPC codes, and LDPC-CC are all suitable candidates. With improved BER/PER, error floor and high throughput decoding structures, these RS concatenated codes are superior to those recommended in Table 4. However, for different code rate, these inner codes may have different BER/PER performance, their burst error correction capability also needs detailed demonstration. With the same overall code rate, different combinations of inner and outer code rate have different overall performance. To decide which kinds of RS concatenated codes are the preferred options, these coding schemes should be carefully investigated in future research works.

#### **VI. CONCLUSIONS**

In the industrial applications targeted by WirelessHP, the inherent deterministic behavior requires high reliability and stringent latency. For short packet communications, this paper has given a comprehensive survey of channel coding schemes in current wireless communication standards and IWSNs. Moreover, we propose four principles to seek for promising candidates. It has been shown that improved performance can be achieved by coding schemes defined over high order Galois fields, constructed by concatenation of different coding schemes, or generated with special coding structures. For each code family, we gave detailed comparison and analysis of the candidate short packet coding schemes, in terms of BER/PER, error floor and throughput. Then, challenges and directions are suggested for future research. Subsequently, we present some possible strategies to construct new codes specially designed for WirelessHP.

Presently, we studied by experiments on the impact of channel coding for control and tracking, which is the main use case of WirelessHP. We have developed a hardware platform for further demonstration, which is composed of two Windows PCs and two Universal Software Radio Peripherals (USRP) model X310. One PC and one USRP are connected through 10 Gigabits interface to act as a transmitter, and the other set of PC and USRP are similarly coupled together to act as a receiver. With our developed MATLAB programs that run on the transmitter and receiver PCs, the impact of channel coding on WirelessHP is tested and analyzed, under conditions of the same transmission power but with different coding schemes (as recommended in Tables 4 and 6 of this paper), code rates and modulation orders in industrial automation environment. We finally note that this paper focused on error correction to reduce BER/PER and fast decoding to reduce latency. Some critical applications may instead need to focus on error detection to minimize the probability of undetected errors. This is also left for further research.

#### ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers for their valuable comments to improve the presentation of this paper.

- R. Drath and A. Horch, "Industrie 4.0: Hit or hype? [Industry forum]," IEEE Ind. Electron. Mag., vol. 8, no. 2, pp. 56–58, Jun. 2014.
- [2] A. Willig, K. Matheus, and A. Wolisz, "Wireless technology in industrial networks," *Proc. IEEE*, vol. 93, no. 6, pp. 1130–1150, Jun. 2005.
- [3] HART Communication Foundation. (2007). HART Field Communication Protocol Specification, Revision 7.0. [Online]. Available: http://www.hartcomm.org/
- [4] Industrial Networks—Wireless Communication Network and Communication Profiles—WIA-PA, document IEC 62601, International Electrotechnical Commission, 2015.
- [5] Industrial Networks—Wireless Communication Network and Communication Profiles—WIA-FA, document IEC 62948, International Electrotechnical Commission, 2015.
- [6] M. Luvisotto, Z. Pang, and D. Dzung, "Ultra high performance wireless control for critical applications: Challenges and directions," *IEEE Trans. Ind. Informat.*, vol. 13, no. 3, pp. 1448–1459, Jun. 2017.
- [7] H. Gerlach-Erhardt, "Real time requirements in industrial automation," ETSI Wireless Factory Starter Group, Tech. Rep. PNO TC2/WG12, Oct. 2009.
- [8] O. Eriksson, E. Björnemo, A. Ahlén, and M. Gidlund, "On hybrid ARQ adaptive forward error correction in wireless sensor networks," in *Proc. 37th Annu. Conf. IEEE Ind. Electron. Soc.*, Nov. 2011, pp. 3004–3010.
- [9] J.-S. Ahn, J.-H. Yoon, and K.-W. Lee, "Performance and energy consumption analysis of 802.11 with FEC codes over wireless sensor networks," *J. Commun. Netw.*, vol. 9, no. 3, pp. 265–273, Sep. 2007.
- [10] H. D. Tiwari, H. N. Bao, and Y. B. Cho, "Flexible LDPC decoder using stream data processing for 802.11n and 802.16e," *IEEE Trans. Consum. Electron.*, vol. 57, no. 4, pp. 1505–1512, Nov. 2011.
- [11] M. Li, Y. Lee, Y. Huang, and L. Van der Perre, "Area and energy efficient 802.11ad LDPC decoding processor," *Electron. Lett.*, vol. 51, no. 4, pp. 339–341, Feb. 2015.
- [12] N. A. Johansson, Y. P. E. Wang, E. Eriksson, and M. Hessler, "Radio access for ultra-reliable and low-latency 5G communications," in *Proc. IEEE Int. Conf. Commun. Workshop (ICCW)*, Jun. 2015, pp. 1184–1189.
- [13] F. Tramarin, S. Vitturi, M. Luvisotto, and A. Zanella, "On the use of IEEE 802.11n for industrial communications," *IEEE Trans. Ind. Informat.*, vol. 12, no. 5, pp. 1877–1886, Oct. 2016.
- [14] G. Durisi, T. Koch, and P. Popovski, "Toward massive, ultrareliable, and low-latency wireless communication with short packets," *Proc. IEEE*, vol. 104, no. 9, pp. 1711–1726, Sep. 2016.
- [15] M. Luvisotto, Z. Pang, D. Dzung, M. Zhan, and X. Jiang, "Physical layer design of high performance wireless transmission for critical control applications," *IEEE Trans. Ind. Informat.*, vol. 13, no. 6, pp. 2844–2854, Dec. 2017.
- [16] M. Weiner, M. Jorgovanovic, A. Sahai, and B. Nikolié, "Design of a low-latency, high-reliability wireless communication system for control applications," in *Proc. IEEE Int. Conf. Commun. (ICC)*, Jun. 2014, pp. 3829–3835.
- [17] Y. H. Yitbarek, K. Yu, J. Åkerberg, M. Gidlund, and M. Björkman, "Implementation and evaluation of error control schemes in industrial wireless sensor networks," in *Proc. IEEE Int. Conf. Ind. Technol. (ICIT)*, Mar. 2014, pp. 730–735.
- [18] A. Sahai and S. Mitter, "The necessity and sufficiency of anytime capacity for stabilization of a linear system over a noisy communication link— Part I: Scalar systems," *IEEE Trans. Inf. Theory*, vol. 52, no. 8, pp. 3369–3395, Aug. 2006.
- [19] S. Matloub and T. Weissman, "Universal zero-delay joint source-channel coding," *IEEE Trans. Inf. Theory*, vol. 52, no. 12, pp. 5240–5250, Dec. 2006.
- [20] S. C. Draper and A. Sahai, "Universal anytime coding," in Proc. 5th Int. Symp. Modeling Optim. Mobile, Ad Hoc Wireless Netw. Workshops, Apr. 2007, pp. 1–5.
- [21] R. T. Sukhavasi and B. Hassibi, "Linear error correcting codes with anytime reliability," in *Proc. IEEE Int. Symp. Inf. Theory (ISIT)*, Jul. 2011, pp. 1748–1752.
- [22] R. T. Sukhavasi and B. Hassibi, "Linear time-invariant anytime codes for control over noisy channels," *IEEE Trans. Autom. Control*, vol. 61, no. 12, pp. 3826–3841, Dec. 2016.
- [23] M. Noor-A-Rahim, K. D. Nguyen, G. Lechner, and Y. L. Guan, "Design and analysis of anytime codes for relay channels," *IEEE Trans. Commun.*, vol. 66, no. 4, pp. 1349–1362, Apr. 2018.

- [24] A. Shirazinia, L. Bao, and M. Skoglund, "Sufficient conditions for stabilization in feedback control over noisy channels using anytime rateless codes," in *Proc. Amer. Control Conf. (ACC)*, Jun. 2012, pp. 1254–1259.
- [25] A. Tarable, A. Nordio, F. Dabbene, and R. Tempo, "Anytime reliable LDPC convolutional codes for networked control over wireless channel," in *Proc. IEEE Int. Symp. Inf. Theory (ISIT)*, Jul. 2013, pp. 2064–2068.
- [26] L. Dössel, L. K. Rasmussen, R. Thobaben, and M. Skoglund, "Anytime reliability of systematic LDPC convolutional codes," in *Proc. IEEE Int. Conf. Commun. (ICC)*, Jun. 2012, pp. 2171–2175.
- [27] N. Zhang, M. Noor-A-Rahim, B. N. Vellambi, and K. D. Nguyen, "Anytime characteristics of protograph-based LDPC convolutional codes," *IEEE Trans. Commun.*, vol. 64, no. 10, pp. 4057–4069, Oct. 2016.
- [28] M. Noor-A-Rahim, K. D. Nguyen, and G. Lechner, "Anytime reliability of spatially coupled codes," *IEEE Trans. Commun.*, vol. 63, no. 4, pp. 1069–1080, Apr. 2015.
- [29] S. Weithoffer and N. Wehn, "Latency reduction for LTE/LTE-A turbo-code decoders by on-the-fly calculation of CRC," in *Proc. IEEE 26th Annu. Int. Symp. Pers., Indoor, Mobile Radio Commun. (PIMRC)*, Sep. 2015, pp. 1409–1414.
- [30] I. Perez-Andrade, S. Zhong, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, "Stochastic computing improves the timing-error tolerance and latency of turbo decoders: Design guidelines and tradeoffs," *IEEE Access*, vol. 4, pp. 1008–1038, 2016.
- [31] C. Berrou, A. Glavieux, and P. Thitimajshima, "Near Shannon limit errorcorrecting coding and decoding: Turbo-codes," in *Proc. IEEE Int. Conf. Commun. (ICC)*, May 1993, pp. 1064–1070.
- [32] G. Liva, E. Paolini, B. Matuz, S. Scalise, and M. Chiani, "Short turbo codes over high order fields," *IEEE Trans. Commun.*, vol. 61, no. 6, pp. 2201–2211, Jun. 2013.
- [33] A. Li, P. Hailes, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, "1.5 Gbit/s FPGA implementation of a fully-parallel turbo decoder designed for mission-critical machine-type communication applications," *IEEE Access*, vol. 4, pp. 5452–5473, 2016.
- [34] H. Mukhtar, A. Al-Dweik, and A. Shami, "Turbo product codes: Applications, challenges, and future directions," *IEEE Commun. Surveys Tuts.*, vol. 18, no. 4, pp. 3052–3069, 4th Quart., 2016.
- [35] Y. Hu, J. P. Fonseka, Y. Bo, E. M. Dowling, and M. Torlak, "Constrained turbo product and block-convolutional codes in wireless applications," *IEEE Trans. Veh. Technol.*, vol. 66, no. 5, pp. 4491–4495, May 2017.
- [36] P. Hailes, L. Xu, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, "A survey of FPGA-based LDPC decoders," *IEEE Commun. Surveys Tuts.*, vol. 18, no. 2, pp. 1098–1122, 2nd Quart., 2015.
- [37] D. J. C. Mackay and R. M. Neal, "Near Shannon limit performance of low density parity check codes," *Electron. Lett.*, vol. 32, no. 18, p. 1645, Aug. 1996.
- [38] R. Asvadi, A. H. Banihashemi, and M. A. Attari, "Design of finite-length irregular protograph codes with low error floors over the binary-input AWGN channel using cyclic liftings," *IEEE Trans. Commun.*, vol. 60, no. 4, pp. 902–907, Apr. 2012.
- [39] Y. Fang, Y. L. Guan, G. Bi, L. Wang, and F. C. M. Lau, "Rate-compatible root-protograph LDPC codes for quasi-static fading relay channels," *IEEE Trans. Veh. Technol.*, vol. 65, no. 4, pp. 2741–2747, Apr. 2016.
- [40] Y. Fang, S. C. Liew, and T. Wang, "Design of distributed protograph LDPC codes for multi-relay coded-cooperative networks," *IEEE Trans. Wireless Commun.*, vol. 16, no. 11, pp. 7235–7251, Nov. 2017.
- [41] Y. Fang, G. Bi, Y. L. Guan, and F. C. M. Lau, "A survey on protograph LDPC codes and their applications," *IEEE Commun. Surveys Tuts.*, vol. 17, no. 4, pp. 1989–2016, 4th Quart., 2015.
- [42] B. Y. Chang, D. Divsalar, and L. Dolecek, "Non-binary protograph-based LDPC codes for short block-lengths," in *Proc. IEEE Inf. Theory Workshop*, Sep. 2012, pp. 282–286.
- [43] V. Rybalkin, P. Schlafer, and N. Wehn, "A new architecture for high speed, low latency NB-LDPC check node processing for GF(256)," in *Proc. IEEE* 83rd Veh. Technol. Conf. (VTC Spring), May 2016, pp. 1–5.
- [44] P. Schläfer, N. Wehn, M. Alles, and T. Lehnigk-Emden, "A new dimension of parallelism in ultra high throughput LDPC decoding," in *Proc. IEEE Workshop Signal Process. Syst. (SiPS)*, Oct. 2013, pp. 153–158.
- [45] K.-L. Huang, V. C. Gaudet, and M. Salehi, "A low-latency algorithm for stochastic decoding of LDPC codes," in *Proc. 53rd Annu. Allerton Conf. Commun., Control, Comput. (Allerton)*, Oct. 2015, pp. 1510–1515.
- [46] D. Wu, Y. Li, X. Guo, and Y. Sun, "Ordered statistic decoding for short polar codes," *IEEE Commun. Lett.*, vol. 20, no. 6, pp. 1064–1067, Jun. 2016.

- [47] J. W. Byers, M. Luby, M. Mitzenmacher, and A. Rege, "A digital fountain approach to reliable distribution of bulk data," in *Proc. ACM SIGCOMM*, Sep. 1998, pp. 56–67.
- [48] R. Cao and L. Yang, "Decomposed LT codes for cooperative relay communications," *IEEE J. Sel. Areas Commun.*, vol. 30, no. 2, pp. 407–414, Feb. 2012.
- [49] L. Sun, P. Ren, Q. Du, and Y. Wang, "Fountain-coding aided strategy for secure cooperative transmission in industrial wireless sensor networks," *IEEE Trans. Ind. Informat.*, vol. 12, no. 1, pp. 291–300, Feb. 2016.
- [50] K. Pang, Z. Lin, B. F. Uchoa-Filho, and B. Vucetic, "Distributed network coding for wireless sensor networks based on rateless LT codes," *IEEE Wireless Commun. Lett.*, vol. 1, no. 6, pp. 561–564, Dec. 2012.
- [51] J. K. Zao, M. Hornansky, and P.-L. Diao, "Design of optimal short-length LT codes using evolution strategies," in *Proc. IEEE Congr. Evol. Comput.*, Jun. 2012, pp. 1–9.
- [52] H. Chen, R. G. Maunder, Y. Ma, R. Tafazolli, and L. Hanzo, "Hybrid-ARQ-aided short fountain codes designed for block-fading channels," *IEEE Trans. Veh. Technol.*, vol. 64, no. 12, pp. 5701–5712, Dec. 2015.
- [53] A. Kharel and L. Cao, "Improved decoding for raptor codes with short block-lengths over BIAWGN channel," in *Proc. Int. Conf. Comput., Inf. Telecommun. Syst. (CITS)*, Jul. 2016, pp. 1–5.
- [54] E. Arıkan, "Channel polarization: A method for constructing capacityachieving codes for symmetric binary-input memoryless channels," *IEEE Trans. Inf. Theory*, vol. 55, no. 7, pp. 3051–3073, Jul. 2009.
- [55] K. Niu and K. Chen, "CRC-aided decoding of polar codes," *IEEE Commun. Lett.*, vol. 16, no. 10, pp. 1668–1671, Oct. 2012.
- [56] K. Niu, K. Chen, J. Lin, and Q. T. Zhang, "Polar codes: Primary concepts and practical decoding algorithms," *IEEE Commun. Mag.*, vol. 52, no. 7, pp. 192–203, Jul. 2014.
- [57] K. Niu, K. Chen, and J.-R. Lin, "Beyond turbo codes: Rate-compatible punctured polar codes," in *Proc. IEEE Int. Conf. Commun. (ICC)*, Jun. 2013, pp. 3423–3427.
- [58] K. Niu, K. Chen, and J. Lin, "Low-complexity sphere decoding of polar codes based on optimum path metric," *IEEE Commun. Lett.*, vol. 18, no. 2, pp. 332–335, Feb. 2014.
- [59] O. Dizdar and E. Arikan, "A high-throughput energy-efficient implementation of successive cancellation decoder for polar codes using combinational logic," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 63, no. 3, pp. 436–447, Mar. 2016.
- [60] S. M. Abbas, Y. Fan, J. Chen, and C.-Y. Tsui, "High-throughput and energy-efficient belief propagation polar code decoder," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 25, no. 3, pp. 1098–1111, Mar. 2017.
- [61] G. Balakrishnan, M. Yang, Y. Jiang, and Y. Kim, "Performance analysis of error control codes for wireless sensor networks," in *Proc. 4th Int. Conf. Inf. Technol.*, Apr. 2007, pp. 876–879.
- [62] S. Tong, D. Lin, A. Kavcic, B. Bai, and L. Ping, "On short forward errorcorrecting codes for wireless communication systems," in *Proc. 16th Int. Conf. Comput. Commun. Netw.*, Aug. 2007, pp. 391–396.
- [63] M. Ismail, S. Denic, and J. Coon, "Efficient decoding of short length linear cyclic codes," *IEEE Commun. Lett.*, vol. 19, no. 4, pp. 505–508, Apr. 2015.
- [64] S. Scholl and N. Wehn, "Advanced hardware architecture for soft decoding Reed–Solomon codes," in *Proc. 8th Int. Symp. Turbo Codes Iterative Inf. Process. (ISTC)*, Aug. 2014, pp. 22–26.
- [65] Y. M. Lin, H. C. Chang, and C. Y. Lee, "Improved high code-rate soft BCH decoder architectures with one extra error compensation," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 21, no. 11, pp. 2160–2164, Nov. 2013.
- [66] Y. Lee, H. Yoo, I. Yoo, and I.-C. Park, "High-throughput and lowcomplexity BCH decoding architecture for solid-state drives," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 22, no. 5, pp. 1183–1187, May 2014.
- [67] Y.-M. Lin, C.-H. Hsu, H.-C. Chang, and C.-Y. Lee, "A 2.56 Gb/s soft RS (255, 239) decoder chip for optical communication systems," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 7, pp. 2110–2118, Jul. 2014.
- [68] X. Peng, W. Zhang, W. Ji, Z. Liang, and Y. Liu, "Reduced-complexity multiplicity assignment algorithm and architecture for low-complexity chase decoder of Reed–Solomon codes," *IEEE Commun. Lett.*, vol. 19, no. 11, pp. 1865–1868, Nov. 2015.
- [69] L. Chen, "Turbo decoding performance of spectrally efficient RS convolutional concatenated codes," in *Proc. Int. Workshop High Mobility Wireless Commun.*, Nov. 2014, pp. 57–62.
- [70] C. Rachinger, J. B. Huber, and R. R. Müller, "Comparison of convolutional and block codes for low structural delay," *IEEE Trans. Commun.*, vol. 63, no. 12, pp. 4629–4638, Dec. 2015.

- [71] D.-S. Yoo, W. E. Stark, K.-P. Yar, and S.-J. Oh, "Coding and modulation for short packet transmission," *IEEE Trans. Veh. Technol.*, vol. 59, no. 4, pp. 2104–2109, May 2010.
- [72] C. Studer, S. Fateh, C. Benkeser, and Q. Huang, "Implementation tradeoffs of soft-input soft-output MAP decoders for convolutional codes," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 59, no. 11, pp. 2774–2783, Nov. 2012.
- [73] T. Hehn and J. B. Huber, "LDPC codes and convolutional codes with equal structural delay: A comparison," *IEEE Trans. Commun.*, vol. 57, no. 6, pp. 1683–1692, Jun. 2009.
- [74] N. ul Hassan, M. Lentmaier, and G. P. Fettweis, "Comparison of LDPC block and LDPC convolutional codes based on their decoding latency," in *Proc. 7th Int. Symp. Turbo Codes Iterative Inf. Process. (ISTC)*, Aug. 2012, pp. 225–229.
- [75] S. V. Maiya, D. J. Costello, T. E. Fuja, and W. Fong, "Coding with a latency constraint: The benefits of sequential decoding," in *Proc. 48th Annu. Allerton Conf. Commun., Control, Comput. (Allerton)*, Oct. 2010, pp. 201–207.
- [76] M. M. Kermani, V. Singh, and R. Azarderakhsh, "Reliable low-latency Viterbi algorithm architectures benchmarked on ASIC and FPGA," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 1, pp. 208–216, Jan. 2017.
- [77] J. T. Tengdin, K. Fodero, and R. Schwartz, "Ensuring error-free performance of communications equipment," *J. Reliable Power-SEL*, vol. 3, no. 2, pp. 4–11, Aug. 2012.
- [78] W. Chen and T. Dong, "Low complexity product codes with LDPC codes achieving ultra low BER," in *Proc. IEEE 14th Int. Conf. Commun. Tech*nol., Nov. 2012, pp. 1312–1316.
- [79] P.-H. Chen, J.-J. Weng, C.-H. Wang, and P.-N. Chen, "BCH code selection and iterative decoding for BCH and LDPC concatenated coding system," *IEEE Commun. Lett.*, vol. 17, no. 5, pp. 980–983, May 2013.
- [80] P. Shi, M.-R. Hou, and R. Lv, "Performance of non-binary RS-turbo concatenated codes," in *Proc. 4th IEEE Int. Conf. Circuits Syst. Commun.*, May 2008, pp. 773–776.
- [81] Y. Li and M. Salehi, "An efficient decoding algorithm for concatenated RSconvolutional codes," in *Proc. 43rd Annu. Conf. Inf. Sci. Syst.*, Mar. 2009, pp. 411–413.
- [82] M. Hernaez, P. M. Crespo, and J. D. Ser, "Flexible channel coding approach for short-length codewords," *IEEE Commun. Lett.*, vol. 16, no. 9, pp. 1508–1511, Sep. 2012.
- [83] I. Djurdjevic, J. Xu, K. Abdel-Ghaffar, and S. Lin, "A class of low-density parity-check codes constructed based on Reed–Solomon codes with two information symbols," *IEEE Commun. Lett.*, vol. 7, no. 7, pp. 317–319, Jul. 2003.
- [84] S.-I. Hwang and H. Lee, "Block-circulant RS-LDPC code: Code construction and efficient decoder design," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 21, no. 7, pp. 1337–1341, Jul. 2013.
- [85] A. J. Felstrom and K. S. Zigangirov, "Time-varying periodic convolutional codes with low-density parity-check matrix," *IEEE Trans. Inf. Theory*, vol. 45, no. 6, pp. 2181–2191, Sep. 1999.
- [86] M. P. C. Fossorier, "Quasicyclic low-density parity-check codes from circulant permutation matrices," *IEEE Trans. Inf. Theory*, vol. 50, no. 8, pp. 1788–1793, Aug. 2004.
- [87] A. E. Pusane, R. Smarandache, P. O. Vontobel, and D. J. Costello, "Deriving good LDPC convolutional codes from LDPC block codes," *IEEE Trans. Inf. Theory*, vol. 57, no. 2, pp. 835–857, Feb. 2011.
- [88] L. Mu, Z. Liu, and Y. Fang, "Construction of time-invariant ratecompatible-low-density parity-check convolutional codes," *IET Commun.*, vol. 10, no. 9, pp. 1021–1026, Jun. 2015.
- [89] C.-W. Sham, X. Chen, F. C. M. Lau, Y. Zhao, and W. M. Tam, "A 2.0 Gb/s throughput decoder for QC-LDPC convolutional codes," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 60, no. 7, pp. 1857–1869, Jul. 2013.
- [90] Y. Chen, Q. Zhang, D. Wu, C. Zhou, and X. Zeng, "An efficient multirate LDPC-CC decoder with a layered decoding algorithm for the IEEE 1901 standard," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 61, no. 12, pp. 992–996, Dec. 2014.
- [91] P. M. Olmos and R. L. Urbanke, "A scaling law to predict the finite-length performance of spatially-coupled LDPC codes," *IEEE Trans. Inf. Theory*, vol. 61, no. 6, pp. 3164–3184, Jun. 2015.
- [92] K. Liu, M. El-Khamy, and J. Lee, "Finite-length algebraic spatiallycoupled quasi-cyclic LDPC codes," *IEEE J. Sel. Areas Commun.*, vol. 34, no. 2, pp. 329–344, Feb. 2016.

- [93] Z. Pang, M. Luvisotto, and D. Dzung, "Wireless high-performance communications: The challenges and opportunities of a new target," *IEEE Ind. Electron. Mag.*, vol. 11, no. 3, pp. 20–25, Sep. 2017.
- [94] K. Cushon, P. Larsson-Edefors, and P. Andrekson, "Low-power 400-Gbps soft-decision LDPC FEC for optical transport networks," *J. Lightw. Technol.*, vol. 34, no. 18, pp. 4304–4311, Sep. 15, 2016.
- [95] S. V. Maiya, D. J. Costello, and T. E. Fuja, "Low latency coding: Convolutional codes vs. LDPC codes," *IEEE Trans. Commun.*, vol. 60, no. 5, pp. 1215–1225, May 2012.
- [96] P. Trifonov, V. Miloslavskaya, C. Chen, and Y. Wang, "Fast encoding of polar codes with Reed–Solomon kernel," *IEEE Trans. Commun.*, vol. 64, no. 7, pp. 2746–2753, Jul. 2016.
- [97] Y. Wang, W. Zhang, Y. Liu, L. Wang, and Y. Liang, "An improved concatenation scheme of polar codes with Reed–Solomon codes," *IEEE Commun. Lett.*, vol. 21, no. 3, pp. 468–471, Mar. 2017.
- [98] D. Wu, Y. Li, and Y. Sun, "Rate assignment for multi-level polarised non-binary polar codes," *IET Commun.*, vol. 10, no. 10, pp. 1151–1155, Jul. 2016.
- [99] M.-C. Chiu, "Non-binary polar codes with channel symbol permutations," in Proc. Int. Symp. Inf. Theory Appl., Oct. 2014, pp. 433–437.



**MING ZHAN** received the Ph.D. degree from the National Key Laboratory of Science and Technology on Communications, University of Electronic Science and Technology of China, Chengdu, China, in 2013. From 2016 to 2017, he was a Visiting Scholar with the KTH Royal Institute of Technology, Stockholm, and the ABB Corporate Research Center, Västerås, Sweden. He is currently an Associate Professor with the College of Electronic and Information Engineering,

Southwest University, Chongqing, China. His research interests include lowcomplexity and energy-efficient error correction decoder, wireless sensor networks, and high-performance communications in industrial automation.



**ZHIBO PANG** received the B.Eng. degree in electronic engineering from Zhejiang University, Hangzhou, China, in 2002, the M.B.A. degree in innovation and growth from the University of Turku, Turku, Finland, in 2012, and the Ph.D. degree in electronic and computer systems from the KTH Royal Institute of Technology, Stockholm, Sweden, in 2013. He is currently a Principal Scientist and the Project Manager of industrial Internet of Things with ABB Corporate

Research, Västerås, Sweden, leading research projects on digitalization solutions for smart homes and buildings, factory and manufacturing, and power systems. He is also serving as an Adjunct Professor or similar roles with universities, such as the KTH Royal Institute of Technology, Tsinghua University, China, and the Beijing University of Posts and Telecommunications, China. His current research interests include the Industry 4.0, real-time cyber physical systems, Internet of Things, wireless control network, industrial communication, real-time embedded system, high-accuracy localization and navigation, enterprise systems, automation and robotics, multicore systemon-chip, network-on-chip, and business technology joint research, such as strategy, business model, value chain, and entrepreneurship and intrepreneurship. He serves as the Chair of Sub-Technical Committee in the Technical Committee on Industrial Informatics and the Industrial Electronics Society of the IEEE. He is an Associate Editor of the IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, the IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, and the IEEE REVIEW ON Biomedical ENGINEERING, and a Guest Editor of the IEEE ACCESS. He serves on the Editorial Board of the Journal of Management Analytics (Taylor & Francis), the Journal of Industrial Information Integration (Elsevier), and the International Journal of Modeling, Simulation, and Scientific Computing (WorldScientific).



**DACFEY DZUNG** received the M.Sc. and Ph.D. degrees in electrical engineering from the Swiss Federal Institute of Technology in 1975 and 1981, respectively. He has been with Brown, Boveri & Cie, Alcatel Mobile, Ascom, and Bosch Telecom Inc. Since 1997, he has been with the ABB Corporate Research Center, Baden, Switzerland, where he is currently an ABB Corporate Research Fellow with Industrial and Utility Communication. He was involved in a variety of communication

systems, including satellite and cellular mobile radio, industrial wireless sensors, and powerline communications. His main technical contributions are in the design of communication protocols and of modem signal processing algorithms. He has also studied cyber security issues in industrial and utility communication systems. His current technical interests are in communication networks for factory automation, process automation, and the smart grid, with a focus on networks using heterogeneous technologies, such as wireless and powerline communications. He was a member of the working groups of the European Telecommunication Standards Institute (ETSI) specifying the digital cellular standard GSM, digital microwave links (RES), and the trunked mobile radio system TETRA. He is a member of the IEEE P1901.2 Working Group on Narrowband Powerline Communication, the Swiss National Delegate to the IEC Working Group SC65c/WG16 on industrial wireless networks, and ETSI and CENELEC Working Groups on standardization and regulation of industrial wireless applications. He also serves on the program committees of the conference series on Emerging Technologies and Factory Automation and Workshops on Factory Communication Systems. He served as an Associate Editor for the IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS.



**MING XIAO** received the bachelor's and master's degrees in engineering from the University of Electronic Science and Technology of China, Chengdu, China, in 1997 and 2002, respectively, and the Ph.D. degree from the Chalmers University of Technology, Göthenburg, Sweden, in 2007. From 1997 to 1999, he was a Network and Software Engineer with China Telecom. From 2000 to 2002, he also held a position with Sichuan Communications Administration. Since 2007, he has

been with the School of Electrical Engineering, KTH Royal Institute of Technology, Sweden, where he is currently an Associate Professor of communications theory. He received best paper awards from the International Conference on Wireless Communications and Signal Processing in 2010 and the IEEE International Conference on Computer Communication Networks in 2011. He received the Hans Werthen Grant from the Royal Swedish Academy of Engineering Science in 2006, the Chinese Government Award for Outstanding Self-Financed Students Studying Abroad in 2007, and Ericsson Research Funding from Ericsson in 2010. Since 2012, he has been an Associate Editor of the IEEE TRANSACTIONS ON COMMUNICATIONS, the IEEE COMMUNICATIONS LETTERS (a Senior Editor since 2015), and the IEEE WIRELESS COMMUNICATIONS LETTERS.

...