Loading web-font TeX/Math/Italic
Multi-Set Space-Time Shift Keying Assisted Adaptive Inter-Layer FEC for Wireless Video Streaming | IEEE Journals & Magazine | IEEE Xplore

Multi-Set Space-Time Shift Keying Assisted Adaptive Inter-Layer FEC for Wireless Video Streaming


Architecture of a layered video scheme.

Abstract:

The scalable video coding extension of H.265/high efficiency video coding is capable of supporting diverse video resolutions. Due to the fact that the enhancement layer (...Show More

Abstract:

The scalable video coding extension of H.265/high efficiency video coding is capable of supporting diverse video resolutions. Due to the fact that the enhancement layer (EL) is encoded and decoded with the aid of the base layer (BL), the decodability of ELs is conditioned on that of the BL. Hence, the EL frames occasionally have to be discarded as a result of corrupted BL frames. Therefore, potent unequal error protection schemes have been conceived for providing a stronger protection to the BL by invoking a lower coding rate based forward error correction (FEC) or a higher transmit power for a specific multiple-input multiple-output sub-channel. This stronger protection is essential for the BL for the sake of both avoiding the waste of resources caused by the dropped undecodable EL bits and for improving the reconstructed video quality. In this treatise, we propose an adaptive system for transmitting the layered video bit stream over the wireless channel. The proposed system is capable of selecting an appropriate interlayer (IL) operation-aided FEC (IL-FEC) scheme in order to maintain the robustness of the system by comparing the near-instantaneous signal-to-noise ratio (SNR) to pre-recorded reconfiguration thresholds. Our simulation results show that the proposed adaptive system is capable of providing better video quality over a large proportion of the channel SNR range than its corresponding fixed-mode counterparts.
Architecture of a layered video scheme.
Published in: IEEE Access ( Volume: 7)
Page(s): 3592 - 3609
Date of Publication: 20 December 2018
Electronic ISSN: 2169-3536

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

Nomenclature

AbbreviationExpansion
AC

Atenna Combination

ACS

Adaptive Channel Selection

AQAM

Adaptive Quadrature Amplitude Modulation

AS

Antenna Set

ASU

Antenna Selection Unit

AVC

Advanced Video Coding

BbB

Burst-by-Burst

BER

Bit Error Ratio

BL

Base Layer

BLAST

Bell-Labs Layered Space-Time

BPCU

Bit Per Channel Use

CC

Convolutional Code

CIF

Common Intermediate Format

CND

Check Node Decoder

CRA

Clean Random Access

CRC

Cyclic Redundancy Check

EEP

Equal Error Protection

EL

Enhancement Layer

e-PLR

Equivalent PLR

EWF

Expanding Window Fountain

FEC

Forward Error Correction

FPS

Frame Per Second

GOP

Group Of Pictures

HEVC

High Efficiency Video Coding

IDR

Instantaneous Decoding Refresh

IL

interlayer

IL-FEC

IL operation-aided FEC

JCT-VC

Joint Collaborative Team on Video Coding

LDPC

Low Density Parity Check

LLR

Log-Likelihood Ratio

Log-MAP

Logarithmic Maximum A Posteriori

LSSTC

Layered Steered Space-Time Coding

LT

Luby Transform

MIMO

Multiple-Input Multiple-Output

MPEG

Moving Picture Experts Group

MS-STSK

Multi-Set STSK

OFDM

Orthogonal Frequency Division Multiplexing

PLR

Packet Loss Ratio

PSNR

Peak SNR

QCIF

Quarter CIF

RA

Receive Antenna

RF

Radio Frequency

RSC

Recursive Systematic Convolutional

SBC

Short Block Code

SHVC

Scalability extension of HEVC

SM

Spatial Modulation

SNR

Signal-to-Noise Ratio

SSK

Space-Shift Keying

STBC

Space-Time Block Code

STSK

Space-Time Shift Keying

SVC

Scalable Video Coding

TA

Transmit Antenna

TC

Turbo Code

UEP

Unequal Error Protection

V-BLAST

Vertical BLAST

VND

Variable Node Decoder

SECTION I.

Introduction

Driven by the emergence of immersive mobile telecommunication devices, the associated data rate demands have been soaring [1]. Nevertheless, flawless lip-synchronized video transmission in error-prone wireless environments remains a challenge [2]. Explicitly, the unpredictable wireless channel results in high packet loss ratios in wireless video transmission. To satisfy the challenging bit rate requirement of video transmissions, a number of video compression standards have been ratified, such as the H.261 [3], H.262 [4], H.263 [5], Moving Picture Experts Group (MPEG)-4 [6], and H.264/Advanced Video Coding (AVC) [7] standards, as well as the newest H.265/High Efficiency Video Coding (HEVC) [8] standard, which was developed by the Joint Collaborative Team on Video Coding (JCT-VC). The H.265 standard is gradually replacing the H.264 standard as the mainstream coding standard.

A. Scalable Video Coding

However, the afore-mentioned traditional video compression techniques fail to robustly operate under unreliable network conditions in the era of the Internet and of existing radio networks. More explicitly, the Internet based communications are devised to provide the best effort delivery for data, which, however, is incapable of guaranteeing the reliability of the link, hence leading to occasional packet loss due to jitter or network congestions [9]. These unpredictable network fluctuations may severely deteriorate the experience of the clients by requiring more buffering time or directly corrupting the reconstructed image quality. Scalable Video Coding (SVC) has fascinated researchers for more than 20 years, as a potential solution for enhancing the system’s spectral efficiency, ever since the H.262 standard was conceived [10]. However, the construction of scalable profiles has remained an open challenge due to their limited coding efficiency as well as owing to their considerable decoder complexity, until the H.264 scheme was developed, which finally significantly improved the video compression capability attained [7]. Furthermore, scalable video techniques are also employed by the HEVC standard, which are known as the Scalability extension of HEVC (SHVC) [11]. The scalability in SVC can be spatial, temporal and quality based, where the sub-layers of the bit stream in the spatial and temporal domains exhibit different spatial resolution and frame rate, respectively. On the other hand, the quality scalability usually relies on the same spatial and temporal configurations, while differing in terms of the associated image quality. The above-mentioned interlayer correlations become favourable, when the same source content is required by different clients at different resolution or frame rate for example.

A layered video scheme is shown in Figure 1, where the video sequences captured are encoded with the aid of one or more scalability functionalities into the four layers of Figure 1 with the aid of a scalable video encoder. The four layers seen in the example shown in Figure 1 include the Base Layer (BL) and three Enhancement Layers (EL). In Figure 1, the upper layer is encoded progressively with reference to the lower layers. For example, $L_{3}$ shown at the input of the scalable video decoder of Figure 1 is encoded with reference to all of the three lower layers. Figure 1 reveals the basic structure of SVC, which is capable of supporting quite flexible structures, such as for example $L_{3}$ being encoded with reference to $L_{0}$ and $L_{1}$ only, instead of the progressive reliance shown in Figure 1. It is also worth noting that the BL of the encoded bit stream can be extracted and decoded independently by a single-layer video decoder [10]. In this paper, the quality scalability is the only factor considered that affects the video quality of various layers, but the novel solutions proposed in this treatise can be further extended to any of the above-mentioned scalability functionalities.

FIGURE 1. - Architecture of a layered video scheme.
FIGURE 1.

Architecture of a layered video scheme.

B. Unequal Error Protection for Video Streaming

The afore-mentioned multilayer correlation may benefit from Unequal Error Protection (UEP), which offers a stronger protection for the BL than for the ELs [10]. The concept of UEP can be satisfied by carefully allocating resources, such as the coding rate of Forward Error Correction (FEC), the transmit power, the number of bits/symbol in modulation scheme etc. [12]. Furthermore, the first EL can be allocated a higher-quality coding technique than the second EL, assuming that the latter is coded depending on the first EL. Since the conception of UEP was proposed by Masnick and Wolf [13], numerous FEC based UEP capabilities have been investigated, relying for example Convolutional Codes (CC) [14] and Low Density Parity Check (LDPC) codes [15], just to have a few. Furthermore, UEP Turbo Codes (TC) [16], concatenated Short Block Codes (SBC) and Recursive Systematic Convolutional (RSC) Codes [17] as well as Fountain Codes [18] were also studied.

Conventionally, the process of communications is accomplished by the cooperation of all OSI layers [19], where the data sources, such as video and audio, are processed commencing from the Application Layer, progressing to the lower layers. Hence, the UEP scheme employed for transmitting the video stream can be implemented using a single layer or multiple layers, relaying on cross-layer operation, which may require explicit signaling from the upper ISO layers passing down information, such as the number of layers used for layered video streaming, in order to accomplish UEP.

Numerous UEP designs have been conceived for video transmissions in the higher layer with the aid of the FEC codes, such as Raptor codes [20]. Vukobratovic et al. [21] proposed a novel scalable multicast system using the Expanding Window Fountain (EWF) codes for transmitting scalable video streaming over hostile channels exhibiting packet loss events, where the ELs conveyed parity information embedded into them for protecting the more important BL. Ahmad et al. [22] advocated a Luby Transform (LT) coded UEP scheme for scalable video communications, which exhibited a lower Bit Error Ratio (BER) and a low overhead at the expense of an increased coding complexity. Hellge et al. [23] designed a Raptor coded cross-layer operation aided scheme for layered video transmission, which implants the bits of the BL into the ELs in order to recover the lost bits of the BL from the ELs. The enhanced Pro-MPEG COP3 codes were developed by Diaz et al. [24] for double-layer video transmission, where the BL packets are interspersed with the repair packets of the EL for improving the recovery capability of the protection scheme.

Additionally, a number of potent contributions employing UEP for improving the video quality have been conceived for the physical layer, including the channel coding and modulation schemes. Ha and Yim [25] proposed a metric for quantifying the layered video distortion in order to adaptively assign UEP and hence to minimize the error propagation effects imposed by the packet loss events. Moreover, with the aid of using different punctured TCs, Marx and Farah minimized the mean video distortion in order to enhance the quality of video transmission over wireless networks [26], where the redundancy imposed on the compressed stream is non-uniformly distributed between the consecutive video frames. The specific importance of the macroblocks and of the video frame type were taken into account in [27] to allocate different-rate CCs to the video stream. Nasruminallah and Hanzo [17] used RSC codes to achieve UEP for supporting data-partitioned aided AVC video streaming. Huo et al. [28] proposed a RSC coded interlayer (IL) operation-aided FEC (IL-FEC) technique that implants the bits of the BL into the ELs with the aid of taking their modulo-2 addition, where the iterative decoding is invoked for exchanging extrinsic information between two layers when the BL is not successfully decoded. In contrast to channel coding providing UEP, an adaptive hierarchical Quadrature Amplitude Modulation (QAM) based mapping algorithm was proposed in [29] to provide UEP for video streaming by Chang et al. [27], where UEP was achieved by varying the Euclidean distance of 16QAM constellation points according to the encoded video frame type. Xiao et al. [30] designed a cooperative Multiple-Input Multiple-Output (MIMO) scheme relaying on sophisticated power control aided SVC transmissions, where variable-rate LDPC codes and Space-Time Block Codes (STBCs) were employed for providing UEP. Li et al. proposed an Orthogonal Frequency Division Multiplexing (OFDM) assisted hierarchical QAM based UEP arrangement for transmitting appropriately partitioned AVC streams, which avoided mapping important information onto these OFDM sub-carriers, while experienced deep fading [31].

Furthermore, numerous cross-layer aided UEP schemes were also proposed for video streaming, which optimized the video quality with the aid of exchanging signaling across multiple OSI reference layers, including the source compression, channel coding and retransmission technique etc. For example, the data link layer provides both error control and flow control, while the network layer mainly deals with the routing issues. Hence, the collaboration of different layers may be expected to attain higher video quality improvements than a single layer does. Van Der Schaar et al. [32] designed an application/transport/MAC/physical layer based UEP scheme for transmitting the video streams, which dynamically adapts the parameters of the application-layer FEC scheme, the maximum MAC retransmission limit and the size of the packets in order to strike a tradeoff between the throughput, reliability and delay. Huusko et al. [33] proposed a cross-layer operation aided method for transmitting the control information and for optimizing the overall multimedia quality over both wireless and wired IP networks. An LDPC code assisted joint source and channel coding scheme was conceived in [34] for layered video streaming, where the source rate and the channel coding rate were optimally allocated according to the available bandwidth and to the average Packet Loss Ratio (PLR). Furthermore, an application/MAC/physical cross-layer structure was proposed by Khalek et al. [35], which optimized the perceptual quality of layered video streaming. Tseng and Chen [36] devised an optimized application/physical cross-layer allocation scheme for multi-user uplink transmission by designing an objective function to maximize the average Peak Signal-to-Noise Ratio (SNR) (PSNR).

One of the major contributions to FEC based SVC streaming techniques is the IL-FEC technique of [28]. The IL-FEC technique significantly improves the performance of a single IL-FEC protected layer,1 which however restricts the flexibility of its employment since the BL always enjoys priority, when invoking unequal protection. In other words, this technique cannot be readily applied to the ELs. One of our contributions is that we improve the BL performance and maximize the overall benefits of IL-FEC by exploiting the systematic bits of the reference layer.

Apart from the FEC scheme providing UEP, transceiver-based UEP has also been explored in the literature [30], [37]–​[39]. Song and Chen [37] proposed an Adaptive Channel Selection (ACS) based MIMO system that transmits SVC signals, in which each bit stream is periodically switched between multiple antennas and the higher-priority video layer’s bit stream is mapped to higher-SNR channels. Additionally, a joint UEP scheduling scheme that considers both the FEC redundancy and the diversity gain of MIMO systems was proposed in [39]. Generally, transceiver based UEP techniques are usually realized by controlling the modulation mode or by selecting the best sub-channel for conveying the high-priority video bits in order to improve the attainable overall performance.

C. Multi-Set Space-Time Shift Keying

The popularity of transceiver-based UEP can also be attributed to the evolution of advanced wireless communications services, which fuelled the development of MIMO techniques for creating reliable high-rate links [40]–​[42]. More explicitly, MIMO techniques are capable of enhancing the multiplexing gain by invoking the Bell Laboratories Layered Space-Time (BLAST) architecture [43], or the diversity gain by STBCs [44]. Alternatively, a combination of both gains can be attained by Layered Steered Space-Time Coding (LSSTC) [45] that combines the benefits of Vertical BLAST (V-BLAST) and STBCs techniques. Spatial Modulation (SM) advocated in [46] is capable of providing a high normalized throughput at the expense of low complexity. Since only a single antenna is activated, which is selected from multiple antennas, only a single Radio Frequency (RF) chain is required. A concept referred to as STSK was proposed in [47], where instead of activating the indexed antennas, one out of $Q$ space-time dispersion matrices is activated during each STSK symbol for attaining both diversity and multiplexing gains. This dispersion matrix based scheme offers a high design flexibility, since we can optimize both the dispersion matrix employed as well as the number of transmit and receive antennas, hence striking a beneficial design trade-off between the attainable multiplexing and diversity gains. Recently, the novel concept of MS-STSK was proposed [48]–​[51], which is depicted in Figure 2. Explicitly, in MS-STSK, the information is conveyed over two components, namely over the afore-mentioned STSK component as well as over the Antenna Selection Unit (ASU) of Figure 2, which selects a single antenna combination for conveying extra bits, hence leading to enhanced multiplexing gains, while simultaneously attaining the STSK scheme’s diversity gains. More specifically, in the MS-STSK system of Figure 2, the information bits are partitioned into two streams, one for the STSK encoder and the other for the ASU. The STSK encoded symbols are then transmitted over a specific combination of the $N_{t}$ Transmit Antennas (TA) determined by the ASU using $N_{RF}$ RF chains. The MS-STSK arrangement constitutes a scalable generalized scheme that contains STSK, SM and Space-Shift Keying (SSK) as special cases.

FIGURE 2. - Structure of MS-STSK [48].
FIGURE 2.

Structure of MS-STSK [48].

In this treatise, we exploit for UEP video streaming that the three components2 of MS-STSK exhibit different BER performances.

D. Adaptive System for Video Streaming

It has been widely acknowledged that the radio channel imposes a severe challenge on reliable high-speed communication owing to its susceptibility to noise, interference, as well as dispersive fading environments [52]. For a fixed configuration wireless system, although the transmitters are expected to provide a sufficiently high signal level for the far-end receivers, due to fading the instantaneous received signal power fluctuates and routinely falls below the sensitivity of receivers. Hence the information cannot be decoded correctly [53].

In real-time video transmission, this discontinuity is even more obvious. Fortunately, numerous near-instantaneously adaptive techniques have been conceived for improving the robustness of the wireless system [54] by providing users with the best possible compromise amongst a number of contradicting design factors, such as the power consumption of the mobile station, robustness against transmission errors and so forth [55]–​[57]. In order to allow the transceiver to cope with the time-variant channel quality of narrowband fading channels, the concept of Adaptive QAM (AQAM) modem was proposed by Steele and Webb, which provides the flexibility to vary both the BER and the bit rate to suit a particular application [58]. AQAM-aided wireless video transmission was conceived for example in [55] and [58], where the Burst-by-Burst (BbB) AQAM assisted system provides a smoother PSNR degradation when channel SNR is degraded. As a benefit, the subjective image quality erosion imposed by the video artefacts is eliminated and hence the dramatic PSNR degradation of the conventional fixed modulation modes is avoided by the adaptive system.

Furthermore, numerous adaptive systems have been devised for SVC streaming with the aid of UEP, which aim for judiciously allocating resources to strike the best compromise between the reconstructed video quality. Song and Chen [59] proposed a sophisticated power allocation scheme for attaining the maximum throughput. Li et al. [38] designed a scalable resource allocation framework for SVC streaming over MIMO OFDM wireless networks in a multi-user scenario, where the time-frequency resources, the modulation order and the power were adaptively allocated to the users in order to grant them at least a basic viewing experience. Additionally, an Historical Information Aware UEP scheme was conceived in [60] for SHVC video streaming, where the objective function of the current video frame was optimized based on the historical information of its dependent frames in order to adaptively adjust the coding rates of the RSC codes and hence to improve the reconstructed video quality.

As a further advance, Xu et al. [61] conceived an adaptive application/MAC layer based cross-layer protection strategy to convey the layered video stream by dynamically selecting the optimal combination of application-layer FEC and MAC retransmissions according to the near-instantaneous channel states, which resulted in a more graceful video quality degradation across a wide range of channel conditions. Zhang et al. [62] designed an application/MAC/physical layer based adaptive cross-layer video streaming scheme, where the resources were judiciously shared between the source and the channel coders with the aid of the minimum-distortion or minimum-power criterion according to the prevalent channel states. Xu et al. [63] designed link/physical cross-layer based adaptive rate allocation schemes for layered video transmission over wireless Rayleigh channels, which substantially improved the video throughput. A multimedia home gateway was put forward for three-screen television in [64], which dynamically controlled both the Raptor FEC overhead and the layer-switching aided SVC streaming.

E. Our Contributions

Against these backgrounds, we conceive a physical layer based powerful UEP scheme for transmitting SHVC streams over narrowband Rayleigh fading channels, where a novel MS-STSK transceiver is employed for the sake of attaining both multiplexing and diversity gains. Furthermore, the adaptive IL-FEC is designed for alleviating the image quality degradations imposed by the fluctuating wireless channel.

Inspired both by the fact that the EL becomes useless without successfully decoding all the layers it depends on and by the fact that MS-STSK provides different BER performances, we specifically design the MS-STSK transceiver of Figure 2 to conceive an UEP scheme for mitigating the BL corruption probability. Furthermore, an improved cross-layer design is conceived for adaptively adjusting the level of protection according to the near-instantaneous channel SNR for ensuring that every bit is likely to be error free. The novelty of this treatise can be summarized as follows:

  1. We propose an adaptive IL-FEC aided MS-STSK system, which exploits for the first time the UEP MS-STSK sub-channels.

  2. Furthermore, we provide design guidelines for beneficially configuring MS-STSK, which is achieved by carefully adjusting the number of bits input to the three different-sensitivity MS-STSK components, namely to the ASU, to the $L-$ QAM/PSK modulator and to the dispersion matrix generator. Explicitly, we show that at a given throughput the BER of the bits fed into the ASU is better than those of the classic modulator in the STSK block, while the bits conveyed by the choice of the dispersion matrices show the worst BER performance.

  3. We enhance the IL-FEC technique. Explicitly, instead of protecting the BL by implanting its systematic bits into the ELs as in the IL-FEC technique recently proposed in [28], we further improve it.

  4. Finally, we demonstrate for a triple-layer scalable video scenario that our adaptive system exhibits a more graceful video quality degradation as the wireless channel degrades in comparison to the three fixed-mode constituent schemes. This prototype system may be readily extended to other IL-FEC MS-STSK scenarios.

In Section II, we present the details of our proposed adaptive system model and of the related adaptive protection techniques. In Section III, our simulation results are provided as quantitative evidence, while this treatise is concluded in Section IV.

SECTION II.

System Architecture

In this section, we introduce our proposed MS-STSK-assisted adaptive system, which is amalgamated with our IL-FEC technique conceived for wireless layered video transmission. In Table 1 we define the notations used in the remainder of the paper. We continue by introducing the MS-STSK and IL-FEC principles followed by our proposed transmitter and receiver architecture.

TABLE 1 Symbol Definition
Table 1- 
Symbol Definition

A. MS-STSK

Again, the MS-STSK transmitter of Figure 2 consists of two basic components, namely the ASU and the STSK block, where the latter contains a dispersion matrix generator and a classic $L-$ PSK/QAM modulator. The input bit sequence of MS-STSK can be partitioned into two processes, referred to as STSK codeword generation and Antenna Set (AS) selection. Therefore, the MS-STSK input bit stream $b_{\mathit {MS-STSK}}$ can be considered as a combination of the $b_{\textit {ASU}}$ ASU bits and $b_{\textit {STSK}}$ STSK bits, where the latter can be further split into $b_{Q}$ bits mapped to the dispersion matrices and $b_{M}$ bits conveyed by the $L$ -QAM/PSK modulator. Figure 3 illustrates the mapping of the bits, when feeding them into the MS-STSK transceiver. In the example of Figure 3, the input bit stream is partitioned into $b_{\mathit {MS-STSK}}=6$ -Bit Per Channel Use (BPCU) that contains $b_{Q}=2$ bits, $b_{M}=2$ bits and $b_{\textit {ASU}}=2$ bits, in which $b_{Q}$ and $b_{M}$ form $b_{\textit {STSK}}$ .

FIGURE 3. - Bit structure of MS-STSK.
FIGURE 3.

Bit structure of MS-STSK.

Figure 4 shows the MS-STSK BER performance for various configurations, when transmitting over narrowband Rayleigh fading channels. The notation MS-STSK$(N_{t},N_{r},M, T,Q,L)|_{\textit {PSK/QAM}}$ used in the figure indicates that there are $N_{t}$ TAs, $N_{r}$ Receive Antennas (RA), $M$ RF chains, $T$ time slots, $Q$ dispersion matrices and finally, an $L-$ QAM/PSK modulator. Observe in Figure 4 that although all configurations exhibit the same normalized throughput of $b_{\mathit {MS-STSK}}=6$ -BPCU the BER performance of the MS-STSK system depends on the specific configuration of the parameters. It can also be observed from Figure 4 that for a larger number of TAs, the system is capable of providing an improved BER. For example, the group of curves labelled with $N_{t}$ -8 outperforms the group labelled with $N_{t}$ -4, since more antennas are capable of increasing the diversity gains.

FIGURE 4. - BER performance of MS-STSK under various configurations at a fixed throughput of 
$b_{MS-STSK}=6$
-BPCU.
FIGURE 4.

BER performance of MS-STSK under various configurations at a fixed throughput of $b_{MS-STSK}=6$ -BPCU.

In Figure 5, we show the BER performance difference of the MS-STSK transceiver components. Explicitly, Figure 5 depicts the BER performance of the three MS-STSK components using three different sets of configurations. It can be seen in Figure 5 that the ASU is capable of attaining a lower BER than the $L$ -QAM/PSK modulator, while both outperform the dispersion matrix component. It is worth noting that in Figure 5 the curves of the MS-STSK(8, 2, 2, 2, 4, 4)|QAM and MS-STSK(16, 2, 2, 2, 8, 8)|PSK systems constitute specific examples, where we have $b_{\textit {ASU}}=b_{Q}=b_{M}$ . In the MS-STSK(8, 2, 2, 2, 4, 4)|QAM system, we have $b_{\textit {ASU}}=b_{Q}=b_{M}=2$ bits, while the MS-STSK(16, 2, 2, 2, 8, 8)|PSK system relies on $b_{\textit {ASU}}=b_{Q}=b_{M}=3$ bits. However they exhibit different BER performances, as shown in Figure 5. Therefore, we conclude that the MS-STSK transceiver has the potential of providing UEP by feeding the video source bits having different importance into the corresponding MS-STSK modules of Figure 2.

FIGURE 5. - BER performances of different blocks of MS-STSK.
FIGURE 5.

BER performances of different blocks of MS-STSK.

B. Conventional IL-FEC

Figure 6(a) shows the encoder of the IL-FEC technique of [28], where three bit streams $l_{0}$ , $l_{1}$ and $l_{2}$ are input into three FEC Encoders, yielding the corresponding systematic bit streams $s_{0}$ , $s_{1}$ and $s_{2}$ as well as their associated parity streams $p_{0}$ , $p_{1}$ and $p_{2}$ . To improve the performance of $l_{0}$ , the systematic bit sequence $s_{0}$ is scrambled by the interleavers $\pi _{0}$ and $\pi _{1}$ , respectively, whose outputs are then implanted into $s_{1}$ and $s_{2}$ by an XOR operation, hence resulting in two mixed bit streams, namely $s_{01}$ and $s_{02}$ , respectively. The parity streams remain unchanged and they are transmitted together with the processed systematic streams.

FIGURE 6. - The (a) Encoder and (b) Decoder of the conventional IL-FEC technique.
FIGURE 6.

The (a) Encoder and (b) Decoder of the conventional IL-FEC technique.

At the decoder shown in Figure 6(b), there are six inputs, $y_{s_{0}}$ , $y_{p_{0}}$ , $y_{s_{01}}$ , $y_{p_{1}}$ , $y_{s_{02}}$ and $y_{p_{2}}$ , which represent the received version of $s_{0}$ , $p_{0}$ , $s_{01}$ , $p_{1}$ , $s_{02}$ and $p_{2}$ . To iteratively exploit the IL dependencies amongst all FEC coded layers, the classic Variable Node Decoder (VND) and Check Node Decoder (CND) concepts are invoked for exchanging extrinsic information [28], as illustrated in Figure 7. Both the VND as well as CND accept and generate soft information by iteratively exploiting all IL dependencies amongst the FEC coded layers. Explicitly, assuming that $u_{1}$ , $u_{2}$ are random binary variables and that we have $u_{3}=u_{1}\oplus u_{2}$ , the VND sums two Log-Likelihood Ratio (LLR) inputs for generating a more reliable LLR output, which may be formulated as $L_{o3}(u_{1})=L_{i1}(u_{1})+L_{i2}(u_{1})$ . The boxplus operation [65] of $L(u_{3}=u_{1}\oplus u_{2})=L(u_{1})\boxplus L(u_{2})$ contributes to improving the reliability of the bit, given that the reliability of the bits $u_{1}$ and $u_{2}$ is known. The boxplus operation $\boxplus $ was defined by Hagenauer as follows [66]:\begin{align*}&\hspace {-2pc}L(u_{1}\oplus u_{2}) \\=&\log \frac {1+e^{L(u_{1})}e^{L(u_{2})}}{e^{L(u_{1})}+e^{L(u_{2})}} \\\approx&sign(L(u_{1}))\cdot sign(L(u_{2}))\cdot \min (|L(u_{1})|,|L(u_{2}|) \tag{1}\end{align*} View SourceRight-click on figure for MathML and additional features.

under the additional rules of:\begin{equation*} L(u)\boxplus \pm \infty =\pm L(u) \qquad \quad L(u)\boxplus 0=0.\tag{2}\end{equation*} View SourceRight-click on figure for MathML and additional features.

FIGURE 7. - Structures of (a) VND and (b) CND.
FIGURE 7.

Structures of (a) VND and (b) CND.

Thus, in order to generate the soft information representing $u_{3}$ , the CND’s action can be formulated as $L_{o}(u_{3})=L_{i}(u_{1})\boxplus L_{i}(u_{2})$ , assuming that the soft LLRs of $u_{1}$ and $u_{2}$ are known.

Since the systematic bit streams $s_{1}$ and $s_{2}$ are implanted into $s_{0}$ , they cannot be independently decoded by FEC Decoders, as shown in Figure 6(b). Thus, to decode both bit streams successfully, the decoding process has to obey a specific decoding order. Explicitly, it has to decode $s_{0}$ before decoding $s_{1}$ and $s_{2}$ , since decoding $s_{1}$ and $s_{2}$ requires the extrinsic information gleaned from $s_{0}$ . As shown in Figure 6(b), the decoding process is described as follows:

  1. The soft information $y_{s_{0}}$ is fed into VND 0 of Figure 6(b). Since at this stage no extrinsic information is provided by the CNDs, $y_{s_{0}}$ is simply forwarded to VND 1 as $L_{a}(s_{0})$ . Then FEC Decoder 0 generates the extrinsic information $L_{e}(s_{0})$ , which also takes into account the received soft parity $y_{p_{0}}$ , which is passed back to VND 0 via VND 1.

  2. At this stage, VND 0 of Figure 6(b) updates $y_{s_{0}}$ with the aid of extrinsic information and feeds $L_{e}(s_{0})$ to CND 0 and CND 1, hence enabling them to generate the a priori information $L_{a}(s_{1})$ and $L_{a}(s_{2})$ by additionally taking into account $y_{s_{01}}$ and $y_{s_{02}}$ , respectively. FEC Decoder 1 and Decoder 2 of Figure 6(b) receive soft information of $L_{a}(s_{1})$ and $L_{a}(s_{2})$ as well as their associated parity streams $y_{p_{1}}$ and $y_{p_{2}}$ in order to generate the extrinsic information $L_{e}(s_{1})$ and $L_{e}(s_{2})$ , respectively, which is then returned to the CND 0 and CND 1 for updating and enhancing the a priori information of $L_{a}(s_{0})$ .

  3. Assisted by the extrinsic information passed to it by CNDs, VND 0 of Figure 6(b) updates the a priori information furnished for $s_{0}$ and therefore the confidence of the systematic bits $s_{0}$ is enhanced, resulting in an improved BER performance, when the first iteration is completed. The iterative decoding process continues until the maximum number of iterations is reached.

  4. When the affordable number of iterations is exhausted, the decoded bit streams $\hat {l_{0}}$ , $\hat {l_{1}}$ and $\hat {l_{2}}$ are generated by VND 1, VND 2 and VND 3 of Figure 6(b), respectively. To obtain $\hat {l_{0}}$ , VND 1 of Figure 6(b) adds up $L_{a}(s_{0})$ gleaned from VND 0 and $L_{e}(s_{0})$ arriving from FEC Decoder 0, while $\hat {l_{1}}$ and $\hat {l_{2}}$ are generated by VND 2 and VND 3 of Figure 6(b), respectively.

Huo et al. [28], [67] have proven that by iteratively repeating the above decoding phases, the BER performance can be significantly improved, which in turn dramatically improves the decoded image quality in terms of PSNR as well. The iterations are terminated as long as the BL is successfully decoded or the iterations reach the maximum number.

C. Proposed Transmitter Model

At the transmitter shown in Figure 8, the captured video source $U$ is compressed by the SHVC encoder, generating a bit stream that contains multiple layers. Its output bit stream is then demultiplexed into three bit streams each for a specific layer, namely $l_{0}$ , $l_{1}$ and $l_{2}$ corresponding to the BL, first EL and second EL, which are then separately encoded by three identical RSC encoders, as shown in Figure 8. The output of the RSC encoders results in six bit streams, including three systematic streams referred to as $s_{0}$ , $s_{1}$ and $s_{2}$ as well as three parity streams, $p_{0}$ , $p_{1}$ and $p_{2}$ . The adaptive IL-FEC selection unit and decoder of Figure 8 aim for adaptively configuring the IL-FEC scheme to judiciously assign protection to the layers. There are three fixed-mode candidates provided for the adaptive IL-FEC selection unit, namely $Mode 0$ , $Mode 1$ and $Mode 2$ , which aim for protecting the BL, first EL and second EL, respectively. However, due to the inability of the conventional IL-FEC to protect the ELs, we propose an enhanced IL-FEC solution to be detailed later in this section. The modified $Mode 0$ , $Mode 1$ and $Mode 2$ are depicted in Figures 9, 10 and 11, respectively, where the most appropriate mode is activated according to the instantaneous channel SNR, which simply implies switching the implantation mode. Then, the bit streams generated by the adaptive IL-FEC selection unit shown in Figure 8 are then fed into the MS-STSK transceiver of Figure 2, where the BL, first EL and second EL bits are forwarded to the ASU block, to the classic modulator and to the dispersion matrices generator, respectively, bearing in mind their different BER performances shown in Figure 5. Again, a frequency-flat Rayleigh fading plus shadow fading channel is considered.

FIGURE 8. - Architecture of proposed MS-STSK aided adaptive system for scalable video streaming.
FIGURE 8.

Architecture of proposed MS-STSK aided adaptive system for scalable video streaming.

FIGURE 9. - The (a) Encoder and (b) Decoder of the IL-FEC technique for 
$Mode 0$
, which achieves the identical function as that of Figure 6.
FIGURE 9.

The (a) Encoder and (b) Decoder of the IL-FEC technique for $Mode 0$ , which achieves the identical function as that of Figure 6.

FIGURE 10. - The (a) Encoder and (b) Decoder of the IL-FEC technique for enhanced 
$Mode 1$
.
FIGURE 10.

The (a) Encoder and (b) Decoder of the IL-FEC technique for enhanced $Mode 1$ .

FIGURE 11. - The (a) Encoder and (b) Decoder of the IL-FEC technique for enhanced 
$Mode 2$
.
FIGURE 11.

The (a) Encoder and (b) Decoder of the IL-FEC technique for enhanced $Mode 2$ .

The protection modes conceived for the adaptive IL-FEC selection unit are described as follows:

  1. $Mode 0$ : In order to protect the BL, the IL based protection applied to $L_{0}$ in our system is identical to that proposed in [67] and [68], as shown in Figure 9(a), where the dotted line indicates that this implantation function is disabled, since the BL is independent of any other layers. It can be seen in Figure 9 that two copies of the systematic bit stream of the BL $s_{0}$ are interleaved and implanted into $s_{1}$ and $s_{2}$ using the conventional XOR operation according to $s_{01}^{k}=s_{0}^{k}\oplus s_{1}^{k}$ and $s_{02}^{k}=s_{0}^{k}\oplus s_{2}^{k}$ , respectively. This results in the mixed bit-streams of $s_{01}^{k}$ and $s_{02}^{k}$ seen in Figure 9(a). The outputs become $s_{0}$ , $s_{01}$ and $s_{02}$ , complemented by the three corresponding parity bit streams.

  2. $Mode 1$ : Figure 10 illustrates the enhanced $Mode 1$ , where $L_{1}$ becomes the IL-FEC protected layer. Considering the dependency between $L_{1}$ and $L_{0}$ , apart from assigning IL-FEC protection to $L_{1}$ , the robustness of $L_{0}$ is also taken into consideration. Thus, first the systematic bits of $L_{0}$ are interleaved by $\pi _{0}$ and then implanted into $s_{2}$ for the sake of guaranteeing the performance of the BL, yielding the mixed sequence of $s_{02}=s_{0}\oplus s_{2}$ . Then, two copies of the systematic sequence of the protected layer $s_{1}$ are interleaved by $\pi _{1}$ and $\pi _{2}$ , as shown in Figure 10, and then implanted into $s_{0}$ and $s_{02}$ , respectively. This operation results in generating two new sequences, namely $s_{10}=s_{1}\oplus s_{0}$ and $s_{102}=s_{1}\oplus s_{02}$ , while the other copy of bit stream $s_{1}$ is output directly. As shown in Figure 10, the IL-FEC-processed systematic bit streams become $s_{10}$ , $s_{1}$ and $s_{102}$ .

  3. $Mode 2$ : The process of assigning IL-FEC protection to $L_{2}$ is quite similar to that of the enhanced $Mode 1$ , with the IL-FEC protected layer becoming $L_{2}$ instead of $L_{1}$ , as shown in Figure 11. Note that instead of guaranteeing the BER performance of $L_{0}$ as in $Mode 1$ , the system provides extra protection for $L_{1}$ by implanting the bit stream of $s_{1}$ into that of $s_{0}$ . Therefore, the system first interleaves $s_{1}$ and then implants it into $s_{0}$ , hence resulting in a new bit sequence of $s_{10}=s_{1}\oplus s_{0}$ . Then, as observed in Figure 11, two copies of the systematic bit stream $s_{2}$ are interleaved and implanted into $s_{10}$ and $s_{1}$ , hence resulting in the new mixed streams of $s_{210}=s_{2}\oplus s_{10}$ as well as $s_{21}=s_{2}\oplus s_{1}$ , while the other copy remains unchanged. Finally, the new outputs representing the systematic bits become $s_{210}$ , $s_{21}$ and $s_{2}$ .

We emphasize that all the three proposed modes can be realized using the same circuit, as shown in Figures 9–​11, where the IL-FEC protected layer is adaptively selected from these three modes according to the instantaneous channel SNR. Thus, the complexity order of our enhanced adaptive enhance IL-FEC system is identical to that of its conventional counterpart in [28]. Since for $Mode 0$ the bit stream of the BL is independent of the other layers, the additional implantation is no longer required and hence it is deactivated, as shown in Figure 9(a). Additionally, the three RSC Decoders shown in Figures 8–​11 are essentially identical and capable of achieving the same function.

The adaptive IL-FEC selection unit of Figure 8 selects the appropriate IL scheme to assign the most appropriate protection based on one of the above three modes by taking into account the estimated channel SNR $\gamma $ , as follows:\begin{equation*} IL-FEC = \begin{cases} Mode 0 & \gamma \leq f_{0}, \\ Mode 1 & f_{0}< \gamma \leq f_{1}, \\ Mode 2 & f_{1}< \gamma, \end{cases} \tag{3}\end{equation*} View SourceRight-click on figure for MathML and additional features. where the threshold $f_{i}$ is defined in terms of the Equivalent PLR (e-PLR) $P(L_{i})_{e}$ of the corresponding layer $L_{i}$ . Owing to the dependency between the BL and ELs, the PLR of the EL also has to take into account that of its reference counterparts. For example, the value of $f_{0}$ is decided by $P(L_{0})_{e}$ , which is equivalent to $P(L_{0})$ , while $P(L_{1})_{e}$ requires both $P(L_{0})$ as well as $P(L_{1})$ and determines the threshold value of $f_{1}$ . Thus, the e-PLR can be treated as the conventional PLR, where its reference layers are error free. For a given layer $L_{i}$ , $P(L_{i})_{e}$ can be expressed as:\begin{equation*} P(L_{i})_{e}= \begin{cases} P(L_{0}) & i = 0, \\ \sum _{m=1}^{i}P(L_{m})\prod _{n=0}^{m-1} [1-P(L_{n})] \\ \qquad +P(L_{0}) & i > 0. \end{cases} \tag{4}\end{equation*} View SourceRight-click on figure for MathML and additional features. The values of the threshold $f_{i}$ are set to $\gamma $ dB, namely to the specific SNR at which the PLR remains ‘just’ below a certain threshold value of $P(L_{i})_{e}< =P_{t}$ . Explicitly, the value of $f_{0}$ is the SNR where we have $P(L_{0})_{e}< =P_{t}$ , while that of $f_{1}$ can be found when $P(L_{1})_{e}< =P_{t}$ .

We emphasize that $s_{i,j}$ is different from $s_{j,i}$ , when implanting systematic bits of $L_{i}$ into those of $L_{j}$ , unless $L_{i}$ and $L_{j}$ have the same number of bits. When the length of $l_{i}$ is higher than that of $l_{j}$ , a feasible solution is to merge several bits of $l_{i}$ by an XOR operation before implanting it into $l_{j}$ for the sake of bit-length matching. By contrast, if $l_{j}$ has more bits, some of the bits in $l_{i}$ may have to be used more than once. More details about unequal-length cross-layer interleaving techniques can be found in [28].

The remaining parity bits of all the three layers are then multiplexed with the newly generated systematic codes, leading to the three new bit streams of $x_{0}$ , $x_{1}$ and $x_{2}$ , which are then fed into the MS-STSK transceiver of Figure 8.

Again, Figure 5 has demonstrated that in general the BER performance of the ASU of MS-STSK is better than that of the classic QAM/PSK sub-channel as well as that of the dispersion matrix-based sub-channel, hence resulting in the lowest BER among these components at a given SNR. Therefore, we feed $x_{0}$ to the ASU sub-channel for guaranteeing the best protection for the BL. We then feed $x_{1}$ into the $L$ -QAM/PSK modulator and $x_{2}$ into the dispersion matrix sub-channel, hence providing inherent UEP for the three video layers. More details about the MS-STSK scheme can be found in [48]–​[51]. The encoded MS-STSK codewords are then transmitted over narrowband Rayleigh fading channels.

D. Proposed Receiver Model

In this section, we detail the decoding process of our proposed adaptive wireless video system. As illustrated in Figure 8, the MS-STSK transceiver first receives the signal symbols and translates them into the LLR representation of the MS-STSK codewords by the Logarithmic Maximum A Posteriori (Log-MAP) algorithm. Then, the soft MS-STSK codewords are forwarded to a demultiplexer to generate the three bit streams, namely $y_{0}$ , $y_{1}$ and $y_{2}$ of Figure 8, respectively, which are forwarded to the adaptive IL-FEC decoder.

Figures 10 and 11 depict the enhanced IL-FEC $Mode 1$ and $Mode 2$ , while no modification is performed for $Mode 0$ . Hence the encoder and decoder devised for $Mode 0$ shown in Figure 9 are identical to those in [28] and [67].

Here we present $Mode 1$ in detail, as shown in Figure 10, which implants $s_{0}$ into $s_{2}$ for guaranteeing the best performance for the BL, and then implants $s_{1}$ into $s_{0}$ and $s_{2}$ , respectively, where the IL-FEC protected layer that requires IL-FEC protection is set to $L_{1}$ . Then, the received MS-STSK codewords are decoded with the aid of the Log-MAP algorithm. The Log-MAP detector outputs soft bits that are then demultiplexed by the DEMUX block of Figure 8 to generate the systematic information, $y_{s_{10}}$ , $y_{s_{1}}$ and $y_{s_{102}}$ as well as their corresponding parity bits $y_{p_{0}}$ , $y_{p_{1}}$ and $y_{p_{2}}$ for the layers $L_{0}$ , $L_{1}$ and $L_{2}$ , respectively. Due to the fact that the systematic bits $s_{0}$ and $s_{1}$ are implanted into $s_{2}$ while $s_{1}$ is implanted into $s_{0}$ , in $Mode 1$ , the received version of $s_{1}$ , namely $y_{s_{1}}$ , can be decoded independently, while decoding $s_{0}$ requires information about $s_{1}$ and decoding $s_{2}$ requires the assistance of both $s_{0}$ and $s_{1}$ . Thus, the decoding process of $Mode 1$ must follow the sequential order: $y_{s_{1}}$ , $y_{s_{01}}$ and $y_{s_{102}}$ . Therefore, as depicted in Figure 10(b), the decoding process obeys the following steps:

  1. The systematic bits of the IL-FEC protected layer, namely $y_{s_{1}}^{k}$ in the example of Figure 10, is fed into VND 0 to generate the a priori information of $L_{a}(s_{1}^{k})$ with the aid of the extrinsic information gleaned from both CND 0 and CND 1. Furthermore, $L_{a}(s_{1}^{k})$ is generated by simply duplicating the soft value of $y_{s_{1}}$ , since at the first iteration no extrinsic information is provided by the CNDs. Then, $L_{a}(s_{1}^{k})$ is input to the RSC Decoder 1 of Figure 10(b) along with its corresponding parity $y_{p_{1}}^{k}$ , hence generating the extrinsic information $L_{e}(s_{1}^{k})$ . This extrinsic LLR is then fed back to VND 0 via VND 1 to produce extrinsic information for CND 0 and CND 1 with the aid of $y_{s_{1}^{k}}$ , as seen in Figure 10(b).

  2. CND 0 of Figure 10(b) receives the extrinsic information of $L_{e}(s_{1})$ from VND 0 and then extracts $y_{s_{0}}^{k}$ from $y_{s_{10}}^{k}$ . The extrinsic bits obtained are then fed into VND 2 to generate $L_{a}(s_{0}^{k})$ , which is equal to $y_{s_{0}}^{k}$ , because the bits in $L_{2}$ have not as yet been processed, hence no extra information is provided by CND 1 for VND 2. The extrinsic information $L_{e}({s_{0}^{k}})$ generated by RSC Decoder 0 is sent back to VND 2 and CND 0 for providing extra information both for CND 1 and VND 0.

  3. Assisted by the output of VND 0 and VND 2, CND 1 becomes able to extract $L_{a}(s_{2}^{k})$ from $y_{102}^{k}$ and then feeds it to the RSC Decoder 2 of Figure 10(b) via VND 4 in order to generate the extrinsic information $L_{e}(s_{2}^{k})$ . As seen in Figure 10(b), CND 1 uses this extrinsic information together with $y_{s_{102}}^{k}$ and either $L_{e}(s_{1}^{k})$ or $L_{e}(s_{2}^{k})$ to generate feedback information for VND 0 and VND 2, respectively, in order to prepare for the next iteration.

  4. Then, the decoding process of Figure 10(b) starts again from VND 0. However, in contrast the procedure in Step 1), the extrinsic information gleaned from CND 0 and CND 1 is no longer zero, since the related soft information has been exchanged among the three RSC Decoders of Figure 10(b), hence improving the soft information $L_{a}(s_{1}^{k})$ . Similarly, the a priori information $L_{a}(s_{0}^{k})$ is enhanced by exploiting the extrinsic information of VND 2, hence resulting in an enhanced BER performance for $L_{0}$ . After two iterations, VND 1, 3 and 4 output the final LLR generated by considering both $L_{a}(s_{i}^{k})$ and $L_{e}(s_{i}^{k})$ , which is then hard-decoded to $\hat {l_{1}}$ , $\hat {l_{0}}$ and $\hat {l_{2}}$ .

In our simulations, the number of iterations between the RSC decoders is fixed to 2, based on our observation that the improvements gleaned by more iterations of the IL-FEC technique become negligible, despite the decoding complexity being increased linearly.

The above process specifically details the philosophy of enhanced $Mode 1$ of Figure 10, when $L_{1}$ is the IL-FEC protected layer. The decoding process of $Mode 0$ is slightly different, where the systematic bits of the BL $s_{0}$ are implanted into $s_{1}$ and $s_{2}$ , respectively, as shown in Figure 9. Thus, the received stream $y_{s_{0}}$ can be decoded independently, while $y_{s_{01}}$ and $y_{s_{02}}$ can be parallelly decoded with the aid of $L_{e}({s_{0}})$ . The decoding process of enhanced $Mode 2$ is fairly similar to that of enhanced $Mode 1$ . Since in enhanced $Mode 2$ the systematic bit stream $s_{2}$ is implanted into $s_{1}$ , and $s_{1}$ and $s_{2}$ are implanted into $s_{0}$ , as shown in Figure 11(a), they have to obey a certain decoding order similar to that of $Mode 1$ at the receiver: namely $s_{2}$ , $s_{1}$ and $s_{0}$ , as depicted in Figure 11(b). The multiplexer MUX of Figure 8 reorganizes the three bit streams, namely $\hat {l_{0}}$ , $\hat {l_{1}}$ and $\hat {l_{2}}$ , and the SHVC decoder reconstructs the video $\hat {U}$ . By iteratively exchanging soft extrinsic information with the RSC decoder of the other layers, as seen in Figures 9, 10 and 11, the IL-FEC protected layer benefits from an improved BER and PSNR performance.

SECTION III.

System Performance

In this section, we present our simulation results for characterizing the proposed MS-STSK assisted adaptive IL-FEC aided system. Again, the SHVC reference software SHM is utilized for encoding the Foreman video clip. The Group Of Pictures (GOP) is set to 4 for all video simulations, which means that the Instantaneous Decoding Refresh (IDR)/Clean Random Access (CRA) frames are inserted every 4 frames. No B frames are used in our simulations due to the fact that they are prone to propagating inter-frame video distortions. Similarly, the bidirectional predictive B frames propagate video distortion and increase the latency, hence preventing flawless lip-synchronization. As a consequence, the video sequence in our simulations simply consists of I frames and P frames. Furthermore, we disable the spatial and temporal scalability functionalities, when encoding the video sequence into three different-quality layers, where the quality of the layers is controlled by setting the bit rate for each layer. The bit stream of each video frame is mapped to an MS-STSK packet, whose length is defined in Table 3. The receiver checks if the received packet has any bit errors using the associated Cyclic Redundancy Check (CRC). If the CRC detection fails, the corrupted frames are dropped and replaced by “frame-copy” based error concealment.

TABLE 2 Bit Allocations of the MS-STSK Configurations
Table 2- 
Bit Allocations of the MS-STSK Configurations
TABLE 3 Parameters for Transmitting the Employed $Foreman$ and $Football$ Sequences
Table 3- 
Parameters for Transmitting the Employed 
$Foreman$
 and 
$Football$
 Sequences

Apart from the above source configuration, the FEC-aided MS-STSK transceivers are configured as follows. The three RSC codecs are configured by the binary generator ipolynomials of [1101 1111]. Additionally, the MS-STSK(4, 2, 2, 2, 8, 4)|QAM and MS-STSK(8, 2, 2, 2, 16, 8)|PSK configurations are used by the MS-STSK transceiver. The bit allocations of two MS-STSK configurations are listed in Table 2. The MS-STSK transceiver configured as MS-STSK(4, 2, 2, 2, 8, 4)|QAM has 4 TAs and 2 RAs as well as 2 RF chains, hence resulting in a 6-bit $b_{\textit {text}}{MS-STSK}$ sequence that consists of 1 bit for the ASU, 2 bits for the $L-$ QAM/PSK modulator and 3 bits for the dispersion matrices. As for the configuration of MS-STSK(8, 2, 2, 2, 16, 8)|PSK, there are 8 TAs, 2 RAs and 2 RF chains, yielding a 9-bit $b_{\mathit {MS-STSK}}$ sequence associated with 2 bits for the ASU, 3 bits for the modulator and 4 bits for the dispersion matrix index.

We first show the PLR and the PSNR versus channel SNR performance improvement attained by using the proposed UEP MS-STSK scheme in Figure 8. The associated configuration parameters can be found in Table 3, except that in order to highlight the improvement achieved by MS-STSK only, the RSC codecs were deactivated in this investigation and so was the channel’s shadow fading.

Figure 12 shows the e-PLR versus channel SNR of both the Equal Error Protection (EEP) and of the UEP schemes. A part of a video sequence, namely the Foreman sequence, which has 30 frames and is scanned at 30 Frame Per Second (FPS), is encoded into three layers, having bit rates of 126.7, 259.7 and 385.3 kbps respectively and using the MS-STSK configuration of MS-STSK(4, 2, 2, 2, 8, 4)|QAM, as shown in Table 3. In the EEP, the number of bits in each layer is split into three streams on average, which are then fed into the three modules of MS-STSK seen in Figure 2, while for the UEP the bits of different layers are fed into the three corresponding MS-STSK modules. Therefore, compared to EEP, the ASU of MS-STSK in UEP only contains the bits of the BL of the scalable video stream, namely $L_{0}$ , while the $L-$ QAM/PSK modulator only has the bits of $L_{1}$ . Finally, the dispersion matrix index only has the bits of $L_{2}$ . The system considered here extracts the sub-layers and feeds them into different blocks, when using the MS-STSK(4, 2, 2, 2, 8, 4)|QAM configuration of MS-STSK. Figure 12 compares the e-PLR of UEP to that of EEP, when applying the MS-STSK(4, 2, 2, 2, 8, 4)|QAM configuration. It can be seen from Figure 12 that when the PLR of $L_{0}$ reaches 5%, the SNR of $L_{0}$ in the UEP mode is improved by about 2 dB compared to that of EEP, albeit at the expense of degrading the higher layers’ e-PLR. However, this degradation imposed on $L_{1}$ and $L_{2}$ does not explicitly affect their e-PLR performance.

FIGURE 12. - e-PLR versus channel SNR for the Foreman test sequence associated with MS-STSK(4, 2, 2, 2, 8, 4)|QAM.
FIGURE 12.

e-PLR versus channel SNR for the Foreman test sequence associated with MS-STSK(4, 2, 2, 2, 8, 4)|QAM.

Figure 13 provides the evidence that for the Foreman test sequence the image quality (PSNR) can be improved by feeding the BL bits to the ASU of Figure 2, where the first EL is fed into the modulator, while the second EL is used for dispersion matrix selection, in order to construct our basic UEP scheme for the layered video stream. Observe in Figure 13 that although both the UEP and EEP schemes yield an identical overall image quality (PSNR) at the channel SNR of 17 dB, it can be seen from Figure 12 that at this channel SNR the $P(L_{0})$ of UEP is below 1%, while that of EEP is above 5%. In other words, the UEP technique is capable of providing a less impaired service and hence a higher subjective quality for the clients than its EEP counterpart, even if they achieve the same PSNR performance. Both simulation results show that at low channel SNRs, the UEP scheme attains a higher reconstructed image quality, since the ASU block conveying the $L_{0}$ bits provides a better protection than the EEP scheme, hence resulting in a more robust channel for transmitting the BL stream. It can be seen from the Figure 13 that in the high channel SNR region, the PSNR of the EEP is similar to that of its UEP counterparts in both configurations, where the channel SNR is sufficiently high for ensuring that the PLR of the BL in the EEP scheme also remains negligible, even though it is inferior to that of the UEP scheme.

FIGURE 13. - PSNR versus channel SNR for the Foreman test sequence associated with MS-STSK(4, 2, 2, 2, 8, 4)|QAM.
FIGURE 13.

PSNR versus channel SNR for the Foreman test sequence associated with MS-STSK(4, 2, 2, 2, 8, 4)|QAM.

In order to characterize our system, we compare the performance of our adaptive system that invokes the enhanced IL-FEC technique to that of the conventional adaptive system as well as to that of the UEP scheme and to that of the EEP scheme. Explicitly, the difference between two adaptive schemes is presented in the adaptive IL-FEC selection unit, as shown in Figure 8, while the other parameters set for the video sequences, the RSC codecs and the MS-STSK transceiver are identical. For the enhanced adaptive system, $Mode 0$ , $Mode 1$ and $Mode 2$ are illustrated in Figures 9, 10 and 11, respectively, while those of the conventional adaptive counterpart only use the schematic of Figure 6, associated with implanting $s_{1}$ into $s_{0}$ and $s_{2}$ for conventional $Mode 1$ and implanting $s_{2}$ into $s_{0}$ and $s_{1}$ for conventional $Mode 2$ . It is worth noting that $Mode 0$ is identical for both adaptive schemes, while the difference between two adaptive schemes occurs in $Mode 1$ and $Mode 2$ , respectively. Additionally, both UEP and EEP are realized by the MS-STSK transceiver, both of which use only RSC codecs instead of the IL-FEC techniques, hence no adaptivity activated in these two schemes. We consider a $P_{t}$ value of 5% as recommended in [56] in order to adaptively determine the mode-switching thresholds, since the perceptual image degradation imposed on the reconstructed video at the receiver by a PLR value of less than or equal to 5% becomes fairly minor. By recalling Equation (4), the e-PLR can be expressed as:\begin{align*} P(L_{0})_{e}=&P(L_{0}), \tag{5}\\ P(L_{1})_{e}=&[1-P(L_{0})] \cdot P(L_{1})+ P(L_{0}).\tag{6}\end{align*} View SourceRight-click on figure for MathML and additional features.

The thresholds set for selecting modes are given in Table 4, where the terms enhanced and conventional represent the specific type of the IL-FEC technique applied for the adaptive system. Again, all other parameters specifying the systems can be found in Table 3. Note that no threshold value is set for the EEP and UEP schemes, since the IL-FEC mode is deactivated for both schemes.

TABLE 4 Thresholds for the Systems
Table 4- 
Thresholds for the Systems

Figure 14 depicts the probability density function of three enhanced modes versus channel SNR in the Foreman test scenario, where at a channel SNR of $\gamma < 4.9$ dB, $Mode 0$ is the frequently used mode, while at $\gamma >9.2$ dB, the probability of using $Mode 2$ is higher than that of $Mode 0$ and $Mode 1$ .

FIGURE 14. - PDF of three enhanced modes versus channel SNR for the Foreman test sequence.
FIGURE 14.

PDF of three enhanced modes versus channel SNR for the Foreman test sequence.

Figure 15 compares the e-PLR and the image quality (PSNR) performances of our enhanced scheme to other counterparts for both the Foreman and Football clips.

FIGURE 15. - The comparison between the enhanced and the conventional adaptive system for the Foreman (left column) and the Football (right column) test sequences, where the first, second and third rows present 
$P(L_{0})_{e}$
, 
$P(L_{1})_{e}$
 and the image quality (PSNR) versus channel SNR, respectively.
FIGURE 15.

The comparison between the enhanced and the conventional adaptive system for the Foreman (left column) and the Football (right column) test sequences, where the first, second and third rows present $P(L_{0})_{e}$ , $P(L_{1})_{e}$ and the image quality (PSNR) versus channel SNR, respectively.

Figures 15(a) and 15(b) depict the $P(L_{0})$ value versus the channel SNR, where we can observe that $P(L_{0})$ of our enhanced $Mode 1$ and $Mode 2$ outperforms the other counterparts. Additionally, the enhanced adaptive scheme is capable of protecting the BL almost as well as $Mode 0$ . This is because the enhanced $Mode 1$ shown in Figure 10 also takes into account the BL by implanting the $s_{0}$ bit stream into its $s_{2}$ counterpart and hence the mode-switching between $Mode 0$ and $Mode 1$ only imposes a modest degradation on the BL compared to the conventional adaptive scheme. However, the difference between two adaptive schemes becomes more distinguishable in Figures 15(c) and 15(d). The channel SNRs required to achieve $P(L_{1})_{e}\leq \%5$ and ≤ %1 for the Foreman test sequence can be found in Table 5, where the enhanced $Mode 1$ substantially outperforms the other modes and results in a 1.4 dB SNR gain compared to its conventional counterpart, hence improving the image quality (PSNR) performance of the corresponding adaptive system. Furthermore, since our enhanced $Mode 2$ also takes into account $L_{1}$ , as shown in Figure 11, it also yields a better $P(L_{1})_{e}$ value than its conventional counterpart, hence imposing a low $P(L_{1})_{e}$ degradation during mode-switching of the adaptive system. Therefore, it can be observed from Figures 15(c) and 15(d) that the $P(L_{1})_{e}$ value of our enhanced adaptive system is more close to that of the enhanced $Mode 1$ , while an obvious video degradation is observed for its conventional adaptive counterpart.

TABLE 5 The Channel SNR Required for $P(L_{1})_{e}$ of the $Foreman$ Test Sequence
Table 5- 
The Channel SNR Required for 
$P(L_{1})_{e}$
 of the 
$Foreman$
 Test Sequence

Figures 15(e) and 15(f) show the image quality (PSNR) performance versus the channel SNR for two video sequences. Observe that the enhanced $Mode 1$ and $Mode 2$ are capable of yielding a better PSNR performance than their conventional fixed counterparts, namely the conventional $Mode 1$ and $Mode 2$ . Observe in Figure 15(e) that to reach a PSNR of 38 dB the enhanced $Mode 1$ outperforms its conventional counterpart by about 0.8 dB, while a 2.2 dB power reduction is achieved by the enhanced $Mode 2$ compared to the conventional $Mode 2$ . It can be seen from Figures 15(e) and 15(f) that our enhanced adaptive scheme is capable of yielding an improved PSNR over a large proportion of the channel SNR range over that of its corresponding fixed-mode counterparts. A subjective video-quality comparison of the benchmarkers recorded for the Football test sequence is presented in Figure 16, where both the adaptive schemes ensure an unimpaired subjective video quality, but the proposed adaptive system is capable providing a better video quality, as shown in Figure 15(f).

FIGURE 16. - Comparison of frames at the channel SNR of 11 dB for the Football sequence. The five columns (from left to right) represent the original video, the enhanced adaptive scheme, the conventional adaptive scheme, the UEP scheme and the EEP scheme, respectively.
FIGURE 16.

Comparison of frames at the channel SNR of 11 dB for the Football sequence. The five columns (from left to right) represent the original video, the enhanced adaptive scheme, the conventional adaptive scheme, the UEP scheme and the EEP scheme, respectively.

SECTION IV.

Conclusions

In this paper, we proposed an MS-STSK assisted adaptive system for wireless video streaming, which adaptively selects the most appropriate enhanced IL-FEC schemes with the aid of the pre-recorded thresholds, as shown in Figure 8. Observe in Figures 4 and 5 that our MS-STSK transceiver is capable of providing UEP by mapping the video bit streams to different MS-STSK components, namely to the ASU, to the classic modulator and to the dispersion matrix generator, according to their importances.

Additionally, the enhanced IL-FEC technique of Figures 10 and 11 was conceived for protecting the ELs, which extends the philosophy of the conventional IL-FEC technique to multiple ELs, hence improving the PLR performance, as shown in Figure 15. Our simulation results shown in Figure 15 illustrate that with the aid the enhanced IL-FEC technique, our proposed adaptive system is capable of providing the best video quality.

1.
Here we introduce the concept of the IL-FEC protected layer as the layer that has to be improved by embedding its bits into some of the other layers.
2.
The STSK component in MS-STSK can be further split to the classic L- QAM/PSK modulator and the dispersion matrix generator, as shown in Figure 2 .

References

References is not available for this document.