Journals & Magazines >IEEE Access >Volume: 4

Multi-Armed Bandit Channel Access Scheme With Cognitive Radio Technology in Wireless Sensor Networks for the Internet of Things

Multi-Armed Bandit Channel Access for Cognitive Radio Based Wireless Sensor Networks.

Abstract:

The wireless sensor network (WSN) is one of the key enablers for the Internet of Things (IoT), where WSNs will play an important role in future internet by several applic...Show More

Topic: Future Networks: Architectures, Protocols, and Applications

Metadata

Abstract:

The wireless sensor network (WSN) is one of the key enablers for the Internet of Things (IoT), where WSNs will play an important role in future internet by several application scenarios, such as healthcare, agriculture, environment monitoring, and smart metering. However, today's radio spectrum is very crowded for the rapid increasing popularities of various wireless applications. Hence, WSN utilizing the advantages of cognitive radio technology, namely, cognitive radio-based WSN (CR-WSN), is a promising solution for spectrum scarcity problem of IoT applications. A major challenge in CR-WSN is utilizing spectrum more efficiently. Therefore, a novel channel access scheme is proposed for the problem that how to access the multiple channels with the unknown environment information for cognitive users, so as to maximize system throughput. The problem is modeled as I.I.D. multi-armed bandit model with M cognitive users and N arms (M<;N). In order to solve the competition and the fairness between cognitive users of WSNs, a fair channel-grouping scheme is proposed. The proposed scheme divides these channels into M groups according to the water-filling principle based on the learning algorithm UCB-K index, the number of channels not less than one in each group and then allocate channel group for each cognitive user by using distributed learning algorithm fairly. Finally, the experimental results demonstrate that the proposed scheme cannot only effectively solve the problem of collision between the cognitive users, improve the utilization rate of the idle spectrum, and at the same time reflect the fairness of selecting channels between cognitive users.

Topic: Future Networks: Architectures, Protocols, and Applications

Multi-Armed Bandit Channel Access for Cognitive Radio Based Wireless Sensor Networks.

Published in: IEEE Access ( Volume: 4)

Page(s): 4609 - 4617

Date of Publication: 17 August 2016

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2016.2600633

Funding Agency:

No metrics found for this document.

Contents

SECTION I.

Introduction

In future, WSNs are expected to be integrated into the Internet of Things [1], where reconfigurable, flexible, and intelligent sensors join the Internet dynamically, and use it to collaborate and accomplish their tasks for a wide range of applications various domains [2]–[16], big data applications, Internet of Things, E-commerce, multimedia medical device, virtual reality & augmented reality, and environment monitoring, etc. Multimedia applications of WSNs require substantially low power consumption, higher bandwidth and more available spectrum [17]–[23]. However, today’s radio spectrum is very crowded for rapid increasing popularities of various wireless applications. Therefore, cognitive radio technology stands as a key and spectrum-efficient communication approach for resource-constrained wireless sensor networks and future wireless network [24]–[26]. When cognitive users, namely, sensor nodes in wireless sensor networks access the spectrum competitively, in order to make effective usage of the spectrum and meet the throughput demand for multimedia applications, effective mechanisms are required to coordinate cognitive users’ behaviors (transmission power control, spectrum access, et al.) [27]–[35]. At the same time, the wireless environment is gradually to be more complicated, which makes it more difficult for cognitive users to obtain the complete network environment parameters information in the wireless network system. Also, the upcoming 5G networking architectures with a super high spectrum utilization and ultra-low power consumption, tends to support massive devices with limited bandwidth and minimum content delay, which would be a big challenge for the current crowded spectrum. Hence, distributed channel access under unknown environment information has been a hot research topic in cognitive radio based wireless sensor network.

Currently, many of the existing literatures (see [36]–[41]) have studied the problem of distributed spectrum access under unknown environment statistical information. Among these literatures, [36] and [37] have researched on the Upper Confidence Bound (UCB) index algorithm with single cognitive user based on multi-armed bandit model, while [38]–[41] focused on the multi-user situation, which transform the unknown system model into Non-Bayesian multi-armed bandit model. Considering the single cognitive user situation is too simple, the current trend is to study multi-armed bandit model with multi-user and multi-channel. Meanwhile, all cognitive users in the system do not know the channel environment statistical information. There will be collisions when more than one cognitive user accesses the same channel simultaneously. To address the problem, [42] proposes an adaptive random access scheme to reduce the collision among cognitive users. The priority access scheme and fair access scheme are also adopted in literature [43] and [44], which could effectively avoid the collision among cognitive users. Based on the UCB1 index algorithm, the literature [45] puts forward the d-UCB4 algorithm, which is suitable for the multi-user scenarios. Literature [46] proposes a distributed online learning access scheme called DSLA, which can be widely applied to distribution learning system and its channels have a variety of channel available probability.

Most of the current literatures, such as [47] and [48] are limited that one cognitive user can only select one channel each time. In this situation, when the selected channel is detected to be in a busy state, the data will not be transmitted and it has to wait for the next slot. However, there may be other channels that are not occupied presently, which will give rise to a waste of spectrum resources. For this problem, we propose a scheme which is based on channel grouping. In each time slot, cognitive users can sense all of the channels in the group one by one, until finding an idle channel or all of the channels are perceived once. Here, these channels are divided into several groups according to the water-filling principle based on the modified UCB-tuned index. On the other hand, many literatures don’t take the fairness among the cognitive users into consideration. In this paper, we introduce the fairness access scheme based on the channel grouping method. Finally, the experimental results show that the proposed scheme can get a logarithmic regret with respect to time slot, and increase the selected number of the channels that has a small idle probability, so as to ensure the fairness among the cognitive users.

This paper is organized as follows. Section II introduces the system model. Then, section III introduces the proposed scheme of this paper, namely, the distributed learning algorithm, and the principle of channels grouping and the access principle. Section IV gives the scheme simulation and analysis. At last, we summarize the paper and point out the future research.

SECTION II.

System Model

As shown in Fig. 1, a CR-WSN coexisting with a licensed system is considered. The time is slotted and the $N$ channels are independent of each other.

FIGURE 1.

The structure of the system.

Show All

We denote time slot as $n$ , and denote the reward (i.e. throughput) arbitrary of channel $i(1\leq i\leq N)$ as random variable $X_{i} \left ({ n }\right )$ . The mean of random variable $X_{i} \left ({ n }\right )$ is $\mu _{i} =E[X_{i} ]$ , where we normalize $X_{i} \in [{0,1}]$ . Different channels have different means $\mu _{i} $ , which is unknown for these cognitive users and there is no information exchange or communication between cognitive users.

In the distributed CR-WSN, each cognitive user can select one opportunity channel from the $N$ channels. The channel set is $N\triangleq \{ {1,\ldots ,N} \}$ . These channels are divided into time slots one by one. As shown in Fig 2, each channel has two states: $s_{i} ( n )=1$ and $s_{i} ( n )=0$ . $s_{i} ( n )=1$ represents that in each time slot, the channel is not occupied by the licensed user (i.e. the channel is available), otherwise, $s_{i} ( n )=0$ (i.e. the channel is not available). The set of the channels state space is $S=\{ {s_{1} ( n ),s_{2} ( n ),\ldots ,s_{i} ( n )\ldots ,s_{N} ( n )} \}$ . For cognitive user $i$ , the closed-form normalized instantaneous reward at slot $n$ is $X_{i} ( n )=s_{i} ( n )\times B,X_{i} \in [{0,1}]$ . Here let $B=1$ . These channels are independent and identically distributed with different Bernoulli parameters. The objective function is to maximize the total throughput or minimize the regret. So we formulate the problem as I.I.D. MAB model. In the model, cognitive users do not know the channel state information, and they have to estimate and predict channel availability by exploring and learning. The practical scheme is $\pi $ . The scheme performance is evaluated by its regret value, which means that the difference between the obtained reward under an ideal environment and the obtained practical reward by taking some strategies. Then we can get the mathematical expression of the regret:\begin{equation} \Re ^{\pi }( {\Theta ;n} )=n\sum \limits _{i\in \mathrm {O}_{M}^{\ast }} {\mu _{i} -E^{\pi }} \left [{ {\sum \limits _{t=1}^{n} {S_{\pi \left ({ t }\right )} ( t )}} }\right ] \end{equation} View Source

FIGURE 2.

Channel model in the system.

Show All

Where, $\pi $ is the adopted strategy, $\Theta =\left \{{ {\mu _{i} ,1\leq i\leq N} }\right \}$ is the channels availability set, $o_{M}^{\ast } $ is the $M$ channels set with $M$ largest reward, $E$ stands for the calculation of expectation value, $S_{\pi ( t )} ( t )$ stands for the total throughput by using the scheme, its mathematical expression is as shown below:\begin{equation} S_{\pi \left ({ t }\right )} \left ({ t }\right )=\sum \limits _{m=1}^{M} {\sum \limits _{i=1}^{N} {X_{i} ( t )\times \Omega _{m,i} ( t )}} \end{equation} View Source

In formula (2), $\Omega _{m,i} \left ({ t }\right )$ reflects the collision between cognitive users: in slot $t$ , when cognitive user $m$ is the only one to use channel $i$ , then $\Omega _{m,i} \left ({ t }\right )=1$ , otherwise $\Omega _{m,i} \left ({ t }\right )=0$ . For facilitation, we let $V_{_{m,i}}^{\pi } \left ({ n }\right )=\sum \limits _{t=1}^{n} {\Omega _{m,i} \left ({ t }\right )} $ , on account of $E\left [{ {X_{i} \left ({ t }\right )} }\right ]=\mu _{i} $ . Therefore,we have the expression of total regret:\begin{equation} \Re ^{\pi }\left ({ {\Theta ;n} }\right )=n\sum \limits _{i\in {{ O}}_{M}^{\ast }} {\mu _{i} -} \sum \limits _{m=1}^{M} {\sum \limits _{i=1}^{N} {\mu _{i} \times E\left [{ {V_{_{m,i}}^{\pi } \left ({ t }\right )} }\right ]}} \end{equation} View Source

SECTION III.

The Principle of the Proposed Scheme

A. Distributed Learning Algorithm Based on Modified UCB-Tuned Index

Due to the channel environment statistical information is completely unknown to cognitive users, we need to predict the channel information by learning algorithm. The proposed distributed learning algorithm in this paper is briefly called UCBT-K that modified based on the tuned upper confidence bound (UCBT) index, which introduces a variance factor in the index and it is generalized form of the modified UCBT. The UCBT-K can select arbitrarily channel with the K-th [49] largest index value. The process of UCBT-K on $M$ cognitive users can be obtained as follows.

Initialization: Cognitive user $m$ senses all the channelsone by one. At time slot $t$ , the instantaneous throughput of channel $i$ is $\hat {\mu }_{m,i} \left ({ t }\right )=X_{i} \left ({ t }\right )$ . The sensed number of times of channel $i$ is $T_{i} \left ({ t }\right )=1$ .
Channel Estimation: Calculate the improved UCB-tuned index for each channel according to the mathematical formula (4):\begin{align}&\hspace {-0.6pc} \hat {\mu }_{m,i} \left ({ {t-1} }\right )+\sqrt {\frac {\log \left ({ t }\right )}{T_{i} \left ({ {t-1} }\right )}} \notag \\[-2pt]&\times \sqrt {\min \left ({ {\frac {1}{4},\delta _{i}^{2}\left ({ {t-1} }\right )+\sqrt {2\log n/T_{i} \left ({ {t-1} }\right )}} }\right )}\qquad \end{align} View Source
Where, $\delta _{i}^{2}\left ({ t }\right )$ is the reward variance of the channel $i$ after t time slots. Then, we construct a channel set $o_{K} $ that contains the K largest index values.
Channel Selection: Select a channel $k$ from $o_{K}$ based on the following mathematical formula (5):\begin{align} k=&\arg \min \limits _{i\in o_{K}} \hat {\mu }_{m,i} \left ({ {t-1} }\right )+\sqrt {\frac {\log \left ({ t }\right )}{T_{i} \left ({ {t-1} }\right )}} \notag \\[-2pt]&\times \,\sqrt {\min \left ({ {\frac {1}{4},\delta _{i}^{2}\left ({ {t-1} }\right )+\sqrt {2\log \left ({ t }\right )/T_{i} \left ({ {t-1} }\right )}} }\right )}\notag \\[-2pt] {}\end{align} View Source
Update Information: if the selected channel is idle, update the average reward $\hat {\mu }_{m,k} \left ({ t }\right )$ of cognitive user $m$ .\begin{equation} \hat {\mu }_{m,k} \left ({ t }\right )=\frac {\hat {\mu }_{m,k} \left ({ {t-1} }\right )\times T_{k} \left ({ {t-1} }\right )+X_{k} \left ({ t }\right )}{T_{k} \left ({ {t-1} }\right )+1} \end{equation} View Source
And $T_{i} \left ({ t }\right )=T_{i} \left ({ {t-1} }\right )+1$ , otherwise, there are $\hat {\mu _{k}^{m}} \left ({ t }\right )=\hat {\mu _{k}^{m}} \left ({ {t-1} }\right )$ and $T_{i} \left ({ t }\right )=T_{i} \left ({ {t-1} }\right )$ .

B. The Principle of Channels Grouping

There are $M$ distributed cognitive users and $N$ channels in the system ($M<N$ ). The principle of channel grouping can be shown as the following Fig. 3.

FIGURE 3.

The channel sensing based on channel group.

Show All

Firstly, we let the channel with the largest index value in the set $o_{M}$ as the channel $A$ of the first channel-group ($G_{1} =\left \{{ A }\right \}$ ). Call the channel with second largest index value as the channel $A$ of the second group ($G_{2} =\left \{{ A }\right \}$ ), and the rest of the channels contained in $o_{M} $ are grouped in the same way. Then, we will acquire $M$ channel-group ($G{}_{1},G_{2} ,\ldots G_{M}$ ). Next, allocate the rest $N-M$ channels evenly to the $M$ channel-group according to iterative water-filling principle: Put the channel with $\left ({ {M+1} }\right )$ -th largest index value to the current channel-group with smallest sum of index values denoted as channel-$B$ , the next channel added in this group is called channel-$C$ . It is also follow the same principle from the channel with $\left ({ {M+2} }\right )$ -th largest index value to channel with $N$ -th largest index value, which forms a new channel-group set $\boldsymbol {o}_{M}^{\ast } =\left \{{ {\begin{array}{l} G_{1} =\{ {A,B\ldots } \},G_{2} =\{ {A,B,\ldots } \} \\ ,\ldots \ldots ,G_{M} =\{ {A,B,C\ldots } \} \\ \end{array}} }\right \}$ . The set $\boldsymbol {o}_{M}^{\ast } $ still includes $M$ channel-group, so the $M$ cognitive users have a corresponding channel-group. For each group has at least one channel, therefore, we assume that the number of channel $N$ is larger than cognitive users $M$ . Assume that the time slot consists of three parts: the detection, the access process and the decision. We assume the first part and the third part is quite short and can be ignored. In this article we allow a cognitive user to sense more than one channel in a time slot.

C. The Access of Channel-Groups With Fairness

Since there are multiple cognitive users in the system, it is necessary to find a reasonable channel access scheme to avoid collisions among cognitive users. In literature [42], the priority access scheme is adopted to avoid collisions. First, it assigns a rank of priority access for each cognitive user. Then each cognitive user chooses the corresponding channel according to its rank. For example, for cognitive user 1 its priority is the first, so it can choose the channel with largest index value every time. In term of the cognitive user 2, its priority is the second, it will always select the one with second largest index value. And the other cognitive users select in the same way. As a result, each cognitive user has a corresponding channel. Although this scheme can effectively avoid the collision among cognitive users, it does not reflect the fairness among cognitive users. Therefore, to solve the problem of unfairness, we propose an access scheme which is based on channel grouping, as shown in Fig. 4.

FIGURE 4.

Channel-group access scheme with fairness.

Show All

As mentioned above, each cognitive user has a corresponding channel group. For arbitrary cognitive user $m$ , at time slot $t$ , the selected channel-group $G_{j} $ needs to meet the formula:\begin{equation} G_{j} =\left ({ {\left ({ {m+t} }\right )\bmod M} }\right )+1 \end{equation} View Source

The algorithm implementation of the proposed scheme can be shown in Fig. 5, which includes three parts: distributed learning, channel grouping and fair access scheme.

FIGURE 5.

The flow chart of the proposed scheme.

Show All

Lemma 1:

The existence of regret value, assume that the parameter $l$ is positive integer, $n$ is the total number of time slot, $t,1\leq t\leq n$ is any arbitrary time slot. $K_{m}^{\ast } \left ({ t }\right )$ is the optimal channel with the $K$ -th largest index for the cognitive user $m$ . $Q_{i}^{m} \left ({ n }\right )$ is the number of times that cognitive user $m$ chooses the non-optimal channel $i$ (i.e. $i\ne K_{m}^{\ast } \left ({ t }\right ))$ . $C_{t,n_{i}} =\sqrt {\frac {\left ({ {L+1} }\right )\ln \left ({ t }\right )}{n_{i}}} $ , $\hat {\mu }_{i,n_{i}} $ is the average reward of the channel $i$ . When $\mu _{i} <\mu _{K_{m}^{\ast }} $ , there are:\begin{align} \Pr \left \{{ {\hat {\mu }_{j\left ({ t }\right ),n_{j\left ({ t }\right )}} \leq \mu _{j\left ({ t }\right )} -C_{t,n_{j\left ({ t }\right )}}} }\right \}\leq t^{-4} \\[3pt] \Pr \left \{{ {\hat {\mu }_{i,n_{i}} \geq \mu _{i} +C_{t,n_{i}}} }\right \}\leq t^{-4} \end{align} View Source

Proof:

As $\mu _{i} <\mu _{K_{m}^{\ast }} $ , there is at least one channel $j\left ({ t }\right )\in \boldsymbol {o}_{K_{m}^{\ast }}^{\ast }$ . If the cognitive user chooses the channel $i$ at slot $t$ , then we can get:\begin{align} \hat {\mu }_{j\left ({ t }\right ),Q_{j_{\left ({ t }\right )} (t-1)}^{m}} +C_{t-1,Q_{j_{\left ({ t }\right )}}^{m} (t-1)} \leq \hat {\mu }_{i,Q_{i}^{m} (t-1)} +C_{t-1,Q_{i_{\left ({ t }\right )}}^{m} (t-1)}\notag \\[3pt] {}\end{align} View Source

$Q_{i}^{m} \left ({ n }\right )$ can be expressed as below:\begin{align}&\hspace {-1.5pc}Q^{m}_{i} \left ({ n }\right )=1+\mathop \sum \limits _{t=N+1}^{n} \parallel \left \{{ {I_{i} \left ({ t }\right )} }\right \}\leq l \notag \\[3pt]&\qquad \quad ~~+\mathop \sum \limits _{t=N+1}^{n} \parallel \{I_{i} \left ({ t }\right ),Q^{m}_{i} \left ({ {t-1} }\right )\geq l\} \notag \\[3pt]\leq&l+\mathop \sum \limits _{t=N+1}^{n} \left ({ {\begin{array}{l} \parallel \{I_{i} \left ({ t }\right ),\mu _{i} <\mu _{K_{m}^{\ast }} ,Q^{m}_{i} \left ({ {t-1} }\right )\geq l\} \\[3pt] +\parallel \{I_{i} \left ({ t }\right ),\mu _{i} >\mu _{K_{m}^{\ast }} ,Q^{m}_{i} \left ({ {t-1} }\right )\geq l\} \\[3pt] \end{array}} }\right )\notag \\[3pt] {}\end{align} View Source

Where, if the number of $Q_{i}^{m} \left ({ n }\right )$ increases by one at time slot $t$ , $I_{i} \left ({ t }\right )=1$ , otherwise $I_{i} \left ({ t }\right )=0$ . And when $x$ is true, $\parallel \left ({ x }\right )=1$ , otherwise, let $\parallel \left ({ x }\right )=0$ .

According to the (10), there are:\begin{align}&\hspace {-1.2pc}\mathop \sum \limits _{t=N+1}^{n} \parallel \{I_{i} \left ({ t }\right ),\mu _{i} <\mu _{K_{m}^{\ast }} ,Q_{i}^{m} (t-1)\geq l\} \notag \\\leq&\mathop \sum \limits _{t=N+1}^{n} \parallel \left \{{ {\begin{array}{l} \hat {\mu }_{j( t ),Q_{j_{( t )}}^{m} (t-1)} +C_{t-1,Q_{j_{( t )}}^{m} (t-1)} \\ \leq \hat {\mu }_{i,Q_{i}^{m} ( {t-1} )} +C_{t-1,Q_{i}^{m} ( {t-1} )} ,Q_{i}^{m} ( {t-1} )\geq l \\ \end{array}} }\right \} \notag \\\leq&\mathop \sum \limits _{t=N+1}^{n} \parallel \Biggl \{{ \min \limits _{0<n_{j(t)<t}} \hat {\mu }_{j\left ({ t }\right ),n_{j(t)}} +C_{t-1,n_{j(t)}} }\notag \\&\qquad \qquad {\leq \max \limits _{l\leq n_{i} <t} \hat {\mu }_{i,n_{i}} +C_{t-1,n_{i}} }\Biggr \} \notag \\\leq&\mathop \sum \limits _{t=1}^{\infty } \mathop \sum \limits _{n_{j\left ({ t }\right )=1}}^{t-1} \mathop \sum \limits _{n_{i} =l}^{t-1} 1\{\hat {\mu }_{j\left ({ t }\right ),n_{j\left ({ t }\right )}} +C_{t,n_{j\left ({ t }\right )}} \leq \hat {\mu }_{i,n_{i}} +C_{t,n_{i}} \} \end{align} View Source

From the (12), for $\hat {\mu }_{j\left ({ t }\right ),n_{j\left ({ t }\right )}} +C_{t,n_{j\left ({ t }\right )}} \leq \hat {\mu }_{i,n_{i}} +C_{t,n_{i}} $ , at least one of the following expressions is true:\begin{align} \mu _{j(t)}<&\mu _{i} +2C_{t,n_{i}} \\ \hat {\mu }_{j\left ({ t }\right ),n_{j\left ({ t }\right )}}\leq&\mu _{j\left ({ t }\right )} -C_{t,n_{j(t)}} \\ \hat {\mu }_{i,n_{i}}\geq&\mu _{i} +C_{t,n_{i}} \end{align} View Source

For $l\geq \frac {8\ln n}{\Delta _{K_{m}^{\ast } ,i}^{2}}$ , there are:\begin{align}&\hspace {-2pc}\mu _{j(t)} -\mu _{i} -2C_{t,n_{i}} \notag \\\geq&\mu _{K_{m}^{\ast }} -\mu _{i} -2\sqrt {\frac {2\Delta _{K_{m}^{\ast } ,i}^{2} \ln t}{8\ln n}} \notag \\\geq&\mu _{K_{m}^{\ast }} -\mu _{i} -\Delta _{K_{m}^{\ast } ,i} =0 \end{align} View Source

So it is clearly false for the inequality (13). Similarly, we can get the inequality (14) and the inequality (15) hold.

According to the Chernoff-Hoeffding bound [46] for (14) and (15), we can get:\begin{align} \Pr \left \{{ {\hat {\mu }_{j\left ({ t }\right ),n_{j\left ({ t }\right )}} \leq \mu _{j\left ({ t }\right )} -C_{t,n_{j\left ({ t }\right )}}} }\right \}\leq e^{-4\ln t}=t^{-4} \\ \Pr \left \{{ {\hat {\mu }_{i,n_{i}} \geq \mu _{i} +C_{t,n_{i}}} }\right \}\leq e^{-4\ln t}=t^{-4} \end{align} View Source

Lemma 2:

When $\mu _{i} >\mu _{K_{m}^{\ast }} $ , similar to lemma 1, the following formula hold:\begin{align} \Pr \left \{{ {\hat {\mu }_{i,n_{i}} \leq \mu _{i} -C_{t,n_{i}}} }\right \}\leq t^{-4} \\ \Pr \left \{{ {\hat {\mu }_{h\left ({ t }\right ),n_{h\left ({ t }\right )}} \geq \mu _{h\left ({ t }\right )} +C_{t,n_{h\left ({ t }\right )}}} }\right \}\leq t^{-4} \end{align} View Source

Proof:

In the case of $\mu _{i} >\mu _{K_{m}^{\ast }} $ , we assume the channel $i$ is selected at $t$ slot by the cognitive user $m$ . There are two possible situations:

If $\boldsymbol {o}_{K_{m}^{\ast }} =\boldsymbol {o}_{K_{m}^{\ast }}^{\ast } $ , there are:\begin{align}&\hspace {-2pc}\hat {\mu }_{i,Q_{i}^{m} \left ({ {t-1} }\right )} -C_{t-1,Q_{i}^{m} \left ({ {t-1} }\right )} \notag \\\leq&\hat {\mu }_{K_{m}^{\ast } ,Q_{K_{m}^{\ast }}^{m} \left ({ {t-1} }\right )} -C_{t-1,Q_{K_{m}^{\ast }}^{m} \left ({ {t-1} }\right )} \end{align} View Source
If $\boldsymbol {o}_{K_{m}^{\ast }} \ne \boldsymbol {o}_{K_{m}^{\ast }}^{\ast } $ , there is at least one channel $h\left ({ t }\right )\notin \boldsymbol {o}_{K_{m}^{\ast }}^{\ast } $ in the set $\boldsymbol {o}_{K_{m}^{\ast }} $ , we can get:\begin{align}&\hspace {-2pc}\hat {\mu }_{i,Q_{i}^{m} \left ({ {t-1} }\right )} -C_{t-1,Q_{i}^{m} \left ({ {t-1} }\right )}\notag \\\leq&\hat {\mu }_{h(t),Q_{h(t)}^{m} \left ({ {t-1} }\right )} -C_{t-1,Q_{h(t)}^{m} \left ({ {t-1} }\right )} \end{align} View Source

For the above two situations, $\Phi _{K_{m}^{\ast }} $ is a set of channels with $K-th$ largest index. We let $\boldsymbol {o}_{K_{m}^{\ast } -1}^{\ast } =\boldsymbol {o}_{K_{\vphantom {R_{j}}m}^{\ast }}^{\ast } -\Phi _{K_{m}^{\ast }} $ , the cognitive user $m$ chooses channel $i$ at $t$ slot. There is a channel $h\left ({ t }\right )\notin \boldsymbol {o}_{K_{m}^{\ast } -1}^{\ast } $ that meets the following formula:\begin{align} \hat {\mu }_{i,Q_{i}^{m} ( {t-1} )} -C_{t-1,Q_{i}^{m} ( {t-1} )} \leq \hat {\mu }_{K_{m}^{\ast } ,Q_{K_{m}^{\ast }}^{m} ( {t-1} )} -C_{t-1,Q_{K_{m}^{\ast }}^{m} ( {t-1} )}\notag \\ {}\end{align} View Source

In the same way, we can get:\begin{align} \hat {\mu }_{i,Q_{i}^{m} \left ({ {t-1} }\right )} -C_{t-1,Q_{i}^{m} \left ({ {t-1} }\right )}\leq&\hat {\mu }_{h(t),Q_{h(t)}^{m} \left ({ {t-1} }\right )} -C_{t-1,Q_{h(t)}^{m} \left ({ {t-1} }\right )}\notag \\ \\ \mu _{i}<&\mu _{h\left ({ t }\right )} +2C_{t,n_{i}} \\ \hat {\mu }_{i,n_{i}}\leq&\mu _{i} -C_{t,n_{i}} \\ \hat {\mu }_{h(t),n_{h(t)}}\geq&\mu _{h\left ({ t }\right )} +C_{t,n_{h(t)}} \end{align} View Source

From (25)–(27), there are at least one true inequality, for $l\geq \frac {8\ln n}{\Delta _{K,i}^{2}}$ , there are:\begin{equation} \mu _{i} -\mu _{h(t)} -2C_{t,n_{i}} \geq \mu _{i} -\mu _{K_{m}^{\ast }} -\Delta _{K_{m}^{\ast } ,i} \geq 0 \end{equation} View Source

According to the inequality (28), the formula (25) is obviously wrong, which means the others are true.

According to the Chernoff-Hoeffding bound for the formula (26) and (27), we can get the following formula, respectively:\begin{align} \Pr \left \{{ {\hat {\mu }_{i,n_{i}} \leq \mu _{i} -C_{t,n_{i}}} }\right \}\leq e^{-4\ln t}=t^{-4} \\ \Pr \left \{{ {\hat {\mu }_{h\left ({ t }\right ),n_{h\left ({ t }\right )}} \geq \mu _{h\left ({ t }\right )} +C_{t,n_{h\left ({ t }\right )}}} }\right \}\leq e^{-4\ln t}=t^{-4} \end{align} View Source

Theorem 1:

The expected regret under the proposed scheme is bounded and grows logarithmically with time slot going on.

Proof:

Because the selection times of the cognitive user $m$ to the non-optimal channel $i$ can be expressed as:\begin{align}&\hspace {-1.2pc} Q^{m}_{i} \left ({ n }\right )=1+\mathop \sum \limits _{t=N+1}^{n} \parallel \left \{{ {I_{i} \left ({ t }\right )} }\right \} \notag \\\leq&l+\mathop \sum \limits _{t=N+1}^{n} \parallel \{I_{i} \left ({ t }\right ),Q^{m}_{i} \left ({ {t-1} }\right )\geq l\} \notag \\\leq&l+\mathop \sum \limits _{t=N+1}^{n} \left ({ {\begin{array}{l} \parallel \{I_{i} \left ({ t }\right ),\mu _{i} <\mu _{K_{m}^{\ast }} ,Q^{m}_{i} \left ({ {t-1} }\right )\geq l\} \\ +\parallel \{I_{i} \left ({ t }\right ),\mu _{i} >\mu _{K_{m}^{\ast }} ,Q^{m}_{i} \left ({ {t-1} }\right )\geq l\} \\ \end{array}} }\right )\notag \\ {}\end{align} View Source Where, if the number of $Q_{i}^{m} \left ({ n }\right )$ increases by one at time slot $t$ , let $I_{i} \left ({ t }\right )=1$ , otherwise $I_{i} \left ({ t }\right )=0$ . And when $x$ is true, $\parallel \left ({ x }\right )=1$ , otherwise, let $\parallel \left ({ x }\right )=0$ .

Here we need to discuss two cases: namely, $\mu _{i} <\mu _{K_{m}^{\ast }} $ and $\mu _{i} >\mu _{K_{m}^{\ast }} $ . According to the above lemma 1 and lemma 2, we can get the expectation about $Q_{i}^{m} \left ({ n }\right )$ :\begin{align}&\hspace {-1.2pc}E\left [{ {Q_{i}^{m} \left ({ n }\right )} }\right ] \notag \\\leq&\left [{ {\frac {8\ln n}{\Delta _{\min ,i}^{2}}} }\right ]\notag \\&+\,\sum \limits _{t=1}^\infty {\sum \limits _{n_{j\left ({ t }\right )} =1}^{t-1} {\sum \limits _{n_{i} =y}^{t-1} {\left ({ {\begin{array}{l} \Pr \left \{{ {\hat {\mu }_{j\left ({ t }\right ),n_{j\left ({ t }\right )}} \leq \mu _{j\left ({ t }\right )} -C_{t,n_{j\left ({ t }\right )}}} }\right \} \\ +\Pr \left \{{ {\hat {\mu }_{i,n_{i}} \geq \mu _{i} +C_{t,n_{i}}} }\right \} \\ \end{array}} }\right )}}} \notag \\&+\,\sum \limits _{t=1}^\infty {\sum \limits _{n_{i} =y}^{t-1} {\sum \limits _{n_{h\left ({ t }\right )} =1}^{t-1} {\left ({ {\begin{array}{l} \Pr \left \{{ {\hat {\mu }_{i,n_{i}} \leq \mu _{i} -C_{t,n_{i}}} }\right \} \\ +\Pr \left \{{ {\hat {\mu }_{h(t),n_{h(t)}} \geq \mu _{h\left ({ t }\right )} +C_{t,n_{h(t)}}} }\right \} \\ \end{array}} }\right )}}} \notag \\\leq&\frac {8\ln n}{\Delta _{\min ,i}^{2}}+1+2\sum \limits _{t=1}^\infty {\sum \limits _{n_{j\left ({ t }\right )} =1}^{t-1} {\sum \limits _{n_{i} =1}^{t-1} {2t^{-4}}}} \notag \\\leq&\frac {8\ln n}{\Delta _{\min ,i}^{2}}+1+\frac {2\pi ^{2}}{3} \end{align} View Source

For any cognitive user $m$ , the regret can be expressed as follows:\begin{align} \Re ^{\pi }( {\Theta ,m,n} )\leq&\sum \limits _{i=1}^{n} {Q_{i}^{m} ( n )} \times \mu _{\max } \!+\!\sum \limits _{h\ne m} {\sum \limits _{i\in o_{m}^{\ast }} {Q_{h}^{m}}} ( n )\mu _{i}\qquad \\ \Re ^{\pi }\left ({ {\Theta ,n} }\right )=&\sum \limits _{m=1}^{M} {\Re ^{\pi }\left ({ {\Theta ,m,n} }\right )} \notag \\\leq&M\left ({ {N+M\times \left ({ {M-1} }\right )} }\right )\notag \\&\times \left ({ {\frac {8\ln n}{\Delta _{\min }}+1+\frac {2\pi ^{2}}{3}} }\right )\times \mu _{\max } \end{align} View Source

SECTION IV.

Simulation and Analysis

A. The Complexity Analysis of Regret

According to the conclusion of Theorem 1, when $n /{\ln n}\geq C$ , ($C$ is a constant which is related to cognitive users and channels, $C={8\left ({ {N+M} }\right )}/{\Delta _{\min }^{2}}+\left ({ {1+{2\pi ^{2}}/ 3} }\right )N\,+\,M)$ . We can get a bound about loose learning regret value when the total time slots $n$ increases to a very large number.\begin{align} \boldsymbol {R}^{\pi }\left ({ {\boldsymbol {\Theta },n} }\right )=&M\left ({ {N-M} }\right )\left ({ {\frac {8\ln n}{\Delta _{\min }}+1+\frac {2\pi ^{2}}{3}} }\right )\mu _{\max } \notag \\&+\,M^{3}\left ({ {1+\frac {2\pi ^{2}}{3}} }\right )\mu _{\max } \end{align} View Source

From the formula (35), the scale of the proposed scheme is $O\left ({ {M\left ({ {N-M} }\right )\ln n} }\right )$ . By research on the existing literature, we can obtain a statistical table about regret complexity growth as shown in table 1:

TABLE 1 The Complexity of Regret

It can be seen from table 1, the complexity of regret function produced by these regular schemes, such as $\rho ^{rand}$ and time division fair sharing (TDFS), is three times power about the number $M$ of cognitive users and the number of channels $N$ . However these schemes adapted in this paper obtain a lower complexity, which is two times power function. Therefore we will further study the performance between the comparison schemes (the DLP scheme, the DLF scheme) and the proposed scheme through experimental simulation.

B. Simulation Results

In this experiment, we assume that these channels are mutually independent and identically distributed. The availability of channels obeys Bernoulli process with different parameters, while it is unknown to the $M$ cognitive users. There are only two channel states: busy (i.e. $s_{i} \left ({ n }\right )=0)$ or idle (i.e. $s_{i} \left ({ n }\right )=1)$ . When the channel is idle, the throughput is 1, otherwise it is 0. We run the simulation results for 100 times, and then take a statistical average. There are four cognitive users (i.e. $M=4$ ) and nine channels (i.e. $N=9$ ) with $\Theta =\left \{{ {0.9,0.8,\ldots ,0.2,0.1} }\right \}$ . The accumulate regret of the proposed scheme compared with the priority access scheme (DLP), the fair access scheme (DLF) and the regret under the three strategies is shown as Fig. 6, it can be obviously known that the regret is uniformly logarithmic with time-slot and the proposed scheme can yield lower regret.

$FIGURE 6. - The total regret strategies ( $n=10^{4}$ ).$

FIGURE 6.

The total regret strategies ($n=10^{4}$ ).

Show All

In the case of $M=3$ and $N=5$ with $\Theta =\left \{{ {0.9,\ldots ,0.6,0.5} }\right \}$ , the total time slot is $n=10^{4}$ . As shown in Fig. 6, when the number of slots increases to 4000 approximately, the simulation results will converge, namely, cognitive users can estimate the channel information after 4000 slots approximately by using the proposed scheme. We verify the fairness under the three strategies. The results that the number of each channels selected by each cognitive user are shown in table 2–table 4, respectively, which clearly reflect that the proposed scheme can significantly increase the channel utilization rate with less idle probability.

TABLE 2 The Case of Each Channel Selected With DLP

TABLE 3 The Case of Each Channel Selected With DLF

TABLE 4 The Case of Each Channel Selected With Proposed Scheme

At the same time, we provide the bar chart about the channel selected situation running on three strategies. The expression can be shown in Fig. 7, Fig. 8, Fig. 9, respectively.

$FIGURE 7. - The channel selection under DLP ( $n=10^{4}$ ).$

FIGURE 7.

The channel selection under DLP ($n=10^{4}$ ).

Show All

$FIGURE 8. - The channel selection under DLF ( $n=10^{4}$ ).$

FIGURE 8.

The channel selection under DLF ($n=10^{4}$ ).

Show All

$FIGURE 9. - The channel selection under proposed scheme ( $n=10^{4}$ ).$

FIGURE 9.

The channel selection under proposed scheme ($n=10^{4}$ ).

Show All

Fig. 7 shows that there is a serious unfairness among various cognitive users. From Fig. 8, we can find the fairness between different cognitive users has been reflected, but the usage of the channels with smaller idle probability is low, such as channel $i=4$ and $i=5$ . In Fig. 9, the proposed scheme not only reflects the fairness between various cognitive users, but also takes more advantages of channels with smaller idle probability. By fairness of channels selection, all the channels are selected more times, which can avoid the situation that only one or a small number of channels are selected while most of the others are unselected. The more channels are selected, the more opportunities for cognitive users to access channels, therefore, the efficiency of channel usage is improved.

SECTION V.

Conclusion

In this paper, a multiple cognitive users and multiple channels system is considered in CR-WSNs. The channels statistical information is completely unknown for the cognitive users. To solve the problem of channel selection in CR-WSNs, a novel fair access scheme with channel grouping is proposed. Firstly, an online learning method called modified UCB-K based on the well-known UCB is used. Then these channels are divided into several groups according to the principle of channel grouping, which can improve the usage rate of the idle spectrum. Besides, we adopted the distributed learning with fairness to avoid collision between cognitive users and at the same time embody the fairness between cognitive users. Finally, the simulation results also show the superiority of the proposed scheme. With the development of the Internet of Things, the scale of networks is larger and larger. How to obtain and deal with the large amount of channel information for larger scale wireless sensor networks may be a promising research topic, and schemes based on wireless big data may be adopted.

Usage

Select a Year

View as

Total usage sinceAug 2016:4,290

Year Total:24

Data is updated monthly. Usage includes PDF downloads and HTML views.

Citations

Crossref^®

Search for
Citations in
Google Scholar^®

References is not available for this document.

Multi-Armed Bandit Channel Access Scheme With Cognitive Radio Technology in Wireless Sensor Networks for the Internet of Things

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

System Model

The Principle of the Proposed Scheme