Loading web-font TeX/Math/Italic
A Grid-Based Clustering Algorithm via Load Analysis for Industrial Internet of Things | IEEE Journals & Magazine | IEEE Xplore

A Grid-Based Clustering Algorithm via Load Analysis for Industrial Internet of Things

Open Access

Prolonging the network lifecycle of the industrial WSNs is concerned in GCA. The simulation result shows that GCA prolongs the network lifetime effectively based on the l...

Abstract:

In the Industrial Internet of Things (IIoT), wireless sensor network (WSN) technology makes devices that communicate with each other. The information integrated from mult...Show More
Topic: Convergence of Sensor Networks, Cloud Computing, and Big Data in Industrial Internet of Thing

Abstract:

In the Industrial Internet of Things (IIoT), wireless sensor network (WSN) technology makes devices that communicate with each other. The information integrated from multiple data sources will be transformed into productivity. However, the clusters close to the base station take a considerable load over multi-hop transmission, and in this case, the lifetime of the industrial WSN is restricted. To solve this problem, a grid-based clustering algorithm via load analysis for IIoT is presented in this paper. First, the network load is quantitatively analyzed and then a load model is constructed. Furthermore, a set of expressions is deduced to indicate the network load distribution. It is concluded that the number of delivered packets in each level is related to the grid length at that level. The optimal grid length is obtained by solving polynomials to achieve the uniform energy consumption of nodes at each level. Finally, the network is partitioned into unequal grids according to the optimal cluster size and all the nodes of a grid are formed into a cluster. Results of the experiments show that compared with ACT, ER-HEED, and RUHEED, our algorithm balances energy depletion effectively and extends the whole network lifetime.
Topic: Convergence of Sensor Networks, Cloud Computing, and Big Data in Industrial Internet of Thing
Prolonging the network lifecycle of the industrial WSNs is concerned in GCA. The simulation result shows that GCA prolongs the network lifetime effectively based on the l...
Published in: IEEE Access ( Volume: 6)
Page(s): 13117 - 13128
Date of Publication: 25 January 2018
Electronic ISSN: 2169-3536

SECTION I.

Introduction

Industrial internet of things (IIoT) brings the 4th industrial revolution and makes the whole world move to Industry 4.0. It will increase the production efficiency by 25%. Industrial wireless sensor network (IWSNs) is the underlying support technique of IIoT. Meanwhile, the development of cloud computing [1], [2] and mobile communicating technology [3] have made wireless sensor networks (WSNs) ubiquitous, numerous fields such as environmental monitoring [4], [5], industrial production [6], smart home [7], [8], traffic monitoring [9], [10] and health care [11] have employed WSNs to collect data periodically from a monitored area. Some sensor nodes with finite energy, computation ability, and storage ability self-organize into a network to accomplish application tasks. However, the limited energy of sensor nodes (their batteries cannot be replaced conveniently) restricts the lifetime of a network. Prolonging the lifetime of a network is an important task.

For prolonging the lifetime of a network, Cluster-based routing protocols are superior to flat routing protocols [12]–​[14]. However, energy consumption imbalance remains as a problem [15], [16]. In equal size clustering routing protocols, the cluster heads (CHs) close to the base station (BS) are burdened with more loads than those far from the BS [17]–​[19]. Thus, the sensor nodes in these clusters exhaust their energy faster than those in other clusters, and finally the network is partitioned into pieces. Consequently, the network is unable to coverage the whole monitored area. Unequal size clustering has advantage to achieve balancing energy consumption of the network and prolonging the network lifetime. Since there are more CHs near the BS to undertake the network load than that under equal size clustering. In general, the network lifetime is defined as the round when the first dead node appears. Ideally, supposing that all nodes in a network deplete their energy at an equal rate, the network lifetime will be the longest. To realize the goal that maximizing the network lifetime, it is crucial to balance energy consumption of nodes in the network. The CH in a cluster receives packets from its cluster members (CMs) and other CHs. Thus, it consumes more energy than CMs and plays a key role in the network. Accordingly, several points need to be considered to achieve the goal:

  1. How to obtain the optimal unequal cluster sizes?

  2. How to select the optimal CHs?

  3. When and how to rotate the CHs?

The information such as location, energy, etc. is usually considered as a parameter to compute the optimal unequal cluster sizes. As for IWSNs, all nodes transmit their packets to the BS and hence the data transmission in IWSNs has characteristics of centripetal. In other words, it is helpful to obtain the optimal unequal cluster sizes by constructing a load analysis model. In the CH selection process, a large number of control packets are exchanged among nodes for selecting an optimal CH. The node with the most energy in a cluster is usually elected to be CH, however, the location of the CH need to be used as a reference when considering the balance of energy consumption. The CH rotation process ensures that the CH has more energy than its CMs. It is also a key step in achieving energy-balanced unequal clustering.

In this study, a grid-based clustering algorithm via load analysis for industrial internet of things (GCA) is proposed for prolonging the network lifetime. The major contribution of this work can be summarized as follows:

  1. A load distribution model is presented for analyzing the network load. Furthermore, on the basis of the analysis of energy consumption, it is concluded that the number of delivered packets in each level is related to the cluster size at that level.

  2. The optimal cluster size can be determined to balance the energy consumption of the network by solving polynomials, and then the network is partitioned into grid clusters according to the obtained size. The load of the network is redistributed with unequal cluster size.

  3. The new CH in each cluster is selected based on the energy and distance information in that cluster. Thus, the energy consumption of the intra transmission is decreased and the overhead of CH selection lessens effectively.

The remainder of this paper is organized as follows: several studies on clustering-based routing algorithms are presented in Section II. The models used in this study are described in Section III. Section IV explains the cluster sizes optimization process. The routing algorithm is described in detail in Section V, and a series of experiments is presented in Section VI. The conclusion is drawn from the research results in Section VII.

SECTION II.

Related Works

Different applications of WSNS have diverse requirements and challenges. However, the energy consumption imbalance is always the key issue in different applications, and not just in IIoT. Existing studies show that cluster-based protocols for energy efficiency optimization can be classified into two categories: CH optimization and cluster size optimization. At present, the latter is receiving more attention than the former. Several cluster size optimization algorithms are described in the following paragraphs.

To solve the problem of uneven energy consumption, numerous unequal clustering algorithms are proposed, which are brought into sharp focus. According to the clustering formation process, these algorithms can be divided into two categories. On the one hand, the network is partitioned into clusters directly. The concept of competition radius is proposed to partition the network into several clusters with unequal sizes, such as in [20]–​[24]. The CHs are selected with localized competition. The competition radius is the same as the cluster size. In [25], FPUC computes the competition radius according to the sensor node’s distance to the sink and the sensor node’s surrounding node density. HUCL [26], which is a hybrid of static and dynamic cluster approach also adopts the competition radius mechanism in the cluster setup phase, and the CH rotating in the intra-cluster effectively reduces the cluster overhead. In the above algorithms, the distribution of CHs is stochastic and the cluster size is usually related to the distance and energy information. On the other hand, the network will be initially partitioned into a series of concentric rings or levels and then the clustering process will be completed. A certain number of CHs are elected in each ring or level. In [27], the network is divided into inner and outer regions, and unequal clustering is implemented in the two different regions. In [18], the area is divided into some virtual tracks around the BS. Nodes located in the same track form clusters with similar sizes. The distance to the BS and the distance among the CHs are two criteria that influence on the cluster sizes at different tracks. But the overlap area should be taken into account when the competition radius is calculated. In [28], the optimal unequal cluster size is obtained at different layers with the symmetrical and unsymmetrical deployment of nodes. In [16], UCRP is evenly dividing the network into multi-layer rings. In [29], COCA is proposed for constructing optimal clustering architecture to minimize the total energy consumption of all sensor nodes. The region is initially divided into rectangles, and then the rectangles are divided into square units with the same width. Hierarchical network, with the uniform distribution of CH, ensures the network coverage ratio, but the overhead will be increased and the scalability of the algorithm is weak.

According to the calculation process of cluster sizes, these algorithms can be divided into three categories to discuss. Many works fall into the same category such as [20]–​[26]. All of these algorithms adopt competition mechanism. The cluster size is usually related to the distance between the CH and the BS or residual energy. That is, CHs close to the BS have smaller cluster sizes than those far from the BS. The optimal cluster sizes are computed on the base of the network performance analysis belongs to the second category. ACT is proposed in [30] for arranging the cluster sizes and transmission ranges of WSNs. The author uses theory analysis to calculate the cluster size based on the relaying load of the CH. The CHs close to the BS avoid the excessive relaying loads. However, the result of calculating the load may be inaccurate. In [31], a mathematical framework for calculating the optimal cluster size by finding the optimal value of k_{\mathrm {opt}} for one-hop communication networks between the CH and the BS is presented. However, the discussion on optimizing cluster size under the multi-hop transmission pattern is not included in this previous study. The methods using fuzzy logic are the third category. In these methods, some parameters are used as input while competing radius as output. A fuzzy logic approach named EAUCF in [32] is adopted to handle uncertainties in estimating the radius of the CHs. Two fuzzy sets of the distance from a sensor node to the BS and the residual energy of a sensor node are regarded as input values. Meanwhile, the competition radius of the tentative CH is the output value. The membership function is the key point, but the author does not provide the selection principle. An improvement of EAUCF is proposed in [33]. The node degree is considered in the competitive radius computation process. Simultaneously, the CH degree is used for cluster formation to overcome the energy consumption imbalance around the BS. In [34], It uses a fuzzy-based CH selection technique for selecting nodes with high residually energy, having more number of neighboring nodes, and high quality of communication link as CHs. In [35], the fuzzy method is used to obtain the unequal cluster size. Distance, energy and load are as input parameters. With fuzzy control mechanism, the accuracy of the optimal cluster size is decided by the input parameters and fuzzy sets, but there is no rule for the fuzzy sets selection. According to the above algorithms, once the network scale is determined, the distribution of network flow is computable when no malicious node attacks the network. Thus, through the analysis of network performance, it is more accurate to calculate the cluster size.

According to the shape of the cluster, these algorithms can be broadly divided into three categories which are circular clusters, grid cluster, and triangle cluster respectively. Most of unequal clustering algorithms adopt circular cluster, such as in [20]–​[24]. In [36], EBCAG partitions the network into clusters with unequal circular cluster sizes, and each sensor node maintains a gradient value. Based on the analysis of the network load, the size of a cluster is decided by the gradient value of its CH. Data gathered from the CMs should follow the direction of the descending gradient to reach the BS. In [15], UMBIC divides the sensor field into number of virtual grids and constructs the candidate sensor set via selecting a certain number of sensor nodes with high energy in each virtual grid. The number of grids depends on the sensor field size and the maximum competition range. In [37], the monitoring area is divided into virtual grids with different sizes. The X-axis and Y-axis in the monitoring area are divided into several parts according to an arithmetic sequence with the equal difference being d . Only a small number of results belong to the third category. ‘Sierpinski Triangle’ is used to partition the network into unequal clusters in [38]. The manner of the cluster formation is static mode. The number of clusters is decided by the number of iteration. From the above discussion, the overlap area should be taken into account when the shape of the cluster is circle, and not when the cluster shape is grid or triangle.

Through the above analysis, the cluster size is usually related to the value of energy, distance, coverage and link quality, etc., regardless of the method used. In this study, a grid-based clustering algorithm via load analysis for industrial internet of things is proposed. The network is partitioned into grid clusters, which increases the accuracy of load analysis. The entire network is covered by grid clusters with the overlap area not being considered. An energy consumption analysis is also conducted. By analyzing the load quantization and the energy balancing constraint, the optimal cluster sizes can be obtained without complex computations. Moreover, we prove that the clusters close to the BS have smaller cluster size than that far from the BS. Then, the network is partitioned into several grid clusters with obtained unequal cluster sizes. Each selected CHs collects the packets from CMs and then forwards them to the BS with the help of other CHs.

SECTION III.

Models Presentation

A. Network Model

It is assumed that the monitored IIoT area is a rectangular. N sensor nodes are uniformly distributed in the monitored IIoT area. The BS is near the monitored area with a fixed location. The parameters and situations of the network are assumed as follows:

  1. All sensor nodes with a unique ID are static and their transmission radii are adjustable.

  2. The network is divided into n-levels clusters based on the load analysis, and named as q_{1} , q_{2}, \ldots, q_{n} in sequence.

  3. The shape of the cluster is square with the length of l_{i} .

  4. Each sensor node generates packets at the speed of k bits in a period of t seconds and sends the packets to its CH.

  5. The packets dealt with by the CHs at the q_{\mathrm {i+1}} th-level are uniformly forwarded to the CHs at the q_{i} th-level.

A demonstration of the network model is shown in Fig. 1. The grid cluster is used in this study.

FIGURE 1. - Demonstration of the network model.
FIGURE 1.

Demonstration of the network model.

Table 1 lists the description of the parameters and the notations used in this section.

TABLE 1 Parameters and Notations
Table 1- 
Parameters and Notations

As all sensor nodes are distributed uniformly in the monitored area, the network node density \rho is given by \begin{equation} \rho =\frac {N}{L\times W} \end{equation}

View SourceRight-click on figure for MathML and additional features.

B. Energy Model

The energy model assumed in this study has been adopted widely in many research works, such as in [29], [36], and [37]. The energy consumption of sensor nodes generally consists of three parts: transmission energy consumption E_{t} , receiving energy consumption E_{r} and sensing energy consumption E_{s} . E_{s} is usually too small and is thus ignored, such as in [26], [29], and [36]. E_{t} and E_{r} comprise the major portion of energy consumption in this study accordingly. E_{t} is defined as follows when sensor node i transmits a packet with b bits to node j :\begin{equation} E_{t} ={\begin{cases} bE_{elec} +b\varepsilon _{fs} d^{2},&\quad if ~d<d_{0} \\ bE_{elec} +b\varepsilon _{amp} d^{4},&\quad if ~d\ge d_{0} \\ \end{cases}} \end{equation}

View SourceRight-click on figure for MathML and additional features. E_{t} is dependent on the transmission distance d between node i and node j . According to the value of d , the propagation loss can be modeled as a free-space model or muti-path attenuation model, where {\mathrm{ E}}_{elec} is the electronic energy and depends on factors such as digital coding, modulation, and filtering of the signal, \varepsilon _{fs} and \varepsilon _{amp} denote the amplifier energy and depend on the required receiver sensitivity and receiver noise figure. Reception energy consumption E_{r} is defined as follows when sensor node i receives a packet with b bits from node j .\begin{equation} E_{r} =bE_{elec} \end{equation}
View SourceRight-click on figure for MathML and additional features.

C. Load Analysis Model

The network in this study is divided into several grid clusters with the length of l_{i} . Based on the preceding description of the parameters, the load can be formulated as follows.

The number of sensor nodes at the q_{i} th-level cluster is N_{i} , \begin{equation} N_{i}=l_{i}^{2}\rho \end{equation}

View SourceRight-click on figure for MathML and additional features. where l_{i} is the length of the q_{i} th level cluster, and \rho \iota \text{s} the node density of the network. Each sensor node in the network generates one packet with k bits within a time period (e.g. each node generates a packet with k bits per 5 sec). Thus, the total number of bits generated by the q_{i} th-level cluster within a time period can be expressed as \begin{equation} P=N_{i}\times k \end{equation}
View SourceRight-click on figure for MathML and additional features.

The number of CHs at the q_{i} th-level is \begin{equation} S_{i} =\frac {W}{l_{i} } \end{equation}

View SourceRight-click on figure for MathML and additional features.

The packets handled by the q_{i+1} th-level cluster are forwarded to the q_{i} th-level cluster. The CH at the q_{n} th level only deals with the generated packets. Thus, the total number of bits P_{n} dealt with by the CH at the q_{n} th-level is \begin{equation} P_{n} =l_{n}^{2} \rho k \end{equation}

View SourceRight-click on figure for MathML and additional features.

The total number of bits dealt with by the CH at the q_{n-1} th-level can be expressed as (8), which consists of the packets generated by the sensor nodes at the q_{n-1} th-level cluster and the packets received from the q_{n} th-level cluster. The packets at the q_{n} th level are distributed equally to the CHs at the q_{n-1} th level, \begin{align} P_{n-1}\approx&\frac {S_{n} \times P_{n} }{S_{n-1} }+l_{n-1}^{2} \rho k=\frac {\frac {W}{l_{n} }\times l_{n}^{2} \rho k}{\frac {W}{l_{n-1} }}+l_{n-1}^{2} \rho k \notag \\=&l_{n} l_{n-1} \rho k+l_{n-1}^{2} \rho k=(l_{n} +l_{n-1})l_{n-1} \rho k \end{align}

View SourceRight-click on figure for MathML and additional features.

Moreover, the total number of bits P_{n-2} dealt with by each CH at the q_{n-2} th-level can be expressed as \begin{align} P_{n-2}=&\frac {S_{n-1} \times P_{n-1} }{S_{n-2} }+l_{n-2}^{2} \rho k \notag \\=&\frac {\frac {W}{l_{n-1} }\times (l_{n} +l_{n-1})l_{n-1} \rho k}{\frac {W}{l_{n-2} }}+l_{n-2}^{2} \rho k \notag \\=&(l_{n} +l_{n-1})l_{n-2} \rho k+l_{n-2}^{2} \rho k \notag \\=&(l_{n} +l_{n-1} +l_{n-2})l_{n-2} \rho k \end{align}

View SourceRight-click on figure for MathML and additional features.

Thus, the total number of bits handled by each CH at the q_{i} th-level is \begin{align} P_{i}=&\frac {S_{i+1} \times P_{i+1} }{S_{i} }+l_{i}^{2} \rho k \notag \\=&(l_{n} +l_{n-1} +\cdots +l_{i})l_{i} \rho k \notag \\=&l_{i} \rho k\sum \limits _{j=i}^{n} {l_{j} } \end{align}

View SourceRight-click on figure for MathML and additional features. According to (10), once the parameters (such as node density and packet-generated speed) of the network are fixed, the number of packets dealt with by the clusters at each level is related to the cluster size.

SECTION IV.

Cluster Size Optimization

The network lifetime will be prolonged if the total energy consumption is minimized. Energy analysis is achieved using the clustering approach proposed in this study. The phenomenon of energy consumption imbalance typically appears among CHs. because their load is difference. The energy consumption of each CH E_{ch} consists of three parts: (1)receiving energy consumption E_{rch } when a CH receives packets from other CHs, (2)E_{rcm } when the CH receives packets from its CMs and (3)transmission energy consumption E_{t} when the CHs forward the packets they receive and those that they generate by themselves. Thus, E_{ch } can be expressed as \begin{equation} E_{ch} =E_{rch} +E_{rcm} +E_{t} \end{equation}

View SourceRight-click on figure for MathML and additional features.

The energy consumption of CH E_{ch}^{i} at the q_{i} th level can be expressed as \begin{align} E_{ch}^{i}=&E_{rch}^{i} +E_{rcm}^{i} +E_{t}^{i} \notag \\\approx&\frac {S_{i+1} \times P_{i+1} }{S_{i} }E_{elec} +l_{i}^{2} \rho kE_{elec} +P_{i} (E_{elec} +\varepsilon _{fs} d_{i}^{n}) \notag \\=&P_{i} (2E_{elec} +\varepsilon _{fs} d_{i}^{n}) \end{align}

View SourceRight-click on figure for MathML and additional features. In (12), the first part represents the receiving energy consumption when the CHs at the q_{\mathrm {i}} th level receive the packets from the q_{\mathrm {i+1}} th level, the second part represents the receiving energy consumption when the CHs at the q_{\mathrm {i}} th level receive the packet from it CMs, and the last part represents the transmission energy consumption when the CHs at the q_{\mathrm {i}} th level transmit all its receiving packets to the q_{\mathrm {i-1}} th level. n is related to the d_{0} , if d_{\mathrm {i}} is less than d_{0} , n is set as 2 for the free space model, otherwise, n is set as 4 for the multipath model.

The objective of balancing energy consumption is achieved if \begin{equation} E_{ch}^{i} \approx E_{ch}^{j},1\le i\le n, 1\le j\le n \end{equation}

View SourceRight-click on figure for MathML and additional features. is satisfied.

Theorem 1:

Equation (13) is satisfied only if the cluster size l_{i} shortens as the CH becomes close to the BS.

Proof:

Equation (12) shows that the energy consumption of CH is related to the number of bits P_{i } it deals with and the transmission distance. According to (10), the number of bits handled by the CH at the q_{i} th-level increases as the CH is close to the BS. Thus, Equation (13) is satisfied only if transmission distance d_{i} shortens as the CH becomes close to the BS. The transmission distance d_{i} is the distance between two CHs during data transmission and it is related to the cluster size l_{\mathrm {i}} of the cluster at each level. Accordingly, Equation (13) is satisfied only if the cluster size l_{i} shortens as the CH becomes close to the BS. Nevertheless, arranging the location of CHs cannot influence the value of d_{i} .

The total energy consumption can be expressed as \begin{align} E_{total}=&E_{ch}^{1} \times S_{1} +E_{ch}^{2} \times S_{2} +\cdots +E_{ch}^{i}\notag \\&~\qquad \qquad \qquad \qquad \quad \times S_{i} +\cdots +E_{ch}^{n} \times S_{n}\qquad \end{align}

View SourceRight-click on figure for MathML and additional features. Moreover, E_{total} can be expressed as \begin{align} E_{total}=&E_{ch}^{1} \times S_{1} +E_{ch}^{1} \times S_{2} +\cdots +E_{ch}^{1} \notag \\&\times S_{i} +\cdots +E_{ch}^{1} \times S_{n} \notag \\=&E_{ch}^{1} \times W\times \left({\frac {1}{l_{1} }+\frac {1}{l_{2} }+\cdots +\frac {1}{l_{n} }}\right) \notag \\=&P_{1} \times (2E_{elec} +\lambda \varepsilon _{fs} d_{1}^{n})\times W\times \sum \limits _{i=1}^{n} {\frac {1}{l_{i} }} \notag \\=&N\times k\times (2E_{elec} +\varepsilon _{fs} d_{1}^{n})\times l_{1} \times \sum \limits _{i=1}^{n} {\frac {1}{l_{i} }} \quad \end{align}
View SourceRight-click on figure for MathML and additional features.
where, d_{1} is the average distance between the CH at the q_{1} th level and the BS. This variable is related to the size of clusters at the q_{1} th level. The length of cluster size l_{\mathrm {i}} is satisfied as follows:\begin{equation} l_{1} +l_{2} +l_{3} +\cdots +l_{n} =L \end{equation}
View SourceRight-click on figure for MathML and additional features.

The value of E_{\mathrm {total}} should be the minimum and the optimal cluster size l_{i} can be obtained with \begin{equation} {\begin{cases} E_{ch}^{1} \approx E_{ch}^{2} \approx E_{ch}^{3} \approx \cdots \approx E_{ch}^{n} \\ l_{1} +l_{2} +l_{3} +\cdots +l_{n} =L \\ \end{cases}} \end{equation}

View SourceRight-click on figure for MathML and additional features.

The optimal solution is defined as an array {l_{1} , l_{2}, \ldots l_{n} , n }. Many approximate optimal solutions can be obtained by solving the Equation (17) with least square method, and then one of approximate optimal solutions who satisfies the equation (18) will be the optimal solution. That is to say, the cluster size l_{i} and the number of level n is determined finally.\begin{equation} Min(E_{total}) \end{equation}

View SourceRight-click on figure for MathML and additional features.

SECTION V.

GCA Algorithm for IIOT

Clusters are formed after the cluster sizes are obtained, and data transmission is then performed. Clustering-based routing protocols generally consist of three parts: cluster formation, data transmission and cluster maintenance. In the cluster formation phase, the network is partitioned into several clusters, and each cluster has a CH. In the data transmission phase, two transmission modes are intra-transmission and inter-transmission respectively. In the cluster maintenance phase, each CH judged whether the CH rotation process should be run.

A. Cluster Formation

At the beginning, the BS broadcasts a short packet that covers the monitored area. Sensor nodes calculate the location information based on the strength of the receiving signal. According to the analysis in Section 3, once the cluster size is determined, the network is partitioned into grid clusters with the same length at the same level and different lengths at different levels. Since the cluster size of cluster at each level is different, and the same as the number of cluster at each level which is equal to \lceil w/l_{\mathrm {k}} \rceil . Obviously, one case that there is one cluster at each level is not a square may be happened. Thus, once the cluster formation is achieved, a cluster merging process is adopted to judge if the cluster which is not square should be merged. The merging condition is related to the number of nodes in that cluster.

In the CH selection process, an optimal CH is selected in each cluster. A detailed analysis on optimizing cluster size in terms of minimizing communication overhead was presented in [39]. The authors investigated the effects of several parameters on optimizing cluster size. These parameters include cluster size, number of BS, position of CH, and node density. The authors concluded that the optimal CH position is at the center of the cluster. Compared with CHs at random locations or those close to the BS, a CH near the cluster center spends approximately 15% less energy. In this study, the selected CH is the sensor node that closes to the center of the cluster.

In the first round, all the nodes have the same energy, thus, the CM with the minimum distance to the cluster center to which it belongs is the CH. The CH broadcasts a notification packet to notify CMs in its cluster and other CHs in other clusters. If some sensor nodes do not receive any notification packet from its CH within a certain period, the sensor node sends a request packet to find a CH with a minimum distance. In this manner, the cluster is formed.

B. Data Transmission Phase

Data transmission is achieved during this phase in two ways: intra-cluster transmission and inter-cluster transmission. In the former, the CMs of a cluster transmit their packets to the CH. For example, a sensor node “a ” transmits its packets to its CH “b ” as shown in Figure 2. The transmission radii of the CMs are equal to the cluster length, which ensures that CMs can communicate with the CH in one hop. That is \begin{equation} R=l_{i} \end{equation}

View SourceRight-click on figure for MathML and additional features.

FIGURE 2. - Data transmission phase.
FIGURE 2.

Data transmission phase.

In inter-cluster transmission, the packets are forwarded to the BS with the help of the CHs at different levels. The CH chooses one of its neighbors with the minimum distance to the BS and the most energy. As shown in Figure 2, the CH “b ” at the q_{2} th level forwards its packets to the CH “c ” at the q_{3} th level with the minimum distance to the BS and the most energy. Simultaneously, CHs forward their packets uniformly to the CHs at the adjacent levels by considering load balance. In this manner, the data packet is forwarded to the BS.

C. Cluster Maintenance

CH rotation is an essential process ensures to balance the energy consumption of nodes. The rotation frequency and rotation strategy are the key points. This process may be extremely complex in some routing protocols. Negotiation among sensor nodes is typically used to select a new CH with higher remaining energy. Exchanging control packets among nodes will cause extra energy depletion. The CH is the manager of a cluster and can obtain the energy information of its CMs. Thus, in this study, the rotation process starts once the remaining energy of the CH becomes less than a threshold f . The value of threshold f can be expressed as \begin{equation} f=T\times E_{c}^{avg} \end{equation}

View SourceRight-click on figure for MathML and additional features. where, E_{c}^{avg} is the average remaining energy of cluster members in the cluster c , T is a factor related to the rotation frequency.

There are two strategies usually used to choose a head in a cluster, which are the CM has most remaining energy or the CM close to its CH. However, both of these strategies are not optimal. The former will cause more intra transmission cost, and the latter will cause the unbalance energy depletion in a cluster. In this study, both the energy and location information are considered to elect a new CH, the current CH specifies a new CH and sends a notification packet to notify the new CH. Once the new CH receives the notification packet, it broadcasts the notification packet to notify other CMs and other CHs at other levels. This process is similar to the cluster formation phase. In this manner, the complexity of the GCA is O (1).

D. The Pseudocode of GCA

The pseudocode of GCA algorithm is illustrated in Algorithm 1. Before the CH formation process begins, the network is partitioned into n levels and \lceil w/l_{\mathrm {k}} \rceil clusters. The state of node i is the CM in the beginning. Each node chooses the cluster to which it belongs based on its location information. Once the node is chosen as the CH, then it broadcasts head_msg to notify other CHs and all of CMs in its cluster. The CM updates its state and sends join_msg to the CH, and the other CHs update their neighbor table, which is used in the data transmission phase. The cluster maintenance process begins when the energy of the CH is less than a threshold. A new CH near the cluster center is selected by the old CH in its cluster with the most energy. The state of the old CH is changed to be CM. The new CH receives the notify_msg from the old CH and changes the state into CH, and broadcasts head_msg. The following cluster maintenance process is similar to the cluster formation process.

Algorithm 1 Grid-Based Clustering Algorithm

Begin

1

Initialization

The optimal cluster size l_{\mathrm {k}} of each level is obtained based on (17) and (18).

The network is partitioned into [w/l_{\mathrm {k}} ] clusters at each level.

Node i judges the cluster c_{k\mathrm {j}} to which it belongs based on its location information.

2

Cluster formation

If node i near the center of the cluster, it will be chosen as the CH.

State[i ]\leftarrow cluster head.

node i broadcasts Head_msg.

else

node j receives Head_msg

if node j is CH

Updates neighbor table

else State[j ]\leftarrow cluster member

Sends join_msg

end

end

3

cluster maintenance

If the energy of the CH i is below a threshold f .

State[i ]\leftarrow cluster member.

Chooses a node k near the cluster center with the most energy in its cluster.

Sends notify_msg to node k

End

Node k receives notify_msg.

State[k ]\leftarrow cluster head.

node i broadcasts Head_msg.

end.

SECTION VI.

Simulation Results

The algorithms RUHEED [24], ACT [30], and ER-HEED [40] are chosen to evaluate their performance with GCA. Both the GCA algorithm and the ACT algorithm use the energy analysis to form unequal clustering. As to ACT algorithm, there is a slightly error in the energy analysis model. For example, the packets generated by the k th-level clusters are initially forwarded to the k -1th-level. Then, the received packets and those packets generated by the k -1th-level clusters are forwarded to the k -2th level together. However, the number of packets is calculated in a different manner in ACT. In this previous study, the packets transmitted from the k th level to the k -1th level and those transmitted from the k -1th-level to the k -2th-level are accumulated when the receiving load in the k -2th-level is calculated. In this manner, the number of packets calculated is greater than that obtained in the usual manner. That is, the calculation of the received load by the CH is not exact, and this error directly affects the accuracy of the cluster size arrangement. As for RUHEED, unequal cluster mechanism is achieved based on competition radius as in [14], that is, the number of cluster is more than that far away from the BS, multi-hop transmission is adopted between CHs. As for ER-HEED, it employs equal cluster mechanism to achieve cluster process. The CH rotation process is used in both RUHEED and ER-HEED. The CH chooses the CM in its cluster with most remaining energy to be the new CH. The four algorithms are simulated and the comparison among them is shown as follows.

A. Simulation Setup

A total of 96 sensor nodes are distributed uniformly in a 120m\times 80\text{m} rectangle area. The BS is located at the coordinate of (0, 40). The initial energy of each node is 0.5J, except for that of the BS, which has unlimited energy. The initial transmission radius of each node is set to 30m, and the BS has a maximum transmission radius that is equal to the length of the network. The radius or the length of the cluster in each level is presented in Table 2. As for GCA and ACT, the size of cluster is different in each level. The cluster size in RUHEED is range from 21 to 30. The cluster size is equal to 30 in ER-HEED. The number of levels is 3 for ACT and 4 for GCA based on calculations. R_{0} is set as 30m in both RUHEED and ER-HEED. In this study, the lifetime indicates the round when the first dead node appears, that is, the node exhausts its energy. For cluster-based protocols, the lifetime is defined as the number of rounds. The definition of round in this study is a completed data transmission process that begins with data generation and ends when data are received by the BS. The goal in this study is to balance the energy consumption and prolong the network lifetime, if the round when the first dead node appears is delayed effectively, then our goal is achieved.

TABLE 2 Radius or Length of the Cluster
Table 2- 
Radius or Length of the Cluster

The optimal value of T is obtained based on Figure 3, it shows the round when the first dead node appears with the value of T set from 0.1 to 0.9. It is obviously that the round when the first dead node appears is latter than others when T is 0.5. The value of T is related to the CH rotation frequency. The smaller the value of T , the slowest the CH rotation frequency, it will cause the phenomenon of unbalance energy consumption and it will cause more extra control packets generated and more energy depletion on the contrary. Thus, the optimal value of T is set as 0.5 in the whole simulation.

FIGURE 3. - The value of 
$T$
.
FIGURE 3.

The value of T .

B. Performance Evaluation

The average remaining energy is compared among the four algorithms in Figure 4. The average remaining energy of sensor nodes in GCA is superior to the other three algorithms. The difference among ACT, ER-HEED and RUHEED is small, but the performance of ER-HEED is always the worst. The result presents that unequal cluster mechanism outperforms equal mechanism with the simulation situation in this study. In GCA, the network is partitioned based on energy analysis which redistributes the network load, and the CH rotation process pays attention to both the energy information and location information, thus, the energy consumption balancing is achieved better in GCA than that in the other algorithms. GCA is better than ACT, because load analysis is more precise in the former than that in the latter. There is an intersection of the ACT and RUHEED. Since the CH rotation process in RUHEED cost more energy than that in ACT with the operation of the algorithms.

FIGURE 4. - Average remaining energy of nodes.
FIGURE 4.

Average remaining energy of nodes.

We use the number of dead nodes shown in Figure 5 to evaluate the characteristic of energy consumption balancing. The dead node appears the first in ACT, and it appears later in ER-HEED and RUHEED than that in ACT. In GCA, it appears the latest among the four algorithms. The result shows that GCA prolongs the network lifetime effectively. When the round is 500, there are only 18 dead nodes, which less than the other three algorithms, As shown in Figure 4, when the round is 500, the average remaining energy is close to 0.1J, that is, when 10 percent energy is remained, most of nodes are alive, it is verified the effectiveness of GCA proposed in this study. Meanwhile, the balance characteristic of energy consumption is the best in GCA, as verified in Figure 4, which indicates the energy analysis in GCA is more accurate than that in ACT.

FIGURE 5. - The number of dead nodes.
FIGURE 5.

The number of dead nodes.

The CH rotation process is different in four algorithms. In GCA, when the remaining energy of CH is less than half of the average remaining in its cluster the rotation process begins. As for ACT, the rotation process begins when the CH of energy is less than 15% of initial energy. It is different in ER-HEED that the rotation process works in each round, the same as RUHEED. Thus, the energy consumption of CH election in ER-HEED and RUHEED is more than that in GCA and ACT as shown in Figure 6. When the first dead node appears, the CH election process is running among all of nodes. The more the number of clusters, the more energy the CH election costs, vice versa. Accordingly, it appears fluctuations when first dead node appears in ER-HEED and RUHEED. On the contrary, the cost of CH election is less in GCA and ACT, the rotation process begins only when the condition is satisfied.

FIGURE 6. - The cost of CH election.
FIGURE 6.

The cost of CH election.

As shown in Figure 7, the average remaining energy of the CHs in ER-HEED and RUHEED is greater than that of GCA and ACT. Since the node with the most energy is chosen as the new CH in a cluster in ER-HEED and RUHEED. In GCA, the location information is considered and it pays no attention to energy information in ACT. The fluctuations appear in four algorithms due to the rotation process is adopted.

FIGURE 7. - Average remaining energy of CHs.
FIGURE 7.

Average remaining energy of CHs.

The number of CHs is investigated among the four algorithms in Figure 8. According to the clustering mechanism in each algorithm, the number of clusters is decided once the optimal radius or length of the cluster is obtained in ACT and GCA. Thus, the number of clusters is always the same in the whole lifetime. As to ER-HEED and RUHEED, the number of clusters is changed once the first dead node appears, the new CHs are elected among all of nodes, and the number of CH will reduce with the remaining of energy or the number of nodes is decreased.

FIGURE 8. - The number of CHs.
FIGURE 8.

The number of CHs.

An ideal cluster head is considered as close to the center of a cluster, and the average distance between the CH and it CMs will be the minimum. It will spend approximately 15% less energy which is verified in [39]. But the energy information should be used as a reference when rotating the CH just as in GCA. However, in ER-HEED and RUHEED, it only pays attention to the energy information, and in ACT, the location information is the only parameter considered for CHs selection. Based on the above analysis, Figure 9 shows that the distance between the CH and CM is bigger in ER-HEED than the other three algorithms. The number of CH in RUHEED is more than that in ER-HEED as shown in Figure 8, thus, the distance in RUHEED is smaller than that in ER-HEED. The same result can be found between GCA and ACT. The distance between the CH and its CM is the minimum in GCA and hence the energy consumption of intra transmission is the least among the four algorithms.

FIGURE 9. - The average distance between CM and CH.
FIGURE 9.

The average distance between CM and CH.

In IIoT, the requirement of node density is inconsistent according to different production modes. In the following experiments the node density is observed with the size of the monitored area fixed. The number of nodes is added to 100, 200 and 300 gradually. As Figure 10 shown, with the increasing of the number of nodes, the network lifetime shortens. Because the number of packets increases with the number of nodes added, the energy consumption of nodes will speed up. It is obviously that the network lifetime of GCA is superior to the other three algorithms. But the difference among the other three algorithms is not clearly. But when the node increases, the network lifetime of ER-HEED and RUHEED drops fast, because the cost of the CH election will increase with the number of nodes grows.

FIGURE 10. - The round when the first dead node appears with changing the number of nodes.
FIGURE 10.

The round when the first dead node appears with changing the number of nodes.

Figure 11 shows the number of dead nodes in each algorithm when the number of nodes is changed. In each case, the number of dead nodes in GCA is much less than that in the other three algorithms. And the round when the dead node appears is ahead when the number of node increases as the CH undertakes more packets. Similarly, when the number of nodes increases, the performance of ACT is superior to that of ER-HEED and RUHEED as shown in Figure 11. The remaining energy of CH is shown in Figure 12, the performance of ER-HEED and RUHEED outperform GCA and ACT regardless of the number of node. In ER-HEED and RUHEED, they choose a new CH with most energy among nodes at higher frequency. With the algorithms running, the difference among GCA, ER-HEED and RUHEED is small, because the energy consumption of nodes in GCA is more uniform.

FIGURE 11. - Number of dead nodes when the number of nodes is changed.
FIGURE 11.

Number of dead nodes when the number of nodes is changed.

FIGURE 12. - The average remaining energy of the CH when the number of nodes is changed.
FIGURE 12.

The average remaining energy of the CH when the number of nodes is changed.

For showing the variation tendency obviously, we present the cost of the CH election from the 200 rounds to 300 rounds in Figure 13. The variety of energy consumption fluctuates obviously. For each algorithm, the cost of cluster election increases with the number of node grows. Clearly, the CH rotation frequency in ER-HEED and RUHEED is higher than that in GCA and ACT. As for considering the energy information and distance information, the elected new CH in GCA is more ideal than that in other algorithms, thus, the rotation frequency in GCA is the lowest among the four algorithms.

FIGURE 13. - The cost of CH election when the number of nodes is changed.
FIGURE 13.

The cost of CH election when the number of nodes is changed.

In the IIoT, the scale of network is changing with the industrial production scale. Thus, the size of the monitored area is extended gradually with node density fixed in the following experiments. The network lifetime is shown in Figure 14. The initial energy of node is set as 2J. There is no doubt that the network lifetime shortens when the size of the monitored area extends. The network lifetime of GCA is always superior to the other three algorithms, but the difference lessens gradually with the size of the area extends. Thus, as to large scale of network, the performance of GCA needs to be improved further.

FIGURE 14. - The round when the first dead node appears with changing the size of the area.
FIGURE 14.

The round when the first dead node appears with changing the size of the area.

Figure 15 illustrates the remaining energy of nodes when the size of the area extends. As the node density is fixed, the number of nodes tends to grow at an exponential rate. Thus, the speed of energy consumption is fast when the size of area increases gradually. However, the remaining energy of GCA keeps advantage among the four algorithms. At the same, the performance of ER-HEED and RUHEED drops fast as the size of area extends. The result indicates that the CH rotation process influence the performance of algorithm distinctly with the number of nodes increases.

FIGURE 15. - The average remaining energy of nodes when the area is changed.
FIGURE 15.

The average remaining energy of nodes when the area is changed.

SECTION VII.

Conclusion

IIoT is seen as the next industrial revolution that will improve the industrial productivity greatly. In this study, we focus on the problem of energy usage imbalance that occurs in cluster-based routing protocols in IIoT and present a new solution to this problem. Numerous works based on optimizing the number of CHs, fuzzy sets, or completive radii have been proposed to achieve balancing energy consumption. However, load distribution fundamentally affects energy consumption. We present a precise load analysis model to guide cluster size adjustment in this study and prove that the cluster near the BS should be smaller than that far from the BS. We also analyze the total energy consumption of a network. The comparison among GCA, ACT, ER-HEED and RUHEED is shown in Section 6. The death time of the first sensor node is delayed effectively in GCA. The performance of GCA is superior to the other three algorithms when the node density increases and the size of the monitored area extends. When it comes to large scale network, there is still much room for improvement in GCA. Our strategy is not only suitable for IIoT but also for numerous applications that require sensor nodes to collect data periodically, such as in environmental monitoring. Our future work will focus on the type of event trigger for IIoT.

ACKNOWLEDGMENT

The authors express their sincere appreciation to the editors and the anonymous reviewers for their helpful comments.

References

References is not available for this document.