Loading [MathJax]/extensions/TeX/boldsymbol.js
Learning, Computing, and Trustworthiness in Intelligent IoT Environments: Performance-Energy Tradeoffs | IEEE Journals & Magazine | IEEE Xplore

Learning, Computing, and Trustworthiness in Intelligent IoT Environments: Performance-Energy Tradeoffs


Abstract:

An Intelligent IoT Environment (iIoTe) is comprised of heterogeneous devices that can collaboratively execute semi-autonomous IoT applications, examples of which include ...Show More

Abstract:

An Intelligent IoT Environment (iIoTe) is comprised of heterogeneous devices that can collaboratively execute semi-autonomous IoT applications, examples of which include highly automated manufacturing cells or autonomously interacting harvesting machines. Energy efficiency is key in such edge environments, since they are often based on an infrastructure that consists of wireless and battery-run devices, e.g., e-tractors, drones, Automated Guided Vehicle (AGV)s and robots. The total energy consumption draws contributions from multiple iIoTe technologies that enable edge computing and communication, distributed learning, as well as distributed ledgers and smart contracts. This paper provides a state-of-the-art overview of these technologies and illustrates their functionality and performance, with special attention to the tradeoff among resources, latency, privacy and energy consumption. Finally, the paper provides a vision for integrating these enabling technologies in energy-efficient iIoTe and a roadmap to address the open research challenges.
Page(s): 629 - 644
Date of Publication: 28 December 2021
Electronic ISSN: 2473-2400

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

A. Towards the Edge

During the last decade, the need for connecting billions of Internet of Things (IoT) devices has driven a significant part of the design of computing and communication networks. The number of use cases is countless, ranging from smart home to smart city, industrial automation or smart farming. Many of the applications involve huge amounts of data, and the need for fast, trustworthy and reliable processing of this data is oftentimes infeasible with a cloud-centric paradigm [1], [2]. Moreover, typical hierarchical setups of IoT cloud platforms hinder use cases with dynamically changing context due to lacking self-awareness of the individual subsystems and the overall system they usher. Alternatively, the architectures are evolving towards edge solutions that place compute, networking, and storage in close proximity to the devices. At the same time, the introduction of machine-driven intelligence has led to the term edge intelligence, referring to the design of distributed IoT systems with latency-sensitive learning capabilities [3].

Although the edge-centric approach solves the fundamental limitations in terms of latency and dynamism, it also induces new challenges to the edge system: 1) the system has to deal with complex IoT applications which include functions for sensing, acting, reasoning and control, to be collaboratively run in heterogeneous devices, such as edge computers and resource-constrained devices, and generating data from a huge number of data sources; 2) trustworthiness is a big concern for edge and IoT systems where devices communicate with other devices belonging to potentially many different parties, without any pre-established trust relationship among them; and 3) all those functionalities are increasingly based on a resource-limited wireless infrastructure that introduces latency and packet losses in dynamically changing channels.

Another huge concern for the exponential growth of IoT is its scalability and contribution to the carbon footprint. On the one hand, IoT is key in deploying a huge amount of applications that will reduce the emissions of numerous sectors and industries (e.g., smart farming or energy) [4]. On the other hand, although many of these devices are low-power, the total energy consumption of the infrastructure that support such systems does have a contribution to the digital carbon footprint and cannot be overlooked [5], [6].

B. Intelligent IoT Environments

We coin the term Intelligent IoT Environment (iIoTe) to refer to autonomous IoT applications endowed with intelligence based on an efficient and reliable IoT/edge- (computation) and network- (communication) infrastructure that dynamically adapts to changes in the environment and with built-in and assured trust. Besides the wireless (and wired) networking to interconnect all IoT devices and infrastructure, there are other three key (and power-hungry) technologies that enable iIoTe. The first one is Machine Learning (ML) and Artificial Intelligence (AI), and therefore we talk about intelligent IoT environments, comprising heterogeneous devices that can collaboratively execute autonomous IoT applications. Given the distributed nature of the system, distributed ML/ AI solutions are better suited for multi-node (multi-agent) learning. Edge computing is another defining technology that provides the computation side of the infrastructure and allocates computing resources for complex IoT applications that need to be distributed over multiple, connected IoT devices (e.g., machines and Automated Guided Vehicle (AGV)s). The third pillar is the Distributed Ledger Technology (DLT): rather than traditional security mechanisms, DLT has been identified as the most flexible solution for trustworthiness in a fully decentralized and heterogeneous scenario. Combined with smart contracts, it is possible for the system to autonomously control the transactions from parties without the need for human intervention. All these ingredients are necessary for a fully functional iIoTe, but they have inevitably a significant contribution to the total energy footprint. Our goal is to understand the role of each technology in the performance and energy consumption of an iIoTe.

C. Example: A Manufacturing Plant

A representative use case for iIoTe is a manufacturing plant like shown in Fig. 1, with autonomous collaboration between industrial robot arms, machinery and AGVs. This relies on real-time data analysis and adaptability and intelligence in the manufacturing process, which is only feasible with the edge paradigm. The wireless infrastructure interconnects all the machines and robots to the edge network and enables reliable and safe operation. In the figure, the following scene is depicted: a customer (the end-user) of a shared manufacturing plant orders a product by specifying a manufacturing goal (step 1). In step (2), the needed machine orchestration and associated process plan is determined to manufacture the desired product taking into account the available computation and communication resources. The event-based process planner at the edge node is responsible for observing the manufacturing process and reacting when the health state of a concerned machine changes. For example, by re-scheduling a given task in a non-responding machine. In step (3), the manufacturing process data is sent to the involved machines, which can include, e.g., mobile robots or an AGV to transport the work-pieces between production points, robotic arms, laser engravers, assembly stations, etc. Let us assume that the task requires a robot to pick up a work-piece and place it in different machines for its processing. As these machines may be operated by the plant owner or a third-party operator, contractual arrangements need to be set up, for which a distributed ledger is used. The ledger registers the details of each task for future accountability. In step (4), the local AI on board of the different end devices comes into play. For example, in the case of the robot as an end device, its AI decides how to pick up a work-piece and place it in the next machine. In case the local AI of the robot cannot complete its task (e.g., because it has not been trained for a similar situation yet), a human takes over remote control (this can be, e.g., a plant operator). After the human intervention, the local AI can be re-trained based on the data captured from the human input. This scene captures the role and interaction of the three technologies mentioned above: edge computing, ML/AI and DLTs/smart contracts. Similar examples can be defined in other domains, such as agriculture (e.g., autonomously interacting harvesting machines), healthcare (e.g., remote patient monitoring and interventions) and energy (e.g., wind plant monitoring and maintenance).

Fig. 1. - iIoTe in a manufacturing plant.
Fig. 1.

iIoTe in a manufacturing plant.

D. Contributions and Outline

In this paper, we analyze the key technologies for the next generation of IoT systems, and the tradeoffs between performance and energy consumption. We notice that characterizing the energy efficiency of these complex systems is a daunting task. The conventional approach has been to characterize every single device or link. Nevertheless, the energy expenditure of an IoT device will strongly depend on the context in which it is put, in terms of, e.g., goal of the communication or traffic behavior. Therefore, we go beyond the conventional single-device approach and use the iIoTe as the basic building block in the energy budget. Contrary to the single device, the iIoTe is able to capture the complex interactions among devices for each of the technologies. The total energy footprint is not just a simple sum of an average per-link or per-transaction consumption of an isolated device, and scaling the number of iIoTe to a large number of instances will give a more accurate picture of the overall energy consumption.

The rest of the paper is organized as follows. In Section II we provide the state-of-the-art of the enabling technologies. Section III analyzes the performance and energy consumption of each enabling technology, and Section IV provides the vision for integrating the enabling technologies in energy-efficient iIoTe. Concluding remarks and a roadmap to address the open research challenges are given in Section V.

SECTION II.

Background and Related Work

A. Edge Wireless Communications

Edge computing enables the processing of the received data closer to the sensor that generated them. This means a full re-design of the communication infrastructure that must implement additional functionality at the cellular base stations or other edge nodes. The design and performance of communication networks for edge computing has been widely studied in the last years, and an overview can be found in [7] and [8]. One example is the term Mobile Edge Computing (MEC), adopted in 5G to refer to the deployment of cloud servers in the base stations to enable low latency, proximity, high bandwidth, real time radio network information and location awareness. Specifically, the concept was defined in late 2014 by the European Telecommunications Standards Institute (ETSI): As a complement of the C-RAN architecture, MEC aims to unite the telecommunication and IT cloud services to provide the cloud-computing capabilities within radio access networks in the close vicinity of mobile users [9]. One of the areas of more research has been the network virtualization and slicing with the MEC paradigm [10]. In the Radio Access Network, several authors have looked at the potential of edge computing to support Ultra-Reliable Low-Latency Communication (URLLC) [11]–​[13]. Another research area is the use of machine learning, particularly deep learning techniques, to unleash the full potential of IoT edge computing and enable a wider range of application scenarios [14], [15]. However, most previous works address the communications separately. Even though several papers address the joint communication and computation resource management [16], they represent only the first step towards a holistic design of iIoTe and its defining technologies, as well as the integration with the communication infrastructure.

To optimize the energy efficiency of iIoTe, it is interesting to choose a communication technology that ensures low power consumption and massive connections of devices. In this regard, 3GPP introduced narrowband Internet of Things (NB-IoT), a cellular technology to utilize limited licensed spectrum of existing mobile networks to handle a limited amount of bi-directional IoT traffic. Although it uses LTE bands or guard-bands, it is usually classified as a 5G technology. It can achieve up to 250 kbps peak data rate over 180 kHz bandwidth on a LTE band or guard-band [17], [18].

Compared to other low-power technologies, NB-IoT is interesting for IoT applications with more frequent communications. This is the case for the ones considered in iIoTe, where the intelligent end devices share the updated models frequently and must record new transactions in the ledger. At the same time, NB-IoT keeps the advantages of Low-Power Wide Area (LPWA) technologies: low power consumption and simplicity. Throughout the rest of the paper, we use NB-IoT as a representative wireless technology for our analyses of iIoTe. Other wireless technologies will follow similar access procedures and energy-performance trade-offs.

For an analysis of the energy consumption and battery lifetime of NB-IoT under different configurations we refer the reader to [19]. A key point for this analysis is the study of the communication exchange during the access procedure: The devices that attempt to communicate through a base station must first complete a Random Access (RA) procedure to transit from Radio Resource Control (RRC) idle mode to RRC connected mode. Only in RRC connected mode data can be transmitted in the uplink through the Physical Uplink Shared Channel (PUSCH) or in the downlink through the Physical Downlink Shared Channel (PDSCH). The standard 3GPP RA procedure consists of four message exchanges: preamble (Msg1), uplink grant (Msg2), connection request (Msg3), and contention resolution (Msg4) (see Figure 2 where the example of recording some data, e.g., a DLT transaction is depicted). Out of these, Msg3 and Msg4 are scheduled transmissions where no contention takes place.

Fig. 2. - Random Access procedure in NB-IoT.
Fig. 2.

Random Access procedure in NB-IoT.

The NB-IoT preamble are orthogonal resources transmitted in the Narrowband Physical Random Access Channel (PRACH) (NPRACH) and used to perform the RA request (Msg1). A preamble is defined by a unique single-tone and pseudo-random hopping sequence. The NPRACH is scheduled to occur periodically in specific subframes; these are reserved for the RA requests and are commonly known as Random Access Opportunitys (RAOs). To initiate the RA procedure, the devices select the initial subcarrier randomly, generate the hopping sequence, and transmit it at the next available RAO. The orthogonality of preambles implies that multiple devices can access the base station in the same RAO if they select different preambles. Next, the grants are transmitted to the devices through the Narrowband Physical Downlink Control Channel (PDCCH) (NPDCCH) within a predefined period known as the RA response window. However, the number of preambles is finite and collisions can happen. In case of collision, each collided device may retransmit a preamble after a randomly selected backoff time.

The specification provides sufficient flexibility in the configuration of the RA process, which makes it feasible to adjust the protocol and find the right balance between reliability, latency, and energy consumption for a given application. Specifically, the network configures the preamble format and the maximum number of preamble transmission depending on the cell size, and this has an impact on the preamble and the total duration [20]. Increasing the number of preamble transmissions reduces the erasure probability, but at the cost of higher energy consumption and larger latency. The same energy-reliability-latency tradeoff applies to other messages, including the RA response. Moreover, scheduling the NPRACH and NPDCCH consumes resources that would otherwise be used for data transmission. Therefore, each implementation must find an adequate balance between the amount of resources dedicated to NPRACH, NPDCCH, PUSCH, and PDSCH.1

B. Distributed Learning Over Wireless Networks

Implementing intelligent IoT systems with distributed ML/AI over wireless networks (e.g., NB-IoT) needs to consider the impact of the communication network (latency and reliability under communication overhead and channel dynamics) and on-device constraints (access to data, energy, memory, compute, and privacy, etc.). Obtaining high-quality trained models without sharing raw data is of utmost importance, and redounds to the trustworthiness of the system. In this view, Federated Learning (FL) has received a groundswell interest in both academia and industry, whose underlying principle is to train a ML model by exchanging model parameters (e.g., Neural Network (NN) weights and/or gradients) among edge devices under the orchestration of a federation server and without revealing raw data [21]. Therein, devices periodically upload their model parameters after their local training to a parameter server, which in return does model averaging and broadcasting the resultant global model to all devices. FL has been proposed by Google for its predictive keyboards [22] and later on adopted in different use cases in the areas of intelligent transportation, healthcare and industrial automation, and many others [23], [24]. While FL is designed for training over homogeneous agents with a common objective, recent studies have extended the focus towards personalization (i.e., multi-task learning) [25], training over dynamic topologies [26] and robustness guarantees [27], [28]. In terms of improving data privacy against malicious attackers, various privacy-preserving methods including injecting fine-tuned noise into model parameters via a differential privacy mechanism [29]–​[32] and mixing model parameters over the air via analog transmissions [33], [34] have been recently investigated. Despite of the advancements in FL design, one main drawback in the design of FL is that its communication overhead is proportional to the number of model parameters calling for the design of communication-efficient FL. In an edge setup with limited resources in communication and computation, this introduces training stragglers degrading the overall training performance. In this view, client scheduling [35]–​[37] and computation offloading [12], [38], [39] with the focus on guaranteeing target training/inference accuracy have been identified as a promising research direction.

With client scheduling, the number of communication links are reduced (known as link sparcification) and thus, the communication bandwidth and energy consumption of distributed learning can be significantly decreased. Additional temporal link sparsity can be introduced by enforcing model sharing policies that account model changes and/or importance within consecutive training iterations such as the Lazy Aggregated Gradient Descent (LAG) method [40]. Sparsity can be further exploited by adopting sparse network topologies, which rely on communications within a limited neighborhood in the absence of a central coordinator/helper. While such sparsification improves energy and communication efficiencies, it could yield higher learning convergence speed as well as lower training and inference accuracy, in which sparsity needs to be optimized in terms of the trade-off between communication cost and convergence speed. In this view, several sparse-topology-based distributed learning methods including decentralized Gradient Descent (GD), dual averaging [41], learning over graphs [42], [43] and GADMM algorithms [44], [45] have been investigated.

C. Optimizing IoT Application Deployments in IoT Environments

IoT applications typically consist of multiple components. For instance, an IoT application could comprise components for secure data acquisition (e.g., based on Blockchain), data pre-processing, feeding the data into a neural network (or even through multiple ones) before it acts upon the outcome of the ML inference, etc. In many cases, such composed IoT applications need to be distributed over multiple, connected intelligent IoT devices. An important aspect is then to optimize this allocation of application components to devices. The result of the allocation is an assignment of components to devices, that fulfills the constraints, and optimizes the performance of the system in some metric. This metric could, for example, maximize the responsiveness of the application or minimize the overall energy consumption, where the latter is reasonable in battery-run wireless systems. An overview of existing allocation approaches is given in [46].

Previous work [47] used Constraint Programming to describe an approach for the efficient distribution of actors to IoT devices. The approach resembles the Quadratic Assignment Problem (QAP) and is NP-hard, resulting in long computation times when scaling up. Samie et al. [48] present another Constraint Programming-based approach that takes into account the bandwidth limitations and minimizing energy consumption of IoT nodes. The system optimizes computation offloading from an IoT node to a gateway, however, it does not consider composed computations that can be distributed to multiple devices.

A Game Theory-based approach is presented in [49] that aims at the joint optimization of radio and computational resources of mobile devices. However, the system local optimum for multiple users only aims at deciding whether to fully offload a computation or to fully process it on device.

Based on Non-linear Integer Programming, Sahni et al. [50] present their Edge Mesh algorithm for task allocation that optimizes overall energy consumption and considers data distribution, task dependency, embedded device constraints, and device heterogeneity. However, only basic evaluation and experimentation are done, without performance comparison. Based on Integer Linear Programming (ILP), Mohan and Kangasharju [51] propose a task assignment solver that first minimizes the processing cost and secondly optimizes the network cost, which stems from the assumption that Edge resources may not be highly processing-capable. An intermediary step reduces the sub-problem space by combining tasks and jobs with the same associated costs. This reduces the overall processing costs.

Cardellini et al. [52] describe a comprehensive ILP-based framework for optimally placing operators of distributed stream processing applications, while being flexible enough to be adjusted to other application contexts. Different optimization goals are considered, e.g., application response time and availability. They propose their solution as a unified general formulation of the optimal placement problem and provide an appropriate theoretical foundation. The framework is flexible so that it can be extended by adding further constraints or shifted to other optimization targets. Finally, our previous work [53] has leveraged Cardellini’s framework and has extended it by incorporating further constraints for the optimization goal, namely the overall energy usage of the application.

D. Distributed Ledger Technologies Over Wireless Networks

In recent years, DLT has been the focus of large research efforts spanning several application domains. Starting with the adoption of Bitcoin and Blockchain, DLT has received a lot of attention in the realm of IoT, as the technology promises to help address some of the IoT security and scalability challenges [54]. For instance, in IoT deployments, the recorded data are either centralized or spread out across different heterogeneous parties. These data can be both public or private, which makes it difficult to validate their origin and consistency. In addition, querying and performing operations on the data becomes a challenge due to the incompatibility between different Application Programming Interfaces (APIs). For instance, Non-Governmental Organizations (NGOs), Public and Private sectors, and industrial companies may use different data types and databases, which leads to difficulties when sharing the data [55]. A DLT system offers a tamper-proof ledger that is distributed on a collection of communicating nodes, all sharing the same initial block of information, the genesis block [56]. In order to publish data to the ledger, a node includes data formatted in transactions in a block with a pointer to its previous block, which creates a chain of blocks, the so called Blockchain.

A smart contract [57] is a distributed app that lives in the Blockchain. This app is, in essence, a programming language class with fields and methods, and they are executed in a transparent manner on all nodes participating in a Blockchain [58]. Smart contracts are the main blockchain-powered mechanism that is likely to gain a wide acceptance in IoT, where they can encode transaction logic and policies, which includes the requirements and obligations of parties requesting access, the IoT resource/service provider, as well as data trading over wireless IoT networks [59]. With the aforementioned characteristics, the advantages of the integration of DLTs into wireless IoT networks consist of: 1) guarantee of immutability and transparency for recorded IoT data; 2) removal of the need for third parties; and 3) development of a transparent system for heterogeneous IoT networks to prevent tampering and injection of fake data from the stakeholders.

DLTs have been applied in various IoT areas such as healthcare [60], [61], supply chain [62], smart manufacturing [63], and vehicular networks [64]. In the smart manufacturing area, the work described in [63] investigates DLT-based security and trust mechanisms and elaborates a particular application of DLTs for quality assurance, which is one of the strategic priorities of smart manufacturing. Data generated in a smart manufacturing process can be leveraged to retrieve material provenance, facilitate equipment management, increase transaction efficiency, and create a flexible pricing mechanism.

One of the challenges of implementing DLT in IoT and edge computing is the limited computation and communication capabilities of some of the nodes. In this regard, the authors in [59], [65] worked on the communication aspects of integrating DLTs with IoT systems. The authors studied the trade-off between the wireless communication and the trustworthiness with two wireless technologies, LoRa and NB-IoT.

SECTION III.

Enabling Technologies for iIoTe

This section elaborates on the three enabling technologies for iIoTe: 1) distributed learning; 2) distributed computing; and 3) distributed ledgers.

A. Energy-Efficient Distributed Learning Over Wireless Networks

As shown in Figure 1, each end device in the iIoTe has local AI and the whole system relies on FL. We present learning frameworks that are suitable for iIoTe leveraging two techniques: 1) spatial and temporal sparsity and 2) quantization.

1) Dynamic GADMM:

Standard FL requires a central entity, which plays the role of a parameter server (PS). At every iteration, all nodes need to communicate with the PS, which may not be an energy-efficient solution especially for a large distributed network of agents/workers, as in the manufacturing use case. Furthermore, a PS-based approach is vulnerable to a single point of attack or failure. To overcome this problem and ensure a more energy-efficient solution, we propose a variant of the standard Alternative Direction Method of Multipliers (ADMM) [66] method that decomposes the problem into a set of subproblems that are solved in parallel, referred to as Group ADMM (GADMM) [44]. GADMM extends the standard ADMM to decentralized topology and enables communication and energy-efficient distributed learning by leveraging spatial sparsity, i.e., enforcing each worker to communicate with at most two neighboring workers. In GADMM, the standard learning problem (P1) is re-formulated as the following learning problem (P2):\begin{align*} \left ({{\mathbf {P1}}}\right)~\min _{\left \{{\boldsymbol {\theta }_{n}}\right \}_{n=1}^{N}}&\sum \limits _{n=1}^{N} f_{n}\left ({\boldsymbol {\theta }_{n}}\right)\tag{1}\\ \left ({{\mathbf {P2}}}\right)~\min _{\left \{{\boldsymbol {\theta }_{n}}\right \}_{n=1}^{N}}&\sum \limits _{n=1}^{N} f_{n}\left ({\boldsymbol {\theta }_{n}}\right) \\ \mathrm {s.t.}&~\boldsymbol {\theta }_{n} = \boldsymbol {\theta }_{n+1}, \mathrm {~for~} n=1,\ldots, N-1.\tag{2}\end{align*} View SourceRight-click on figure for MathML and additional features.

To this end, GADMM divides the set of workers into two groups head and tail. Thanks to the equality constraint of (P2), each worker from the head/tail group exchanges model with only two workers from the tail/head group forming a chain topology. At iteration {k}\,\,+ 1, giving the models of the tail workers and the dual variables at iteration {k} , all head workers update their models in parallel since they have no joint constraints. Once the head workers update their models, they transmit their updated model to their neighbors from the tail group. Then, following the same way, every tail worker updates its model. Finally, the dual variables are updated locally at each worker. Following this alternation, GADMM allows at most {N} /2 workers to compete over the available bandwidth compared to {N} workers for the PS-based approach. With that, GADMM can significantly increase the bandwidth available to each worker, which reduces the energy wasted in competition for communication resources. The energy expenditure for communication is further reduced by including only two neighboring workers. The detailed algorithm is described in [44].

One drawback of GADMM is attributed to its slow convergence compared to standard ADMM. In other words, due to the sparsification of the graph, workers require more iterations for the convergence. To alleviate this issue and combine the fast convergence of standard ADMM with the communication-efficiency of GADMM, we have Dynamic GADMM (D-GADMM) [44]. proposed Not only D-GADMM improves the convergence speed of GADMM, but it also copes with dynamic (time-variant) networks, in which the workers are moving (e.g., the AGVs in the manufacturing plant or the tractors in the agriculture use case), while inheriting the theoretical convergence guarantees of GADMM. In a nutshell, every couple of iterations in D-GADMM, i.e., system coherence time, two things are changing: i) workers assignment to head/tail group, which follows a predefined assignment mechanism and ii) neighbors of each worker from the other group. The idea at high level as is follows: the workers are given fixed IDs, and they share a pseudo-random code that is used every \tau seconds, where \tau is the system coherence time to generate a set of random integers with cardinality {N} /2\,\,- 2. If {n} belongs to the set, then worker {n} is a head worker for this period. The assumption is that workers 1 and {N} do not change their assignment. i.e., worker 1 is always a head and worker {N} is always a tail. Head workers broadcast their IDs alongside a pilot signal, then tail workers compute their communication cost to all head workers, and share the cost vector with the neighboring heads. If a tail does not receive a signal from a certain head, the cost to that head is \infty , the same applies to heads. Subsequently, every head locally computes the communication-efficient chain using a predefined heuristic and share it with its neighboring tails. This approach requires two communication rounds and guarantees that every head will compute the same chain. Once the chain information is calculated, each worker will share its right dual variable with its right neighbor to be used by both workers and GADMM continues for \tau seconds. It is worth mentioning that we could, e.g., start with a chain 1-2-3-\cdots -N and move to 1-5-7-4-\cdots -N , so only nodes 1 and {N} preserve their assignments. For further details, the reader is referred to [44] where a comprehensive explanation of the steps of D-GADMM can be found.

In Fig. 3, we plot the objective error in terms of the number of iterations (left) and in terms of sum energy (right) for D-GADMM as well as GADMM and standard ADMM. As we can see from Fig. 3, D-GADMM greatly increases the convergence speed of GADMM and thus decreases the overall communication cost for fixed topology. As a consequence, D-GADMM achieves convergence speed comparable to the PS-based ADMM while maintaining GADMM’s low communication cost per iteration.

Fig. 3. - D-GADMM: loss as a function of (a) number of iterations and (b) total energy consumption.
Fig. 3.

D-GADMM: loss as a function of (a) number of iterations and (b) total energy consumption.

2) Censored Quantized Generalized GADMM:

As pointed out earlier, each worker, in the GADMM framework, exchanges its model with up to two neighboring workers only, which slows down convergence. To reduce the communication overhead while generalizing to more generic network topologies, we propose the Generalized GADMM (GGADMM) [45]. Under this generalized framework, the workers are still divided into two groups: 1) head and 2) tail, with possibly different sizes. In other words, the topology is generalized from a chain topology to any bipartite graph where the number of neighbors, that each worker can communicate with can be any arbitrary number and not necessarily limited to two. By leveraging the censoring idea, i.e., temporal sparsity, we introduce the Censored GGADMM (C-GGADMM) where each worker exchanges its model only if the difference between its current and previous models is greater than a certain threshold. To make the algorithm more communicationefficient, censoring is applied on the quantized version of the worker’s model instead of the model itself to get the Censored Quantized GGADMM (CQ-GGADMM) [45], [67]. CQ-GGADMM can significantly reduce the communication overhead, particularly for large model size{d} , since its payload size is (bd + 32) bits compared to the payload size of 32{d} bits for the full precision GGADMM. Since according to the Shannon’s capacity theorem, more bits consume more transmission energy for the same bandwidth, transmission duration, and noise spectral density, the communication energy of CQ-GGADMM, compared to the original GADMM, is significantly reduced. Theoretically, CQ-GGADMM inherits the same performance and convergence guarantees of vanilla GGADMM, provided that the censoring threshold sequence is non-increasing and non-negative.

Fig. 4 compares CQ-GGADMM with Censored ADMM (C-ADMM), GGADMM, as well as C-GGADMM in terms of the loss versus the number of iterations (left) and versus the total sum energy (right) for a system of 18 workers on a linear regression task using the Body Fat dataset [68]. We can observe, from Fig. 4, that CQ-GGADMM exhibits the lowest total communication energy, followed by C-GGADMM, then GGADMM and finally C-ADMM, while having similar convergence speed to GGADMM. This observation validates the benefits of censoring the quantized version of the models before sharing, which makes the proposed algorithm (CQ-GGADMM) more communication and energy efficient.

Fig. 4. - CQ-GGADMM: loss as a function of (a) number of iterations and (b) total energy consumption.
Fig. 4.

CQ-GGADMM: loss as a function of (a) number of iterations and (b) total energy consumption.

Finally, it is worth mentioning that motivated by the fact that, in FL, the parameter server is interested in the aggregated output of all workers rather than the individual output of each worker, analog over the air aggregation schemes such as [34], [69]–​[71] were proposed. Such schemes were shown to achieve high scalability and significant savings in energy consumption owing to their ability to allow non-orthogonal access to the bandwidth.

B. Optimizing Energy Consumption of Wireless IoT Environments

The next pillar in iIoTe is edge computing. Specifically, we consider the problem of allocating the application components to the available end devices. As first presented in [53], we extend the Integer Linear Programming (ILP) based framework defined by Cardellini et al. [52] (Section II-C). In [53] the goal was to minimize the overall energy consumption needed for executing an IoT application. The formulated ILP model is described below. The optimality can be determined with this by feeding it into a solver, such as IBM CPLEX.2

We define optimality of the allocation by total energy use over one execution of an IoT application. Energy during the application’s execution is consumed in two phases: 1) device energy, consumed by a device when executing a component and 2) edge network energy, consumed by the device when sending the result of the calculation over the network. Note that “optimal” in this case only describes optimality in the integer model. Given the constraints and the model, we find the optimal assignment, i.e., the one with minimal energy usage.

The optimal network configuration is the assignment of application components to devices that result in the lowest total consumption of energy and satisfies the constraints. The constraints concern the requirements that an assignment must satisfy: Each component should only be allocated once and resource requirements for assigned components should not exceed the resources of the node. This problem is a form of the quadratic assignment problem, and thus is NP-hard.

1) System Model:

The application consists of a set of components and edges that interconnect them, modeled as a weighted undirected graph \mathcal{G}_{\mathrm {app}}=(\mathcal{V}_{\mathrm {app}}, \mathcal{E}_{\mathrm {app}}) . Graph \mathcal{G}_{\mathrm {app}} is multi-partite, with vertex set \mathcal{V}_{\mathrm {app}} containing the application components, | \mathcal{V}_{\mathrm {app}}|=N , and edge set \mathcal{E}_{\mathrm {app}}\subset \{t_{1} t_{2}\,\,:\,\,t_{i}\in \mathcal{V}_{\mathrm {app}}, i=1,2\} representing the logical connections between components t_{i} .

Analogously, the network infrastructure where the components can be evaluated is modeled with the multi-partite graph \mathcal{G}_{\mathrm {net}}=(\mathcal{V}_{\mathrm {net}}, \mathcal{E}_{\mathrm {net}}) with vertex set \mathcal{V}_{\mathrm {net}} containing the communicating nodes, with cardinality | \mathcal{V}_{\mathrm {net}}|=M , and edge set \mathcal{E}_{\mathrm {net}}\subset \{t_{1} t_{2}\,\,:\,\,t_{i}\in \mathcal{V}_{\mathrm {net}}, n=1,2\} representing the wireless and wired links among nodes n_{i} . The result of the allocation is a matrix X = \mathcal{V}_{\mathrm {app}} \times \mathcal{V}_{\mathrm {net}} where {X} [{t} , {n} ] = 1 if and only if component {t} is allocated to node {n} . We also define E_{d} to be the device energy and E_{n} the network energy. E_{t} is then the total energy, and we put the constraint E_{d} + E_{n} \leq E_{t} . Components, nodes and links have properties that are relevant for the energy consumption of the application once allocated. These parameters are described in Table I. S_{t} , P_{n} , R_{n} and C_{n} are defined as multiples of some reference node. The resources of a node are expressed as a single scalar, but additional resource requirements can easily be introduced into the model.

TABLE I Parameters of Energy-Aware Allocation Algorithm
Table I- 
Parameters of Energy-Aware Allocation Algorithm

2) Problem Formulation:

For calculating the network energy, we need to know whether a link between two components is assigned to a link between two nodes. For this, we introduce a matrix Y= \mathcal{V}_{\mathrm {app}} \times \mathcal{V}_{\mathrm {app}} \times \mathcal{V}_{\mathrm {net}} \times \mathcal{V}_{\mathrm {net}} , where Y[t_{1}, t_{2}, n_{1}, n_{2}] = 1 if and only if the communication between component t_{1} and component t_{2} is allocated on the network link between nodes n_{1} and n_{2} . This corresponds to X[t_{1}, n_{1}] = 1 \wedge X[t_{2}, n_{2}] . Unfortunately, this is not a linear constraint, and thus we need to linearize the formulation. For this, we follow the formulation presented in [52] and define an ILP model as:\begin{align*}&\forall t_{1}, t_{2} \in \mathcal{V}_{\mathrm {app}}~:~\forall n_{1}, n_{2} \in \mathcal{V}_{\mathrm {net}}~:~Y\left [{t_{1}, t_{2}, n_{1}, n_{2}}\right] \le X\left [{t_{1}, n_{1}}\right] \\ \tag{3}\\&\forall t_{1}, t_{2} \in \mathcal{V}_{\mathrm {app}}~:~\forall n_{1}, n_{2} \in \mathcal{V}_{\mathrm {net}}~:~Y\left [{t_{1}, t_{2}, n_{1}, n_{2}}\right] \le X\left [{t_{2}, n_{2}}\right] \\ \tag{4}\\&\forall t_{1}, t_{2} \in \mathcal{V}_{\mathrm {app}}~:~\forall n_{1}, n_{2} \in \mathcal{V}_{\mathrm {net}}~:~Y\left [{t_{1}, t_{2}, n_{1}, n_{2}}\right] \\&\;\ge X\left [{t_{1}, n_{1}}\right] + X\left [{t_{2}, n_{2}}\right] - 1 \tag{5}\\&\forall t \in \mathcal{V}_{\mathrm {app}}~: \sum _{n \in \mathcal{V}_{\mathrm {net}}} X\left [{t,n}\right] = 1 \tag{6}\\&\forall n \in \mathcal{V}_{\mathrm {net}}~: \sum _{t \in \mathcal{V}_{\mathrm {app}}} X\left [{t,n}\right] \cdot R_{t} \le R_{n} \tag{7}\\&\sum _{t \in \mathcal{V}_{\mathrm {app}}}\sum _{n \in \mathcal{V}_{\mathrm {net}}} C_{n} \cdot \left ({S_{t} / P_{n}}\right) \cdot X\left [{t,n}\right] \le E_{d} \tag{8}\\&\sum _{\left ({t_{1}, t_{2}}\right) \in \mathcal{E}_{\mathrm {app}}} \sum _{n_{1}, n_{2} \in \mathcal{V}_{\mathrm {net}}} O_{n_{1}} \cdot P_{n_{1},n_{2}} \cdot Y\left [{t_{1}, t_{2}, n_{1}, n_{2}}\right] \le E_{n} \\ \tag{9}\\&E_{n} + E_{d} \le E_{t} \tag{10}\end{align*} View SourceRight-click on figure for MathML and additional features.where equations (3) to (5) describe the linearization of the network matrix {Y} . Equations (6) and (7) are for ensuring that components are allocated only once and that resources are not exceeded, respectively. Equations (8) and (9) calculate network and device energy as described above. Finally, we calculate the total energy use of the assignment by adding both energies in (10). The objective of the optimization is the minimization of the total used energy.

3) A Linear Heuristic for Energy-Optimized Allocation:

The presented QAP is NP-hard and thus compute intensive. The culprit for this is the network cost calculation and the linearization of {Y} resulting in a large number of constraints. By removing the {Y} matrix and the associated constraints, we create a linear problem that can be evaluated effectively by the simplex method [72]. The approach approximates the energy required for sending a packet of data by taking the average of a node’s links. We introduce the parameter \hat {T}_{n} = \frac {1}{|\mathrm {outgoing}(n)|} \sum _{e \in \mathrm {outgoing}(n)} T_{e} that describes the average transmission cost of a node’s links.\begin{align*} \sum _{t \in \mathcal{V}_{\mathrm {app}}}\sum _{n \in \mathcal{V}_{\mathrm {net}}} C_{n} \cdot \left ({S_{t} / P_{n}}\right) \cdot X\left [{t,n}\right] + O_{t} \cdot \hat {T}_{n} \cdot X\left [{t,n}\right] \le E_{t} \\\tag{11}\end{align*} View SourceRight-click on figure for MathML and additional features.

The complete model reuses constraints in equations (6) and (7) with the constraint (11). By transforming the QAP into a linear problem, we greatly increase the speed of finding a solution, and make the optimization feasible for on-line usage. The drawback is that by approximating the network energy the solution is no longer optimal, as it will be shown in the results.

4) Evaluation of Allocation Algorithm:

We implemented the model using the PuLP3 linear programming library. The evaluation was done by generating a random network and a random application, and letting the solver find the optimal allocation.

The network configuration is generated with a variety of node configurations and capabilities, reflecting a heterogeneous computation and communication infrastructure that one could find in an industrial manufacturing plant (e.g., using Siemens range of industrial computers [73]). In the evaluated configuration, 60% of the nodes were generated as wired nodes, and the remaining 40% are wireless nodes. Nodes are connected to each other with a certain probability. That probability is 0.8 for wired-wired connections, 0.5 for wireless-wireless connections and 0.4 for wireless-wired connections. Wired connections use 0.2 units of energy, while wireless connections use 0.8 units of energy, which is similar to the power consumption of an Ethernet module [74] as compared to a WiFi module [75]. Nodes have a varying amount of memory resources uniformly distributed between a lower bound of 1 and an upper bound of 8 resource units. Nodes also have a varying processing speed between 1 and 3 speedup, roughly comparing to the Intel processor family i3, i5, and i7. Finally, nodes can use from 0.5 to 1.5 units of energy for a single unit of computation.

For the application, two classes with a certain number of components are generated, a “wide” and a “long” application. In a “wide” application, two components are designated the “start” and “end” components, and every other component needs input from the start node and sends output to the end node. In a long application, components are linked serially. Figure 5 shows two example applications. This method for generating recips is similar to [52]. Each application component has resource requirements randomly distributed between 1 and 8, an output factor randomly distributed between 0.5 and 1.5, and a computation size of 1 or 2.

Fig. 5. - “Long” (left) and “wide” (right) composed IoT applications.
Fig. 5.

“Long” (left) and “wide” (right) composed IoT applications.

As expected, the optimal allocation algorithm scales very badly (non-polynomially). Figure 6 shows the runtime of the algorithm for varying problem sizes. The shaded area shows the variance with the non-shown parameter (different application sizes for the network node graph, differing network sizes for the application node graph). The time needed for finding the optimal allocation grows unwieldy very quickly.

Fig. 6. - Runtime for optimal allocation.
Fig. 6.

Runtime for optimal allocation.

In comparison, the heuristic presented in equation (11) finds a solution much more quickly. Figure 7 shows the runtime of the heuristic for different network and application sizes. For the slowest case for the full allocation, the heuristic takes 8 seconds of CPU time, while the solver consumes 864104 seconds (about 10 days) of CPU time for finding the optimal allocation. The allocation evaluation was executed on an Amazon EC2 m4.10xlarge machine with 40 virtual cores and 160 GiB of memory. Peak memory use was 51 GiB. However, the heuristic loses about 30% of energy efficiency over the optimal algorithm. Specifically, 50% of the solutions achieve between 60% and 80% of the energy efficiency of the optimal case.

Fig. 7. - Heuristic runtime.
Fig. 7.

Heuristic runtime.

C. Energy-Efficient Blockchain Over Wireless Networks

The last enabling technology is DLT, which provides a tamper-proof ledger distributed for the nodes of the iIoTe. The energy and latency cost of implementing DLT over wireless links and with constrained IoT devices is oftentimes overlooked. In general, the latency and energy budgets are highly impacted by the wireless access protocol.

1) System Model:

As introduced in [65], there are two architectural choices for IoT DLT. The conventional one is to have IoT devices that receive complete blocks from the Blockchain to which they are connected, and locally verify the validity of the Proof-of-Work (PoW) solution and the contained transactions. This configuration provides the maximum possible level of security. However, this requires high storage, energy and computation resources, since the node needs to store the complete Blockhain and to check all transactions. This makes it infeasible for many IoT applications. Instead, we consider the second option where the IoT device is a light node that receives only the headers from the Blockchain nodes. These headers contain sufficient information for the Proof-of-Inclusion (PoI), i.e., to prove the inclusion of a transaction in the block without the need to download the entire block body. Furthermore, the device defines a list of (few) events of interest, such as modifications to the state of a smart contract or transactions from/to a particular address.

The communication model for this lightweight version is as follows. The IoT devices transmit data to the Blockchain using the edge infrastructure. Specifically, a NB-IoT cell with the base station located in its center is considered, with {N} devices uniformly distributed within the area. The base station, which is designated as a full DLT node connected to the Blockchain, is the DLT-anchor for the IoT devices. For the radio resource management, we adapt the queueing model of [19] to our scenario, where the uplink and downlink radio resources are modeled as two servers that visit and serve their respective inter-dependent traffic queues.

2) End-2-End (E2E) Latency:

NB-IoT provides three coverage classes namely normal, extreme, and robust class for serving limited-resource devices and suffering various pathloss levels [76]. Minimum latency and throughput requirements need to be maintained in the extreme coverage class, whereas enhanced performance is ensured in the extended or normal coverage class. Without loss of generality, we consider only normal and extreme coverage class, i.e., the number of classes {C}\,\,= 2. A class is assigned to a device based on the estimated path loss, with the base station informing the assigned device of the dedicated path between them. Class {j} and \forall _{j} are supported by the replicas number c_{j} , which are transmitted based on data and the control packet [19]. Particularly, the reserved period of class {j} is denoted by c_{j}\tau . The unit length \tau of the for the class of coverage is denoted by c_{j} = 1 . t_{j} is the average time interval between two consecutive scheduling of of class {j} , whereas the average time duration between two consecutive occurrences is denoted by {d} .

The total E2E latency includes two parts: 1) the latency L_{UeD} of transmissions of uplink and downlink between IoT devices and the base station (the wireless communication latency) and 2) and the latency L_{DLT} due to the DLT verification process. i.e., L = L_{UeD} + L_{DLT} .

The wireless communication latency of NB-IoT uplink and downlink can be formulated as:\begin{align*} L_{UeD}=&L^{u} + L^{d} = L^{u}_{sync} + L^{u}_{rr} + L^{u}_{tx} + L^{d}_{sync} \\&+\,\,L^{d}_{rr} + L^{d}_{rx},\tag{12}\end{align*} View SourceRight-click on figure for MathML and additional features. where L^{u}_{sync} , L^{u}_{rr} , L^{u}_{tx} , L^{d}_{synch} , L^{d}_{rr} , and L^{d}_{rx} are energy consumption of synchronization, resource reservation, and data transmission of uplink and downlink, respectively. L^{u}_{sync} has been defined in [77] with the values of 0.33{s} . L_{rr} is given as:\begin{equation*} L_{rr} = \sum _{l=1}^{N_{r_{max}}} \left ({1-P_{rr}}\right)^{l-1} P_{rr}l\left ({L_{ra} + L_{rar}}\right),\tag{13}\end{equation*} View SourceRight-click on figure for MathML and additional features. in which, N_{r_{max}} is the maximum number of attempts, P_{rr} is the probability of successful resource reservation in an attempt, L_{ra} = 0.5t + \tau , is the expected latency in sending an RA control message, \tau is the unit length and equal to the period for the coverage class 1 which is varied from 40ms to 2.56s [77], and L_{rar}=0.5d + 0.5 \mathcal{Q} fu +u , is the expected latency in receiving the RAR message, where \mathcal{Q} are requests waiting to be served.

In the following, we provide a simple technique based on drift approximation [78] to calculate P_{rr} recursively. Therefore, we treat the mean of the random variables involved in the process as constants. Besides, we assume that sufficient resources are available in the so that failures only occur due to collisions in the or to link outages.

Let \lambda ^{a}=\lambda ^{u}+\lambda ^{d} be the arrival rate of access requests per period and \lambda ^{a}(l) be the mean number of devices participating in the contention with their {l} -th attempt. Note that in the steady state \lambda ^{a}(l) remains constant for all periods. Next, let \lambda ^{a}_{tot}=\sum _{l=1}^{N_{r_{max}}} \lambda ^{a}(l) . The collision probability in the can be calculated using the drift approximation for a given value of \lambda ^{a}_{tot} and for a given number of available preambles {K} as:\begin{equation*} P_{\mathrm {collision}}\left ({\lambda ^{a}_{tot}}\right)=1-\left ({1-\frac {1}{K}}\right)^{\lambda ^{a}_{tot}-1}\approx 1-e^{-\frac {\lambda ^{a}_{tot}}{K}}.\tag{14}\end{equation*} View SourceRight-click on figure for MathML and additional features. From there, we approximate the probability of resource reservation as a function of \lambda ^{a}_{tot} as P_{rr}(\lambda ^{a}_{tot})\approx p_{d}\, e^{-\frac {\lambda ^{a}_{tot}}{K}} . This allows us to define \lambda ^{a}_{tot} as:\begin{equation*} \lambda ^{a}_{tot}= \lambda ^{a} +\left ({1-P_{rr}\left ({\lambda ^{a}_{tot}}\right)}\right)\sum _{l=2}^{N_{r_{max}}} \lambda ^{a}\left ({l}\right), \tag{15}\end{equation*} View SourceRight-click on figure for MathML and additional features. since \lambda ^{a}(l)=(1-P_{rr}(\lambda ^{a}_{tot}))\lambda ^{a}(l-1) for {l}~\geq 2 and \lambda ^{a}(1) =\lambda ^{a} . Finally, from the initial conditions \lambda ^{a}(l)=0 for {l}~\geq 2, the values of \lambda ^{a}(l) and \lambda ^{a}_{tot} can be calculated recursively by: 1) applying (15); 2) calculating P_{rr}(\lambda ^{a}_{tot}) for the new value of \lambda ^{a}_{tot} ; and 3) updating the values of \lambda ^{a}(l) . This process is repeated until the values of the variables converge to a constant value. The final value of P_{rr}(\lambda ^{a}_{tot}) is simply denoted as P_{rr} and used throughout the rest of the paper.

Assuming that the transmission time for the uplink transactions follows a general distribution with the first two moments l_{1}, l_{2} , the first two moments of the distribution of the packet transmission time are s_{1} = (f_{1}l_{1})/(\mathcal{R}w) , and s_{2} = (f_{1}l_{2})/(\mathcal{R}^{2}w^{2}) . Applying the results from [79], considering L_{tx} as a function of scheduling of NPUSCH, we have:\begin{equation*} L_{tx} = \frac {f\lambda ^{u}s_{1}s_{2}}{2s_{1}\left ({1-fGs_{1}}\right)} + \frac {f\lambda ^{u}s_{1}^{2}}{2\left ({1-f\lambda ^{u}s_{1}}\right)} + \frac {l_{1}}{ \mathcal{R}^{u}w},\tag{16}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \mathcal{R}^{u} is the average uplink transmission rate, \lambda ^{u} = \lambda _{s} +\lambda _{b} , and f(\lambda _{s} + \lambda _{b})s_{1} is the mean batch-size. The latency of data reception is defined as:\begin{equation*} L_{rx} = \frac {0.5Fh_{1}t^{-1}}{h_{1}\left ({1-Fht^{-1}}\right)} + \frac {Fh_{1}}{1-Fht^{-1}} + \frac {m_{2}}{ \mathcal{R}^{d}y},\tag{17}\end{equation*} View SourceRight-click on figure for MathML and additional features. in which, h_{1}=fm_{1}(\mathcal{R}^{d}y)^{-1} , h_{2} = fm_{2} ((\mathcal{R}^{d})^{2}y^{2})^{-1} are two moments of distribution of the packet transmission time, assuming that the packet length follows a general distribution with moments m_{1} , m_{2} , F=f\lambda ^{d} t , \mathcal{R}^{d} is downlink data transmission rate.

Next, we calculate the second latency component, corresponding to the DLT verification process. Consider a DLT network that includes {M} miners. These miners start their Proof-of-Work (PoW) computation at the same time and keep executing the PoW process until one of the miners completes the computational task by finding the desired hash value [56]. When a miner executes the computational task for the POW of current block, the time period required to complete this PoW can be formulated as an exponential random variable {W} whose distribution is f_{W}(w) = \lambda _{c} e^{-\lambda _{c} w} , in which \lambda _{c}= \lambda _{0} P_{c} represents the computing speed of a miner, P_{c} is power consumption for computation of a miner, and \lambda _{0} is a constant scaling factor. Once a miner completes its PoW, it will broadcast messages to other miners, so that other miners can stop their PoW and synchronize the new block.\begin{equation*} L_{tM} = L_{newB} + L_{getB} + L_{transB}\tag{18}\end{equation*} View SourceRight-click on figure for MathML and additional features.

In (18), L_{newB} , L_{getB} , and L_{transB} , are latencies of sending hash of new mined block, requesting new block from neighboring nodes, and new block transmission, respectively. L_{newB} and L_{transB} are computed using uplink transmission, while L_{getB} is computed based on downlink transmission as described in previous section.

For the PoW computation, a miner i* , first finds out the desired PoW hash value, i*= \min _{i \in M } w_{i} . The fastest PoW computation among miners is W_{i*} , the complementary cumulative probability distribution of W_{i*} could be computed as Pr(W_{i*} > x)= Pr(\min _{i \in M} (W_{i}) > x) = \prod _{i=1}^{H} Pr(W_{i} > x) = (1 - Pr(W < x))^{M} . Hence, the average computational latency of miner i* is described as:\begin{align*} L_{W_{i*}}=&\int _{0}^{\infty } (1 - Pr(W \leq x))^{M} DD x \\=&\int _{0}^{\infty } e^{-\lambda _{c}Mx} DD x = \frac {1}{\lambda _{c}M}\tag{19}\end{align*} View SourceRight-click on figure for MathML and additional features.

The total latency required from DLT verification process is L_{DLT} = L_{tm} + L_{W_{i*}} .

3) Energy Consumption:

Analogously to the latency, the energy consumption is divided in the wireless communication (uplink/downlink) and the DLT verification.

The total energy consumption in the wireless communication is written as follows:\begin{align*} E_{UD}=&E^{u} + E^{d} \\=&E^{u}_{sync} + E^{u}_{rr} + E^{u}_{tx} +E^{u}_{s} + E^{d}_{sync} + E^{d}_{rr} \\&+\,\,E^{d}_{rx} + E^{d}_{s},\tag{20}\end{align*} View SourceRight-click on figure for MathML and additional features. in which, E^{u}_{sync} , E^{u}_{rr} , E^{u}_{rr} , E^{d}_{sync} , E^{d}_{rr} , and E^{d}_{rx} are energy consumption of synchronization, resource reservation, and data transmission of uplink and downlink, respectively. Each of them are formally defined as follows:\begin{align*} E_{sync}=&P_{l} \cdot L_{sync} \tag{21}\\ E_{rar}=&P_{l} \cdot L_{rar} \tag{22}\\ E_{rr}=&\sum _{l=1}^{N_{max}} \left ({1-P_{rr}}\right)^{l-1} \cdot P_{rr} \cdot \left ({E_{ra} + E_{rar}}\right) \tag{23}\\ E_{ra}=&\left ({L_{ra} - \tau }\right) \cdot P_{I} + \tau \cdot \left ({P_{c} + P_{e} P_{t}}\right) \tag{24}\\ E_{tx}=&\left ({L_{tx} - \frac {l_{a}}{ \mathcal{R}^{u}w}}\right) \cdot P_{I} + \left ({P_{c} + P_{e} P_{t}}\right)\frac {l_{a}}{ \mathcal{R}^{u}w} \quad \tag{25}\\ E_{rx}=&\left ({L_{rx} - \frac {m_{1}}{ \mathcal{R}^{d}y}}\right) \cdot P_{I} + P_{l} \frac {m_{1}}{ \mathcal{R}^{d}y}\tag{26}\end{align*} View SourceRight-click on figure for MathML and additional features. where P_{e} , P_{I} , P_{c} , P_{l} , and P_{t} are the power amplifier efficiency, idle power consumption, circuit power consumption of transmission, listening power consumption, and transmit power consumption, respectively.

Following the PoW described above, the average energy consumption of DLT to finish a single PoW round is:\begin{equation*} E_{DLT} = P_{c} L_{W_{i*}} + P_{t} L_{tm}.\tag{27}\end{equation*} View SourceRight-click on figure for MathML and additional features.

4) Results:

The performance of DLT-based NB-IoT system is shown in Fig. 8. The experiments demonstrate the total latency Fig. 8(a) and energy efficient Fig. 8(b) of a DLT-based NB-IoT system, respectively. In Fig. 8(a), the E2E latency is defined as the time elapsed from the generation of a transaction at the NB-IoT device until its verification. This includes the latency at the NB-IoT radio link and at the DLT, which comprises the execution time of the smart contract and transaction verification. We observe that increasing {t} and {d} values at the first increases lifetime and decrease latency due to more resources for NPUSCH and NPDSCH, but after certain point increasing {t} and {d} decreases the lifetime by increasing the expected time for resource reservation. In comparison with the standard NB-IoT system in [55], [76], the DLT-based system introduces a slight latency because of addition time of consensus process and transaction verification. This is a latency and security trade-off between standard NB-IoT and DLT-based systems.

Fig. 8. - Latency and Energy Consumption.
Fig. 8.

Latency and Energy Consumption.

SECTION IV.

Towards Energy-Efficient Intelligent IoT Environments

Having the energy-performance characterization for each of the enabling technologies (Section III), we describe next how they interact with each other in iIoTe. For this, we consider the scenario in Figure 9, where a given learning application is considered. We split the task into sub-tasks such as data processing, data training and model aggregation and distribute them in a decentralized way. Each of these sub-tasks (Section III-A) constitute the application components (C_{1}, C_{2},\ldots, ) that can be run at the available edge nodes. The optimal allocation of sub-tasks to edge nodes is determined using the ILP-based algorithms presented in Section III-B. The required trustworthiness (i.e., assuring security, privacy, immutability and transparency) between sub-tasks is provided through DLT (Section III-C). The heterogeneity of devices, capabilities and tasks is exploited accordingly: The edge servers with high computation capability are selected to operate the DLT activities, e.g., block mining, and aggregate the ML models (the head workers if the learning paradigms in Section III-A are applied), while more constrained edge devices or mobile devices are setup as DLT light clients that can participate in local training (the tail workers) and consensus. The involved network components can communicate via wireless long-range communication NB-IoT channels.

Fig. 9. - Integration of enabling technologies.
Fig. 9.

Integration of enabling technologies.

In detail, the communication workflow of the proposed scheme can be summarized as follows:

  • Step 1: The data processing can be completed in different edge devices with limited resources. The selected data from the data provider is pre-processed and structured. This process includes both a data engineering and feature engineering sub-process, in which data engineering converts the raw data into prepared data and feature engineering tunes the prepared data to create features expected by ML models.

  • Step 2: Then, the edge nodes or IoT devices, which are responsible for training, compute the local model based on its own private data and then publish the local model to its associated edge server via, e.g., NB-IoT by registering with active smart contracts to upload their result securely until the results are incorporated in the final aggregation and generation of DLT transactions.

  • Step 3: Next, the edge servers with ML aggregation responsibility gather transactions and arrange them in blocks following the Merkle tree. The structure of a DLT involves the hash of the previous block, a timestamp, ‘nonce’ and the structure of hash tree. These edge servers with high computational capacity join in the DLT mining process to verify the created blocks and operate consensus in the edge network. After completing the mining process, the verified blocks are added to the ledger, and synchronized among the nodes. The local models are published in the distributed ledger. Hence, the powerful edge servers can compute the global model directly based on the aggregation rules defined in smart contracts.

The advantages of this integration are two-folds. First, by distributing the tasks to different edge nodes with different computing capacities, the IoT devices or edge nodes with limited resources can save significant amount of energy required for training or mining and they can achieve lower latency. Second, by leveraging the DLT, the updates of ML models are securely formed in encrypted transactions and hashed blocks, which significantly enhances the security and privacy of distributed learning in the edge networks. The DLT provides trust, transparency and immutability baseline for distributed learning to guarantee the security and privacy of data and ML models, and naturally addresses the single-point of failure problem of the current standard FL approach that relies on a centralized server to aggregate the models. Although the integration of enabling technologies introduces advantages, it also has some drawbacks, for example, the time required of DLT mining will increase the total latency of the system. This is a trade-off between trust and communication latency which we discussed in [55], [65].

SECTION V.

Conclusion and Future Work

In this paper, we address the evolution of next-generation of IoT networks towards the edge, driven by the introduced intelligent IoT environments. We use the iIoTe as the basic building block to characterize the tradeoff energy-performance of the three key enabling technologies, learning, edge computing and distributed ledger. Edge intelligence must rely on distributed paradigms such as FL, and we have shown how exploiting spatial and temporal sparsity and quantization can significantly improve the performance and reduce the energy consumption. Moreover, we have discussed the distribution of the FL model aggregator and the rest of sub-tasks to make the framework more robust against failures. For edge computing, the optimal allocation of the application components to network resources is important to efficiently use the available infrastructure and optimize its energy consumption. DLT is a flexible solution for trustworthiness in these environments, but the energy and latency cost of implementing DLT over wireless and constrained devices is oftentimes overlooked. We have analyzed these parameters using NB-IoT as the baseline wireless technology.

In the integration of these technologies in iIoTe, we have shown the interactions among them, which provides the basis towards an energy model and evaluation that encompasses the contribution of each element. For instance, the learning and computation models can be easily broaden to consider the allocation of the different sub-tasks of the learning application in a representative topology, with each learning action and resource allocation playing the role of an action to be recorded in the DLT. Future work also includes extending the proposed solutions to dynamic environments where agents move and edge nodes are not always available. This is already supported by the presented dynamic head/tail learning paradigms but the integration of a dynamic resource allocation and DLT framework is pending. Another necessary direction is to investigate the joint optimization of the computing and communication resources from the energy perspective.

References

References is not available for this document.