Introduction
A. Towards the Edge
During the last decade, the need for connecting billions of Internet of Things (IoT) devices has driven a significant part of the design of computing and communication networks. The number of use cases is countless, ranging from smart home to smart city, industrial automation or smart farming. Many of the applications involve huge amounts of data, and the need for fast, trustworthy and reliable processing of this data is oftentimes infeasible with a cloud-centric paradigm [1], [2]. Moreover, typical hierarchical setups of IoT cloud platforms hinder use cases with dynamically changing context due to lacking self-awareness of the individual subsystems and the overall system they usher. Alternatively, the architectures are evolving towards edge solutions that place compute, networking, and storage in close proximity to the devices. At the same time, the introduction of machine-driven intelligence has led to the term edge intelligence, referring to the design of distributed IoT systems with latency-sensitive learning capabilities [3].
Although the edge-centric approach solves the fundamental limitations in terms of latency and dynamism, it also induces new challenges to the edge system: 1) the system has to deal with complex IoT applications which include functions for sensing, acting, reasoning and control, to be collaboratively run in heterogeneous devices, such as edge computers and resource-constrained devices, and generating data from a huge number of data sources; 2) trustworthiness is a big concern for edge and IoT systems where devices communicate with other devices belonging to potentially many different parties, without any pre-established trust relationship among them; and 3) all those functionalities are increasingly based on a resource-limited wireless infrastructure that introduces latency and packet losses in dynamically changing channels.
Another huge concern for the exponential growth of IoT is its scalability and contribution to the carbon footprint. On the one hand, IoT is key in deploying a huge amount of applications that will reduce the emissions of numerous sectors and industries (e.g., smart farming or energy) [4]. On the other hand, although many of these devices are low-power, the total energy consumption of the infrastructure that support such systems does have a contribution to the digital carbon footprint and cannot be overlooked [5], [6].
B. Intelligent IoT Environments
We coin the term Intelligent IoT Environment (iIoTe) to refer to autonomous IoT applications endowed with intelligence based on an efficient and reliable IoT/edge- (computation) and network- (communication) infrastructure that dynamically adapts to changes in the environment and with built-in and assured trust. Besides the wireless (and wired) networking to interconnect all IoT devices and infrastructure, there are other three key (and power-hungry) technologies that enable iIoTe. The first one is Machine Learning (ML) and Artificial Intelligence (AI), and therefore we talk about intelligent IoT environments, comprising heterogeneous devices that can collaboratively execute autonomous IoT applications. Given the distributed nature of the system, distributed ML/ AI solutions are better suited for multi-node (multi-agent) learning. Edge computing is another defining technology that provides the computation side of the infrastructure and allocates computing resources for complex IoT applications that need to be distributed over multiple, connected IoT devices (e.g., machines and Automated Guided Vehicle (AGV)s). The third pillar is the Distributed Ledger Technology (DLT): rather than traditional security mechanisms, DLT has been identified as the most flexible solution for trustworthiness in a fully decentralized and heterogeneous scenario. Combined with smart contracts, it is possible for the system to autonomously control the transactions from parties without the need for human intervention. All these ingredients are necessary for a fully functional iIoTe, but they have inevitably a significant contribution to the total energy footprint. Our goal is to understand the role of each technology in the performance and energy consumption of an iIoTe.
C. Example: A Manufacturing Plant
A representative use case for iIoTe is a manufacturing plant like shown in Fig. 1, with autonomous collaboration between industrial robot arms, machinery and AGVs. This relies on real-time data analysis and adaptability and intelligence in the manufacturing process, which is only feasible with the edge paradigm. The wireless infrastructure interconnects all the machines and robots to the edge network and enables reliable and safe operation. In the figure, the following scene is depicted: a customer (the end-user) of a shared manufacturing plant orders a product by specifying a manufacturing goal (step 1). In step (2), the needed machine orchestration and associated process plan is determined to manufacture the desired product taking into account the available computation and communication resources. The event-based process planner at the edge node is responsible for observing the manufacturing process and reacting when the health state of a concerned machine changes. For example, by re-scheduling a given task in a non-responding machine. In step (3), the manufacturing process data is sent to the involved machines, which can include, e.g., mobile robots or an AGV to transport the work-pieces between production points, robotic arms, laser engravers, assembly stations, etc. Let us assume that the task requires a robot to pick up a work-piece and place it in different machines for its processing. As these machines may be operated by the plant owner or a third-party operator, contractual arrangements need to be set up, for which a distributed ledger is used. The ledger registers the details of each task for future accountability. In step (4), the local AI on board of the different end devices comes into play. For example, in the case of the robot as an end device, its AI decides how to pick up a work-piece and place it in the next machine. In case the local AI of the robot cannot complete its task (e.g., because it has not been trained for a similar situation yet), a human takes over remote control (this can be, e.g., a plant operator). After the human intervention, the local AI can be re-trained based on the data captured from the human input. This scene captures the role and interaction of the three technologies mentioned above: edge computing, ML/AI and DLTs/smart contracts. Similar examples can be defined in other domains, such as agriculture (e.g., autonomously interacting harvesting machines), healthcare (e.g., remote patient monitoring and interventions) and energy (e.g., wind plant monitoring and maintenance).
D. Contributions and Outline
In this paper, we analyze the key technologies for the next generation of IoT systems, and the tradeoffs between performance and energy consumption. We notice that characterizing the energy efficiency of these complex systems is a daunting task. The conventional approach has been to characterize every single device or link. Nevertheless, the energy expenditure of an IoT device will strongly depend on the context in which it is put, in terms of, e.g., goal of the communication or traffic behavior. Therefore, we go beyond the conventional single-device approach and use the iIoTe as the basic building block in the energy budget. Contrary to the single device, the iIoTe is able to capture the complex interactions among devices for each of the technologies. The total energy footprint is not just a simple sum of an average per-link or per-transaction consumption of an isolated device, and scaling the number of iIoTe to a large number of instances will give a more accurate picture of the overall energy consumption.
The rest of the paper is organized as follows. In Section II we provide the state-of-the-art of the enabling technologies. Section III analyzes the performance and energy consumption of each enabling technology, and Section IV provides the vision for integrating the enabling technologies in energy-efficient iIoTe. Concluding remarks and a roadmap to address the open research challenges are given in Section V.
Background and Related Work
A. Edge Wireless Communications
Edge computing enables the processing of the received data closer to the sensor that generated them. This means a full re-design of the communication infrastructure that must implement additional functionality at the cellular base stations or other edge nodes. The design and performance of communication networks for edge computing has been widely studied in the last years, and an overview can be found in [7] and [8]. One example is the term Mobile Edge Computing (MEC), adopted in 5G to refer to the deployment of cloud servers in the base stations to enable low latency, proximity, high bandwidth, real time radio network information and location awareness. Specifically, the concept was defined in late 2014 by the European Telecommunications Standards Institute (ETSI): As a complement of the C-RAN architecture, MEC aims to unite the telecommunication and IT cloud services to provide the cloud-computing capabilities within radio access networks in the close vicinity of mobile users [9]. One of the areas of more research has been the network virtualization and slicing with the MEC paradigm [10]. In the Radio Access Network, several authors have looked at the potential of edge computing to support Ultra-Reliable Low-Latency Communication (URLLC) [11]–[13]. Another research area is the use of machine learning, particularly deep learning techniques, to unleash the full potential of IoT edge computing and enable a wider range of application scenarios [14], [15]. However, most previous works address the communications separately. Even though several papers address the joint communication and computation resource management [16], they represent only the first step towards a holistic design of iIoTe and its defining technologies, as well as the integration with the communication infrastructure.
To optimize the energy efficiency of iIoTe, it is interesting to choose a communication technology that ensures low power consumption and massive connections of devices. In this regard, 3GPP introduced narrowband Internet of Things (NB-IoT), a cellular technology to utilize limited licensed spectrum of existing mobile networks to handle a limited amount of bi-directional IoT traffic. Although it uses LTE bands or guard-bands, it is usually classified as a 5G technology. It can achieve up to 250 kbps peak data rate over 180 kHz bandwidth on a LTE band or guard-band [17], [18].
Compared to other low-power technologies, NB-IoT is interesting for IoT applications with more frequent communications. This is the case for the ones considered in iIoTe, where the intelligent end devices share the updated models frequently and must record new transactions in the ledger. At the same time, NB-IoT keeps the advantages of Low-Power Wide Area (LPWA) technologies: low power consumption and simplicity. Throughout the rest of the paper, we use NB-IoT as a representative wireless technology for our analyses of iIoTe. Other wireless technologies will follow similar access procedures and energy-performance trade-offs.
For an analysis of the energy consumption and battery lifetime of NB-IoT under different configurations we refer the reader to [19]. A key point for this analysis is the study of the communication exchange during the access procedure: The devices that attempt to communicate through a base station must first complete a Random Access (RA) procedure to transit from Radio Resource Control (RRC) idle mode to RRC connected mode. Only in RRC connected mode data can be transmitted in the uplink through the Physical Uplink Shared Channel (PUSCH) or in the downlink through the Physical Downlink Shared Channel (PDSCH). The standard 3GPP RA procedure consists of four message exchanges: preamble (Msg1), uplink grant (Msg2), connection request (Msg3), and contention resolution (Msg4) (see Figure 2 where the example of recording some data, e.g., a DLT transaction is depicted). Out of these, Msg3 and Msg4 are scheduled transmissions where no contention takes place.
The NB-IoT preamble are orthogonal resources transmitted in the Narrowband Physical Random Access Channel (PRACH) (NPRACH) and used to perform the RA request (Msg1). A preamble is defined by a unique single-tone and pseudo-random hopping sequence. The NPRACH is scheduled to occur periodically in specific subframes; these are reserved for the RA requests and are commonly known as Random Access Opportunitys (RAOs). To initiate the RA procedure, the devices select the initial subcarrier randomly, generate the hopping sequence, and transmit it at the next available RAO. The orthogonality of preambles implies that multiple devices can access the base station in the same RAO if they select different preambles. Next, the grants are transmitted to the devices through the Narrowband Physical Downlink Control Channel (PDCCH) (NPDCCH) within a predefined period known as the RA response window. However, the number of preambles is finite and collisions can happen. In case of collision, each collided device may retransmit a preamble after a randomly selected backoff time.
The specification provides sufficient flexibility in the configuration of the RA process, which makes it feasible to adjust the protocol and find the right balance between reliability, latency, and energy consumption for a given application. Specifically, the network configures the preamble format and the maximum number of preamble transmission depending on the cell size, and this has an impact on the preamble and the total duration [20]. Increasing the number of preamble transmissions reduces the erasure probability, but at the cost of higher energy consumption and larger latency. The same energy-reliability-latency tradeoff applies to other messages, including the RA response. Moreover, scheduling the NPRACH and NPDCCH consumes resources that would otherwise be used for data transmission. Therefore, each implementation must find an adequate balance between the amount of resources dedicated to NPRACH, NPDCCH, PUSCH, and PDSCH.1
B. Distributed Learning Over Wireless Networks
Implementing intelligent IoT systems with distributed ML/AI over wireless networks (e.g., NB-IoT) needs to consider the impact of the communication network (latency and reliability under communication overhead and channel dynamics) and on-device constraints (access to data, energy, memory, compute, and privacy, etc.). Obtaining high-quality trained models without sharing raw data is of utmost importance, and redounds to the trustworthiness of the system. In this view, Federated Learning (FL) has received a groundswell interest in both academia and industry, whose underlying principle is to train a ML model by exchanging model parameters (e.g., Neural Network (NN) weights and/or gradients) among edge devices under the orchestration of a federation server and without revealing raw data [21]. Therein, devices periodically upload their model parameters after their local training to a parameter server, which in return does model averaging and broadcasting the resultant global model to all devices. FL has been proposed by Google for its predictive keyboards [22] and later on adopted in different use cases in the areas of intelligent transportation, healthcare and industrial automation, and many others [23], [24]. While FL is designed for training over homogeneous agents with a common objective, recent studies have extended the focus towards personalization (i.e., multi-task learning) [25], training over dynamic topologies [26] and robustness guarantees [27], [28]. In terms of improving data privacy against malicious attackers, various privacy-preserving methods including injecting fine-tuned noise into model parameters via a differential privacy mechanism [29]–[32] and mixing model parameters over the air via analog transmissions [33], [34] have been recently investigated. Despite of the advancements in FL design, one main drawback in the design of FL is that its communication overhead is proportional to the number of model parameters calling for the design of communication-efficient FL. In an edge setup with limited resources in communication and computation, this introduces training stragglers degrading the overall training performance. In this view, client scheduling [35]–[37] and computation offloading [12], [38], [39] with the focus on guaranteeing target training/inference accuracy have been identified as a promising research direction.
With client scheduling, the number of communication links are reduced (known as link sparcification) and thus, the communication bandwidth and energy consumption of distributed learning can be significantly decreased. Additional temporal link sparsity can be introduced by enforcing model sharing policies that account model changes and/or importance within consecutive training iterations such as the Lazy Aggregated Gradient Descent (LAG) method [40]. Sparsity can be further exploited by adopting sparse network topologies, which rely on communications within a limited neighborhood in the absence of a central coordinator/helper. While such sparsification improves energy and communication efficiencies, it could yield higher learning convergence speed as well as lower training and inference accuracy, in which sparsity needs to be optimized in terms of the trade-off between communication cost and convergence speed. In this view, several sparse-topology-based distributed learning methods including decentralized Gradient Descent (GD), dual averaging [41], learning over graphs [42], [43] and GADMM algorithms [44], [45] have been investigated.
C. Optimizing IoT Application Deployments in IoT Environments
IoT applications typically consist of multiple components. For instance, an IoT application could comprise components for secure data acquisition (e.g., based on Blockchain), data pre-processing, feeding the data into a neural network (or even through multiple ones) before it acts upon the outcome of the ML inference, etc. In many cases, such composed IoT applications need to be distributed over multiple, connected intelligent IoT devices. An important aspect is then to optimize this allocation of application components to devices. The result of the allocation is an assignment of components to devices, that fulfills the constraints, and optimizes the performance of the system in some metric. This metric could, for example, maximize the responsiveness of the application or minimize the overall energy consumption, where the latter is reasonable in battery-run wireless systems. An overview of existing allocation approaches is given in [46].
Previous work [47] used Constraint Programming to describe an approach for the efficient distribution of actors to IoT devices. The approach resembles the Quadratic Assignment Problem (QAP) and is NP-hard, resulting in long computation times when scaling up. Samie et al. [48] present another Constraint Programming-based approach that takes into account the bandwidth limitations and minimizing energy consumption of IoT nodes. The system optimizes computation offloading from an IoT node to a gateway, however, it does not consider composed computations that can be distributed to multiple devices.
A Game Theory-based approach is presented in [49] that aims at the joint optimization of radio and computational resources of mobile devices. However, the system local optimum for multiple users only aims at deciding whether to fully offload a computation or to fully process it on device.
Based on Non-linear Integer Programming, Sahni et al. [50] present their Edge Mesh algorithm for task allocation that optimizes overall energy consumption and considers data distribution, task dependency, embedded device constraints, and device heterogeneity. However, only basic evaluation and experimentation are done, without performance comparison. Based on Integer Linear Programming (ILP), Mohan and Kangasharju [51] propose a task assignment solver that first minimizes the processing cost and secondly optimizes the network cost, which stems from the assumption that Edge resources may not be highly processing-capable. An intermediary step reduces the sub-problem space by combining tasks and jobs with the same associated costs. This reduces the overall processing costs.
Cardellini et al. [52] describe a comprehensive ILP-based framework for optimally placing operators of distributed stream processing applications, while being flexible enough to be adjusted to other application contexts. Different optimization goals are considered, e.g., application response time and availability. They propose their solution as a unified general formulation of the optimal placement problem and provide an appropriate theoretical foundation. The framework is flexible so that it can be extended by adding further constraints or shifted to other optimization targets. Finally, our previous work [53] has leveraged Cardellini’s framework and has extended it by incorporating further constraints for the optimization goal, namely the overall energy usage of the application.
D. Distributed Ledger Technologies Over Wireless Networks
In recent years, DLT has been the focus of large research efforts spanning several application domains. Starting with the adoption of Bitcoin and Blockchain, DLT has received a lot of attention in the realm of IoT, as the technology promises to help address some of the IoT security and scalability challenges [54]. For instance, in IoT deployments, the recorded data are either centralized or spread out across different heterogeneous parties. These data can be both public or private, which makes it difficult to validate their origin and consistency. In addition, querying and performing operations on the data becomes a challenge due to the incompatibility between different Application Programming Interfaces (APIs). For instance, Non-Governmental Organizations (NGOs), Public and Private sectors, and industrial companies may use different data types and databases, which leads to difficulties when sharing the data [55]. A DLT system offers a tamper-proof ledger that is distributed on a collection of communicating nodes, all sharing the same initial block of information, the genesis block [56]. In order to publish data to the ledger, a node includes data formatted in transactions in a block with a pointer to its previous block, which creates a chain of blocks, the so called Blockchain.
A smart contract [57] is a distributed app that lives in the Blockchain. This app is, in essence, a programming language class with fields and methods, and they are executed in a transparent manner on all nodes participating in a Blockchain [58]. Smart contracts are the main blockchain-powered mechanism that is likely to gain a wide acceptance in IoT, where they can encode transaction logic and policies, which includes the requirements and obligations of parties requesting access, the IoT resource/service provider, as well as data trading over wireless IoT networks [59]. With the aforementioned characteristics, the advantages of the integration of DLTs into wireless IoT networks consist of: 1) guarantee of immutability and transparency for recorded IoT data; 2) removal of the need for third parties; and 3) development of a transparent system for heterogeneous IoT networks to prevent tampering and injection of fake data from the stakeholders.
DLTs have been applied in various IoT areas such as healthcare [60], [61], supply chain [62], smart manufacturing [63], and vehicular networks [64]. In the smart manufacturing area, the work described in [63] investigates DLT-based security and trust mechanisms and elaborates a particular application of DLTs for quality assurance, which is one of the strategic priorities of smart manufacturing. Data generated in a smart manufacturing process can be leveraged to retrieve material provenance, facilitate equipment management, increase transaction efficiency, and create a flexible pricing mechanism.
One of the challenges of implementing DLT in IoT and edge computing is the limited computation and communication capabilities of some of the nodes. In this regard, the authors in [59], [65] worked on the communication aspects of integrating DLTs with IoT systems. The authors studied the trade-off between the wireless communication and the trustworthiness with two wireless technologies, LoRa and NB-IoT.
Enabling Technologies for iIoTe
This section elaborates on the three enabling technologies for iIoTe: 1) distributed learning; 2) distributed computing; and 3) distributed ledgers.
A. Energy-Efficient Distributed Learning Over Wireless Networks
As shown in Figure 1, each end device in the iIoTe has local AI and the whole system relies on FL. We present learning frameworks that are suitable for iIoTe leveraging two techniques: 1) spatial and temporal sparsity and 2) quantization.
1) Dynamic GADMM:
Standard FL requires a central entity, which plays the role of a parameter server (PS). At every iteration, all nodes need to communicate with the PS, which may not be an energy-efficient solution especially for a large distributed network of agents/workers, as in the manufacturing use case. Furthermore, a PS-based approach is vulnerable to a single point of attack or failure. To overcome this problem and ensure a more energy-efficient solution, we propose a variant of the standard Alternative Direction Method of Multipliers (ADMM) [66] method that decomposes the problem into a set of subproblems that are solved in parallel, referred to as Group ADMM (GADMM) [44]. GADMM extends the standard ADMM to decentralized topology and enables communication and energy-efficient distributed learning by leveraging spatial sparsity, i.e., enforcing each worker to communicate with at most two neighboring workers. In GADMM, the standard learning problem (P1) is re-formulated as the following learning problem (P2):\begin{align*} \left ({{\mathbf {P1}}}\right)~\min _{\left \{{\boldsymbol {\theta }_{n}}\right \}_{n=1}^{N}}&\sum \limits _{n=1}^{N} f_{n}\left ({\boldsymbol {\theta }_{n}}\right)\tag{1}\\ \left ({{\mathbf {P2}}}\right)~\min _{\left \{{\boldsymbol {\theta }_{n}}\right \}_{n=1}^{N}}&\sum \limits _{n=1}^{N} f_{n}\left ({\boldsymbol {\theta }_{n}}\right) \\ \mathrm {s.t.}&~\boldsymbol {\theta }_{n} = \boldsymbol {\theta }_{n+1}, \mathrm {~for~} n=1,\ldots, N-1.\tag{2}\end{align*}
To this end, GADMM divides the set of workers into two groups head and tail. Thanks to the equality constraint of (P2), each worker from the head/tail group exchanges model with only two workers from the tail/head group forming a chain topology. At iteration
One drawback of GADMM is attributed to its slow convergence compared to standard ADMM. In other words, due to the sparsification of the graph, workers require more iterations for the convergence. To alleviate this issue and combine the fast convergence of standard ADMM with the communication-efficiency of GADMM, we have Dynamic GADMM (D-GADMM) [44]. proposed Not only D-GADMM improves the convergence speed of GADMM, but it also copes with dynamic (time-variant) networks, in which the workers are moving (e.g., the AGVs in the manufacturing plant or the tractors in the agriculture use case), while inheriting the theoretical convergence guarantees of GADMM. In a nutshell, every couple of iterations in D-GADMM, i.e., system coherence time, two things are changing: i) workers assignment to head/tail group, which follows a predefined assignment mechanism and ii) neighbors of each worker from the other group. The idea at high level as is follows: the workers are given fixed IDs, and they share a pseudo-random code that is used every
In Fig. 3, we plot the objective error in terms of the number of iterations (left) and in terms of sum energy (right) for D-GADMM as well as GADMM and standard ADMM. As we can see from Fig. 3, D-GADMM greatly increases the convergence speed of GADMM and thus decreases the overall communication cost for fixed topology. As a consequence, D-GADMM achieves convergence speed comparable to the PS-based ADMM while maintaining GADMM’s low communication cost per iteration.
D-GADMM: loss as a function of (a) number of iterations and (b) total energy consumption.
2) Censored Quantized Generalized GADMM:
As pointed out earlier, each worker, in the GADMM framework, exchanges its model with up to two neighboring workers only, which slows down convergence. To reduce the communication overhead while generalizing to more generic network topologies, we propose the Generalized GADMM (GGADMM) [45]. Under this generalized framework, the workers are still divided into two groups: 1) head and 2) tail, with possibly different sizes. In other words, the topology is generalized from a chain topology to any bipartite graph where the number of neighbors, that each worker can communicate with can be any arbitrary number and not necessarily limited to two. By leveraging the censoring idea, i.e., temporal sparsity, we introduce the Censored GGADMM (C-GGADMM) where each worker exchanges its model only if the difference between its current and previous models is greater than a certain threshold. To make the algorithm more communicationefficient, censoring is applied on the quantized version of the worker’s model instead of the model itself to get the Censored Quantized GGADMM (CQ-GGADMM) [45], [67]. CQ-GGADMM can significantly reduce the communication overhead, particularly for large model size
Fig. 4 compares CQ-GGADMM with Censored ADMM (C-ADMM), GGADMM, as well as C-GGADMM in terms of the loss versus the number of iterations (left) and versus the total sum energy (right) for a system of 18 workers on a linear regression task using the Body Fat dataset [68]. We can observe, from Fig. 4, that CQ-GGADMM exhibits the lowest total communication energy, followed by C-GGADMM, then GGADMM and finally C-ADMM, while having similar convergence speed to GGADMM. This observation validates the benefits of censoring the quantized version of the models before sharing, which makes the proposed algorithm (CQ-GGADMM) more communication and energy efficient.
CQ-GGADMM: loss as a function of (a) number of iterations and (b) total energy consumption.
Finally, it is worth mentioning that motivated by the fact that, in FL, the parameter server is interested in the aggregated output of all workers rather than the individual output of each worker, analog over the air aggregation schemes such as [34], [69]–[71] were proposed. Such schemes were shown to achieve high scalability and significant savings in energy consumption owing to their ability to allow non-orthogonal access to the bandwidth.
B. Optimizing Energy Consumption of Wireless IoT Environments
The next pillar in iIoTe is edge computing. Specifically, we consider the problem of allocating the application components to the available end devices. As first presented in [53], we extend the Integer Linear Programming (ILP) based framework defined by Cardellini et al. [52] (Section II-C). In [53] the goal was to minimize the overall energy consumption needed for executing an IoT application. The formulated ILP model is described below. The optimality can be determined with this by feeding it into a solver, such as IBM CPLEX.2
We define optimality of the allocation by total energy use over one execution of an IoT application. Energy during the application’s execution is consumed in two phases: 1) device energy, consumed by a device when executing a component and 2) edge network energy, consumed by the device when sending the result of the calculation over the network. Note that “optimal” in this case only describes optimality in the integer model. Given the constraints and the model, we find the optimal assignment, i.e., the one with minimal energy usage.
The optimal network configuration is the assignment of application components to devices that result in the lowest total consumption of energy and satisfies the constraints. The constraints concern the requirements that an assignment must satisfy: Each component should only be allocated once and resource requirements for assigned components should not exceed the resources of the node. This problem is a form of the quadratic assignment problem, and thus is NP-hard.
1) System Model:
The application consists of a set of components and edges that interconnect them, modeled as a weighted undirected graph
Analogously, the network infrastructure where the components can be evaluated is modeled with the multi-partite graph
2) Problem Formulation:
For calculating the network energy, we need to know whether a link between two components is assigned to a link between two nodes. For this, we introduce a matrix \begin{align*}&\forall t_{1}, t_{2} \in \mathcal{V}_{\mathrm {app}}~:~\forall n_{1}, n_{2} \in \mathcal{V}_{\mathrm {net}}~:~Y\left [{t_{1}, t_{2}, n_{1}, n_{2}}\right] \le X\left [{t_{1}, n_{1}}\right] \\ \tag{3}\\&\forall t_{1}, t_{2} \in \mathcal{V}_{\mathrm {app}}~:~\forall n_{1}, n_{2} \in \mathcal{V}_{\mathrm {net}}~:~Y\left [{t_{1}, t_{2}, n_{1}, n_{2}}\right] \le X\left [{t_{2}, n_{2}}\right] \\ \tag{4}\\&\forall t_{1}, t_{2} \in \mathcal{V}_{\mathrm {app}}~:~\forall n_{1}, n_{2} \in \mathcal{V}_{\mathrm {net}}~:~Y\left [{t_{1}, t_{2}, n_{1}, n_{2}}\right] \\&\;\ge X\left [{t_{1}, n_{1}}\right] + X\left [{t_{2}, n_{2}}\right] - 1 \tag{5}\\&\forall t \in \mathcal{V}_{\mathrm {app}}~: \sum _{n \in \mathcal{V}_{\mathrm {net}}} X\left [{t,n}\right] = 1 \tag{6}\\&\forall n \in \mathcal{V}_{\mathrm {net}}~: \sum _{t \in \mathcal{V}_{\mathrm {app}}} X\left [{t,n}\right] \cdot R_{t} \le R_{n} \tag{7}\\&\sum _{t \in \mathcal{V}_{\mathrm {app}}}\sum _{n \in \mathcal{V}_{\mathrm {net}}} C_{n} \cdot \left ({S_{t} / P_{n}}\right) \cdot X\left [{t,n}\right] \le E_{d} \tag{8}\\&\sum _{\left ({t_{1}, t_{2}}\right) \in \mathcal{E}_{\mathrm {app}}} \sum _{n_{1}, n_{2} \in \mathcal{V}_{\mathrm {net}}} O_{n_{1}} \cdot P_{n_{1},n_{2}} \cdot Y\left [{t_{1}, t_{2}, n_{1}, n_{2}}\right] \le E_{n} \\ \tag{9}\\&E_{n} + E_{d} \le E_{t} \tag{10}\end{align*}
3) A Linear Heuristic for Energy-Optimized Allocation:
The presented QAP is NP-hard and thus compute intensive. The culprit for this is the network cost calculation and the linearization of \begin{align*} \sum _{t \in \mathcal{V}_{\mathrm {app}}}\sum _{n \in \mathcal{V}_{\mathrm {net}}} C_{n} \cdot \left ({S_{t} / P_{n}}\right) \cdot X\left [{t,n}\right] + O_{t} \cdot \hat {T}_{n} \cdot X\left [{t,n}\right] \le E_{t} \\\tag{11}\end{align*}
The complete model reuses constraints in equations (6) and (7) with the constraint (11). By transforming the QAP into a linear problem, we greatly increase the speed of finding a solution, and make the optimization feasible for on-line usage. The drawback is that by approximating the network energy the solution is no longer optimal, as it will be shown in the results.
4) Evaluation of Allocation Algorithm:
We implemented the model using the PuLP3 linear programming library. The evaluation was done by generating a random network and a random application, and letting the solver find the optimal allocation.
The network configuration is generated with a variety of node configurations and capabilities, reflecting a heterogeneous computation and communication infrastructure that one could find in an industrial manufacturing plant (e.g., using Siemens range of industrial computers [73]). In the evaluated configuration, 60% of the nodes were generated as wired nodes, and the remaining 40% are wireless nodes. Nodes are connected to each other with a certain probability. That probability is 0.8 for wired-wired connections, 0.5 for wireless-wireless connections and 0.4 for wireless-wired connections. Wired connections use 0.2 units of energy, while wireless connections use 0.8 units of energy, which is similar to the power consumption of an Ethernet module [74] as compared to a WiFi module [75]. Nodes have a varying amount of memory resources uniformly distributed between a lower bound of 1 and an upper bound of 8 resource units. Nodes also have a varying processing speed between 1 and 3 speedup, roughly comparing to the Intel processor family i3, i5, and i7. Finally, nodes can use from 0.5 to 1.5 units of energy for a single unit of computation.
For the application, two classes with a certain number of components are generated, a “wide” and a “long” application. In a “wide” application, two components are designated the “start” and “end” components, and every other component needs input from the start node and sends output to the end node. In a long application, components are linked serially. Figure 5 shows two example applications. This method for generating recips is similar to [52]. Each application component has resource requirements randomly distributed between 1 and 8, an output factor randomly distributed between 0.5 and 1.5, and a computation size of 1 or 2.
As expected, the optimal allocation algorithm scales very badly (non-polynomially). Figure 6 shows the runtime of the algorithm for varying problem sizes. The shaded area shows the variance with the non-shown parameter (different application sizes for the network node graph, differing network sizes for the application node graph). The time needed for finding the optimal allocation grows unwieldy very quickly.
In comparison, the heuristic presented in equation (11) finds a solution much more quickly. Figure 7 shows the runtime of the heuristic for different network and application sizes. For the slowest case for the full allocation, the heuristic takes 8 seconds of CPU time, while the solver consumes 864104 seconds (about 10 days) of CPU time for finding the optimal allocation. The allocation evaluation was executed on an Amazon EC2
C. Energy-Efficient Blockchain Over Wireless Networks
The last enabling technology is DLT, which provides a tamper-proof ledger distributed for the nodes of the iIoTe. The energy and latency cost of implementing DLT over wireless links and with constrained IoT devices is oftentimes overlooked. In general, the latency and energy budgets are highly impacted by the wireless access protocol.
1) System Model:
As introduced in [65], there are two architectural choices for IoT DLT. The conventional one is to have IoT devices that receive complete blocks from the Blockchain to which they are connected, and locally verify the validity of the Proof-of-Work (PoW) solution and the contained transactions. This configuration provides the maximum possible level of security. However, this requires high storage, energy and computation resources, since the node needs to store the complete Blockhain and to check all transactions. This makes it infeasible for many IoT applications. Instead, we consider the second option where the IoT device is a light node that receives only the headers from the Blockchain nodes. These headers contain sufficient information for the Proof-of-Inclusion (PoI), i.e., to prove the inclusion of a transaction in the block without the need to download the entire block body. Furthermore, the device defines a list of (few) events of interest, such as modifications to the state of a smart contract or transactions from/to a particular address.
The communication model for this lightweight version is as follows. The IoT devices transmit data to the Blockchain using the edge infrastructure. Specifically, a NB-IoT cell with the base station located in its center is considered, with
2) End-2-End (E2E) Latency:
NB-IoT provides three coverage classes namely normal, extreme, and robust class for serving limited-resource devices and suffering various pathloss levels [76]. Minimum latency and throughput requirements need to be maintained in the extreme coverage class, whereas enhanced performance is ensured in the extended or normal coverage class. Without loss of generality, we consider only normal and extreme coverage class, i.e., the number of classes
The total E2E latency includes two parts: 1) the latency
The wireless communication latency of NB-IoT uplink and downlink can be formulated as:\begin{align*} L_{UeD}=&L^{u} + L^{d} = L^{u}_{sync} + L^{u}_{rr} + L^{u}_{tx} + L^{d}_{sync} \\&+\,\,L^{d}_{rr} + L^{d}_{rx},\tag{12}\end{align*}
\begin{equation*} L_{rr} = \sum _{l=1}^{N_{r_{max}}} \left ({1-P_{rr}}\right)^{l-1} P_{rr}l\left ({L_{ra} + L_{rar}}\right),\tag{13}\end{equation*}
In the following, we provide a simple technique based on drift approximation [78] to calculate
Let \begin{equation*} P_{\mathrm {collision}}\left ({\lambda ^{a}_{tot}}\right)=1-\left ({1-\frac {1}{K}}\right)^{\lambda ^{a}_{tot}-1}\approx 1-e^{-\frac {\lambda ^{a}_{tot}}{K}}.\tag{14}\end{equation*}
\begin{equation*} \lambda ^{a}_{tot}= \lambda ^{a} +\left ({1-P_{rr}\left ({\lambda ^{a}_{tot}}\right)}\right)\sum _{l=2}^{N_{r_{max}}} \lambda ^{a}\left ({l}\right), \tag{15}\end{equation*}
Assuming that the transmission time for the uplink transactions follows a general distribution with the first two moments \begin{equation*} L_{tx} = \frac {f\lambda ^{u}s_{1}s_{2}}{2s_{1}\left ({1-fGs_{1}}\right)} + \frac {f\lambda ^{u}s_{1}^{2}}{2\left ({1-f\lambda ^{u}s_{1}}\right)} + \frac {l_{1}}{ \mathcal{R}^{u}w},\tag{16}\end{equation*}
\begin{equation*} L_{rx} = \frac {0.5Fh_{1}t^{-1}}{h_{1}\left ({1-Fht^{-1}}\right)} + \frac {Fh_{1}}{1-Fht^{-1}} + \frac {m_{2}}{ \mathcal{R}^{d}y},\tag{17}\end{equation*}
Next, we calculate the second latency component, corresponding to the DLT verification process. Consider a DLT network that includes \begin{equation*} L_{tM} = L_{newB} + L_{getB} + L_{transB}\tag{18}\end{equation*}
In (18),
For the PoW computation, a miner \begin{align*} L_{W_{i*}}=&\int _{0}^{\infty } (1 - Pr(W \leq x))^{M} DD x \\=&\int _{0}^{\infty } e^{-\lambda _{c}Mx} DD x = \frac {1}{\lambda _{c}M}\tag{19}\end{align*}
The total latency required from DLT verification process is
3) Energy Consumption:
Analogously to the latency, the energy consumption is divided in the wireless communication (uplink/downlink) and the DLT verification.
The total energy consumption in the wireless communication is written as follows:\begin{align*} E_{UD}=&E^{u} + E^{d} \\=&E^{u}_{sync} + E^{u}_{rr} + E^{u}_{tx} +E^{u}_{s} + E^{d}_{sync} + E^{d}_{rr} \\&+\,\,E^{d}_{rx} + E^{d}_{s},\tag{20}\end{align*}
\begin{align*} E_{sync}=&P_{l} \cdot L_{sync} \tag{21}\\ E_{rar}=&P_{l} \cdot L_{rar} \tag{22}\\ E_{rr}=&\sum _{l=1}^{N_{max}} \left ({1-P_{rr}}\right)^{l-1} \cdot P_{rr} \cdot \left ({E_{ra} + E_{rar}}\right) \tag{23}\\ E_{ra}=&\left ({L_{ra} - \tau }\right) \cdot P_{I} + \tau \cdot \left ({P_{c} + P_{e} P_{t}}\right) \tag{24}\\ E_{tx}=&\left ({L_{tx} - \frac {l_{a}}{ \mathcal{R}^{u}w}}\right) \cdot P_{I} + \left ({P_{c} + P_{e} P_{t}}\right)\frac {l_{a}}{ \mathcal{R}^{u}w} \quad \tag{25}\\ E_{rx}=&\left ({L_{rx} - \frac {m_{1}}{ \mathcal{R}^{d}y}}\right) \cdot P_{I} + P_{l} \frac {m_{1}}{ \mathcal{R}^{d}y}\tag{26}\end{align*}
Following the PoW described above, the average energy consumption of DLT to finish a single PoW round is:\begin{equation*} E_{DLT} = P_{c} L_{W_{i*}} + P_{t} L_{tm}.\tag{27}\end{equation*}
4) Results:
The performance of DLT-based NB-IoT system is shown in Fig. 8. The experiments demonstrate the total latency Fig. 8(a) and energy efficient Fig. 8(b) of a DLT-based NB-IoT system, respectively. In Fig. 8(a), the E2E latency is defined as the time elapsed from the generation of a transaction at the NB-IoT device until its verification. This includes the latency at the NB-IoT radio link and at the DLT, which comprises the execution time of the smart contract and transaction verification. We observe that increasing
Towards Energy-Efficient Intelligent IoT Environments
Having the energy-performance characterization for each of the enabling technologies (Section III), we describe next how they interact with each other in iIoTe. For this, we consider the scenario in Figure 9, where a given learning application is considered. We split the task into sub-tasks such as data processing, data training and model aggregation and distribute them in a decentralized way. Each of these sub-tasks (Section III-A) constitute the application components (
In detail, the communication workflow of the proposed scheme can be summarized as follows:
Step 1: The data processing can be completed in different edge devices with limited resources. The selected data from the data provider is pre-processed and structured. This process includes both a data engineering and feature engineering sub-process, in which data engineering converts the raw data into prepared data and feature engineering tunes the prepared data to create features expected by ML models.
Step 2: Then, the edge nodes or IoT devices, which are responsible for training, compute the local model based on its own private data and then publish the local model to its associated edge server via, e.g., NB-IoT by registering with active smart contracts to upload their result securely until the results are incorporated in the final aggregation and generation of DLT transactions.
Step 3: Next, the edge servers with ML aggregation responsibility gather transactions and arrange them in blocks following the Merkle tree. The structure of a DLT involves the hash of the previous block, a timestamp, ‘nonce’ and the structure of hash tree. These edge servers with high computational capacity join in the DLT mining process to verify the created blocks and operate consensus in the edge network. After completing the mining process, the verified blocks are added to the ledger, and synchronized among the nodes. The local models are published in the distributed ledger. Hence, the powerful edge servers can compute the global model directly based on the aggregation rules defined in smart contracts.
The advantages of this integration are two-folds. First, by distributing the tasks to different edge nodes with different computing capacities, the IoT devices or edge nodes with limited resources can save significant amount of energy required for training or mining and they can achieve lower latency. Second, by leveraging the DLT, the updates of ML models are securely formed in encrypted transactions and hashed blocks, which significantly enhances the security and privacy of distributed learning in the edge networks. The DLT provides trust, transparency and immutability baseline for distributed learning to guarantee the security and privacy of data and ML models, and naturally addresses the single-point of failure problem of the current standard FL approach that relies on a centralized server to aggregate the models. Although the integration of enabling technologies introduces advantages, it also has some drawbacks, for example, the time required of DLT mining will increase the total latency of the system. This is a trade-off between trust and communication latency which we discussed in [55], [65].
Conclusion and Future Work
In this paper, we address the evolution of next-generation of IoT networks towards the edge, driven by the introduced intelligent IoT environments. We use the iIoTe as the basic building block to characterize the tradeoff energy-performance of the three key enabling technologies, learning, edge computing and distributed ledger. Edge intelligence must rely on distributed paradigms such as FL, and we have shown how exploiting spatial and temporal sparsity and quantization can significantly improve the performance and reduce the energy consumption. Moreover, we have discussed the distribution of the FL model aggregator and the rest of sub-tasks to make the framework more robust against failures. For edge computing, the optimal allocation of the application components to network resources is important to efficiently use the available infrastructure and optimize its energy consumption. DLT is a flexible solution for trustworthiness in these environments, but the energy and latency cost of implementing DLT over wireless and constrained devices is oftentimes overlooked. We have analyzed these parameters using NB-IoT as the baseline wireless technology.
In the integration of these technologies in iIoTe, we have shown the interactions among them, which provides the basis towards an energy model and evaluation that encompasses the contribution of each element. For instance, the learning and computation models can be easily broaden to consider the allocation of the different sub-tasks of the learning application in a representative topology, with each learning action and resource allocation playing the role of an action to be recorded in the DLT. Future work also includes extending the proposed solutions to dynamic environments where agents move and edge nodes are not always available. This is already supported by the presented dynamic head/tail learning paradigms but the integration of a dynamic resource allocation and DLT framework is pending. Another necessary direction is to investigate the joint optimization of the computing and communication resources from the energy perspective.