Conferences >2022 IEEE Symposium on Securi...

RT-TEE: Real-time System Availability for Cyber-physical Systems using ARM TrustZone

Abstract:

Embedded devices are becoming increasingly pervasive in safety-critical systems of the emerging cyber-physical world. While trusted execution environments (TEEs), such as...Show More

Metadata

Abstract:

Embedded devices are becoming increasingly pervasive in safety-critical systems of the emerging cyber-physical world. While trusted execution environments (TEEs), such as ARM TrustZone, have been widely deployed in mobile platforms, little attention has been given to deployment on real-time cyber-physical systems, which present a different set of challenges compared to mobile applications. For safety-critical cyber-physical systems, such as autonomous drones or automobiles, the current TEE deployment paradigm, which focuses only on confidentiality and integrity, is insufficient. Computation in these systems also needs to be completed in a timely manner (e.g., before the car hits a pedestrian), putting a much stronger emphasis on availability.To bridge this gap, we present RT-TEE, a real-time trusted execution environment. There are three key research challenges. First, RT-TEE bootstraps the ability to ensure availability using a minimal set of hardware primitives on commodity embedded platforms. Second, to balance real-time performance and scheduler complexity, we designed a policy-based event-driven hierarchical scheduler. Third, to mitigate the risks of having device drivers in the secure environment, we designed an I/O reference monitor that leverages software sandboxing and driver debloating to provide fine-grained access control on peripherals while minimizing the trusted computing base (TCB).We implemented prototypes on both ARMv8-A and ARMv8-M platforms. The system is tested on both synthetic tasks and real-life CPS applications. We evaluated rover and plane in simulation and quadcopter both in simulation and with a real drone.

Published in: 2022 IEEE Symposium on Security and Privacy (SP)

Date of Conference: 22-26 May 2022

Date Added to IEEE Xplore: 27 July 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/SP46214.2022.9833604

Conference Location: San Francisco, CA, USA

Funding Agency:

Contents

SECTION I.

Introduction

The software of modern cyber-physical systems (CPSs) is often highly complex. For example, the code in a modern automobile such as the Chevy Volt is as complex as the total flight software of the Boeing 787 airplane [1]. Under the pressure to include more features and to save on production cost, weight, and testing, CPS system designers are consolidating more and more functionalities on a single system-on-chip (SoC) [2], [3]. Numerous software vulnerabilities have been discovered on modern cyber-physical systems such as drones [4], [5] and automobiles [6]. While some of these vulnerabilities are only nuisances [7], others allow attackers to escalate into system privilege [7], [8], [4], [9] and can have life or death implications [6].

Lack of availability protection in existing defenses: Recognizing the importance of embedded system security, there has been significant interest in hardening the software using security mechanisms, such as control-flow integrity, privilege minimization, specialized reference monitor, etc. [10], [11], [12], [13]. Common to all software approaches is the reliance on a trusted OS. However, many existing embedded systems, microcontrollers in particular, have a large amount of code in the privilege mode [14] for convenience of development or performance.

Trusted Execution Environment [15], [16], [17], [18] is a complementary approach that provides a powerful abstraction of a trusted machine even if the system software is compromised. TEE technologies, such as TrustZone, are now a de facto solution for mobile device security [19], [20], [21], [22], [23], [24], [25], [26]. However, similar to all existing software solutions, when the attackers can escalate their privilege into the OS, current TEE software stacks offer little assurance for system availability. Since the current TEE design only protects computation confidentiality [19] and integrity [20], management of resources, including process scheduling, is left to the non-secure OS. Recently, there has been increasing interest in enabling availability protection using new hardware designs [27], [28], [29], [30]. However, the application of such hardware primitives in real-time cyber-physical systems, such as autonomous drones, remains an open question.

Importance of availability in CPS: A defining characteristic of real-time CPSs is their continuous interaction with the physical world. Therefore, it is crucial that system resources are made available to safety-critical tasks in a timely manner. For example, the pedestrian detection algorithm on a self-driving car is a real-time task with a direct connection to the physical world process. A delay in the execution of this workload by the attacker can render the result utterly useless, since a catastrophic accident may have already happened, as recently demonstrated in [31], [32]. To further motivate the problem, we have also developed a new attack called time warping attack, which exploits access to Dynamic Voltage and Frequency Scaling (DVFS) to tamper with the timing characteristics of critical control components protected with TEE, leading to control destabilization crashing the robot.

Real-Time Trusted Execution Environment: In this paper, we assume a strong adversary that can exploit vulnerabilities in CPS firmware [5], [4], [6] to take control of the OS, and we address the research question of how to use security primitives on commodity embedded hardware to provide system availability assurance for real-world CPSs.

Our main contribution is the design, implementation, and evaluation of RT-TEE, a real-time trusted execution environment that protects system availability using hardware-assisted system resource partitioning on embedded platforms, such as ARM TrustZone. Availability entails the guarantee of timely access to system resources, including both computation (control) and I/O (sensing and actuation). However, moving the critical processes and resource management into the TEE not only significantly increases the system trusted computing base (TCB), but also degrades the performance. There are three key research challenges:

Challenge 1) Minimal hardware abstraction for availability guarantee in CPS: To provide availability guarantee, the TCB has to assert complete mediation over resources needed by the safety/security critical tasks for availability. However, resource management is commonly implemented by the untrusted OS in existing TEE designs. Building on the concept of control loops, we formulate the requirements on the minimal set of capabilities the hardware has to provide and show how they can be met using primitives from the TEE. This allows us to construct the rest of the design using a minimized hardware abstraction. From a high level, to ensure availability for CPS, RT-TEE relies on the secure timer to trap execution back to TCB to provide computational availability for the control. It also relies on a secure I/O reference monitor to enforce isolation and protection for sensing and actuation.

Challenge 2) Real-time computation availability: Due to the strong temporal affinities of CPS, computation resources not only have to be available, but also have to be in real-time. Contrary to the popular belief that real-time processes have to finish in a very short time, the key requirement is on meeting the deadlines from the perspective of real-time computing [33], [34]. This is typically accomplished using a trusted real-time scheduler. A naive approach is to directly implement such a scheduler inside the TCB for all secure and non-secure processes, but it significantly increases the TCB complexity. Another approach is to always complete the secure tasks first, also known as idle scheduling. However, this design can lead to unnecessary starvation of non-secure tasks, which hurts overall system performance since critical/secure tasks may not need to be executed immediately; they just need to be completed before the deadlines. For example, the battery checking task is secure safety-critical but doesn’t have to be executed immediately, while the video streaming application on the drone is not safety-critical but should be accommodated to the extent that secure tasks do not miss their deadlines.

To minimize the penalty on performance without significantly increasing the complexity of the secure scheduler, we propose a policy-based event-driven hierarchical scheduler. Our hierarchical scheduler has two layers. Only the top-level scheduler has to be added to the TCB to guarantee secure processes have the resources to meet the deadlines. This is because the theoretical guarantee on the completion of secure tasks by compositional schedulability analysis makes no assumption on the behavior of the non-secure environment.

Challenge 3) Fine-grained peripheral availability: Naive use of TEE to protect I/O resources is neither sufficient nor effective for two reasons. It is insufficient because device level protection may not be universally available on all peripherals. Using SPI bus as an example, the access control is coarse grained, only specifying if a security domain has access to the bus or not. It is also not effective because migrating device drivers into the TCB will significantly increase its complexity.

To enable fine-grained access control on the peripherals, we designed and implemented an I/O reference monitor on top of TEE to remove the assumption on trusted drivers. To minimize the impact on the TCB, we leveraged the unique characteristics of cyber-physical systems, where each control loop performs the same set of I/O actions, to allow for significant driver debloating, where only a subset of the driver functionality is maintained for sensing and actuation. To enable feature-rich drivers without increasing the TCB, we proposed to sandbox the driver in conjunction with the I/O reference monitor to prevent compromised drivers from harming the system.

Prototype and Contribution: We have implemented a prototype on both ARMv8-A and ARMv8-M architecture, using Raspberry Pi and NXP LPC 55S69 development board, respectively. Using Raspberry Pi as the controller running ArduPilot, we assembled a quadcopter to test the impact of security protection on both real-time properties and control variation. To show how the environment can be used, we presented two concrete use cases on autonomous drones, protecting the entire flight controller or just the fail-safe controller for emergency recovery of the autonomous aerial vehicle. To evaluate the performance and understand the limitation of our proposed approach, we conducted a series of experiments on both synthetic workloads and real-life applications on both simulator and real-world systems. We found that our RT-TEE introduces a small overhead in task execution time on real-world drone applications.

In Summary, we have made the following contributions,

We designed and implemented a real-time trusted execution environment, RT-TEE, capable of ensuring real-time availability on both CPU and I/O for commodity embedded processors in the presence of a compromised OS, addressing a key requirement for safety-critical CPS/IoT.
To balance real-time responsiveness and TCB minimization, we designed and developed a policy-based event-driven hierarchical scheduler. To minimize the attack surface of device drivers in the TCB, we developed an I/O reference monitor on top of driver debloating and sandboxing to ensure the real-time I/O availability.
We implemented a prototype on both ARMv8-A and ARMv8-M processors ¹. We tested our system on both synthetic tasks as well as real-world applications, covering three CPS platforms, quadcopter, plane, and rover, in simulation. We also deployed RT-TEE on a real-life quadcopter to validate the feasibility.

SECTION II.

Background and Motivation

Lack of Availability in Existing TEE Deployment Model: ARM processor families, which power more than 60% of embedded devices, have a long history of building a trusted execution environment called ARM TrustZone into both low-end Cortex-M and high-end Cortex-A series. Similar to ARM, many commodity [16], [35], [15] and customized processor [18], [36], [17] offerings enable hardware-enforced resource isolation between the secure and non-secure environment, which are also referred to as the secure world and non-secure/normal world in ARM. Using such isolation, TEE offers a secure environment for secure processing even if the non-secure OS is compromised. However, based on the design principle of TCB minimization, most existing deployment models of powerful TEE hardware rely on the non-secure OS for resource orchestration. Using the current most widely deployed embedded TEE, TrustZone, as a case study, we surveyed all existing TEE software stacks, including nVidia TLK, Linaro OP-TEE, Trustonic TEE, Huawei iTrustee, Android Trusty, and Qualcomm TEE. All of them rely on the non-secure OS for resource management, including process scheduling. The detailed survey can be found in Appendix J.

Real-Time System Background: Contrary to the popular belief that real-time systems need to complete individual tasks quickly, the expectation in the real-time system community is that a task shall finish before its deadline [37]. A task is usually implemented as a thread in an OS. Real-time (RT) tasks are tasks with certain timing constraints. Periodic tasks are the most common model in real-time scheduling, because they map well to cyber-physical processes, where a task releases jobs periodically. The interval between two consecutive job releases is referred to as the period. Each job needs to be executed and completed before its deadline. A deadline can be explicit (specified) or implicit (at the end of a period). A hard real-time job must be completed before the deadline; completion past the deadline does not provide any utility and may lead to serious consequences. To facilitate scheduling, a priority is assigned to a task. The priority can be fixed (i.e., determined before run-time) or dynamic (i.e., changing based on the current tasks running in the system).

Security Implication of Real-time Property: The timing critical nature of CPS changes the landscape of attack vectors when the non-secure OS is compromised. Resources not only need to be made available, but also have to be available in a timely manner such that the computation can finish on time. To motivate the necessity of real-time scheduling for security, we developed a concrete attack called time warping attack that exploits DVFS and can destabilize the system even when the controller for the CPS is bug-free and protected by TEE.

Time Warping Attack: Dynamic Voltage and Frequency Scaling is a ubiquitous energy management technique that enables a trade-off between processor speed and energy consumption. During the schedulability test, the worst-case execution time is calculated based on the assumption of specific processor frequency. When it is changed, the original allocated budget for secure/critical tasks will no longer suffice. Since the frequency scaling attack can occur anytime during the execution of the secure environment by launching the attack on a different core occupied by the untrusted non-secure OS, the secure environment also faces the challenge of time-of-check vs time-of-use (ToCToU). This frequency reduction leads to a misconception of time elapsing in the secure environment, and results in control destabilization.

Fig. 1:

Trajectory under Frequency Scaling Attack in Open-loop Testing.

MIT Libraries

MIT Libraries

RT-TEE: Real-time System Availability for Cyber-physical Systems using ARM TrustZone

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Introduction

Background and Motivation

Threat Model and Security Goal

RT-TEE Design

A. Minimal Abstraction for Resource Availability

B. Real-time Availability for Computation

C. Fine-grained I/O Access Control for Peripheral Availability

Implementation

Evaluation

A. Microbenchmark on Scheduling

B. Micro-benchmark on I/O

C. Macro-benchmark with Synthetic Tasks

D. Macro-benchmark with Real-world CPS in Simulation

Security Analysis

A. Real-time Computational Availability – R1, R2, R3

B. Real-time I/O Availability Protection – R3

C. TCB Minimization and Platform Security

Related Work

Limitations and Discussions

Conclusion

ACKNOWLEDGMENT

Appendix ASecure Timer Popularity

Secure Timer Popularity

Appendix BRealizing the Availability Hardware Abstract Layer on ARM Platforms

Realizing the Availability Hardware Abstract Layer on ARM Platforms

Appendix CCase Study on Autonomous Drone

Case Study on Autonomous Drone

Appendix DEvaluation on Different Physical Conditions

Evaluation on Different Physical Conditions

APpendix EAdditional Evaluation on Control Performance

Additional Evaluation on Control Performance

Appendix Fadditional security analysis

additional security analysis

Appendix GAdditional Details on Scheduling Subsystem

Additional Details on Scheduling Subsystem

Appendix HAdditional Details on I/O Subsystem

Additional Details on I/O Subsystem

Appendix IDetails on Secure Clock and Power

Details on Secure Clock and Power

Appendix JAdditional Related Work

Additional Related Work

References

Appendix A
Secure Timer Popularity

Appendix B
Realizing the Availability Hardware Abstract Layer on ARM Platforms

Appendix C
Case Study on Autonomous Drone

Appendix D
Evaluation on Different Physical Conditions

APpendix E
Additional Evaluation on Control Performance

Appendix F
additional security analysis

Appendix G
Additional Details on Scheduling Subsystem

Appendix H
Additional Details on I/O Subsystem

Appendix I
Details on Secure Clock and Power

Appendix J
Additional Related Work