Loading [MathJax]/extensions/MathMenu.js
Share-Aware Joint Model Deployment and Task Offloading for Multi-Task Inference | IEEE Journals & Magazine | IEEE Xplore

Share-Aware Joint Model Deployment and Task Offloading for Multi-Task Inference


Abstract:

In vehicular edge computing, efficient strategies for model deployment and task offloading offer tremendous potential to reduce response time for machine learning inferen...Show More

Abstract:

In vehicular edge computing, efficient strategies for model deployment and task offloading offer tremendous potential to reduce response time for machine learning inference. However, existing works do not pay much attention to that there are shared structures among different types of inference tasks. This limits the improvement in response time. This paper aims to fill this gap by investigating a share-aware joint model deployment and task offloading problem for multi-task inference in vehicular edge computing. We formulate the problem with an objective to minimize the total response time of all inference requests, under constraints of per task response time, per roadside unit storage capacity, etc. We prove that the formulated problem is NP-hard. To solve the problem, a time period aware algorithm, called TPA, is proposed with guaranteed approximation ratio. In TPA, an iterative approach is designed to solve the problem of maximizing system throughput during a certain time period. Then, the certain time period approximates to the minimum time period of completing all requests. The algorithms are evaluated in the environment comprising two CPUs, two GPUs, state-of-the-art multi-task learning models and the dataset of Google cluster-usage trace. Simulation results derived from this environment show that, the proposed TPA outperforms the state-of-the-art methods for all cases, in terms of the total response time of all requests. For example, TPA can significantly reduce the total response time by at least 73.72% for different numbers of RSUs considered, compared with state-of-the-art methods.
Published in: IEEE Transactions on Intelligent Transportation Systems ( Volume: 25, Issue: 6, June 2024)
Page(s): 5674 - 5687
Date of Publication: 16 January 2024

ISSN Information:

Funding Agency:


I. Introduction

Recent years have witnessed significant efforts on improving vehicle safety and efficiency, with emerging driving assistance applications, such as avoiding collisions, augment reality navigation, lane following, etc. [1], [2], and [3] in the intelligent transportation systems. Generally, these applications rely on machine learning techniques to achieve smart and better performance. Specifically, these applications perform inference on real-time input data using pre-trained machine learning models. As reported by Gartner, autonomous vehicles will be among the top five fields of artificial intelligence software spending with a growth rate of 20.1% in 2022 [4]. However, these applications generally suffer from poor quality of service (QoS). This is because the massive resources required for computation, communication and storage in machine learning inference are limited in vehicles. Internet of vehicles (IoV) has attracted lots of attention to provide high quality of computing services [5], [6], [7]. Thus, vehicular edge computing (VEC) has emerged as an appealing paradigm to support delay-sensitive and computationally intensive services, by exploiting computation, storage and communication resources at the edge of vehicular network [8], [9]. Specifically, in VEC, edge servers are deployed on roadside units (RSUs) to enable on-board driving tasks meet real-time requirements [10], [11], [12].

Contact IEEE to Subscribe

References

References is not available for this document.