Conferences >2023 32nd International Confe...

A Silicon Photonic Multi-DNN Accelerator

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In shared environments like cloud-based datacenters, hardware accelerators are deployed to meet the scale-out computation demands of deep neural network (DNN) inference t...Show More

Metadata

Abstract:

In shared environments like cloud-based datacenters, hardware accelerators are deployed to meet the scale-out computation demands of deep neural network (DNN) inference tasks. As conventional hardware accelerators optimized for single-DNN execution cannot effectively resolve the dynamic interaction of these inference-as-a-service (INFaaS) tasks, several multi-DNN hardware accelerators have been developed to improve the overall system performance while adhering to the constraints of individual tasks. Some of such multi-DNN hardware accelerators temporally schedule tasks by incorporating the preemption or load-balancing-based algorithm but suffer from resource underutilization because of unmanaged mismatch between resource demand and provision. Other multi-DNN hardware accelerators enable spatial colocation of tasks to improve resource utilization and system flexibility, but the irregular communication patterns between the fragmented resource partitions cannot be adequately supported by the metallic-based interconnects due to their rigidity and other inherent scaling limitations. We introduce a photonic multi-DNN accelerator named Aspire in this paper. The fundamental novelty of Aspire lies in the ability to adaptively create sub-accelerators for different tasks by assembling fine-grained resource partitions in the same architecture. Seamless communications between those fragmented resource partitions from a sub-accelerator are realized by exploiting photonic interconnects. Specifically, Aspire includes three novel designs: (1) a photonic network that can be adaptively partitioned into several sub-networks, each seamlessly connecting the fragmented resource partitions to construct sub-accelerators; (2) a dataflow that simultaneously leverages temporal and spatial data reuse opportunities within each resource partition and across several resource partitions, respectively; (3) an algorithm that allocates resource partitions at task granularity and derives optimal tile s...

Published in: 2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

Date of Conference: 21-25 October 2023

Date Added to IEEE Xplore: 27 December 2023

ISBN Information:

DOI: 10.1109/PACT58117.2023.00028

Conference Location: Vienna, Austria

Funding Agency:

Contents

I. Introduction

Large-scale accelerators are increasingly being deployed in shared multi-DNN environments (such as in cloud data centers [1]–[4]) in order to meet the demands of large-scale compute-intensive deep neural network (DNN) workloads. Typically, these inference-as-a-service (INFaaS) requests from different DNN applications are satisfied by partitioning the large accelerator into multiple smaller accelerators by distributing the workloads and allocating resources to each inference request [5]–[8]. As INFaaS demands increase with stringent quality of service (QoS) guarantees for DNN applications, DNN accelerators will be required to allocate resources incrementally while allowing seamless communication for data movement. Most prior single-task execution-based DNN accelerators [1], [9]–[19] cannot be directly utilized for multi-DNN workload since the underlying hardware was not designed for guaranteeing fairness or other service-level agreements (SLA). Further, naively applying single-task DNN accelerators to multi-DNN workloads can also lead to underutilized hardware resources which can impact throughput and increase latency.

References is not available for this document.

A Silicon Photonic Multi-DNN Accelerator

Abstract:

Metadata

Abstract:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Silicon Photonic Multi-DNN Accelerator

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

I. Introduction

References