Loading [MathJax]/extensions/MathMenu.js
A Silicon Photonic Multi-DNN Accelerator | IEEE Conference Publication | IEEE Xplore

A Silicon Photonic Multi-DNN Accelerator


Abstract:

In shared environments like cloud-based datacenters, hardware accelerators are deployed to meet the scale-out computation demands of deep neural network (DNN) inference t...Show More

Abstract:

In shared environments like cloud-based datacenters, hardware accelerators are deployed to meet the scale-out computation demands of deep neural network (DNN) inference tasks. As conventional hardware accelerators optimized for single-DNN execution cannot effectively resolve the dynamic interaction of these inference-as-a-service (INFaaS) tasks, several multi-DNN hardware accelerators have been developed to improve the overall system performance while adhering to the constraints of individual tasks. Some of such multi-DNN hardware accelerators temporally schedule tasks by incorporating the preemption or load-balancing-based algorithm but suffer from resource underutilization because of unmanaged mismatch between resource demand and provision. Other multi-DNN hardware accelerators enable spatial colocation of tasks to improve resource utilization and system flexibility, but the irregular communication patterns between the fragmented resource partitions cannot be adequately supported by the metallic-based interconnects due to their rigidity and other inherent scaling limitations. We introduce a photonic multi-DNN accelerator named Aspire in this paper. The fundamental novelty of Aspire lies in the ability to adaptively create sub-accelerators for different tasks by assembling fine-grained resource partitions in the same architecture. Seamless communications between those fragmented resource partitions from a sub-accelerator are realized by exploiting photonic interconnects. Specifically, Aspire includes three novel designs: (1) a photonic network that can be adaptively partitioned into several sub-networks, each seamlessly connecting the fragmented resource partitions to construct sub-accelerators; (2) a dataflow that simultaneously leverages temporal and spatial data reuse opportunities within each resource partition and across several resource partitions, respectively; (3) an algorithm that allocates resource partitions at task granularity and derives optimal tile s...
Date of Conference: 21-25 October 2023
Date Added to IEEE Xplore: 27 December 2023
ISBN Information:
Conference Location: Vienna, Austria

Funding Agency:

References is not available for this document.

I. Introduction

Large-scale accelerators are increasingly being deployed in shared multi-DNN environments (such as in cloud data centers [1]–[4]) in order to meet the demands of large-scale compute-intensive deep neural network (DNN) workloads. Typically, these inference-as-a-service (INFaaS) requests from different DNN applications are satisfied by partitioning the large accelerator into multiple smaller accelerators by distributing the workloads and allocating resources to each inference request [5]–[8]. As INFaaS demands increase with stringent quality of service (QoS) guarantees for DNN applications, DNN accelerators will be required to allocate resources incrementally while allowing seamless communication for data movement. Most prior single-task execution-based DNN accelerators [1], [9]–[19] cannot be directly utilized for multi-DNN workload since the underlying hardware was not designed for guaranteeing fairness or other service-level agreements (SLA). Further, naively applying single-task DNN accelerators to multi-DNN workloads can also lead to underutilized hardware resources which can impact throughput and increase latency.

Select All
1.
N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, et al., "In-Datacenter Performance Analysis of a Tensor Processing Unit", Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA), pp. 1-12, June 2017.
2.
J. Fowers, K. Ovtcharov, M. Papamichael, T. Massengill, M. Liu, D. Lo, et al., "A Configurable Cloud-Scale DNN Processor for Real-Time AI", Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA), pp. 1-14, June 2018.
3.
U. Gupta, S. Hsia, V. Saraph, X. Wang, B. Reagen, G.-Y. Wei, et al., "DeepRecSys: A System for Optimizing End-to-End At-Scale Neural Recommendation Inference", Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA), pp. 982-995, May 2020.
4.
F. Romero, Q. Li, N. J. Yadwadkar and C. Kozyrakis, "INFaaS: A Model-Less and Managed Inference Serving System", arXiv Preprint, pp. 1-16, May 2019.
5.
Y. Choi and M. Rhu, "PREMA: A Predictive Multi-Task Scheduling Algorithm for Preemptible Neural Processing Units", Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 220-233, February 2020.
6.
E. Baek, D. Kwon and J. Kim, "A Multi-Neural Network Acceleration Architecture", Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA), pp. 940-953, May 2020.
7.
S. Ghodrati, B. H. Ahn, J. K. Kim, S. Kinzer, B. R. Yatham, N. Alla, et al., "Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks", Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 681-697, October 2020.
8.
J. Lee, J. Choi, J. Kim, J. Lee and Y. Kim, "Dataflow Mirroring: Architectural Support for Highly Efficient Fine-Grained Spatial Multitasking on Systolic-Array NPUs", Proceedings of the ACM/IEEE Design Automation Conference (DAC), pp. 247-252, December 2021.
9.
Y. S. Shao, J. Clemons, R. Venkatesan, B. Zimmer, M. Fojtik, N. Jiang, et al., "Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-based Architecture", Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 14-27, October 2019.
10.
Y.-H. Chen, J. Emer and V. Sze, "Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks", Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA), pp. 367-379, June 2016.
11.
Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, et al., "ShiDianNao: Shifting Vision Processing Closer to the Sensor", Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA), pp. 92-104, June 2015.
12.
S. Chakradhar, M. Sankaradas, V. Jakkula and S. Cadambi, "A Dynamically Configurable Coprocessor for Convolutional Neural Networks", Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA), pp. 247-257, June 2010.
13.
L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim and L. Benini, "Origami: A Convolutional Network Accelerator", Proceedings of the ACM Great Lakes Symposium on VLSI (GLVLSI), pp. 199-204, May 2015.
14.
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, et al., "Di-anNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous. Machine-Learning", Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 269-284, February 2014.
15.
J. Albericio, P. Judd, T. Hetherington, T. Adamodt, N. E. Jerger and A. Moshovos, "Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing", Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA), pp. 1-13, June 2016.
16.
A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, et al., "SCNN: An Accelerator for Compressed-Sparse Convolutional Neural Networks", Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA), pp. 27-40, June 2017.
17.
Y. Li, A. Louri and A. Karanth, "SPACX: Silicon Photonics-based Scalable Chiplet Accelerator for DNN Inference", Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 831-845, April 2022.
18.
Y. Li, K. Wang, H. Zheng, A. Louri and A. Karanth, "ASCEND: A Scalable and Energy-Efficient Deep Neural Network Accelerator with Photonic Interconnects", IEEE Transactions on Circuits and Systems I (TCAS-I), vol. 69, no. 7, pp. 2730-2741, July 2022.
19.
Y. Li, A. Louri and A. Karanth, "Scaling Deep-Learning Inference with Chiplet-based Architecture and Photonic Interconnects", Proceedings of the ACM/IEEE Design Automation Conference (DAC), pp. 931-936, December 2021.
20.
D. A. B. Miller, "Device Requirements for Optical Interconnects to Silicon Chips", Proceedings of the IEEE, vol. 97, no. 7, pp. 1166-1185, June 2009.
21.
R. Soref, "The Past Present and Future of Silicon Photonics", IEEE Journal of Selected Topics in Quantum Electronics, vol. 12, no. 6, pp. 1678-1687, November 2006.
22.
K. Bergman, L. P. Carloni, A. Biberman, J. Chan and G. Hendry, Photonic Network-on-Chip Design, Springer, 2014.
23.
Y. Demir, Y. Pan, S. Song, N. Hardavellas, J. Kim and G. Memik, "Galaxy: A High-Performance Energy-Efficient Multi-Chip Architecture Using Photonic Interconnects", Proceedings of the ACM International Conference on Supercomputing (ICS), pp. 303-312, June 2014.
24.
N. Kirman, M. Kirman, R. K. Dokania, J. F. Martinez, A. B. Apsel, M. A. Watkins, et al., "Leveraging Optical Technology in Future Bus-based Chip Multiprocessors", Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 492-503, December 2006.
25.
P. Grani, R. Proietti, V. Akella and S. J. B. Yoo, "Design and Evaluation of AWGR-based Photonic NoC Architectures for 2.5D Integrated High Performance Computing Systems", Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 289-300, February 2017.
26.
D. Vantrease, R. Schreiber, M. Monchiero, M. McLaren, N. P. Jouppi, M. Fiorentino, et al., "Corona: System Implications of Emerging Nanophotonic Technology", Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA), pp. 153-164, June 2008.
27.
A. K. Ziabar, J. L. Abellan, R. Ubal, C. Chen, A. Joshi and D. Kaeli, "Leveraging Silicon-Photonic NoC for Designing Scalable GPUs", Proceedings of the ACM International Conference on Supercomputing (ICS), pp. 273-282, June 2015.
28.
Y. Pan, P. Kumar, J. Kim, G. Memik, Y. Zhang and A. Choudhary, "Firefly: Illuminating Future Network-on-Chip with Nanophotonics", Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA), pp. 429-440, June 2009.
29.
Y. Thonnart, S. Bernabe, J. Charbonnier, C. Bernard, D. Coriat, C. Fuguet, et al., "POPSTAR: A Robust Modular Optical NoC Architecture for Chiplet-based 3D Integrated Systems", Porceedings of the Design Automation & Test in Europe Conference & Exhibition (DATE), pp. 1456-1461, March 2020.
30.
Y. Li, A. Louri and A. Karanth, "SPRINT: A High-Performance Energy-Efficient and Scalable Chiplet-based Accelerator with Photonic Interconnects for CNN Inference", IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 33, no. 10, pp. 2332-2345, October 2022.

Contact IEEE to Subscribe

References

References is not available for this document.