Conferences >2021 IEEE International Sympo...

Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Memory capacity is a major bottleneck for training deep neural networks (DNN). Heterogeneous memory (HM) combining fast and slow memories provides a promising direction t...Show More

Metadata

Abstract:

Memory capacity is a major bottleneck for training deep neural networks (DNN). Heterogeneous memory (HM) combining fast and slow memories provides a promising direction to increase memory capacity. However, HM imposes challenges on tensor migration and allocation for high performance DNN training. Prior work heavily relies on DNN domain knowledge, unnecessarily causes tensor migration due to page-level false sharing, and wastes fast memory space. We present Sentinel, a software runtime system that automatically optimizes tensor management on HM. Sentinel uses dynamic profiling, and coordinates operating system (OS) and runtime-level profiling to bridge the semantic gap between OS and applications, which enables tensor-level profiling. This profiling enables co-allocating tensors with similar lifetime and memory access frequency into the same pages. Such fine-grained profiling and tensor collocation avoids unnecessary data movement, improves tensor movement efficiency, and enables larger batch training because of saving in fast memory space. Sentinel reduces fast memory consumption by 80% while retaining comparable performance to fast memory-only system; Sentinel consistently outperforms a state-of-the-art solution on CPU by 37% and two state-of-the-art solutions on GPU by 2x and 21% respectively in training throughput.

Published in: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

Date of Conference: 27 February 2021 - 03 March 2021

Date Added to IEEE Xplore: 22 April 2021

ISBN Information:

ISSN Information:

DOI: 10.1109/HPCA51647.2021.00057

Conference Location: Seoul, Korea (South)

Funding Agency:

Contents

I. Introduction

Deep neural networks (DNN) have been shown preliminary success in many fields. However, training those models can be extremely memory-consuming. For example, the recent language models and translation models have 100s of billions of parameters [1] requiring 100s of GB of memory for training. Although it has been repeatedly demonstrated that larger models and more data lead to improved model accuracy on many tasks [2]–[4], the memory becomes a major bottleneck either when training models with more weight parameters or with larger batch sizes. Lack of memory causes DNN training to have out-of-memory crashes and limits the sizes of the model and batch for training, causing degradation in training effectiveness and efficiency [5],[6]. Adding more DRAM can mitigate the problem, but often comes with huge costs. In this work, we look into overcoming the memory scaling issue for DNN training by leveraging heterogeneous memory (HM) to achieve larger memory capacity.

References is not available for this document.

Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References