Conferences >2023 IEEE 29th International ...

Optimizing Transformer Training Based on Computation and Accessing Memory Features with Deep Learning Processor

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The Transformer model, which has significantly advanced natural language processing and computer vision, overcomes the limitations of recurrent neural networks and convol...Show More

Metadata

Abstract:

The Transformer model, which has significantly advanced natural language processing and computer vision, overcomes the limitations of recurrent neural networks and convolutional neural networks. However, it faces challenges with computational efficiency and memory management due to complex computations and variable-length inputs. Despite research efforts, these issues persist. This paper presents a novel optimization of the Transformer model, following an indepth analysis of its computational graph structure. Firstly, we utilize Deep Computing Unit (DCU) as our hardware platform. Secondly, we optimize element-wise and reduction operators through operator fusion and rewriting. Thirdly, we develop a fine-grained memory management algorithm using a greedy strategy. As a result, the training speed of the Transformer model increases by 1.2x - 1.4x without compromising accuracy.

Published in: 2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)

Date of Conference: 17-21 December 2023

Date Added to IEEE Xplore: 26 March 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/ICPADS60453.2023.00380

Conference Location: Ocean Flower Island, China

Funding Agency:

Contents

I. Introduction

The Transformer model [1] has revolutionized deep learning, particularly in Natural Language Processing [2] and Computer Vision [3], by addressing the limitations of Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN). It alleviates the weak parallelism and long-term dependency problems of RNN and discards the viewpoint limitations of CNN. Despite its popularity and the emergence of various variants, the model faces challenges with computational efficiency and memory management due to complex computations and variable-length inputs. The computation required for Transformer models is significantly higher than traditional models, which could slow down research progress due to limited computing resources. Additionally, the handling of variable-length inputs by the Transformer model, which are converted into multiple fixed-length inputs, can result in additional computations. This affects performance and poses a challenge to memory management.

References is not available for this document.

MIT Libraries

MIT Libraries

Optimizing Transformer Training Based on Computation and Accessing Memory Features with Deep Learning Processor

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Optimizing Transformer Training Based on Computation and Accessing Memory Features with Deep Learning Processor

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?