Loading [MathJax]/extensions/MathMenu.js
Optimizing Transformer Training Based on Computation and Accessing Memory Features with Deep Learning Processor | IEEE Conference Publication | IEEE Xplore

Optimizing Transformer Training Based on Computation and Accessing Memory Features with Deep Learning Processor


Abstract:

The Transformer model, which has significantly advanced natural language processing and computer vision, overcomes the limitations of recurrent neural networks and convol...Show More

Abstract:

The Transformer model, which has significantly advanced natural language processing and computer vision, overcomes the limitations of recurrent neural networks and convolutional neural networks. However, it faces challenges with computational efficiency and memory management due to complex computations and variable-length inputs. Despite research efforts, these issues persist. This paper presents a novel optimization of the Transformer model, following an indepth analysis of its computational graph structure. Firstly, we utilize Deep Computing Unit (DCU) as our hardware platform. Secondly, we optimize element-wise and reduction operators through operator fusion and rewriting. Thirdly, we develop a fine-grained memory management algorithm using a greedy strategy. As a result, the training speed of the Transformer model increases by 1.2x - 1.4x without compromising accuracy.
Date of Conference: 17-21 December 2023
Date Added to IEEE Xplore: 26 March 2024
ISBN Information:

ISSN Information:

Conference Location: Ocean Flower Island, China

Funding Agency:


I. Introduction

The Transformer model [1] has revolutionized deep learning, particularly in Natural Language Processing [2] and Computer Vision [3], by addressing the limitations of Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN). It alleviates the weak parallelism and long-term dependency problems of RNN and discards the viewpoint limitations of CNN. Despite its popularity and the emergence of various variants, the model faces challenges with computational efficiency and memory management due to complex computations and variable-length inputs. The computation required for Transformer models is significantly higher than traditional models, which could slow down research progress due to limited computing resources. Additionally, the handling of variable-length inputs by the Transformer model, which are converted into multiple fixed-length inputs, can result in additional computations. This affects performance and poses a challenge to memory management.

Contact IEEE to Subscribe

References

References is not available for this document.