Conferences >2019 IEEE 21st International ...

Entropy-Based Gradient Compression for Distributed Deep Learning

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Nowadays, with the increasing amount of data and scale of network models, distributed deep learning, i.e., training a deep neural network using multiple workers distribut...Show More

Metadata

Abstract:

Nowadays, with the increasing amount of data and scale of network models, distributed deep learning, i.e., training a deep neural network using multiple workers distributed across different computing nodes, is becoming more and more popular. One of the major challenges of distributed deep learning lies in the frequent communication of gradients among workers, because it may cause severe bottlenecks, in terms of time latency and bandwidth. In this paper, we propose a novel approach named Entropy-based Gradient Compression (EGC) to reduce communication overhead. The major components of EGC are two algorithms: the entropy-based threshold selection algorithm and the automatic learning rate correction algorithm. To improve the accuracy, EGC also includes two commonly used algorithms: gradient residual and momentum correction. To evaluate the performance of EGC, we conduct experiments based on applications of image classification and language modeling using public datasets including Cifar10, Tiny ImageNet and Penn Treebank. The experiment results show that, compared with existing works, EGC can achieve a gradient compression ratio about 1000× while keeping the accuracy similar or even higher.

Published in: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

Date of Conference: 10-12 August 2019

Date Added to IEEE Xplore: 03 October 2019

ISBN Information:

DOI: 10.1109/HPCC/SmartCity/DSS.2019.00046

Conference Location: Zhangjiajie, China

Contents

I. Introduction

In the past few decades, Deep Learning has become the essential machine learning algorithm and achieved remarkable success in various application fields ranging from machine translation [1], image processing [2], speech recognition [3] and many others. With increase of the scale and complexity of data sets and neural network models, the training of deep neural networks also becomes more and more difficult. For example, the number of parameters of a single network model can be as large as a billion [4]. Training such complex neural networks is very costly, in both time and computing resources. Therefore, how to reduce training time while keeping satisfactory accuracy has become a hotspot in deep learning research.

References is not available for this document.

Entropy-Based Gradient Compression for Distributed Deep Learning

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Entropy-Based Gradient Compression for Distributed Deep Learning

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?