Entropy-Based Gradient Compression for Distributed Deep Learning | IEEE Conference Publication | IEEE Xplore

Entropy-Based Gradient Compression for Distributed Deep Learning


Abstract:

Nowadays, with the increasing amount of data and scale of network models, distributed deep learning, i.e., training a deep neural network using multiple workers distribut...Show More

Abstract:

Nowadays, with the increasing amount of data and scale of network models, distributed deep learning, i.e., training a deep neural network using multiple workers distributed across different computing nodes, is becoming more and more popular. One of the major challenges of distributed deep learning lies in the frequent communication of gradients among workers, because it may cause severe bottlenecks, in terms of time latency and bandwidth. In this paper, we propose a novel approach named Entropy-based Gradient Compression (EGC) to reduce communication overhead. The major components of EGC are two algorithms: the entropy-based threshold selection algorithm and the automatic learning rate correction algorithm. To improve the accuracy, EGC also includes two commonly used algorithms: gradient residual and momentum correction. To evaluate the performance of EGC, we conduct experiments based on applications of image classification and language modeling using public datasets including Cifar10, Tiny ImageNet and Penn Treebank. The experiment results show that, compared with existing works, EGC can achieve a gradient compression ratio about 1000× while keeping the accuracy similar or even higher.
Date of Conference: 10-12 August 2019
Date Added to IEEE Xplore: 03 October 2019
ISBN Information:
Conference Location: Zhangjiajie, China

I. Introduction

In the past few decades, Deep Learning has become the essential machine learning algorithm and achieved remarkable success in various application fields ranging from machine translation [1], image processing [2], speech recognition [3] and many others. With increase of the scale and complexity of data sets and neural network models, the training of deep neural networks also becomes more and more difficult. For example, the number of parameters of a single network model can be as large as a billion [4]. Training such complex neural networks is very costly, in both time and computing resources. Therefore, how to reduce training time while keeping satisfactory accuracy has become a hotspot in deep learning research.

Contact IEEE to Subscribe

References

References is not available for this document.