I. Introduction
Over the past decade, DL (Deep Learning) has gained tremendous success in many areas including Image Classification, Natural Language Processing, Self-driving cars, etc. DNNs (Deep Neural Networks) is the key technology that is capable of automatically extracting features from multi-modal datasets and developing a model that understands the complex and non-linear relationships between these features. Training these DNNs is a compute-intensive workload that is typically done on parallel systems with GPUs (Graphics Processing Units). DL frameworks like TensorFlow [1] and PyTorch [2] support efficient DNN training on such systems.