I. Introduction
The parameter server framework [1] [2] has been developed to support distributed training of large-scale machine learning (ML)models (such as deep neural networks [3]–[5])over very large data sets, such as Microsoft COCO [6], ImageNet 1K [3] and ImageNet 22K [7]. Training a deep model using a large-scale cluster with an efficient distributed paradigm reduces the training time from weeks on a single server to days or hours.