Loading [MathJax]/extensions/MathMenu.js
EcoFed: Efficient Communication for DNN Partitioning-Based Federated Learning | IEEE Journals & Magazine | IEEE Xplore

EcoFed: Efficient Communication for DNN Partitioning-Based Federated Learning


Abstract:

Efficiently running federated learning (FL) on resource-constrained devices is challenging since they are required to train computationally intensive deep neural networks...Show More

Abstract:

Efficiently running federated learning (FL) on resource-constrained devices is challenging since they are required to train computationally intensive deep neural networks (DNN) independently. DNN partitioning-based FL (DPFL) has been proposed as one mechanism to accelerate training where the layers of a DNN (or computation) are offloaded from the device to the server. However, this creates significant communication overheads since the intermediate activation and gradient need to be transferred between the device and the server during training. While current research reduces the communication introduced by DNN partitioning using local loss-based methods, we demonstrate that these methods are ineffective in improving the overall efficiency (communication overhead and training speed) of a DPFL system. This is because they suffer from accuracy degradation and ignore the communication costs incurred when transferring the activation from the device to the server. This article proposes EcoFed – a communication efficient framework for DPFL systems. EcoFed eliminates the transmission of the gradient by developing pre-trained initialization of the DNN model on the device for the first time. This reduces the accuracy degradation seen in local loss-based methods. In addition, EcoFed proposes a novel replay buffer mechanism and implements a quantization-based compression technique to reduce the transmission of the activation. It is experimentally demonstrated that EcoFed can reduce the communication cost by up to 133× and accelerate training by up to 21× when compared to classic FL. Compared to vanilla DPFL, EcoFed achieves a 16× communication reduction and 2.86× training time speed-up.
Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 35, Issue: 3, March 2024)
Page(s): 377 - 390
Date of Publication: 04 January 2024

ISSN Information:

Funding Agency:


I. Introduction

Federated learning (FL) is a privacy-preserving machine learning paradigm that facilitates collaborative training without transferring raw data from participating devices to a server [1], [2], [3]. However, running FL training on resource constrained devices is challenging since training deep neural networks (DNN), which is computationally expensive, is solely run on devices. This is a known bottleneck [4], [5], [6].

Contact IEEE to Subscribe

References

References is not available for this document.