Loading [MathJax]/extensions/MathMenu.js
Towards Efficient Scheduling of Federated Mobile Devices Under Computational and Statistical Heterogeneity | IEEE Journals & Magazine | IEEE Xplore

Towards Efficient Scheduling of Federated Mobile Devices Under Computational and Statistical Heterogeneity


Abstract:

Originated from distributed learning, federated learning enables privacy-preserved collaboration on a new abstracted level by sharing the model parameters only. While the...Show More

Abstract:

Originated from distributed learning, federated learning enables privacy-preserved collaboration on a new abstracted level by sharing the model parameters only. While the current research mainly focuses on optimizing learning algorithms and minimizing communication overhead left by distributed learning, there is still a considerable gap when it comes to the real implementation on mobile devices. In this article, we start with an empirical experiment to demonstrate computation heterogeneity is a more pronounced bottleneck than communication on the current generation of battery-powered mobile devices, and the existing methods are haunted by mobile stragglers. Further, non-identically distributed data across the mobile users makes the selection of participants critical to the accuracy and convergence. To tackle the computational and statistical heterogeneity, we utilize data as a tuning knob and propose two efficient polynomial-time algorithms to schedule different workloads on various mobile devices, when data is identically or non-identically distributed. For identically distributed data, we combine partitioning and linear bottleneck assignment to achieve near-optimal training time without accuracy loss. For non-identically distributed data, we convert it into an average cost minimization problem and propose a greedy algorithm to find a reasonable balance between computation time and accuracy. We also establish an offline profiler to quantify the runtime behavior of different devices, which serves as the input to the scheduling algorithms. We conduct extensive experiments on a mobile testbed with two datasets and up to 20 devices. Compared with the common benchmarks, the proposed algorithms achieve 2-100× speedup epoch-wise, 2–7 percent accuracy gain and boost the convergence rate by more than 100 percent on CIFAR10.
Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 32, Issue: 2, 01 February 2021)
Page(s): 394 - 410
Date of Publication: 14 September 2020

ISSN Information:

Funding Agency:

References is not available for this document.

1 Introduction

The tremendous success of machine learning stimulates a new wave of smart applications. Despite the great convenience, these applications consume massive personal data, at the expense of our privacy. The growing concerns of privacy become one of the major impetus to shift computation from the centralized cloud to users’ end devices such as mobile, edge and IoTs. The current solution supports running on-device inference from a pre-trained model in near real-time [23], [24], whereas their capability to adapt to the new data and learn from each other is still limited.

Select All
1.
J. Dean et al., "Large scale distributed deep networks", Proc. 25th Int. Conf. Neural Inf. Process. Syst., pp. 1223-1231, 2012.
2.
B. McMahan, E. Moore, D. Ramage, S. Hampson and B. Arcas, "Communication-efficient learning of deep networks from decentralized data", Proc. 20th Int. Conf. Artif. Intell. Statist., pp. 1273-1282, 2017.
3.
X. Li, K. Huang, W. Yang, S. Wang and Z. Zhang, "On the convergence of FedAvg on Non-IID data", Proc. Int. Conf. Learn. Representations, 2020.
4.
Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin and V. Chandra, "Federated learning with Non-IID data".
5.
E. Jeong, S. Oh, H. Kim, S. Kim, J. Park and M. Bennis, "Communication-efficient on-device machine learning: Federated distillation and augmentation under non-IID private data", Proc. 32nd Conf. Neural Inf. Process. Syst. 2nd Workshop Mach. Learn. Phone Consum. Devices, 2018.
6.
K. Bonawitz et al., "Towards federated learning at scale: System design", Proc. 2nd SysML Conf., 2019.
7.
Y. Chen, S. Biookaghazadeh and M. Zhao, "Exploring the capabilities of mobile devices in supporting deep learning", Proc. 4th ACM/IEEE Symp. Edge Comput., pp. 127-138, 2019.
8.
"Federated learning: Strategies for improving communication efficiency", Proc. Int. Conf. Neural Inf. Process. Syst..
9.
L. Wang, W. Wang and B. Li, "CMFL: Mitigating communication overhead for federated learning", Proc. IEEE 39th Int. Conf. Distrib. Comput. Syst., pp. 954-964, 2019.
10.
X. Lian et al., "Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent", Proc. 31st Int. Conf. Neural Inf. Process. Syst., pp. 5336-5346, 2017.
11.
H. Zhu and Y. Jin, "Multi-objective evolutionary federated learning", IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 4, pp. 1310-1322, 2020.
12.
V. Smith, C. Chiang, M. Sanjabi and A. Talwalkar, "Federated multi-task learning", Proc. 31st Int. Conf. Neural Inf. Process. Syst., pp. 4427-4437, 2017.
13.
L. Liu, J. Zhang, S. H. Song and K. Letaief, "Client-edge-cloud hierarchical federated learning", Proc. IEEE Int. Conf. Commun., pp. 1-6, 2020.
14.
S. Zheng et al., "Asynchronous stochastic gradient descent with delay compensation", Proc. 34th Int. Conf. Mach. Learn., pp. 4120-4129, 2017.
15.
Q. Ho et al., "More effective distributed ML via a stale synchronous parallel parameter server", Proc. 26th Int. Conf. Neural Inf. Process. Syst., pp. 1223-1231, 2013.
16.
J. Vicarte, B. Schriber, R. Paccagnella and C. Fletcher, "Game of threads: Enabling asynchronous poisoning attacks", Proc. 25th Int. Conf. Architectural Support Program. Lang. Operating Syst., pp. 35-52, 2020.
17.
R. Tandon, Q. Lei, A. Dimakis and N. Karampatziakis, "Gradient coding: Avoiding stragglers in distributed learning", Proc. Mach. Learn. Res., vol. 70, pp. 3368-3376, 2017.
18.
B. McMahan and D. Ramage, "Federated learning: Collaborative machine learning without centralized training data".
19.
K. Bonawitz et al., "Practical secure aggregation for privacy-preserving machine learning", Proc. ACM SIGSAC Conf. Comput. Commun. Secur., pp. 1175-1191, 2017.
20.
P. Blanchard, E. Mhamdi, R. Guerraoui and J. Stainer, "Machine learning with adversaries: Byzantine tolerant gradient descent", Proc. 31st Int. Conf. Neural Inf. Process. Syst., pp. 118-128, 2017.
21.
[online] Available: https://www.macworld.com/article/3442716/inside-apples-a13-bionic-system-on-chip.html.
22.
[online] Available: https://consumer.huawei.com/en/campaign/kirin980.
23.
X. Zeng, K. Cao and M. Zhang, "MobileDeepPill: A small-footprint mobile deep learning system for recognizing unconstrained pill images", Proc. 15th Annu. Int. Conf. Mobile Syst. Appl. Serv., pp. 56-67, 2017.
24.
A. Mathur, N. Lane, D. Bhattacharya, S. Boran, A. Forlivesi and C. Kawsar, "DeepEye: Resource efficient local execution of multiple deep vision models using wearable commodity hardware", Proc. 15th Annu. Int. Conf. Mobile Syst. Appl. Serv., pp. 68-81, 2017.
25.
S. Han, H. Mao and W. J. Dally, "Deep compression: Compressing deep neural networks with pruning trained quantization and huffman coding", Proc. Int. Conf. Learn. Representations, 2016.
26.
J. Frankle and M. Carbin, "The lottery ticket hypothesis: Finding sparse trainable neural networks", Proc. Int. Conf. Learn. Representations, 2019.
27.
C. Wang, Y. Xiao, X. Gao, L. Li and J. Wang, "Close the gap between deep learning and mobile intelligence by incorporating training in the loop", Proc. 27th ACM Int. Conf. Multimedia, pp. 1419-1427, 2019.
28.
[online] Available: https://www.arm.com/why-arm/technologies/big-little.
29.
[online] Available: https://developer.android.com/guide/topics/manifest/application-element.
30.
D. Haussler, "Quantifying inductive bias: AI learning algorithms and Valiants learning framework", J. Artif. Intell., vol. 36, no. 2, pp. 177-221, 1988.

Contact IEEE to Subscribe

References

References is not available for this document.