I. Introduction
Sparse empirical risk minimization (ERM) [1]–[4] is an important machine learning model, which learns from data collected from individuals. Despite its usefulness and the large body of research for improving its efficiency [5]–[7], its involvement of sensitive individual data poses increasing privacy concerns, especially in applications of healthcare, financial, and biomedicine. To avoid breaching the privacy of the individuals, privacy-preserving techniques have been developed to ensure that the adversary cannot infer any individual data from the output of the learning process. Beginning with the seminal work [8], which proposes to carry out the private ERM training under the formal statistical differential privacy (DP) notion [9], different types of differentially private optimization algorithms have been developed to suit various computing contexts. In particular, existing works focus on training the model with centralized datasets [10]–[14] and samplewise distributed datasets [15]–[18]. For example, when samples distribute among user sites, Lou et al. [16] and Han et al. [18] proposed privacy-preserving strategies for the stochastic (sub) gradient (SGD) method to avoid sensitive user information leakage during distributed optimization. Agarwal et al. [15] and Jin et al. [17] further reduced the uplink communication cost by utilizing gradient quantization techniques [19]–[24]. These existing research studies, although providing a decent privacy-preserving guarantee with optimal utility and efficiency for centralized and samplewise distributed settings, leave the featurewise distributed private training barely studied.