Introduction
Deep learning has demonstrated promising outcomes in healthcare applications that support clinical data-based medical diagnosis and treatment decisions. It helps with text detection in medical laboratory reports, brain tumor segmentation and classification in Magnetic Resonance Imaging (MRI) scans, cancer diagnosis and prognosis, and many other tasks [1]. When developing deep learning models, a large amount of training data is needed for reliable performance. In the healthcare domain, such training data are obtained from various clinical resources such as biological sensors, clustered patients, hospitals, medical research institutions, pharmaceutical companies, etc. In the clinical setting, accessing healthcare data for training a specific deep learning model is challenging. Normally, the amount of data related to a specific disease/condition in a single institution is limited, while obtaining data from other institutions is complicated due to privacy and data protection-related regulations. In Canada, medical data are subject to the Personal Information Protection and Electronic Documents Act (PIPEDA) and also Provincial Privacy Laws such as British Columbia’s E-Health (Personal Health Information Access and Protection of Privacy) Act, Alberta’s Healthcare Information Act, and Manitoba’s Personal Health Information Act.1 Data privacy has been termed as the “most important issue in the next decade” [2]. It has gained prominence as a result of laws like the California Consumer Privacy Act (CCPA) and the General Data Protection Regulation (GDPR) of the European Union.2 Thus, considerable effort is required to enable model training on private data, such as patients’ medical records.
In the machine-learning literature, several solutions for data privacy-related issues have been established. Homomorphic encryption [3], differential privacy [4], Private Aggregation of Teacher Ensembles (PATE) [5], and secure multiparty computation [6] are traditional techniques for data privacy in Artificial Intelligence (AI). Federated Learning (FL) [7] is a more recent technique, involving multiple clients/institutions, each with their own data. By retaining the data locally at each client, FL enables model training using data from many institutions while avoiding data sharing and the challenges associated with establishing and maintaining large central databases. Although FL addresses the issues with data sharing, it is not without its own challenges. For example, FL assumes that each institution has sufficient computational resources to train its own version of the full deep model (called a local model), which is not always realistic.
Split Learning (SL) [8] was developed to tackle some of the challenges associated with FL, specifically those involving computational resources. SL allows medical institutions to implement only a portion of the model on the hardware resources available to them, while the majority of the computation is carried out at a remote server. At the same time, original data can be kept at each institution to preserve privacy. The amalgamation of FL algorithms with the SL framework facilitates a combination of these approaches, where multiple institutions/clients with limited resources can participate in training a large model. Our objective in this paper is to survey the FL, SL, and emerging hybrid Split-Federated Learning (SFL) approaches in healthcare. There are numerous challenges to such learning due to data properties (e.g., imbalanced or non-identically distributed data at different clients), privacy (embodied in various attacks), hardware heterogeneity, etc. We will review these issues throughout this survey article.
There are several related surveys on FL and SL in the literature, but relatively few focus on healthcare, particularly medical image analysis. Moreover, SFL, as a more recent concept, has not been reviewed in detail. Yang et al. [9] summarized the early FL literature for general privacy-preserving techniques. Kulkarni et al. [10] highlighted the need to personalize global models to work better for individual clients. Jin et al. [11] provided a brief overview of the most common semi-supervised FL algorithms, detailing potential approaches, contexts, and difficulties. Lim et al. [12] presented applications of FL for mobile edge network optimization. A survey of the research addressing communication constraints in FL has been presented in [13]. A survey of FL threats has been presented in [14], highlighting the intuition, various approaches, and fundamental premises of various attacks. The authors in [15] provided a detailed analysis of recent advances and open problems in FL.
In addition, [16] discussed the current state of FL in healthcare. The reference [17] presented a comprehensive review of FL in healthcare from the perspective of data properties and applications. Chowdhury et al. [18] reviewed FL in the field of oncology with its associated future clinical directions. Majorly focusing on the biomedical space, [19] presented their survey showing the challenges and potential solutions. Rieke et al. [20] explored how FL may benefit the future of digital health. Recently, [21] provided a detailed review of the current and future research trends of FL on medical applications, highlighting FL’s statistical problems such as device challenges, security, privacy concerns, etc.
As seen above, the concept of FL has been reviewed extensively in the recent literature, even in the area of healthcare. However, SL, and especially SFL, have yet to be thoroughly reviewed. This is where our survey paper differs from existing literature on the topic, as we survey all three emerging decentralized learning paradigms– FL, SL, and SFL– in the context of healthcare. This allows us to draw parallels and explore the benefits and weaknesses of all three learning methodologies from the point of view of healthcare. We focus particular attention on medical image analysis, one of the main areas of healthcare research. We review the existing publicly available datasets that can be used in medical image analysis research. To the best of our knowledge, this is the first survey to present a unified treatment of existing FL, SL, and SFL approaches with a view toward medical image analysis, along with related datasets.
The structure of the paper is as follows. Section II presents the preliminaries on centralized and decentralized learning needed for the remainder of the paper. Section III reviews the applications in healthcare from existing literature on FL, SL, and SFL, with a special view toward medical image analysis. The general challenges faced by the three emerging decentralized learning paradigms and proposed solutions are discussed in Section IV. Section V discusses challenges specific to healthcare, current trends, and possible future research directions. Finally, Section VI concludes the paper with a summary of the main conclusions.
Preliminaries
This Section contains descriptions of the survey’s primary topics. We describe centralized machine learning and decentralized machine learning, which are federated learning, split learning, and hybrid split-federated learning, in-depth.
A. Centralized Machine Learning
Centralized machine learning is carried out at one location, usually a powerful server or the cloud. The basic premise is that all data is available at that location. In a healthcare setting, this means that all clients (clinics, hospitals) need to transmit their labeled data to the server where the global model is being trained. Upon training, the inference may also be carried out at the central server, especially if the clients lack the computational infrastructure to run the model. In this case, clients may upload their data to the server and get inference/prediction results back from the server. This framework is illustrated in Fig. 1. Health service providers have used cloud-centralized frameworks, such as Microsoft Azure Healthcare, to promote healthcare solutions.
While centralized solutions address the lack of powerful computational infrastructure for the clients, the main issue with their use is the protection of health data. Both training and inference require uploading data to another location, which may be in a different jurisdiction and subject to different (if any) health-related privacy protection regulations. For this reason, centralized machine learning (and inference) in healthcare may not be feasible except in very special cases.
B. Decentralized Machine Learning
Decentralized learning can alleviate issues with data protection by keeping data at the clients. In the most basic setting, each client would train its own local model. However, this type of solution has many drawbacks. First, clients might not have the infrastructure to train large models, so they would be forced to compromise on the ultimate accuracy of their model, compared to a larger centrally-trained model. Second, since local models are trained only on the client’s local data, they will likely not generalize as well as a centrally-trained model with access to more data. Finally, having different models for different clients may cause inequity in health outcomes, as the clients with more data (e.g., those in urban centers with larger populations and better infrastructure) are likely to end up with better, more accurate models.
Although different local models can be pooled together through ensembling techniques [22], this would still require sharing data and/or predictions from different clients. A simple version of ensembling is illustrated in Fig. 2, where predictions made by the different local models are aggregated at the central server, but for this to work, local models would have to operate on the same input. Federated Learning (FL) [7] was developed as a decentralized learning strategy that avoids such data sharing, yet is able to create a model that benefits from all clients’ data. In the remainder of this section, we present the preliminaries of FL, as well as related decentralized learning strategies of Split Learning (SL) and hybrid Split-Federated Learning (SFL).
1) Federated Learning
Federated Learning [7] is a decentralized machine learning approach that enables multiple clients (data centers, organizations, remote devices,...) to train a global model without sharing their data. It was first introduced by Google to improve next-word prediction models on Android mobile phones. In FL, a model is trained using data from different clients, and the process is controlled by a central server, as illustrated in Fig. 3. The basic FL procedure has the following steps. Steps 2–5 comprise one federated round or global epoch and are repeated until stopping criteria are met:
Step 1:
The server transmits the initial model to the clients.
Step 2:
Clients train their local models with their own data for a certain number of local epochs.
Step 3:
Each client sends the parameters of its local model to the server.
Step 4:
The server aggregates the model parameters from various clients using an averaging scheme and sends the aggregated parameters back to the clients.
Step 5:
Clients update their local models with the aggregated model.
According to [15], FL methods can be divided into cross-silo, where FL takes place among data centers or organizations, and cross-device, where FL takes place among IoT/mobile devices, which have lower computational resources. Another classification exists considering clients’ features and labels [11], [15]: horizontal FL, vertical FL, and the federated transfer learning. In horizontal FL, datasets at different clients share the same feature space, but different label spaces. An example would be different units in the same hospital, which use the same input data (patients’ health records) but are interested in different diagnostics and/or outcomes. In vertical FL, clients share the same label space, but different feature spaces. An example would be oncology units in different hospitals or health authorities, which are interested in the same outcomes but may differ in the type of patient data they have access to. Finally, in federated transfer learning, neither feature space nor label space is shared across clients. An example would be the generic problem of “cancer detection” across different hospitals or health authorities. While available inputs and expected outputs may differ across different clients, knowledge of common characteristics of cancer may be transferable and useful for each client’s model.
FL can be applied to any machine learning model, although the focus in the current literature is on Deep Neural Networks (DNNs). Examples include Multi-Layer Perceptron (MLP) [23], which is common for making predictions using tabular medical data; Convolutional Neural Networks (CNNs), which are common for medical image analysis [24], and other models such as auto-encoders [25], Generative Adversarial Networks (GANs) [26], Long Short-Term Memory (LSTM) networks [27], Support Vector Machines (SVMs) [28], etc.
Although FL addresses data protection and privacy concerns, it requires all clients to train their local models, which are of the same size as the global model. This is challenging when considering modern DNNs with hundreds of millions of parameters since the clients might not have the required computational infrastructure and associated Information Technology (IT) support. Another distributed learning strategy– Split Learning– was developed to address this computational imbalance between the clients and the server.
2) Split Learning
In Split Learning (SL), a DNN is split into several parts, which can be located on various devices and/or servers. In its most basic form, a front-end of a DNN (usually the initial few layers) is located on a client device, and a more computationally-demanding back-end is located on a server. SL is the learning counterpart of split inference, also known as collaborative inference or collaborative intelligence [29], and was introduced as SplitNN [8], where the authors have described the split network training approaches with and without label propagation.
Various SL configurations are possible depending on where the data and the labels are located: simple vanilla SL, where the client keeps its data but shares labels with the server (Fig. 4 (a)); U-shaped SL, where the client keeps both its data and labels, without sharing with the server (Fig. 4 (b)); and SL with vertically partitioned data, which involves multiple clients, each keeping its own data but sharing labels with the server (Fig. 4 (c)). Another classification can be made depending on how the DNN is split between the client(s) and the server. In horizontal SL, the split is introduced between layers (as shown in Fig. 4), so that each layer is fully executed on a single device/client. In vertical SL [30], layers themselves can be split across multiple devices/clients.
The basic steps of vanilla SL (Fig. 4 (a)) are outlined below. Steps 2–5 are repeated until the stopping criteria are met. The more sophisticated versions of SL are extensions of this procedure, involving additional transmission of data and gradients between the server and the client(s).
Step 1:
A DNN model is split between the client and the server, taking into account computational resources available at the client, communication link between the client and the server, etc.
Step 2:
A batch of data is loaded and passed through the client-side model front-end, and the features (also known as smashed data) are computed and sent to the server.
Step 3:
The server receives the features, processes them through the back-end, and computes the outputs.
Step 4:
The server compares the outputs with the ground-truth labels, computes error gradients, back-propagates them through the model back-end (updating model parameters along the way), and sends the gradients from the split layer back to the client.
Step 5:
The client receives the error gradients and back-propagates them through the front-end, updating the model parameters along the way.
In the literature, SL has been demonstrated on a number of popular datasets such as the Modified National Institute of Standards and Technology (MNIST) database [31], ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset [32], and Canadian Institute For Advanced Research (CIFAR) datasets [33], and it was shown [8] that– assuming perfect communication links– its performance is similar to conventional centralized learning. In practice, SL has been used in real-world IoT settings [34], where the overhead, training time, power consumption, and memory usage were studied. Reference [35] described an efficient model training in IoT systems. Their approach, termed ARES (Adaptive Resource-Aware Split Learning for the Internet of Things), accounted for time-varying communication link throughput and computing resources. SplitNet [36] is an SL approach that splits a deep network into a tree of sub-networks, which allowed for a simple model parallelization while simultaneously reducing the number of parameters and computations.
SL solves the issue of computational imbalance between the client(s) and the server that exists in FL, and in certain cases has lower communication requirements than FL. For example, [37] found that SL is more communication-efficient3 than FL when there are many clients or when the model is large. However, unlike FL, the basic SL schemes do not offer a way for clients to collaborate among themselves to increase the pool of data and potentially train better models. For this reason, hybrid split-federated learning (SFL) approaches were developed to combine the best of both worlds.
3) Hybrid Split-Federated Learning
As seen above, FL and SL each have their own advantages and disadvantages. Research has been done to combine these two approaches to yield the best of both worlds. We will refer to such combined approaches as hybrid Split-Federated Learning (SFL). The overall goal of SFL is to appropriately use the computational resources available at the clients and the server (like SL), while allowing clients to collaborate in model training and keep their data private (like FL).
The first reported SFL approach is SplitFed [38]. It eliminates FL’s and SL’s drawbacks with their modified architectural design. SFL enhances overall data privacy and model robustness compared to that of FL and SL [34], [38], [39], [40], [41]. An SFL model allows training better models with enhanced performance since it combines the strengths of SL and FL architectures.
Steps 2–10 comprise one communication round or global round or global epoch and are repeated until stopping criteria are met:
Step 1:
Federated Server (FS) chooses a statistical model; which is supposed to be the client-side model to be trained (client-side global model).
Step 2:
FS transmits this initial client-side global model to several units (clients).
Step 3:
Clients train their client-side models locally with their own data (Forward propagation (FP)).
Step 4:
Clients send their model features to the main server.
Step 5:
The main server processes the forward propagation on its server-side model separately and in parallel.
Step 6:
The main server does the backpropagation (BP) on the server-side model.
Step 7:
The main server sends the gradients to the respective clients.
Step 8:
The main server updates its server slide global model.
Step 9:
Each client receives the gradient update and performs the back-propagation on its client-side local model, and computes its gradients.
Step 10:
Fed server updates its model.
SplitFed appears in two main variants: SplitFedv1 (SFLV1) and the SplitFedv2 (SFLV2) [38]. In SFLV1, the number of server-side sub-networks equals the number of clients. SFLV2 performs sequential server-side sub-network training over the smashed data of the clients and shares only one copy of the sub-network on the server-side. The authors in [34] also discussed a generalized version of SFL (SFLG) that merges SFLV1 and SFLV2 enabling a varying number of server-side sub-networks. SplitFedv3 (SFLV3) [40] is another new version that involves unique client-side sub-networks and averaged server-side sub-network. The problem of catastrophic forgetting is avoided in SFLV3 due to averaging of the server-side network. Fig. 5 illustrates the SFLV1 approach without label sharing. Similar designs can be occupied for SFLV2, SFLG, and SFLV3.
Research has been particularly done to study the effect of client-side and server-side model synchronization. The Multi-Head Split Learning (MHSL) approach [41], [42] studied SplitFed without client-side model synchronization. The performance of ML models that did not use client-side parameter aggregation was comparable to that of those that did. The similar research Parallel SL with Split-Layer Gradient Averaging and Learning Rate Splitting (SGLR) [43] discussed a scalable SL framework. In contrast to SplitFed, the server in SGLR broadcasted a common gradient averaged at the split layer, simulating FL without any further client interactions.
The number of clients and servers was equal in the Federated Split Learning framework suggested in [44]. It trained the server-side model corresponding to each client on a separate server. Each client-server pair was simultaneously trained. After each global round, the server-side models were aggregated by an FS and updated on each server. This method leveraged the PySyft and PyGrid libraries. The authors confirmed better accuracy and the privacy guarantee, provided that each client has a reasonable amount of training data. The authors in [45] showed that the vanilla SL could have the possibility of overfitting due to the sequential nature of training and proposed the Parallel Split Learning Concept (PSL) that prevents overfitting.
Federated Deep Learning with Private Passport (FDL-PP) [46] considered the layers prior to the split as private layers and the following layers as public layers. They suggested adopting FL in public layers to prevent any attacks at the split. The server only aggregated the model parameters of the public layers. PyVertical [47] is a framework that supports vertical federated learning using a split neural network. It enabled a data scientist to maintain raw data on an owner’s device while training neural networks on vertically partitioned data features among several owners. Adasplit [48] is another hybrid approach of SL and FL, that enabled efficient scaling to SL to low resource scenarios in reducing bandwidth consumption and improving performance across heterogeneous clients. The authors in [49] suggested a hybrid approach to updating client- and server-side models simultaneously through local-loss-based training. Losses were calculated for each local split.
FedFly [50] is an approach to migrate a partitioned neural network when devices move (device mobility challenge) between the edge servers during FL training. FedLite [51] showed high communication costs associated with model splitting. The authors solved this problem by using a new clustering scheme and a gradient correction method to compress the extra communication. LocFedMix-SL [52] showed that existing parallel SL algorithms achieve neither scalability nor fast convergence since there is an imbalance between the FP and the BP updates. To fix this problem, the authors added local parallelism, FL, and mixup data augmentation techniques to parallel SL to keep the FP and BP updates balanced.
In accordance with the principle of “first-parallel-then-sequential”, federated parallel training in SL was made possible by the Cluster-based Parallel SL (CPSL) method [53]. Each training round in CPSL was divided into two phases: sequential inter-cluster training followed by simultaneous intra-cluster training. Client devices were organized into several clusters. All clients in the same cluster cooperated with the server to carry out parallel training, similar to SplitFedv2. After the completion of a single round of intra-cluster training, each client’s client-side model was sent to the server for aggregation and updating. The client model was then initialized in the following cluster to begin intra-cluster training using the updated model.
Hybrid Split and Federated Learning (HSFL) scheme [54] was proposed to deal with challenges in collaborative learning in highly diverse IoT devices with heterogeneous resources and data distributions. This framework organized clients into two groups: one group performed FL, and the other group performed SL.
Figure 6 shows a summary of the described emerging decentralized learning approaches along with their evolutionary timeline. Table 1 outlines a comparative evaluation of FL, SL, and SFL under different criteria.
Decentralized Learning Applications in Healthcare
This Section reviews State-of-the-art (SOTA) FL, SL, and SFL applications in healthcare. Readers are requested to follow Table 5 of the Appendix for descriptions of the publicly available medical imaging datasets mentioned in this section.
A. FL in Healthcare
FL has found its way into healthcare, from medical imaging use cases to Electronic Health Record (EHR) management as outlined in Fig. 7. SOTA applications in medical imaging cover a wide range, including the brain, chest, skin, breast, eye, prostate, skin, and abdomen which are discussed in this Section. Some other interesting advanced studies associated with FL in the healthcare domain include patient similarity learning [55], patient representation learning [56], phenotyping [57], and predictive modeling [28] etc.
1) FL Applications Associated with Brain
The first reported FL used for multi-institutional applications in medical imaging was done by Intel in collaboration with the Centre for Biomedical Image Computing and Analytics at the University of Pennsylvania for brain image segmentation [58]. The data was a collection of multi-institutional, multi-model brain MRI scans from glioma patients that are publicly available as part of the Brain Tumor Image Segmentation Benchmark (BraTS) challenge 2018 [59]. This system was based on a U-Net. The server aggregated the model parameters of chosen clients to form the global model. They proved that the semantic segmentation performance of the federated model is superior to that of the centralized model. FedDis [60] is a disentangled FL application for unsupervised brain pathology segmentation. The model is trained with two brain MRI scan datasets: Open Access Series of Imaging Studies-3 (OASIS-3) [61] and Alzheimer’s Disease Neuroimaging Initiative-3 (ADNI-3) [62]. The model was tested with two public Multiple Sclerosis (MS) datasets (MSLUB [63], MSISBI [64]) and an in-house MS and glioblastoma database. Silva et al. [65] suggested an FL architecture, to securely access and meta-analyze any biomedical data without disclosing personal information. They investigated brain structural relationships across diseases and clinical cohorts. Li et al. [24] proposed a privacy-preserving approach for multi-site Functional Magnetic Resonance Imaging (fMRI) classification using federated transfer learning and domain adaptation. Experiments were conducted with fMRI data from the Autism Brain Imaging Data Exchange (ABIDE) dataset [66]. Promising results were achieved while improving neural image analysis performance and identifying valid disease-related biomarkers. Fed-BioMed [67] proposed a robust and scalable open-source FL framework accommodating different models and optimization methods and tested it with brain imaging datasets from four institutions. The authors in [68] investigated the feasibility of applying differential privacy techniques by applying their FL algorithm on the BraTS 2018 dataset [59].
2) FL Applications Associated with Chest
FL research related to the chest has mostly been done with pneumonia Computerized Tomography (CT) lung cancer scans, and coronavirus disease (COVID)-19 CT scans. An interesting FL application, C-DistriM (Chained Distributed Machine learning) delivered a superior performance with Non-Small Cell Lung Cancer (NSCLC) radiomics dataset [69] in predicting two-year lung cancer survival [70]. With the COVID-19 pandemic, a major focus of AI was shifted to institutional data collaboration through FL. The authors in [71] showed the viability of an FL approach for identifying CT anomalies associated with COVID-19. The authors investigated FL strategies to create a COVID-19 medical image diagnosis AI model with strong generalizability on seven international datasets. The public dataset developed by Radiology Ai One-Stop solution (RAIOSS) [72] was also used during the model validations. The FL research [73] performed collaborative training of multiple medical institutions’ models using the two public chest X-ray screening datasets: Montgomery County chest X-ray dataset (MC) [74] and Shenzhen chest X-ray dataset [74]. The authors investigated several key specificities of FL settings, including non-(Independent and Identically Distributed) IID and unbalanced data distributions. The FL research in [75] developed an abnormal chest radiograph classification model on GoogLeNet-22 and ResNet-50 using the public Chest X-ray (CXR) dataset [76]. The authors discussed the challenges of sample size and label distribution variability in FL.
3) FL Applications Associated with Skin
FedPerl [77] is a semi-supervised FL method developed to classify skin carcinoma data. It encouraged the community to use collaborative learning to generate more accurate pseudo-labels for unlabeled data. They used 71,000 skin lesion images from four publicly available datasets: International Skin Imaging Collaboration (ISIC)19 [78], Human Against Machine with 10,000 training images (HAM10K) [79], Derm7pt [80], PAD-UFES [81]. The FL model in [82] focused on multimodal melanoma detection using ISIC19 [78] data. The FL model’s performance was comparatively better than that of the centralized model. A gradient aggregation method that better extracts the shareable information from multiple local servers is presented in [83]. The authors applied it to skin lesion segmentation using seven datasets, including HAM10K [79] and Derm7pt [80] datasets. FedMix [84] is a federated skin lesion segmentation model developed with the HAM10K dataset [79].
4) FL Applications Associated with Breast
FedMix [84] also tested their FL model using the breast tumor datasets: Breast Ultrasound (BUS) [85], Breast Ultrasound Image Segmentation (BUSIS) [86], and UDIAT [87]. Using a cooperative strategy with seven clinical institutes worldwide, Roth et al. [88] developed an FL model for breast density categorization. According to their experimental findings, the FL model beat the individually-trained models on the local data of each institute by an average of 6.3%. Additionally, when the FL model was assessed using external testing datasets from other participating sites, a relative improvement of 45.8% in the model’s generalizability was established. The memory-aware curriculum learning method for FL is proposed in [89]. Their findings with three mammography imaging datasets demonstrated the benefits of federated adversarial learning for multi-site breast cancer categorization.
5) FL Applications Associated with Eye
The authors in [90] evaluated the performance of an FL framework for DNN-based retinal microvasculature segmentation and referable diabetic retinopathy classification using Optical Coherence Tomography (OCT) and OCT angiography (OCTA) images. FL is applied to diabetic retinopathy detection in [91]. The authors used five datasets: EyePACS [92], Methods to Evaluate Segmentation and Indexing Techniques in the field of Retinal Ophthalmology (MESSIDOR) [93], Indian Diabetic Retinopathy image Dataset (IDRiD) [94], Asia Pacific Tele-Ophthalmology Society (APTOS) [95], and University of Auckland (UoA) diabetic retinopathy database [96]. The model was investigated with three approaches: standard transfer learning, federated averaging, and federated proximal framework. The authors in [75] also evaluated their models with a diabetic retinopathy detection task using the Diabetic Retinopathy binary classification dataset [97].
6) FL Applications Associated with Prostate
The authors in [98] introduced a flexible FL framework for cross-site training, validation, and evaluation of deep prostate cancer detection. Their method used an abstract representation of the model architecture and data, enabling NVFlare FL framework to be used to train prototype deep learning models. Prostate biopsy data were collected from two University of California research hospitals. Their method showed improvements in prostate cancer detection and classification compared to that of SOTA. A personalized FL model was developed for prostate segmentation in [99]. The authors suggested an adaptation strategy that enables unique model architecture for each client. The suggested strategy was assessed on MSD1 [100], PROSTATEx [101], PROMISE12 [102], and National Cancer Institute–International Symposium on Biomedical Imaging (NCI-ISBI) [103] datasets. It was demonstrated to enhance the local models’ performance following adaptation. The authors in [104] applied a Federated Cross Learning (FedCross) algorithm on prostate cancer MRI segmentation using the same datasets as in [99].
7) FL Applications Associated with Abdomen
The authors in [105] investigated the automatic segmentation of pancreatic cancer CT scans with three datasets including the publicly available MSD1 [100] pancreatic dataset and Synapse dataset [106]. The authors applied heterogeneous optimization techniques to achieve significant performance. MoNet [107] is a highly optimized federated pancreatic CT scans segmentation algorithm, that used the MSD1 [100] pancreatic dataset.
8) FL Applications Associated with EHR Data
EHRs are a vital source of real-world healthcare data. EHRs contain information such as a patient’s medical history, diagnoses, medications, treatment plans, immunization dates, allergies, radiology images, and laboratory test results. Several FL research with EHR is available in the literature. They cover areas of medications [108], [109], [110], human emotions [108], activity recognition [111], [112], EEG classification [113], personalized wearable healthcare applications [111], patient hospitalization [114], patient mortality [114], and preterm birth data [115].
B. SL in Healthcare
The applicability of the SL in healthcare was first shown by the authors in [8], [116] with their SplitNN model pointing to its promising advantages. Then the first application of SL based approach in the medical field was reported in [117]. The authors applied their model to a binary classification of fundus images and a multi-label classification of chest X-rays. The authors in [118] first showed the clinical feasibility of SL. They discussed the inference performance aspects, convergence rates, computational efficiency, and communication requirements in relation to clinical feasibility. SL applications in healthcare still only cover a limited range, including the chest, eye, brain, and EHR.
1) SL Applications Associated with Chest and Eye
The binary classification task in [117] used a dataset of 9,000 fundus images, and the multi-label classification task used a dataset of 156,535 chest X-rays. As the first study in the SL and medical settings, their research paved the way for future developments in the collaborative training of DNNs. The authors in [118] developed a 121-layer denseNet for multi-label classification of chest X-ray images from the Chexpert dataset [119].
2) SL Applications Associated with Brain
The authors in [118] also used their approach for brain tumor segmentation with the BraTS datasets [59]. One another brain tumor classification task is presented in [120]. The authors vertically distributed brain MRI scans and classified them as healthy or tumorous using a partitioned neural network.
3) SL Applications Associated with EHR Data
An SL framework for Electrocardiogram (ECG) classification was proposed in [121]. The system was trained on the Physikalisch-Technische Bundesanstalt (PTB-XL) ECG dataset [122]. Results showed a significant reduction in the computation and communication overhead with minimal performance loss. SL approach in [123] was based on a 1D CNN model, where heart abnormalities are detected with ECG data. The authors studied the effects of privacy preservation on model performance.
C. SFL in Healthcare
The first-ever SFL approach in healthcare was initiated by The authors in SplitFed [38], by using their proposed model on a skin lesion segmentation task. Similar to SL, SFL applications in healthcare are comparatively less prevalent than FL. These cover areas in healthcare related to the skin, chest, bones, stomach, brain, eye, cervix, and EHR.
1) SFL Applications Associated with Skin
SplitFed [38] authors evaluated their experiments using the dermatoscopic images of the HAM10K dataset [79]. MHSL [41], [42], which is described in the Section II-B3 was tested on ResNet-18 and 1D CNN architectures with the HAM10K [79], MNIST [31], and CIFAR [33] datasets.
2) SFL Applications Associated with Brain
Split-U-Net [124] applied SFL with vertical FL on four institutions for multi-model brain tumor segmentation. Experiments were done using the Medical Segmentation Decathlon (MSD1) brain tumor dataset [100]. It was the first application of SFL to a multi-modal image segmentation task. The authors quantified the amount of data leakage in biomedical image segmentation and presented defense strategies. SplitAVG and SplitAVG-V2 [125] are two interesting research that focused on heterogeneity-aware FL. They were tested on a brain tumor segmentation model with the BraTS datasets [59].
3) SFL Applications Associated with Eye
SplitAVG and SplitAVG-V2 were also evaluated in a diabetic retinopathy binary classification task. It achieved 96.25% of accuracy on the Diabetic Retinopathy binary classification dataset [97].
4) SFL Applications Associated with Chest, Bones, Stomach, and EHR
“Spatio-temporal split learning” presented in [126] allowed collaboration among privacy-sensitive organizations. They spatially spread many clients to cover a variety of datasets from various participants. The algorithm was tested on MURA X-ray images [127], and COVID-19 CT scans from the COVID-CT dataset [128]. FL and SL have achieved 95.7% and 98.5% respectively, for the classification task. The same authors came up with “multi-site split learning” [129] that enabled secure medical data transfer between hospitals. They explored the optimal number of clients via experimental analysis. They empirically investigated the optimal data split ratio for the best-split learning performance. Federated Learning on Medical Datasets using Partial Networks (FLOP) [130] presented a framework that only shared a partial model between the server and the clients, keeping the rest of the layers private within the client’s own space. Experiments were carried out with the COVIDx dataset [131] and the Kvasir dataset [132]. FLOP achieved better performance while reducing privacy and security risks.
Apart from the above-mentioned, [133] applied Vertical SL to five medical datasets related to cervical cancer, diabetes, heart disease, stroke, and stroke rehabilitation. The authors evaluated the impact of different networks and feature distributions on predictive performance and compared the performance with the centralized architecture.
Decentralized Learning Challenges and SOTA Solutions
This Section gives a summary of the major challenges in FL, SL, and SFL and the SOTA solutions in tackling them.
A. Challenges Associated with FL
Existing research on FL is mostly focused on reducing the effects of corrupted or noisy clients, dealing with statistical and system heterogeneity issues, improving communication efficiency, addressing privacy issues, improving fairness, handling biased sources, and dealing with system challenges. Researchers have taken steps to evaluate their algorithms under different challenging circumstances. These techniques have helped in developing more robust FL algorithms.
1) Research on Handling Corrupted or Noisy Clients in FL
Data annotation is a complex task, even for experts, and depending on the nature of the data and the considerations for labeling, it is time-consuming. Since in an FL setting, local data are gathered by different clients, it is difficult to ensure that the data are annotated correctly. Furthermore, clients may have different standards according to their domain expert’s knowledge. Thus, clients’ annotated data may have distinct distributions compared to that of a centralized annotation. As a result, data corruption or data mislabelling could occur. Similarly, clients’ local data might include noise. Noise can arise from clients using different data acquisition sources or devices, communication barriers, system errors, etc. In the collaborative FL environment, if the clients’ data are corrupted or noisy, the global aggregated model produces wrong predictions. Table 2 describes prior research on handling corrupted or noisy client issues in the FL network [84], [134], [135], [136], [137], [138], [139].
2) Research on Handling Heterogeneity Issues in FL
Heterogeneity is defined as the quality or state of being diverse in character or content. Although data heterogeneity is not new in machine learning, it is significantly more prominent in FL than in CML. Heterogeneity is classified into two categories in FL: Statistical heterogeneity, which is the non-IID data across the network, and the Systems heterogeneity, which is the significant variability of system characteristics [7], [140], [141]. The systems’ characteristics include storage, computational, and communicational capabilities (memory and CPU/GPU), network connectivity (3G, 4G, 5G, or Wi-Fi), and power capabilities (battery level).
The bounds on convergence caused by heterogeneity were theoretically presented in [142], and the impact of adjusting heterogeneity on model performance was empirically demonstrated in [143]. The challenges of heterogeneity for FL were discussed in [140]. The authors in [144] demonstrated how the clients’ local model weights diverge as a result of the data heterogeneity. New insights for protecting against the non-uniformity introduced by data heterogeneity in FL as a solution for backdooring attacks were shown in [145]. FedKL [146] used federated reinforcement learning to tackle heterogeneity issues. Data problems and resource heterogeneity in FL were discussed in [147]. The authors in [148] came up with the concept of Virtual Homogeneity Learning (VHL), where a virtual dataset addresses data heterogeneity issues. VHL improved the speed of convergence and performance generalization drastically.
Fig. 8 illustrates a categorical representation of FL in handling heterogeneity issues.
Methods for handling heterogeneity can be categorized into two groups: fine-tuning local models and fine-tuning the global model. The local model fine-tuning group utilizes personalized FL methods, regularization methods, and non-aggregated methods of training. A variety of parameter aggregation variants (Table 3), fairness tackling methods, and regularization methods are used in global model fine-tuning.
SOTA on personalized FL can be categorized based on the model’s architectural design, training process, and multi-task learning. FedBN [149], FedAP [150], and FedPer [151] come under the personalizing models based on architectural design. PerFed Avg [152], MetaFed [153], FedFV [154], pFedMe [155], and FedMGDA+ [156] come under personalizing approaches based on the training process. MOCHA [157] and VIRTUAL [158] designed FL for multi-task learning. FedCross [104] presented an FL approach that doesn’t involve parameter aggregation, where it sequentially trained the global model across different clients in a round-robin manner. Ditto [159] and Adaptive Personalized Federated Learning (APFL) [160] performed local model fine-tuning using regularization methods.
Fairness in FL is inspired by fair resource allocation and encourages equitable or uniform resource distribution throughout a federated network. Agnostic Federated Learning [161], q-Fair Federated Learning (qFFL)/qFedAvg [162], and Hierarchically Fair Federated Learning (HFFFL) [163] are the most distinguished research on fairness in FL literature. FedProx [141] is a popular regularization method that facilitates stable convergence in heterogeneous settings. It used a proximal term to the local sub-problem to minimize the impact of varying local updates on clients.
3) Research on Handling Communication Efficiency Issues in FL
In FL settings, communication efficiency is a popular topic with several SOTA studies. FedKD [143] showed a communication-efficient FL algorithm based on knowledge distillation and gradient compression techniques. It reduced communication costs (by 94.89%) and achieved competitive results on medical Named Entity Recognition (NER) datasets. FedPAQ [168] is another communication efficient scheme based on averaging models periodically at the server, using only a fraction of clients in training and quantized message passing at edge nodes. It achieved near-optimal theoretical guarantees for strongly convex and non-convex loss functions and demonstrated the communication-computation trade-off. FedAGM [169] improved server-side aggregation using an accelerated model to guide local gradient updates. It achieved communication efficiency even with low client participation.
B. Challenges Associated with SL
Although SL offers improved privacy compared to CML and FL settings [8], [37], [116], [117], SL still has several privacy loopholes. Smaller client-side models can lead to private data leakage.
1) Research on Handling Privacy Issues in SL
The authors in [123] examined SL’s privacy-preserving training in 1D CNN models. The authors adopted two privacy mitigation techniques: adding more hidden layers to the client side and applying differential privacy. SplitHE [170] presented an empirical security evaluation using membership inference applying homomorphic encryption. The model was built on top of the SplitNN architecture. Marvell (optiMized perturbAtion to pReVEnt Label Leakage) [171] is a realistic threat model that proposed a privacy loss metric to quantify label leakage in SL. By minimizing the amount of label leakage, it found the structure of the noise disturbance in a smart way. TPSL (Transcript Private Split Learning) [172] is a similar gradient perturbation-based SL approach that provides a provable differential privacy guarantee. Their experiments demonstrated robustness and effectiveness against label leakage attacks. SL in the context of the private collaborative inference against reconstruction attacks are analyzed in [173]. Their approach modified model training to reduce data leakage while maintaining accuracy. SplitGuard [174], [175] detected and mitigated hijacking attacks in SL. UnSplit [176] studied data oblivious model inversion, model stealing, and label inference attacks against SL. These methods showed that vanilla SL is not very secure and showed how important it is to take additional steps to make secure protocols.
C. Challenges Associated with SFL
Similar to SL, SFL offers privacy preservation advantages, as previously discussed. However, SFL can still suffer from several privacy loopholes. Like SL, private data can leak through the smashed data with smaller client-side models. Also, research has been done to tackle the heterogeneity issues associated with SFL.
1) Research on Handling Privacy Issues in SFL
Research has enabled SFL to be performed in a privacy-preserving manner. To protect sensitive data and limit the amount of data sent during training, encryption, and other security measures are used. SplitFed authors [38] came up with several privacy preservation mechanisms on the client-side, the FS, and the main server. Among these were adding a PixelDP noise layer and using differential privacy to train the client-side model. These methods allowed for model training without disclosing sensitive information. The recent work, [177] performed an empirical analysis of SplitFed’s robustness to strong model poisoning attacks.
2) Research on Handling Heterogeneity Issues in SFL
The SFL framework of [178] proposed an energy- and loss-aware selective updating method for heterogeneous systems, updating client-side models based on clients’ energy and loss changes. Experiments were conducted using CIFAR [33] datasets. SplitAVG [125], as previously stated, also dealt with heterogeneous clients.
Healthcare Viewpoint on Challenges, Current Trends, and Future Directions
In this section, we discuss the problems, the current research trends, and some possible directions for the future from the point of view of healthcare.
A. Challenges Specific to Healthcare
Despite the fact that these decentralized learning approaches have enormous advantages and work well in healthcare applications, this field is still in its early stages due to the challenges that are frequently rising [179], [180]. We categorized these healthcare-specific challenges into three: data challenges, privacy and security challenges, and communication challenges. Data challenges involve issues with data quality, data heterogeneity, or data biases of the healthcare participants. Privacy and security challenges include data poisoning attacks, adversarial attacks, membership inference attacks, or free-riding attacks in the collaborative network. Communication challenges might occur due to failure or dropouts of healthcare participants, energy consumption issues, or computational and communication overhead prevalent with systems. In contrast to the technical challenges, some other specific challenges exist, as shown under Section V-C.
B. Current Trends Specific to Healthcare
It is evident that the research communities are truly active in mitigating associated challenges, as we detailed in Section IV. Prior to their local model training during the collaboration, all of the healthcare participants should have mutually agreed upon these mechanisms.
Decentralized learning paradigms require agreements to specify the scope, objective, and technology used, which might be challenging to determine because they are still relatively new. Research has revealed that massive projects being undertaken now are paving the way for future norms of inventive, safe, and fair collaboration in healthcare applications. Several university and industrial research institutions jointly lead Federated Learning (FL) research in the healthcare area. Associated institutions have successfully implemented several projects to guarantee efficient standards for confidential, equitable, and creative healthcare application collaboration. These collaborative projects and consortiums aim to unite researchers and practitioners interested in FL. Students, professors, and business leaders from around the world get together for these projects or consortiums to learn more about the topic, find technical problems, and discuss possible solutions. Table 4 highlights popular consortiums and collaborative healthcare projects associated with FL.
Currently, some open-source software has also been developed to facilitate proofs-of-concept and experiments. These software hold open-source federated datasets, including medical datasets, strict evaluation frameworks, and reference implementations that try to show the problems and complexities of real-world federated environments. Some of the most accessible medical datasets relate to COVID, Alzheimer’s disease, and predictive maintenance. Federated AI Technology Enabler (FATE),4 Substra,5 OpenFL,6 Tensorflow Federated,7 IBM Federated Learning,8 NVIDIA Clara,9 PySyft10 +PyGrid platform,11 and enterprise-grade FL platforms such as Apheris,12 are some popular open-source software.
C. Future Directions Specific to Healthcare
Here are some possible directions for future research on how FL, SL, and SFL can be used in healthcare.
1) Developing Personalized Models
FL and SFL approaches produce a common global model for all participants. Healthcare decisions are generally scientifically designed in a personalized health management scheme. Care might vary institution-wise, physician-wise, and patient-wise. SOTA has some general research about FL personalization, as shown in Fig. 8 under personalized FL. However, how to combine the medical domain knowledge and produce personalized versions of the global models in collaborative networks is yet to be investigated.
2) Reducing Bias
Although FL and SFL limit the effects of a biased dataset by utilizing multiple datasets, systems with biased data propagation may exist due to poor system design. In the collaborative network, individual datasets should be appropriately weighted to reduce the risk of biased or insecure data. Even though they’re weighted properly initially, bias might still emerge later in training. The reason is that client features such as data distribution may still vary over time (e.g., with the addition of new patients or the demise of existing patients). Thus, developing more robust models resistant to handling bias with proper parameter aggregation schemes is an important research direction.
3) Incorporating Hybrid Non-IID Properties
Healthcare institutions have different sample sizes, label distributions, resolutions, data measurement frequencies, types of laboratory tests, laboratory tools, data acquisition mechanisms, and various demographic features of subjects such as age, gender, etc. Therefore, medical datasets often involve several non-IID features [181]. Most FL studies focus exclusively on one of the non-IID features, such as noisy or corrupted labels. There hasn’t been a thorough investigation of various non-IID features in medical datasets. Therefore, in the future, another focus for the decentralized learning paradigm could be addressing challenges related to multiple non-IID features in medical data.
4) HYPERPARAMETER Tuning
Hyperparameter tuning in machine learning is crucial but time-consuming. Optimizing hyperparameters is significantly challenging in collaborative learning [182], while it is extremely difficult in healthcare collaborations. Future research will urgently require automated techniques or frameworks to choose the best hyperparameters in decentralized learning models in healthcare.
The following are identified as non-technical directions that need specific attention.
5) Developing Sufficient Incentive Mechanisms
Healthcare organizations may have limited trust in the new technological frameworks. They might even be required to undergo considerable communication and computation overheads. Naturally, they might be unwilling to participate in collaborative learning tasks without well-designed incentives. Finding effective ways to encourage organizations with standard imaging or related data to participate in the learning process is a fundamental problem for the future.
6) Incorporating Domain Expertise Knowledge
The healthcare industry relies on professionalism and accuracy. Physicians or subject-matter experts delicately perform tasks like devising treatment plans, prescribing medications, or forecasting specific health conditions. Even if there is a large medical dataset, healthcare experts will not trust or accept the predictions of a collaborative model trained in a collaborative network in the first round. Such predictions require expert knowledge, supervision, and intervention. Experts in the medical field could oversee the entire process of collaborative learning, from the first step of collecting data to the last step of making a prediction based on the global model. Such oversight by experts will lead to more accurate and confident results.
Conclusion
In this survey, we reviewed the emerging decentralized learning approaches of FL, SL, and SFL and their applications in healthcare, with a particular focus on medical imaging. In general, several studies have tried to address some of the challenges in FL, SL, and SFL settings. We detailed major challenges with SOTA solutions. From the healthcare perspective, these challenges still need to be fully addressed. Since FL, SL, and SFL will be active research areas for the next ten years, we detailed current trends and possible directions for future research in their healthcare applications. There is a lot of room and need for systems and algorithms that are more realistic, perform better, and ensure security and privacy.
AppendixPublicly Available Medical Imaging Datasets Used in Recent Decentralized Learning Research
Publicly Available Medical Imaging Datasets Used in Recent Decentralized Learning Research
Having open access to data is an essential component of research. In healthcare, publicly available data would benefit researchers and academia to a greater extent. Open data helps, without a doubt, in coming up with effective and efficient solutions to life-threatening diseases or long-lasting problems that plague humanity.
As part of the survey, we contributed to the design of a collection of publicly available medical imaging datasets that have been used in the recent decentralized learning research. It is identified that these datasets cover a large range, including data related to the brain, chest, skin, breast, eye, prostate, skin, abdomen, etc. Table 5 lists our findings.