Introduction
In recent years, AI and IoT convergence for efficient smart healthcare systems is getting attention in the community [1]–[4]. This convergence allows detecting different health disease more efficiently than ever before. Diabetes is one of the chronic diseases around the world that occurs when our body is not able to produce insulin hormone or not able to use it effectively. World Health Organization (WHO) reported that diabetes caused more than 1.6 million deaths in 2016 [5].Diabetic patients tend to produce a high glucose level in the blood, which may cause damage and failure to the body’s organs. According to the International Diabetes Federation (IDF), 1 out of 10 people is diabetic, which is a serious concern. Possible complications of diabetes may result in heart stroke, kidney failure, serious problems with eye vision etc.
One of the serious complications of diabetes is Diabetic Retinopathy (DR). DR may cause complete blindness and many people around the world are victims of DR [6], [7]. Around 25% of people out of total diabetic patients are affected by DR only, which makes DR a complicated disease [8]. Long-term diabetes may result in DR, which is a progressive disease and may cause partial or permanent vision impairment. Majority of people affected from DR are working-age group, which are the main workforce of the growing economy of any country [9]). IDF also reported that only India alone shares a large portion of diabetic people around the world and this share is growing rapidly every year [9].
It is difficult to detect the symptoms of DR in the early stage, which is a challenging and important issue in medical science. The initial symptoms of DR are very light and patients are usually unaware until it results in irreversible damage to the retina or it is diagnosed through a medical test. Therefore, it is highly desirable that DR must be detected at the earliest. However, it is possible to detect DR with the availability of highly efficient and trained person who can evaluate the digital color fundus photographs of the retina. The rear part of the human eye is known as fundus. These fundus images are evaluated by locating the lesions associated with vascular abnormalities that arise due to diabetes. Therefore, the effective solution is available but it is a time-consuming process and demands the availability of the highly efficient medical practitioner.
Deep learning is known to be a very popular approach in different healthcare, patient monitoring and the systems, which requires huge medical image processing [10]–[16]. Convolutional neural network (ConvNets) is a deep learning approach, which is very effective for image data analysis in several domains [13]. he accuracy of ConvNets can be improved by scaling up different parameters subject to the availability of resources. These scaled versions of ConvNets are very efficient for the medical domain where accuracy is highly desirable. Therefore, in this study, EfficientNet architecture [17] is used to analyze the retina images in order to detect the DR.
A. Contribution
The major contribution of this work is as follows:
We utilized state-of-the-art EfficientNet model for identifying blindness symptoms at the first instance and found significant improvement to identification of blindness in retinal images with over 92% validation accuracy outperforming CNN and ResNet50 models.
We proposed a novel augmentation step called as Polar unrolling for this imbalanced dataset of retinal images which significantly improves the prediction accuracy during test time augmentation.
B. Organization
The rest of the article is organized as follows: Section 2 provides state of art literature review. In Section 3, the methodology used is discussed in detail. Section 4 provides the details about experiments followed by the evaluation and discussion of the results in Section 5. The article concludes with a discussion of future scope in section 6.
Related Work
Researchers [18]–[23] have been working in the area of connected smart health [22], healthcare Internet of Things (HealthIoT) [24], patient monitoring [25], where AI and IoT technologies have considerable potential. AI tools and technologies, particularly for Diabetic Retinopathy, will enhance the state of global health. Kumar and Vashist [26] discussed the various challenges and achievements in eye care with a focus on Indian community. They mentioned that several eye diseases that may results in vision loss should be provided effective care and potential soultions by 2020 in India. Bhalla et al. [27] proposed a model to demonstrate an innovative modality to address the issue of DR in India by organizing a certification programs for technicians and doctors to improve the knowledge and skills required for DR detection at early stages.
Diabetic Retinopathy is a very common disease among diabetic patients that can lead to partial or complete blindness. Therefore, early diagnosis and detection are very important. The existing state of the art diagnosis methods to detect DR requires continuous observation of patients by a skilled physician. These diagnosis methods are time-consuming and subject to the availability of expensive medical equipment.
In [18], the researchers mentioned that only a group of 10–15 physicians is responsible for manually diagnosing DR over 2 million retinal images per year in the largest eye care facility in the world named as Aravind Eye Hospital, India. They have reinstated the extreme effort in terms of infrastructure and time required in the task when a large number of cases turn out to be normal.
Automatization of this process can assist the doctors in diagnosing the DR patients effectively and to save the time spent on diagnosing the normal cases. Considering this aspect in mind, several researchers have worked in this area and utilized the machine learning artifact to devise a model that can predict the presence of DR in a patient. In the following paragraphs, we review some of these key researches in this field.
In [18], they presented an account of the initial efforts they put into the development of an automated system using computer vision techniques to operate patients in the early stage of DR using retinal color fundus images. The project was a continuation of their previous work on WiLDNet at Aravind eye hospital. Their main focus was to create a model based on different retinopathy types using SIFT descriptors. The important features include hemorrhages and exudates. The intention was to develop a robust and long-term model. Besides that, Support Vector Machine classifier is trained to label each image patch; post, which these images are aggregated and a decision over the entire image is identified, based on patch-level prediction. The authors’ reported an equal error rate of 87% using 1000 images.
In [20], the researchers developed a solution for early detection of DR using deep convolutional neural network technique to detect micro-aneurysms (MAs) in diabetic patients. They also performed multi-label classification [28] by assigning labels to retinal fundal images on five categories. Earlier they proposed multi-layer convolutional neural networks (CNNs) by considering two fully connected layers and a single output layer to efficiently detect DR. Catering to the issue of oversampled classes, they use a small capacity network with L2 regularization and dropout,
A model built using convolutional neural networks such as AlexNet, VggNet, GoogleNet and ResNet is proposed in [23] to identify Diabetic Retinopathy in diabetic patients. The fundus image of DR are classified into five classes based on which the model identifies the stage of DR for the patient. Moreover, the use of transfer learning and hyper-parameter tuning leveraged the CNNs model for better accuracy [29], which was not possible before using non-transferred learning for noisy data. Normalization and data augmentation techniques have been deployed for preprocessing of images and non-local means denoising (NLMD) methods were deployed to eject noise. The performance of the system was tested and evaluated on datasets available on Kaggle and classification accuracy of 95.68% was reported to the application of CNN and transfer learning.
An EyeWeS model was proposed to achieve high performance and greater efficiency by converting the pre-trained convolutional neural network architecture for DR detection into a weekly-supervised model by eliminating the data automation lesion-wise required for pixel-level training [30], [31]. Moreover, the researchers not only focused on the identification of diabetic retinopathy but also directed to the region of the eye that got effected. Besides, this model consists of bag labels to train instances (i.e, image patches) to identify the pooling function
Past work [32] employed a deep learning methodology to detect the diabetic retinopathy (DR). They gave attention to regression activation Maps (RAM) model to capture the discriminative area of input retina image. The network architecture was proposed by authors’ considered that the global averaging pooling (GAP) layer to identify the total contribution in final prediction mentioned by neuron. The RAM model is depicted by
Methodology
The ConvNets can be scaled based on the requirements to achieve better accuracy subject to the availability of resources. There are several ways to achieve this scalability. For example, we can increase the number of layers in ResNet [33] to scale it up from ResNet-18 to ResNet-200. The popular approaches to scale up a ConvNets model is to increase the width or depth of the model or a high resolution image can be used for model training and testing. However, these approaches do not use a well defined criteria to select the width, depth or resolution of the input image. In this study, EfficientNet [17], a balanced and more accurate deep ConvNets is used to detect DR in retina image data set which uses a well defined criteria to scale up on all dimensions and provides better accuracy.
A. Model
This section details the adapted recent convolutional neural network-based EfficientNet model [17] for Diabetic Retinopathy – a disease caused by damage to the retina. If not treated, it can progress to blindness. Existing method of diagnosis is defined by classes 0 to 4, where 0 represents no presence of diseases and 4 represents severe progression. Our proposed model is based on the EfficientNet-B5 [17] with ImageNet as pre-trained weights, and also with different input image sizes of \begin{align*}&width: w = \beta ^{\phi }, depth: d = \alpha ^{\phi }, resolution: r = \gamma ^{\phi }, \\&\text {such that} ~\alpha \cdot \beta ^{2}\cdot \gamma ^{2}\approx 2 \\&\alpha \geq 1,\quad \beta \geq 1,~\gamma \geq 1\tag{1}\end{align*}
Our aim is to optimize the quadratic weighted kappa score [34], [35] which we consider as a regression problem. In order to be flexible in the optimization and we can yield higher scores than solely optimizing for accuracy. We optimize a pre-trained EfficientNet-B5 with a few added layers. The metric that we optimize is the mean squared error which is the squared mean differences between the predictions and labels, as reported in the formula below. By optimizing the given metric we are also optimizing for quadratic weighted kappa in case if we round the predictions afterwards.\begin{equation*} \frac {1}{n}\sum _{i=1}^{n} (Y_{i} - \hat {Y_{i}})^{2}\tag{2}\end{equation*}
Since we do not have much training data (3662 images), we augment the data to increase the robustness of our proposed model. We also rotate the data on any angle. Also, we flip the data both horizontally and vertically. Finally, we divide the data by 128 for normalization.
Also, we found one of the earlier work [36] which developed a technique to deal with imbalanced image data (in our study, its’ retinal images) by sampling method, but it does not improved our prediction due to augmentation step as we remove black background, without augmentation it improves a little which our model seems to overfit due to continuous loss at the same rate without learning any training parameters even on using cyclic learning rate scheduler.
Experimental Evaluation
In this section, we will detail the employed dataset followed by the pre-processing and data augmentation techniques as well as the training steps including inference of trained models.
A. Dataset
In this work, we employ a real-world eye disease image dataset which has a large collection of high resolution retina images captured using fundus photography2 underlying varied circumstances of imaging situations. A trained clinician has assessed the existence of diabetic retinopathy in each image on standard ICDR (International Clinical Diabetic Retinopathy) severity scale of 0 to 4. The rating scores of each DR causes are
The size of images are quite large which is 2,896
B. Preprocessing
This section details the data processing step which we performed before actually modeling this regression task so that training can be made faster on such unbalanced dataset. First and foremost, we transformed each retinal images in RGB channels followed by circular crop on each images based on image center to separate black regions from real-squared images by keeping the original aspect ratio so the images appearance natural. Also, we employ Ben Graham’s method3 to improve the illumination condition of images so that we can enrich the insights from eye images. Then, we resize the original images to
After preprocessing the dataset, we found that images when zoomed to center are exhibiting close relation to the light spots on left. To solve this zoom issue as reported in Figure 3, we performed augmentation on these processed images to make sure it generalises better and does not overfits.
C. Data Augmentation
For better generalisation of the processed dataset, we look into this problem by executing some augmentation steps for the images. We employ the Albumentations library [37] to perform augmentation, where we flipped images horizontally and vertically, image rotation to 360°, zoom images to
We adapted a new augmentation step i.e. Polar unrolling which allows to better leverage pixel space, separate “rotation” from an augmentations list and acquire uniformly scaled eye images (for the instances of no or partial cropping of the fundus image with preservation of radius).
Originally we have images with noticeable black regions. In order to whittle away such regions we at the beginning apply an autocrop. After autocrop, we have the circle’s radius (depicted by the broadest side of an image). Then, extraction of circle using polar unrolling. By unrolling, we changed coordinate space which do not require rotation augmentation. As rotation becomes just plain shift by x-axis (which does not matter for convolutional neural networks). It is more than absence of this type of augmentation. This made all possible rotations for our model considered (except some borders, which can be tackled by the single 50% shift by x-axis).
Our augmentation technique improves the existing contour to transform method on retinal images [6] in terms of processing the imbalanced data and training technique on the state-of-the-art EfficientNet models [17].
Apart from the above augmentation approaches, we also look into a specific oversampling technique [38] to compare the methodological flaw if there any exists in Polar unrolling as key part of the augmentation step. The oversampling of the data can be an alternative step to augmentation due to given imbalanced data. We used image size of
D. Training and Inference
We trained our proposed EfficientNet-B5 based model with a larger batch size of 32 for 5 epochs as in the warm-up stage where we freeze all layers except the last two given the learning rate of 4e-3, and we use Adam optimizer and cosine with learning rate (LR) scheduler. In the complete model training, we trained over 30 epoch under the fine-tuning stage provided all layers are unfreeze. Also, we use Early stopping monitoring validation loss for 5 epoch. The LR scheduler at different steps of training is schematically shown in Figure 5 followed by the training and validation loss in Figure 6.
As we found that the training loss becomes smoother around ~0.4 and steeply improved to less than 0.2 at 30th epoch. This shows that TTA keeps the retinal feature during validation while training and could be improved if trained for long on higher batch size.
We cross-validated our EfficientNet-B5 model 5 folds and then inference these trained models by doing test-time augmentation (TTA) [40] 10 times and taken average of 5 models.
We employ 1xP100 and 1xT4 NVIDIA Tesla GPU for training models.
E. Evaluation Measures
To evaluate the detection model, we use quadratic weighted kappa (or Cohen’ s Kappa), which measures the consensus between expert ratings and submitted ratings. The quadratic weights triggers an optimization factor in the rounding operation. This metric ranges from 0 (random agreement among raters) to 1 (complete agreement among raters). A perfect score of 1.0 is allowed when both the actuals and predictions are same, otherwise the least possible score is -1 which is provided when the predictions are furthest away from actuals. In this work, we treat all actuals as 0’s and all predictions as 4’s. This will give rise a quadratic weighted kappa score of −1.4 The weighted kappa is given below:\begin{equation*} \kappa = 1 - \frac {\sum _{i=1}^{k} \sum _{j=1}^{k} w_{ij} x_{ij}}{\sum _{i=1}^{k} \sum _{j=1}^{k} w_{ij} m_{ij}}\tag{3}\end{equation*}
Experimental Results
This section reports the evaluation result which is the predicted labels of DR for blind spot. For evaluation we predict values from the generator and round it to the nearest integer to get valid predictions. After that we compute the quadratic weighted kappa score on the training set and the validation set which is reported in Table 1.
We perform Grid search in order to optimize the validation score over a range of thresholds and the quadratic weighted kappa score is 0.92 with threshold range of (0.5, 1.5, 2.5, 3.5). Our fine-tuned EfficientNet-B5 based model scores 92.32% on validation set with the input image size of
We perform test time augmentation (same augmentation steps as mentioned in above section) 10 times and averaged all 5 models with their TTA predictions. The prediction result is reported in Figure 8 and Figure 9.
We adopted several convolutional neural network based models reported in Table 2 and found that DenseNet169 perform significantly better than ResNet 50 which is since former model improves on Adam optimizer.
The prediction scores of convolutional neural network models reported in Table 2 treats the identification problem as regression which we also considered for our proposed model on top of EfficientNet-B5 model. The models are trained in general without any fine-tuning steps and EfficientNet-B5 performs better than any other CNN models with a significant improvement of 0.10%.
Case Study of Smart IoT Based Healthcare System
IoT based healthcare systems are more powerful by integration of deep learning approaches. Recent advancements in IoT technology have revolutionized the electronic healthcare research and industry applications. The huge increase in the use of portable smart health devices has increased the quality of health monitoring, diagnosis and collecting data for the clinicians with potential to perform early diagnosis and provide necessary treatment on time. However, the use of personal medical data and records is a concern of data security and data sharing policy [47]. Blockchain [41] provides a solution to deal with privacy and transparency of data. The typical IoT blockchain platform for smart healthcare is illustrated in Fig 10.
The framework consists of vital sign monitoring system, an IoT server [42], a blockchain network, and a communication interface to collect the patients information from the healthcare sensors. All the information is stored securely and communicated to the medical staff for further diagnosis and treatment. This information further can be used to develop decision making models using deep learning models to provide accurate diagnosis in time. The approach once development can be optimized to develop an IoT based smart device and can be used by the medical staff efficiently.
Conclusion and Future Work
In this work, we introduce the state-of-the-art DL based smart health system for identification of blindness in eye disease (diabetic retinopathy) evaluated on a retinal image dataset in the IoT. We have shown that the convergence of IoT with AI can lead to provide effective smart health system.
Our fine-tuned Efficient-B5 based model outperforms CNN and ResNet50 models with 92.32% validation accuracy which predicts diagnosis of diabetic retinopathy severity (eye blindness) on the five-point scale from retinal images. Our baseline model Efficient-B5 (with fine-tuning) model training on the average doctor opinion, a tactics that output state-of-the-art results on identifying blindness by 90.20% of validation accuracy. The freezing and unfreezing for fine-tuned EfficientNet-B5 significantly improved the prediction with 92.32% validation accuracy. The proposed approach has been developed and tested only for the early detection of diabetic retinopathy in diabetic patients. We also performed oversampling strategies for the interpretation on our detected results. For other medical image diagnosis, the approach needs to be tested before making any concluding remarks. We found that there are more number of 0’s and 2’s i.e., no diabetic retinopathy symptom and moderate retinopathy shown in around 89% of the images.
We intend to adopt other CNN architectures [43]–[45] such as UNet with ResNet models and EfficientNet weights [46] with UNet for such imbalanced image collection, and pseudo labeling the imbalanced dataset may potentially improve the prediction for given classes including consideration of this identification task as binary classification by individually labeling the data could be an added advantage. The limitation in doing so is of processing power which greatly increases if using EfficientNet-B6 or B7 weights. The proposed approach has been developed and tested only for the early detection of diabetic retinopathy in diabetic patients.
Declaration of Competing Interest
We wish to confirm that there are no known conflicts of interest associated with this publication.