Introduction
The services offered in hospitals are a great representation of healthcare. A wide range of specializations and services are offered to different groups of patients with various needs [1]. One of the problems being experienced in hospitals is overcrowding. This problem occurs when the number of patients is greater than that of available resources, including medical equipment, medical staff, or hospital care areas. When hospitals are overcrowded, the waiting time of patients increases, whereas the quality of service and the efficiency of healthcare providers decrease. The issue of overcrowding is a global issue [2] and a public health problem that affects individuals irrespective of one’s social class [3]. The occurrence of this problem is attributed to certain factors, including the inflow and outflow of patients, patients’ waiting time, and efficiency of healthcare providers, agglomerations, and availability of resources [4]. As a result of overcrowding, the quality of service decreases, the risk of mortality increases, and social discrimination prevails in the queues [5]. McHugh [5] found a strong relationship between overcrowding and mortality rate among patients. With the rate at which COVID-19 infection is spreading, an increase in all the aforementioned scenarios has been observed. Therefore, the increasing rate of COVID-19 poses a new challenge to the health sector alongside the expected provided service per day where rate of mortality is expected to be high and diagnosis process takes long time.
There is a considerable research in the literature on the use of the Internet of Things (IoT) to improve health services especially during COVID-19 pandemic. For instance, during quarantine, it is helpful for a proper monitoring system. All high-risk patients are tracked easily using the Internet-based network [6], [7]. Also, this technology is used for the biometric measurements in COVID-19 cases to verify blood pressure, heart beat, and glucose level [8], [9]. The concept of IoT involves the using, processing, and storing information in the cloud; such information is made accessible and can be used independently by intelligent objects connected to the cloud via the Internet [10]. Through IoT, data can be shared and processed so that objects that are specifically designed for the improvement of life can be smartly integrated [1]. The typical representation of the application of IoT in service systems can be seen in smart hospitals that function on the basis of the IoT technology; the construction of such a smart hospital uses the vectors of different application services. This kind of hospital is regarded as a new type of hospital where different functions, including diagnosis, treatment, management, and decision, are combined [11]. In addition, smart hospitals are designed by combining the concepts and ideas of intelligent, informative, and digital hospitals [12]. Integrated hospitals reflect the actual dynamic, elaborate, and specific description of hospitals. When a smart hospital is implemented, it may require the implementation of an application system based on a digital environment. With a smart hospital, relevant information about a service can be rapidly and accurately obtained by people. Therefore, this smart hospital system facilitates the informatization of diagnosis, scientific decision, and management standardization. Moreover, with the combination of application service in the system of smart hospitals, information can be acquired and shared in the hospital. The aim of this is to enhance the implementation of smart diagnosis, smart management, smart service, and smart treatment [12].
COVID-19 is often detected by using molecular approaches, such as quantitative real-time reverse transcription–polymerase chain reaction [13] and other methods, such as serologic tests [14] and viral throat swab testing [15]. However, previous studies have revealed that abnormalities pointing toward lung disease, including COVID-19, can be efficiently detected using chest radiographs (X-rays) [16] and chest computed tomography (CT) scans [17]. The severity of COVID-19 can be assessed by using X-ray tests and CT scans as main detection tools; these tools can also be used for monitoring the emergency case of infected patients and predicting the progression of COVID-19 [18]. However, such emergency situations are accompanied by time limitation; thus, the use of extant conventional manual diagnosis cannot be used [19]. Such procedures are prone to human error; thus, a specialist doctor is required during the testing, reading, and interpretation of the findings because errors can lead to inaccurate findings hence inappropriate treatment. With a spike in the transmission rates of COVID-19 worldwide, hospitals are filled with patients whose health conditions are improving or worsening [20]. Thus, the patient test must be conducted with a high level of efficiency and speed to save as many lives as possible [14]. Intelligent technologies are great assets that can assist in the effective diagnosis and classification of the severity of COVID-19 [16].
Different fields, particularly, in the area of medical diagnosis and disease detection, are experiencing the increasing growth in the use of artificial intelligence (AI) [21]. In different fields, the use of AI has been extensively used because it facilitates the production of accurate detection results while reducing the workload of healthcare systems [22]. Additionally, AI can reduce the time required for decision-making during the process of detection when using traditional methods [23]. One of the main ways through which the prediction, prevention, and detection of future global health risks can be improved is by developing AI in a manner that allows the recognition of the risks of epidemic diseases [24]. A few researchers have introduced different kinds of AI classifiers that have been evaluated using real COVID-19 datasets with different targets and case studies [18]. Despite the benefits offered by AI techniques in terms of diagnosing and classifying COVID-19, the selection of an AI technique that is appropriate for the production of accurate results remains a major challenge [25], [26]. The difficulty associated with the selection of the most appropriate technique for the diagnosis and classification of COVID-19 is attributed to the availability of a wide range of AI techniques [27].
The diagnosis of patients with COVID-19 on the basis of ML models and laboratory findings has found in the study [28]. Some authors have presented a remarkable contribution when they performed filtering and balancing for a dataset that contains 111 laboratory findings from 5644 various patients. They found that only 18 out 111 laboratory findings are important among 600 patients. This dataset has been tested with different deep learning models, and the best accuracy achieved is 92.3% by the CNN-LSTM hybrid model. Even with this remarkable diagnosis accuracy, accurate ML models are still needed, and the accuracy of diagnosis can still be improved. Deep learning models are based on several parameter settings and deep layers [29]–[32]; thus, these models cannot be easily implemented in real-time applications because it needs numerous resources. In other words, these models are not lightweight architectures. Furthermore, the hybrid model (CNN-LSTM) is complex, and it hardly satisfies the resource requirements for this type of models in real-time environments. Moreover, the same study adopted several selected features based on the recommendation of other studies from the medical point of view; meanwhile, the authors have ignored the feature selection approach based on the requirements of ML models (technical point of view), especially because the volume of COVID-19 patient data is unpredictable and urgent medical involvement is needed.
Our study has the following objectives and contributions.
To the best of our knowledge, this study is the first to use three light ML algorithms for COVID-19 prediction based on laboratory data.
The roles of ML and IoT are highlighted in smart hospital environments to solve overcrowding problem during the COVID-19 pandemic.
This study validates the RF, Bernoulli (NB), and SVM methods on COVID-19 diagnosis results based on the original, normalized datasets, and those based on selected laboratory findings using the brute-force feature selection technique.
The accuracy of the COVID-19 diagnosis model is improved.
Methodology
The methodology of this study is divided into two stages. The first stage illustrates the role of ML and IoT in smart hospital environments. The second stage explains the steps of the COVID-19 diagnosis model within a target case study.
A. ML and IoT Model in Smart Hospital Environments
Smart hospitals, which are a novel type of hospitals, are based on the technology of ML and IoT built with the vector of various application service systems. These hospitals typically reflect two technologies that have been applied in special areas of healthcare. Smart hospitals are considered new because it integrates different functions, such as diagnosis, treatment, management, and decision making. The evolution of ML and IoT in smart hospitals during the COVID-19 pandemic can be explained in three stages (see Fig. 1).
First Stage:
Data generation and collection are conducted. The suspected patients are examined using different medical devices, such as PCR, CT scans, and X-rays. Then, the examination data are generated for further analysis. To capture the X-ray or CT scans, a lab technician performs these procedures remotely from a control room through a live streaming of the video images, which can be further processed. In this manner, the time required for the examination and production of base data of the COVID-19 case is reduced. Moreover, with this procedure, less contact is needed. In other words, lab technicians do not need to move from door to door to conduct examinations.
Second Stage:
Numerous patients are investigated every single day during the COVID-19 pandemic; thus, a huge volume of patient data is generated every minute. The problem of overcrowding occurs when the number of patients waiting to be attended to supersede that of the availability of resources (medical equipment, medical practitioners, or hospital care areas). Furthermore, the historical data for each COVID-19 patient are valuable for further analysis. With limited resource capacity in smart hospitals, a practicable solution is required. However, the collected data from the first stage are transferred through the Internet to the IoT cloud where services facilitate the instantaneous and on-demand delivery of computing infrastructure, databases, storage, and applications needed for the processing and analysis of COVID-19 patient data generated from hundreds of cases (shown in Fig. 1). In this stage, patient data are stored for further analysis. Then, when a request for COVID-19 patient diagnosis is requested, data are transferred from the cloud storage to an ML diagnosis model. However, as mentioned earlier, numerous patients must be diagnosed each day during the COVID-19 pandemic. Furthermore, the overcrowding of patients, along with a shortage of doctors for patient diagnosis, poses a huge challenge; thus, a feasible solution is needed. Therefore, in stage one of our proposed methodology, an ML model is adopted to serve as a decision support system for the diagnosis of patients with COVID-19. In this manner, the overcrowding issue and doctors’ workload can be reduced. Usually, the laboratory findings transferred for IoT cloud are fed into ML models to diagnose if the patient is positive or negative for COVID-19. The implementation of the diagnosis model is conducted in a workstation, where the ML model can be trained offline. Through software integrated into the system, the training process can be performed automatically with new data to facilitate the updating of the cloud and improve diagnosis capabilities. Further details of the proposed ML model are highlighted in Section II-B.
Third Stage:
After the diagnosis results are obtained by the ML model, patient health reports are generated and then sent to a decision-making unit where the next action could either be patient quarantine or patient to be brought to the emergency room, and so on.
B. COVID-19 Diagnosis Model
This section explains the proposed ML diagnosis model, which is composed of three main phases as shown in Fig. 2.
Phase One:
Three main subphases are conducted in this phase.
The base dataset for the experiment was extracted from [28]. This study used a laboratory dataset of patients with COVID-19 in the Israelita Albert Einstein Hospital in São Paulo, Brazil. The patient samples were collected to identify who were infected by COVID-19 in the beginning of year 2020. The laboratory dataset contains information on 600 patients with 18 laboratory findings. In this dataset, 520 had no findings, and 80 were patients with COVID-19.
The relevant information extracted from raw data is critical in COVID-19 classification due to its direct influence on classification performance. The main input for the classification of COVID-19 cases are values with each laboratory finding for the mentioned dataset. In this stage, 18 laboratory findings (i.e., red blood cells, hemoglobin, platelets, hematocrit, aspartate transaminase, lymphocytes, monocytes, sodium, urea, basophils, creatinine, serum glucose, alanine transaminase, leukocytes, potassium, eosinophils, C reactive protein, and neutrophils) are included in the feature extraction process.
The diagnosis process for COVID-19 patient data was associated with a classification approach. The diagnosis stage was conducted on the basis of the defined dataset, extracted features, and the implementation of different ML models. Python was used for all classification tasks during the research period. The final output of the diagnosis for COVID-19 was generated in this sub-phase where binary classification was conducted. In other words, two cases are the ML models need to predicate, namely, normal and COVID-19 cases.
Phase Two:
In this phase, almost the same processes were applied, except for the data normalization process. The feature extraction and diagnosis processes were conducted on the basis of the normalized dataset. However, attempts were made by several ML algorithms to identify trends in the data by comparing features of data points. However, an issue emerged due to the great variations in the scales of the features. Normalization is specifically aimed at making every data point has the same scale because each feature has equal relevance. One of the commonly used methods of data normalization is the min–max normalization, which involves the transformation of the minimum value of each feature into 0, maximum value into 1, and every other value into a decimal between 0 and 1.
The dataset had a large discrepancy in the values in one column or between the values in all the columns because of the existence of some negative values and some positive values. Thus, normalization was required to solve the problem of differences between the values. The formula for min–max normalization is as follows:
\begin{equation*} v'=\frac {v-\min }{\max -\min }.\tag{1}\end{equation*} View Source\begin{equation*} v'=\frac {v-\min }{\max -\min }.\tag{1}\end{equation*}
Phase Three:
On the basis of the normalized dataset, three main processes were conducted, with only one process different from the previous phases, that is, feature selection was applied. The selection of laboratory features was performed on the basis of the brute-force feature selection method, which extensively evaluates all candidate combinations of the input features and then finds the most appropriate subset. Apparently, this extensive search incurs high computational cost and a relative risk of overfitting. Therefore, greedy methods should be considered as an alternative for the brute-force method of feature selection. In this study, real-world datasets obtained from more than 10 domains were used to compare the different algorithms in terms of efficiency and accuracy.
To find the optimal model for the diagnosis of patients with COVID-19, the performance of all ML diagnosis models was measured on the basis of an evaluation experiment within each of the three phases. The evaluation experiment was conducted on the basis of the following measurements:\begin{align*}&{\mathrm{ Accuracy}}=\left ({\left ({{\mathrm{ TP+TN}}}\right)}\right)/\left ({\left ({{\mathrm{ TP+TN+FP+FN}}}\right)}\right) \,\, \times 100 \\ {}\tag{2}\\&{\mathrm{ Precision}}={\mathrm{ TP}}/\left ({{\mathrm{ TP+FP}}}\right) \tag{3}\\&{\mathrm{ Recall}}={\mathrm{ TP}}/\left ({{\mathrm{ TP+FN}}}\right) \tag{4}\\&F1=\left ({{\mathrm{ Precision}}*{\mathrm{ Recall}}}\right)/\left ({{\mathrm{ Precision+Recall}}}\right)\tag{5}\end{align*}
Algorithm 1 Pseudocode for the Feature Selection Method
Start brute-force feature selection;
Set and of split the data set to of 80% of the data for training set and 20% for testing set.
Suppose I has different feature sizes i = (0, 1,
a: Suppose feature selection of I = best set size I, by calculated as minimizer of the best
Split data in the training process.
b: Suppose test feature = feature selection set of i;
Choose the best feature set as feature selection i that minimizes the test set score
Results and Discussion
In this research, 18 laboratory findings for 600 patients were used in the proposed COVID-19 prediction method. The data of 600 patients and tests were obtained from a laboratory dataset. Three diverse ML methods, namely, RF classifier, Bernoulli NB, and SVM, were improved and used as ML classifiers. In this section, the results of the proposed model are analyzed and discussed in four sections: COVID-19 diagnosis results based on the (Section III-A) original and (Section III-B) normalized datasets, (Section III-C) those based on selected laboratory findings, and (Section III-D) those against benchmark studies.
A. COVID-19 Diagnosis Results Based on the Original Dataset
In this research, we developed a fully automated approach for COVID-19 prediction using ML methods and medical IoT. Three ML methods (i.e., RF, SVM, and Bernoulli NB) were used on the original dataset. These methods were evaluated in terms of recall, precision, accuracy, F1 score, and AUC. The evaluation results of the methods with an 80–20 train–test split technique are shown in Table I.
In this study, we observed that the best prediction methods are measured by AUC metric is 0.88 for SVM classifier as best COVID-19 predication model in the predictive performance. COVID-19 prediction based on laboratory findings is a difficult task and undergoes a challenging process because sample collection requires a long time and a complex process. Therefore, the best laboratory findings identification outcomes achieved with great evaluation measurements of F1-score of 92%, accuracy of 93.33%, precision of 94%, and recall of 93%, respectively, for SVM classifier. The SVM classifier achieves good results because it is widely used in disease prediction and data sequences, especially when data include a time series. The results of the model evaluation are shown in Fig. 3.
Confusion matrix and ROC curve for three models based on the original dataset. (a) RF. (b) Bernolulli (NB). (c) SVM.
Confusion matrix and ROC curve for three models based on the normalized dataset. (a) RF. (b) Bernolulli (NB). (c) SVM.
B. COVID-19 Diagnosis Results Based on the Normalized Dataset
The number of COVID-19 cases increases every day; thus, it takes time to understand laboratory findings. Consequently, impediments in terms of findings and treatments also increase. Given these restrictions and challenges, the need for clinical support approaches with prediction systems has emerged. Prediction systems may possibly ease the strain on medical centers and hospitals by distinguishing COVID-19 cases. To achieve good decision-making and accurate diagnoses, we used min–max approach to normalize and filter the dataset. Numerous ML techniques have been used to discover patterns within data by comparing data point features. Thus, a challenge for feature extraction is the different scales of features. The objective of normalization is to make each data point have the same scale for all the features and make these features similarly critical. Min–max normalization is one of the most popular methods used to normalize data. In this method, the minimum value of every feature is transformed to 0, the maximum value is transformed to 1, and other values are transformed to a specific ratio between 0 and 1.
We repeated the same experiment using three ML methods (RF, SVM, and Bernoulli NB) on the normalized dataset and evaluated these ML methods in terms of recall, precision, accuracy, F1 scores, and AUC. According to [28], the clinical predictive algorithms can achieve better predication performance with 80–20 splitting approach. The evaluation results of the methods with an 80–20 train–test split technique is shown in Table II.
C. COVID-19 Diagnosis Results Based on Selected Laboratory Findings
To provide an improved COVID-19 identification model using improved and different ML methods. To the best of our knowledge, no research has used different ML techniques with IoT for COVID-19 prediction based on laboratory findings. The present study may encourage researchers to validate the methods by using various laboratory data. To improve accuracy and identify the best model, we used the brute-force technique as the feature selection process to discover the most important laboratory features. Feature selection is important for ML methods because insignificant features influence the prediction execution of ML methods. Feature selection enhances the precision of prediction and decreases the execution time of ML methods. The brute-force feature selection technique aims to assess all conceivable combinations of the input features thoroughly; afterward, the most excellent subset is discovered. Clearly, the comprehensive search’s computational fetched is restrictively tall, with impressive threat of overfitting. Consequently, individuals resort to greedy techniques, such as forward determination.
We repeated the third experiment on the normalized dataset and used the brute-force feature selection technique with three ML methods. The evaluation results of the ML methods with an 80–20 train–test split technique on the normalized dataset, along with brute-force feature selection, are shown in Table III.
The importance of feature selection has been shown on the COVID-19 laboratory findings when we compared the difference among the accuracies of the ML methods. From 18 clinical features, we selected the best 15 features and dropped three features (i.e., monocytes, sodium, and alanine transaminase). Which is almost consistent with selected features based on the medical point of view. However, the best laboratory findings identification outcomes achieved with great evaluation measurements of F1-score of 94%, accuracy of 95%, precision of 95%, and recall of 95%, respectively, for SVM classifier. The results of the models’ evaluation are shown in Fig. 5.
Confusion matrix and ROC curve for three models with constraints of the normalized dataset and feature selection. (a) RF. (b) Bernolulli (NB). (c) SVM.
D. COVID-19 Diagnosis Against Benchmark Studies
Benchmarking is an essential step that must be used in medical processing and diagnosis research to determine the efficiency and reliability of developed approaches. It is usually conducted by using a standard dataset or approaches that have been used for the same problem domain or application. Moreover, benchmarking is achieved by utilizing the best methods for COVID-19 laboratory findings based on ML approaches and feature selection methods existed in the literature. Table IV presents the benchmark studies used in the present work.
As shown in Table IV, different studies have been used and compared on the same dataset with different traditional ML and deep learning methods. Our SVM model achieved better results in accuracy and other measures than other studies. Based on the findings, our study is the first to use different measurements and achieve good results. Our proposed SVM classifier is considered the best method, with an accuracy of 95%, F1 score of 94%, precision of 95%, recall of 95%, and AUC of 95%.
The main limitation associated with our study is data size. The data used in this study comprise information on 600 patients, with some laboratory findings for some patients not measured. Thus, our prediction system has a success range of 90%–95%, depending on the dataset used, type of the data used (original or normalized), and type of feature selection method. Additionally, the laboratory data used in our study were imbalanced; therefore, we balanced this dataset by withdrawing some features and using a normalization method. The execution of these methods can be improved with a larger dataset. Additional experiments and research must be conducted with laboratory findings from other areas to confirm these outcomes.
Conclusion
During the COVID-19 pandemic, numerous patients are expected to be diagnosed, treated, and monitored, thus bringing a huge burden to medical organizations. Adopting additional automated services can reduce the workload for doctors, overcrowding, and mortality rate. Moreover, in smart hospitals, ML approaches and IoT paradigms can serve as clinical decision support systems for handling issues related to the COVID-19 pandemic. To the best of our knowledge, this study is the first to propose an ML and IoT model based on laboratory findings in smart hospital environments. Different ML approaches were implemented, aiming to improve the accuracy of diagnosis for COVID-19 cases. The steps and representation for the proposed model in smart hospitals were described and explained. On the basis of a laboratory dataset, three different experiments were conducted to find the most optimal diagnosis results among the selected ML models. Then, the best ML approach was compared with benchmark studies that have adopted the same laboratory dataset. The results of this study confirm the following.
Based on the diagnosis results in the original COVID-19 laboratory dataset, SVM outperforms ML models in all evaluation metrics.
Based on the normalized dataset diagnosis results, SVM outperforms other ML models in all metrics, except for AUC where the highest value was obtained by the RF model (89%), in the same time each of SVM and RF have scored 92% as F1-score value.
Based on the feature selection approach results, only 15 out of 18 features are effective for the diagnosis of COVID-19 cases, thus reducing the computational load and diagnosis time for ML models. Furthermore, SVM outperforms other models in all evaluation metrics, except for precision where SVM shares an equal value (95%) with RF.
Finally, compared with the other studies that have adopted the same laboratory dataset, the proposed SVM shows a remarkable improvement in terms of diagnosis accuracy (up to 2.7%); the model scored 95% in accuracy, 94% in F1, and 95% in precision, recall, and AUC.
The results of this study are limited within only three ML models. However, in the future, authors plan to investigate more laboratory datasets, ML models, and different feature selection methods.
Author Contributions
Conceptualization—Oana Geman, Karrar Hameed Abdulkareem, Mazin Abed Mohammed; Methodology—Karrar Hameed Abdulkareem, Mazin Abed Mohammed; Software—Mazin Abed Mohammed, Ashish Khanna, Karrar Hameed Abdulkareem; Formal analysis—Karrar Hameed Abdulkareem, Mazin Abed Mohammed, Ashish Khanna, Muhammad Arif, Oana Geman, Deepak Gupta; Investigations—Karrar Hameed Abdulkareem, Oana Geman, Ashish Khanna; Resources—Karrar Hameed Abdulkareem; Data analysis—Karrar Hameed Abdulkareem, Oana Geman, Ashish Khanna; Writing—original draft preparation—Karrar Hameed Abdulkareem, Mazin Abed Mohammed, Ahmad Salim; Writing—review and editing—Karrar Hameed Abdulkareem, Mazin Abed Mohammed, Ahmad Salim, Muhammad Arif, Oana Geman, Deepak Gupta, Ashish Khanna. All authors have read and agreed to the published version of the manuscript.
Ethical Statement
This study used a laboratory dataset of patients with COVID-19 in the Israelita Albert Einstein Hospital in São Paulo, Brazil. This dataset available online and anyone can be used.
Conflicts of Interest
The authors declare no conflict of interest.