Introduction
From a decade data is generated exponentially and it can be accessed very easily with effective cost by leaps and bounds [1]. The data volume may be increased from TB to PB because 2.5 quintillion bytes of data may be gathered per day, according to Walmart, where they group and store about 2.5 PB of data in a hour [2]–[4]. This data may be comprised of images, videos, audios, text, social links, GPS signals, commercial, and social accounts, etc. and might be gathered from different sources, for example, industries, banks, educational institutes, health departments, and other business organizations, etc. [5], [6]. The data is either in the form of structured, semi-structured, or un-structured. The IDC’s (International Data Corporation) verified that the ratio of structured data on internet is around 32% while un-structured data is around 63% [7]. This data in any form can result in big data whenever its volume, velocity, variety, variability, value, visualization, or veracity surpasses the volume of IT system for the storage, processing, and management of that data [3], [8], [9]. Furthermore, whenever the volume of data increased from Terabytes to Zettabytes then it results in big data [10], [77], because at that time massive volume of data will overflow the processing and storing capacity of the IT system [3], [8], [11]–[13], [81].
According to Apache Hadoop, big data is the data that have a large volume and size on which traditional IT system cannot operate, as storing and processing with the time and possessions given [2]. Though big data has no definite size because of its increasing volume from petabytes to exabytes still many organizations have collected this huge volume of data by conducting reviews and surveys all over the world to advance their procedure of decision-making to maintain a good future for their organizational business. Therefore, cloud comes for processing, storing and managing this large volume of data on internet instead of traditional IT systems [11], [12], [14], [66], [68], [69], [80], [81].
To manage this large volume of data for software vendors organizations, cloud computing are used to perform all these processes on the internet instead of local server. Cloud computing is the technology which provides a series of IT related services to their clients over the internet to fulfill their day-today needs. While the quality of these services depends on the delivery of cloud computing possessions, software’s, and data available on the internet instead of traditional systems or other devices on user needs with effective costs [8], [9], [11], [15]–[20], [64], [66], [80].
To process and manage this large amount of data, the network of cloud computing is connected to a large number of servers in a connection with each other’s and then processed data of the end-user are stored on a secret storing address located any-where in the world [20]–[22], [64], [70]. Cloud computing basically provides three types of services which are “Software as a Service (SaaS)”, “Platform as a Service (PaaS)”, and “Infrastructure as a Service (PaaS)”. Some popular examples of cloud computing are: YouTube, Gmail, LinkedIn, ToTok, Facebook, Twitter, Zoom App, and Dropbox etc. Cloud Computing provides agility, easiness, pliability, and scalability that’s why cloud computing is used in organizations for data processing and management very rapidly [1], [8], [12], [17]–[19], [23]–[25], [69], [77].
“Software as a Service (SaaS)” is a service of cloud computing that carry software application on the internet and does not requires to install the software on the user computer, as they can easily and simply access the services of cloud via the internet, for example Google Workspace, Dropbox, and Salesforce. etc. [1], [13], [22], [23], [25]. “Platform as a Service (PaaS)” is a service of cloud computing, which provide a development environment to the users, on which the customers can upload their own software application and coding like “Windows Azure”, “Heroku”, “Force.com”, “Google App Engine”, etc [1], [13], [22], [23], [25]. While “Infrastructure as a Service (IaaS)” is a service of “cloud computing” where multiple “computing resources” are given to the customers by the IaaS. IaaS services are accessed by the cloud customers by using a wide area network such as internet. IaaS services consist of storage, network, operating system, hardware, and storage devices on user request, for example “Amazon Web Services (AWS), Microsoft Azure, and Google Compute Engine (GCE)” [1], [13], [22], [23], [25].
Cloud computing mainly has three types, such as “public cloud, private cloud, and hybrid cloud. The Public cloud is the computing services provided by the third party providers publicly on the internet. This type of cloud services are available for all kind of users and can be use freely but need to pay for the consumed services of the cloud. The Private cloud is a computing service, which is provided on the internet for the selected customers not for all common users on the internet. While the private cloud provides maximum security and privacy through the internal hosting and firewalls. The Hybrid cloud is the combination of public and private clouds. In this type of cloud computing each cloud is managed independently, however the application and data could be shared between the clouds [23], [10], [26], [62].
Presently cloud computing has become very much useful and popular for storage and processing of the data. It has many advantages like multi-tenancy, resource sharing, data storing and clear virtualization, however cloud computing still faces certain security challenges, which are [1], [16], [17], [22], [59], [63], [67]:
Confidentiality
Privacy
Data Integrity issue
Data Availability issue
Trusted Third Party
Interoperability Issue
Malicious Insider attack
Lack of Trust issue
Losing Control over Data
To address these security challenges we have formulated below two research questions:
RQ1:
“What are the main security issues / challenges faced by the vendor organization to secure the Big Data on Cloud Computing?”
RQ2:
“What are the existed solutions / practices as defined in the literature used for avoiding the security issues / challenges faced by the software vendor organization?”
Keeping in view the above research questions, we find out 15 critical security challenges that might affect big data security on the cloud computing and identified 64 practices that would help in overcoming these security challenges from the selected 103 research articles shortlisted for the proposed SLR study. The results validates the semblances and differences in the identified security challenges in different periods, continents, databases, and methods. The SLR challenges and practices identification will also assist software vendor’s companies to secure big data on the cloud computing. For the proposed literature study, we mainly targeted software vendor organizations that develop software applications using agile software methodology.
Background
Nowadays in the digital world, the data produced by every organization increasing exponentially and it is hard to be managed by warehouse technology. The large volume of raw data produced by various data sources, required the big data techniques to analyze it [1], [27], [28]. According to Wal-Mart, they processed more than million of user’s records with in an hour and store up-to 2.5 petabytes of user’s data [3], [4], [29], [30]. Congress library reports that they gathered 235 terabytes of novel data every year and stored 60 petabytes of this data. In 2014, more than 5.5 billion cell phones were used, and each of them can creates terabytes of call records every year [28]. During 2000s, the International Data Corporation (IDC), an international leading market firm reports that that digital world which was 4.4 ZB in the year of 2003, will be grown up-to 44 ZB by 2020 [28], [31].
For big data analytics maximum number of raw data is unstructured data, which is obtained from multiple data sources and application such as weblogs, Facebook, Twitter, LinkedIn, text files, dropbox, emails, images, audio, or video recordings. Big data is destined for handling and managing unstructured data with the help of key value pairs. Big data concept is defined by Gartner and Will Dailey [28], [32], [35]. According to Gartner [28], [32] the big data is that data which has maximum volume, velocity, and variety of information that requires efficient costs, advanced ways of data processing and decision making. Similarly, Will Dailey [28] defined big data as, “The super processing environment that is engineered for the parallel computing process through huge volume of distributed data so to analyze that data”.
The large volume of data results in big data, when its volume increases from terabytes to zettabytes [10] and the volume, value, velocity, variety, veracity, variability, and its visibility run-off the storage and processing ability of IT System [3], [8], [11], [12], [35], [70]. Cloud computing has mainly faced security challenges on four levels which are network level, data level, user authentication level, and generic level [21], [59]–[62]. The above mentioned security challenges faced by the big data on the cloud are discussed in detail below:
Distributed Nodes: this is a network level issue where all the nodes are connected with each other and data flow can be occurred any-where across the nodes. User cannot know about the exact node of computation, data flow, and data location. As the data of each user is distributed around the world [15], [21], [33], [34].
Un-encryption of data: For better performance and efficacy cloud computing like Hadoop and MapReduce store and process the data without encryption, due to which critical data of end user are unsecure and malicious insiders can easily get access to the critical data [12], [13], [15], [18], [33], [35]–[37].
Data Recovery issue: whenever in cloud computing a data loss is occurred due to any cause then recovering the complete data is not possible as some of its parts are deleted permanently. It is also considered as a big challenge for cloud computing to find out the exact node of data deletion for recovery in the cloud [13], [16], [26], [38], [39].
Trust issue: Cloud computing has a trust issue because the storage location of data is not the same as that of customer. Moreover, the customer has no idea of their critical data and its security, that’s why customer do not trust on cloud computing and refer to store their personal data on their own system at home instead of cloud environment [7], [11], [24], [33], [40], [41].
Technical Network issue: In cloud computing data communication occur through unsecured network such as Internet, where every node can easily change data or inter node communication so that to breach the entire connection of the system [15], [40], [42], [43].
Un-controlled access of Data: In cloud computing all the nodes are inter linked with each other for data processing. As every node has free access to the user data, which can make the possibility for some harmful nodes to change or steal the critical data. So this un-controlled administrative access of each node to the data is a critical issue in cloud computing [13], [15], [10], [36], [38].
Cloud computing Security association also discussed the main challenges of security of big data on cloud environment as [26], [36], [44], [45], [61], [67], [76]:
Secure the computation in distributed programming frameworks
Security best practices for non-relational databases
Secure data storage and transactions logs
End-point input validation/filtering
Real-time security monitoring
Scalable and composable privacy-preserving data mining and analytics
Cryptographically enforced data centric security
Granular access control
Granular audits
Source of data
It is contended that avoiding, addressing, or justifying these security challenges by software vendors organization can yield in a successful consequences of the big data security on cloud environment. This motivated us, to explore in detailed the security challenges confronted by the software vendors organizations by conducting a Systematic Literature Review (SLR) that helps in identifying critical security challenges faced by big data over the platform of cloud computing. Furthermore, we are more interested in finding the best solutions or practices, which needs to be executed by software vendor’s organization to pointout, avoid, or mitigate these security challenges.
We contributed in certain aspects, Firstly, we have find out through SLR that there are many security challenges that big data faces on the platform of cloud computing. These challenges were discussed by many researchers where they have defined each security challenge separately. To our knowledge, we haven’t found any such model using SLR in the literature that supports software vendor organizations on the usage of big data over the platform of cloud computing. Secondly, our security model for Big Data usage on Cloud Computing (SMBDCC) will provide solution for security related challenges. Our security model is a unique contribution to support software vendor organization about security related challenges of big data on cloud computing. Our study is based on, to identify the security challenges by using systematic literature review (SLR) and empirical study (ES), and also to examine the solutions/practices, through SLR and empirical study in the software organizations, so that to address, mitigate, and avoid these security challenges.
Methodologgy
We the method of Systematic Literature Review (SLR) approach for the identification of security challenges/issues that will assist software vendor organization for big data usage on cloud computing [46], [24]. According to Gangawane and Devi [24] SLR is a novel method of recognition, analyzation, and gathering of all possible information about a unique technology to know about the novel track and the investigation of research questions. In SLR, the searching of related published work is done with the help of pre-defined search string that is based on the research questions. The SLR used a pre-defined inclusion/exclusion method/criteria for the analyses of the collected data. The SLR consist of three phases, such as, planning a review, conducting a review, and reporting a review [46], [71], [78]. Where, the reliability of the SLR results is greater than ordinary literature review because SLR followed a method of systematic evaluation. For the proposed SLR, we followed step by step procedure to identify the challenges and practices of big data on cloud computing. Where, we select each article based on its relevancy to the topic or search string after that inclusion/exclusion criteria were applied to exclude the irrelevant papers from our search. Next, the data extraction is performed from the selected research papers systematically and then data synthesis & quality of publication is conducted. From the proposed SLR, we initially find out 19 security challenges, which are further reduced to 15 challenges by merging the similar critical security challenges for big data security on cloud.
In our SLR, the planning and conducting phases are already accomplished, and now we need to report the results of conducted phase. The central objective of this research, paper is to highlight all the security challenges/issues and it’s appropriate management with the help of SLR.
We have find out total of 15 challenges (as shown in Table 2), which are very much critical challenges for big data usage on cloud computing, These are ‘Data secrecy issue’, ‘Geographical data location issue’, ‘Unauthorized data access issue’, ‘Lack of Control’, ‘Lack of Data Management’, ‘Network level issues’, ‘Data integrity issues’, ‘Data Recovery issues’, ‘Lack of Trust’, ‘Data Sharing Issue’, ‘Data Availability’, ‘Assets Issue’, and ‘Legal Amenabilities’ on the basis of frequency >=25%, where the same approach used by other researchers [47]. We have used this criteria to consider more challenges to assist the vendor organizations.
A. Search Process and Practice
To make the search string for the SLR, we have follow the following steps
Identify population, outcomes, and intervention for the definition of search term.
Identification of spellings and synonyms of the substitutes.
Keywords verification and validation for search terms in all related selected literature.
For accurate searching result use of Boolean operators (AND/OR), if there is any need to guide the search engine.
Firstly, we designed a search string for trial bases on different databases to identify the relevant research articles. This trial search was searched on the five different search engines i.e. Google scholar, ACM, IEEE, Springer link, and Science direct. Below is the trial search string which shows different results on each database but still the results were not satisfactory.
((“Big data”) AND (“Cloud computing” OR “Cloud environment”) AND (Challenges OR Barriers OR Issues OR Problems)).
The final search string is:
((vendors OR merchants OR retailers OR contractor OR suppliers) AND (“big data” OR “massive data” OR “data science” OR “data analytics”) AND (“cloud computing” OR “cloud environment” OR “cloud technology” OR “cloud airframe” OR “cloud database”) AND (“security challenges” OR “security issues” OR “security risks” OR barriers OR “security problems”) AND (“security practices” OR “security reviews” OR “security methods” OR approaches OR procedures OR “security solutions”)).
The search results of relevant publications that are obtained with the final search string are listed below in TABLE 1.
For the final selection of research articles, as shown in Table. 1, we used the inclusion and exclusion criteria’s, where we need to select the relevant research publications based on complete reading, paper quality, and verification. Each of them are elaborated below:
1) Inclusion Criteria
The inclusion criteria is based on the below terms:
Those research papers will be extracted where big data is discussed in detail.
Those papers will be included where a detail discussion is present about cloud computing.
Those papers which discuss about cloud computing security challenges.
Those research papers which discussed about big data security challenges on the platform of cloud computing.
Those papers which have discussed about the solutions/practices for big data security challenges on cloud computing.
The papers that were written in the English language.
The papers which have similar title to our research article.
The paper which have the keywords similar to our defined search term.
2) Exclusion Criteria
The exclusion criteria is based on the following terms:
Those papers will be excluded, which are not related to big data.
Those papers will also be excluded, which are not relevant to cloud computing.
Those papers will be excluded, which does not match our research questions.
Those papers that have different title from our search string.
Those papers, which do not match the abstract with our search term.
Those paper that did not match the keywords with our research string.
Duplicate papers will also be excluded.
Those papers that are not written in English language.
3) Publication Selection & Quality Assessment
Publication selection is basically the criteria of selection of a research publication, which is done on the basis of paper title, abstract, and keywords. We have selected 611/23,213 papers primarily. The quality of the publications will be assessed when the publication will be selected finally. The quality assessment of a publication will depends on the below questions.
Does the author clearly identify the issues faced by vendor organization that can affect security of big data usage on cloud computing?
What are the practices that are adopted by the author to solve these security issues?
4) Data Extraction
For data extraction we have followed the below criteria’s:
Details of paper publication i.e. Title, Authors Name, Information about Reference Type that either the paper is published in Journal or Conference, Conference Name, Journal Name, Issue and Volume of the Journal, Location of Conference, Year of Publication, Number of Pages etc.
That data which is associated with our research questions.
5) Data Synthesis
With the help of SLR, we identified a list of the security challenges from the sample of 103 published relevant research papers that are included in the SLR. Data synthesis has been performed mostly by the first author of the paper while a handsome support was provided by the rest of authors on the validation of the SLR results. Initially we identified 19 security challenges for the vendor organizations when dealing with cloud data platform, that are further reviewed and validated and therefore some of these security challenges were merged together on the basis of similarities such as geographical data location and distribution data storage were merged together as one challenge. Finally we have a list of 15 challenges which are shown in Table 2.
Findings
This section describes in detail the results obtained from the SLR.
A. Challenges/Barriers/Issues Find Through Systematic Literature Review
To answer RQ1, we have identified critical security challenges for vendor organizations by critically analysing research papers review through SLR that are shown in Table 2. The critical security challenges of big data usage on the platform of cloud computing are identified along with their occurrences in each research paper included in the SLR are: Data Secrecy Issue with (97%), means that the data secrecy issue has been discussed in 100 research papers included in the SLR, Geographical data location issue with (69%), Unauthorized data access Issue with (65%), Lack of Control with (60%), Lack of Data Management having (59%), Network level issues having (58%), Data integrity issues with (56%), Data Recovery issues with (55%), Lack of Trust with (54%), Data Sharing Issue with (53%), Data Availability with (47%), Assets Issue with (35%), Legal Amenabilities with (33%), Lack of quality issues with (25%), and Lack of consistency with (25%). The cloud computing has still security concerns at each stage and on different viewpoints. According to an author from vendors perspective cloud computing has still no security as a service on its best. In cloud, users applications are organized and running on various virtual machines, there is possibilities of providing any vulnerabilities by the cloud vendors organization, which may exist in cloud services, operating systems, or user application can be affected by the hackers [48], [79].
The Geographical data location issue is reported in 69 percent research articles as a possible security challenge that might have a negative impact on the security of big data on cloud computing. In cloud computing, for large amount of data retrieval and storage lot of solutions have been proposed, although many of them have been applied in cloud computing but still there exits many issues that can delay the effective implementation of these solutions. These includes the capacity of cloud technologies and high performance for addressing a large volume of data, enhance the existing file systems for the demand of volume of data retrieval applications, and data storage that how can data will be easily extracted and transfer among the servers [49]. In cloud computing, customer data is stored at several locations all around the globe, where users don’t have any idea that at what exact location their data is stored. In literature, there are many issues linked with the data storage, which are challenges related to “shared storage media”, challenges related to “location of data”, and challenges associated with “reliability of storage media” [15].
The unauthorized data access issue challenge is highlighted in 65 percent research articles of our research work. The data access issue occurs when a user accessing its own data or organizational data. In case of large scale organization of cloud computing, only the relevant employees are given the access to the confidential data, but according to the security policy of cloud and free access of user, an unauthorized user can also get access to that confidential data, which is a sever issue for the organization [16]. According to CERT Insider Threat Centers [50] malevolent insider are users that have an authorized access to the organizational data, services, or systems and deliberately misuses the data or services, which can affect organizational security, confidentiality, availability, or integrity.
The Lack of Control security challenge is described as 60 percent of research articles. The cloud organizations don’t have a direct control on the data and also have no information about their data usage by someone else, which can cause a security issue since there is no translucent technique by which we can directly observe all the resources. It is also possible that data may not be completely removed from server at the time of data sharing [51], [62], [65], [75], [79].
The issue of “Data management” has been recorded in 59 percent of the relevant publications. The Big data is still facing the security issues when an organization moving it to cloud, as data management and analysis will be provided by the different providers [38].
The “Network Level issue” is reported in 58 percent research articles, which is a critical issue faced by software vendors organization while transferring the data over the network. The Network security deals with the security features and network protocol that are used for the transfer of data.
The “Data integrity issues” is defined in 56 percent of the published papers. The data integrity issue take place whenever a hidden modification occur in the data by some malicious insiders or accidental modification occurred so that receiver cannot see the exact data which the sender shared [52].
The “Data Recovery issues” is stated in 55 percent of the related research work. The data recovery is also a big issue for big data on cloud, as whenever there will be any data loss occur on cloud then recovery of complete data is not an easy task [11].
The “Lack of Trust” is also a critical issue for big data on cloud computing and reported in 54 percent of research articles. For cloud service providers, it is better to provide new data control policies to build trust between the user’s [11], [74].
The “Data Sharing Issue” is mentioned and discussed in 53 research papers. The data sharing is a critical issue, which is faced by the big data with many sub-challenges i.e. data transferring speed and traffic jam. The transfer speed tells about the data transfer from one point to another while the traffic jam takes place between local sites, cities or world-wide during data sharing [53], [65].
The “Data Availability” is a critical issue which is reported in 47 percent research articles. The goal of data availability is the free access to the data and cloud resources [54].
The “Assets Issue” is a critical challenge for big data on cloud which is described in more than 35 percent of research papers. The co-occurrence of assets of multiple occupants at the same address, having lack of security control information’s while using the same service of the cloud. The possibility of attacks increased while hosting the set of valued assets on obtainable infrastructure publicly [55].
The “Legal Amenabilities is reported in 33 percent of published research papers. The guidelines and regulations, for example HIPA and SOX forbid the usage of cloud computing. In a variety of disciplines legal amenabilities are necessary for specific information technology infrastructures [56].
Our research findings can benefit the software vendors organization security about their big data usage on the platform of cloud computing. The list of identified security challenges from the literature are shown below in TABLE 2.
The TABLE 2 shows all the 15 critical security challenges along with their frequencies and percentages. All the identified critical security challenges have percentages >=25% which describes that all these security challenges have great impact on the big data security on cloud computing. Among these security challenges we have identified that the Data secrecy issue is the top most issue with the percentage of 97%, which big data faced on cloud platform. The software vendors companies can use our proposed research study to overcome these security issues on cloud platform for better protection of their data.
1) Analysis of the Security Challenges for Big Data Usage on Cloud Computing, Identified Through SLR, for Software Vendors Organization
We performed a statistical analysis on the identified challenges based on four different variables. These variables includes, different continents, time decade, research methods, and database used in the paper. The objective of these analyses is to identify that whether these security challenges remain the same as in every continent, time decade, research method, and database respectively or vice versa.
a: Analysis and Comparison of Challenges/Barriers Across Various Search Engines
We have compared several security challenges identified in two time decades (2000-2010, and 2011-2020). To answer RQII, Table 3 shows a list of challenges identified in these two decades. For the analysis of identified security challenges we have adopted linear by linear association Chi- square test, to find out if there is any significant differences between the challenges in these two decades. The Chi-square linear by linear association is considered more powerful than that of Pearson Chi-square test [57]. When the value of p < 0.05 we usually refer this difference as a significant difference. In case of Data recovery and Data sharing issues the value of p < 0.05, which indicates that in first decade (2000-2010) software vendors organization did not faced any such issue because of low trend of cloud for big data usage. It shows that these challenges are not critical in first decade. But if we see these two challenges in the other decade i.e. from (2011-2020), we can see that these two challenges have frequencies of 57 and 55 respectively, and a percentage of 58% and 56%. So in this two decades data recovery issue and data sharing issue have a significance difference. Furthermore, we have seen that data secrecy issue is the most critical issue for big data in both the decades as in decade (2000-2010) it has the percentage of 100% and in decade (2011-2020) the percentage is 97%. Similarly, the geographical data location issue has the percentage of 100% in decade (2000-2010) and 67% in decade (2011-2020), which shows that this security issue is also critical in both the decades. The lack of control issue also has 100% in first decade and 58% in the second decade, which means that this issue is also critical to overcome while using big data on cloud computing.
We have identified total of 15 challenges from the research articles for these two different decades. Our findings reveal that “Data Secrecy issue”, “Geographical data location issue”, “Unauthorized data access Issue”, “Lack of Control”, “Lack of Data Management”, “Network level issues”, “Data integrity issues”, “Data Recovery issues”, “Lack of Trust”, “Data Sharing Issue”, “Data Availability”, “Assets Issue”, and “Legal Amenabilities” are considered as critical challenges in these two different decades.
The core objective of this research study is to scratch different challenges which have negative influence on software vendor organizations for big data security over cloud computing. The identified security challenges across two different decades are shown in TABLE 3 (RQII).
In Figure 1, we have analyze the frequencies of the security challenges of big data on cloud computing platform in two different time period 2000-2010 and 2011-2020, which described that in in first decade only in 5 research articles these security challenges were highlighted while in second decade there are 98 published articles where these security challenges were discussed. Which clearly shows that these security challenges are very much critical in the second (decade 2011-2020).
b: Analysis and Comparison of Big Data Security on Cloud Computing, Identified Through SLR, Based on Different Continents
We have used SPSS tool in order to find out the frequencies of different security challenges of big data on cloud computing across different continents. To analyze these security challenges we have used chi-square linear by linear association test to find out the significance difference among these challenges across various continents. We compared these challenges across six different continents (Asia, Europe, North America, South America, Africa, and Australia), and also a mixed (combination of two or more continents i.e. Africa and Australia). We have find out the similarities and dissimilarities of these challenges across different continents. The Details of Risk/Challenges across different Continents are shown in TABLE 4.
We have found only one significant difference for the challenge “Unauthorized data access issue” across these continents as mention in the below Table 4. Where this challenge has not found in South America (0%) and (42%) in that of North America, jointly have (27%). This means that unauthorized data access issue is not critical for South America. In Africa reported as 50%, Europe 57%, mixed 67%, and in Asia it was reported in 77% articles. Which shows that for all other continent except South America the issue of un-authorized access was consider as critical challenge.
Data availability issue has (0%) frequency in Australia which means that this security is not critical in this continent. While in Africa having a highest occurrences of 100% in research articles means for Africa this security challenge is very critical.
The percentage of “Legal Amenabilities”, “Lack of quality issues”, and “Lack of consistency” were reported (0%) in Australia which shows that these security issues were not critical in this continent. Moreover our findings shows that these challenges are critical for the rest of the continents and mixed one.
In Figure. 2 we have described about the frequency analysis of our identified security challenges across different continents, these are consists of Asia, Europe, North America, South America, Africa, Australia, and Mixed continent of Africa and Australia, where we have identified that Asia is on top among these continents which shows that in Asia these challenges are very much critical for big data usage on cloud computing.
c: Analysis of Big Data Security on Cloud Computing, Identified Through SLR, Based on Different Study Strategy (Methods)
TABLE 5, describes the results identified through the SLR based on the study strategy. The sample size for this study contains 103 research papers that are identified through the SLR.. We have used the SLR protocol to extract data from each research paper. For our sample size, we have used six study strategies which includes Interviews, Surveys, SLR’s, Literature reviews, Experience reports, Thesis reports. Below is the Table 5, which describes the identified challenges through SLR across various methods/strategies. Figure 3 describes the different articles with respect to different methods used in this research paper.
From our findings, we concluded that majority of the challenges were highlighted in the literature reviews, surveys, and experience reports. In case of interviews, there are seven most critical challenges having the frequencies 100% in research articles they are ‘Data secrecy issue’, ‘Geographical data access issue’, ‘Lack of control’, ‘Lack of data management’, ‘Data sharing issue’, ‘Data availability issue’, and ‘Lack of data consistency’. While the rest eight challenges are not properly discussed or might be out of scope in case of interview as they are highlighted in 0% published papers. These are ‘Unauthorized data access Issue’, ‘Data integrity issue’, ‘Data recovery issue’, ‘Lack of Trust’, ‘Assets issue’, ‘Legal amenabilities’, and ‘Lack of quality issue’.
In case of Survey’s, all the challenges were critical except ‘Lack of quality control issue’ which is reported in 17% research articles. ‘Data secrecy issues’ is the most repeated challenges in this strategy with a percentage of 97%. The rest of the challenges have percentage greater than 25%. In SLR, all the challenges are describes in 100% published articles except ‘Lack of quality control’, and ‘Lack of data consistency’ which have the percentage of 0% in the related articles.
In the strategy of Literature review ‘data secrecy issue’ is the most reported challenge having a percentage of 97%. While ‘Lack of data consistency’ is described in 19% related article which shows that this issue was not critical for the method of Literature review.
The rest 15 challenges are critical because all have their percentages great than 25% as shown above in TABLE 5. ‘Data secrecy issue’ is the most critical challenges for both experience report and thesis repot as it is reported 100% in published research work for both the strategies. The issue ‘Lack of quality control process’ described in 0% research articles, and the challenges ‘Network level issue’, ‘Assets issue’, ‘Legal amenabilities’, ‘Lack of data consistency’ were recorded in 17% published work in case of strategy of experience report. In the strategy of thesis report the issue of ‘Data sharing’ is highlighted in 0% articles. For the main difference among several study strategies, we have used linear by linear Chi-Square test in order to find out the key difference among the 15 big data critical security challenges on cloud.
Our findings describes that in all these strategies ‘Data secrecy issue’ is the mostly reported challenge that can affect software vendors organization while placing big data on cloud platform. Our result discloses more resemblances than that of its differences.
Our findings also tells about the importance of different strategies used in this literature. It also describes that which method is more suitable for our research work.
d: Analysis of Big Data Security on Cloud Computing, Identified Through SLR, Based on Different Data Bases
TABLE 6, describes about the different databases which we used in our research for relevant results and data extraction. We have used five different databases in our research process, which are: “IEEE Explore, ACM, Science Direct, Springer Link,” and Google Scholar. In which, we see that majority of the problems are accumulated from Google Scholar. The security issues ‘Assets issues’, ‘Legal amenabilities’, and ‘Lack of data consistency’ are reported in 9 percent of related articles in IEEE Explore, which shows that these are not critical in case of this database.
‘Data secrecy’ and ‘un-authorized data access’ issues are the most critical challenges in IEEE Explore as 91% and 82% respectively. ‘Data recovery’ and ‘legal amenabilities’ are reported in 0% research articles in ACM database, while Data Secrecy issue and un-authorized data access issues are the most critical challenges for the ACM database and both are reported 100% in the relevant published work.
e: Practices to Enhance Data Secrecy Issue
In TABLE 7, we have addressed the practices to enhance data secrecy issue.
The challenges ‘Data secrecy issue’, ‘un-authorized data access issue’, ‘Lack of control’, ‘Data integrity issue’, ‘Data recovery issue’, and ‘Lack of data consistency’ are the most critical challenges reported in 100% published research work in database of Science Direct. The rest of all challenges are not critical for this database as these are reported 0% in research articles. For the database of Springer Link ‘Lack of quality control issue’, and ‘Lack of data consistency’ are the described in 18% and 23% respectively, which shows that these are not critical challenges. The remaining 13 challenges are critical for this database and reported in more than 35% research papers. The issue of data secrecy is the most critical issue as highlighted in 91% research papers. in the database of Google scholar all the challenges are reported in more than 30% published work except ‘Lack of quality control’ and ‘Lack of data consistency issue’ both have 26% occurrences in research articles. ‘Data secrecy issue’ is the most critical challenge as reported in 100% of research work. For the main difference among various databases, we have used linear by linear Chi-Square test in order to find out the significant difference among the 15 big data security challenges on cloud computing. Our result discloses more comparations than its differences. Our findings also express about the importance of different databases used in this literature. It also tells about which database is more suitable in searching of our research paper. We have found the key differences for the issues of ‘Un-authorized data access issue’, ‘Legal Amenabilities’, and ‘Assets issues’ across different databases. Below is the table for identified challenges through SLR across various databases [7], [11], [24], [33], [40], [41].
2) Practices Identified Through SLR, for Addressing the Identified Big Data Security Challenges on Cloud Computing
With the help of SLR we have identified a total of 63 practices for the security of big data usage on cloud computing from vendors perspective. Which are discussed in detail below:
In the below Tables of solutions/practices we have used several abbreviations. Which are:
‘CSC’ used for “Critical Security Challenge”
‘PCSC’ used for “Practices for Critical Security Challenge”
‘P’ used for paper, like P-1 denotes paper-1.
Analytical Hierarchy Process (AHP)
The process of choosing or identifying alternatives from the given set of factors on the basis of the preference from the decision-makers is known as multi-criteria decision-making approach. This procedure become complex when it includes multiple criteria, while MCDM method describes the findings based on conflicted criteria, e.g. benefit and cost criteria [72], [73].
For this MCDM problem, AHP method is very important, which is used by many researcher for prioritization and analysis [87]–[89]. To prioritize and analyze the identified security challenges of big data, we used the MCDM technique based on the AHP method. The AHP method is basically used for deciding relevant weight between multiple criteria, identification and prioritization of the challenges, and pair-wise comparison method for calculation of weights of the criteria within decision-making problems [83]–[85]. AHP has mainly three phases [86], [73], [87], which are:
Decomposition of the complex decision-making challenges into basic hierarchical structure.
Conclusion of priority-weights of challenges and their sub-ordinate challenges by pair-wise comparisons.
Verification of the consistency level of results.
We have the following equation of AHP to prioritized and analyze the security challenges of big data on cloud platform.
A. Findings of Pair Wise Comparison, Priority Weights and Checking Consistency
In Table 23 and 24, we find out the pairwise comparison matrix by using the equation 1 and normalized matrix using equation 3, 4, and 5 for steadiness level. Where we have
Table. 25 and 26 describe about the pairwise comparison matrix and normalized matrix for the management level. Where the
Table 27 and 28 describe the pairwise comparison and normalized matrix by using the above mentioned equations for the control level of challenges. Here
Table 29 and 30 figure out pairwise comparison and normalized matrix of eminence level with
While, Table 31 and 32 describes the pairwise comparison and normalized matrix of these four levels, where
In Table 33 we have discussed in detail about the local and global weights of the challenges and their ranks of priority. The global weights describes about the contribution of a specific challenge with in the comprehensive study. Global weights of the challenge is the product of local weights of the challenges and weights of its concerned level. For example the global weight for CSC is 0.048 which is obtained as 0.078*0.612. Similarly we have concluded the global ranking values for rest of the challenges described above. Figure 5 shows the overall global ranking of these challenges in detail. Among these 15 critical security challenges CSC1 has the highest global value of 0.0477, and CSC12 has the least global value of 0.00026. The rest of the challenges have the values greater than CSC12 and less than CSC1.
Table 34 described about the detail of prioritization and ranking of these critical security challenges of big data on the platform of cloud computing. Where CSC1 “data secrecy issue” has prioritized as the top most critical or significant security challenge for big data on the cloud platform, so vendors must take serious action against this security challenge. If we compare the result of SLR with AHP approach then we have seen that according to SLR CSC1 is also most serious security challenge as out of 103 research articles in 97 articles CSC1 is identified as the most critical security challenge. CSC9 “lack of trust issue” is identified as the 2nd most significant or critical security challenge for big data usage on cloud computing, and CSC12 “assets issue” is reported as the least significant or critical issue among these security challenges. The rest are shown in the above Table 34. By similar method we can prioritized the identified practices against each identified challenges.
Limitation
The authors of all these research studies were not supposed to report the sincere reasons that why these security issues have negatively influence on software vendors organization for big data usage on cloud. These may be that the majority of research studies were literature review, Surveys, and experience report which may be further subject to publication bias. With the increasing number of papers publication in big data security on cloud from vendor perspective, our SLR procedure may have dropped some related publications. Moreover, some search engines like Google Scholar could not gave access completely for paper extraction.
Conclusion and Future Work
Initially we have identified a list of 19 security challenges with the help of SLR in which we have merged some challenges and finally got 15 challenges which are shown in the Table 2. These 15 challenges “Data Secrecy issue (97%)”, “Geographical data location issue (69%)”, “Unauthorized data access issue (65%)”, “Lack of Control (60%)” “Lack of Data Management”(59%), “Network level issues”(58%), “Data integrity issues”(56%), “Data Recovery issues”(55%), “Lack of Trust”(54%), “Data Sharing Issue”(53%), “Data Availability”(47%), “Assets Issue”(35%), “Legal Amenabilities”(33%), “Lack of quality control process (25%)”, and “Lack of consistency (25%)” are marked as critical security challenges for big data usage on cloud computing from software vendors’ perspective. Also identified the standard 64 practices for these 15 critical security challenges from the selected literature. The software vendors organization needs to focus on these 64 practices to overcome these 15 critical security challenges while using big data on cloud computing.
The goal of our research is to give a protected way to software vendor’s organization’s for big data usage on cloud computing. The future work of our research is to validate the identified security challenges and also to find out practices for these security challenges with the help of empirical study apart from the identified and discussed above. Furthermore, we plan to conduct a case study in the relevant software vendor’s organization as in that of Capability Maturity Model Integration (CCMI) model [82], to identify each vendor organization level of our proposed security model and finally to assist them in using big data on cloud. Additionally, in future, we want to prioritized and analyzed these security challenges of big data on cloud platform by applying Fuzzy TOPSIS approach to identify the most important and critical challenges amongst the identified one.