Abstract:
The ongoing growth in information and technology sector has increased storage requirement in cloud data centers with unprecedented pace. Global storage reached 2.8 trilli...Show MoreMetadata
Abstract:
The ongoing growth in information and technology sector has increased storage requirement in cloud data centers with unprecedented pace. Global storage reached 2.8 trillion GB as per EMC Digital Universe study 2012 [1] and will reach 5247GB per user by 2020. Data redundancy is one of the root factors in storage scarcity because clients upload data without knowing the content available on the server. Ponemon Institute detected 18% redundant data in "National Survey on Data Centers Outages" [15]. To resolve this issue, the concept of data deduplication is used, where each file has a unique hash identifier that changes with the content of the file. If a client tries to save duplicate of an existing file, he/she receives a pointer for retrieving the existing file. In this way, data deduplication helps in storage reduction and identifying redundant copies of the same files stored at data centers. Therefore, many popular cloud storage vendors like Amazon, Google Dropbox, IBM Cloud, Microsoft Azure, Spider Oak, Waula and Mozy adopted data deduplication. In this study, we have made a comparison of commonly used File-level deduplication with our proposed Block-level deduplication for cloud data centers. We implemented the two deduplication approaches on a local dataset and demonstrated that the proposed Block-level deduplication approach shows 5% better results as compared to the File-level deduplication approach. Furthermore, we expect that the performance will further be improved by considering a large dataset with more users working in similar domain.
Published in: 2020 3rd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)
Date of Conference: 29-30 January 2020
Date Added to IEEE Xplore: 23 April 2020
ISBN Information: