Conferences >2021 40th International Sympo...

MinervaFS: A User-Space File System for Generalised Deduplication: (Practical experience report)

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Deduplication exploits the presence of similar data chunks to reduce storage overhead. Generalised deduplication (GD) uses transformation functions to split data into a b...Show More

Metadata

Abstract:

Deduplication exploits the presence of similar data chunks to reduce storage overhead. Generalised deduplication (GD) uses transformation functions to split data into a basis (common to millions of chunks) and a deviation with respect to the basis. Doing so, it avoids computing additional hashes, comparing or differentiating against previously stored chunks. Minervafs is the first FUSE-based file system for GD. We implement and evaluate it using several real-world datasets, e.g., satellite images and virtual machine images, comparing against classical deduplication approaches (ZFS, SDFS), delta compression (xdelta) or compression (Gzip). Compared to ZFS, Minervafs achieves up to 63.53% (average of 27.38%) saving in storage usage and a speedup of 16% in read-heavy workloads. For VM images, MINERVAFS's data compression is on par with Gzip, while outperforming ZFS by severalfold. In contrast to ZFS’ growing RAM costs when more data is stored, MinervaFS’ RAM usage is independent from the amount of data stored, making it well suited to handle growing storage demands.

Published in: 2021 40th International Symposium on Reliable Distributed Systems (SRDS)

Date of Conference: 20-23 September 2021

Date Added to IEEE Xplore: 22 November 2021

ISBN Information:

ISSN Information:

DOI: 10.1109/SRDS53918.2021.00033

Conference Location: Chicago, IL, USA

Funding Agency:

Contents

I. Introduction

In the last decade, we witnessed a stark increase in data generated by various sources, e.g., social networks, Internet of Things (IoT), and persistently stored on public clouds (an estimated 175 ZB by 2025 [1]). Cloud-based storage systems put under enormous pressure on their storage backends, as they must continuously expand the storage infrastructure to cope with such ever-growing demand. There exist different techniques to lower storage footprint: (a) compression [2] and deduplication [3], [4], (b) erasure coding [5], [6] as an alternative to store multiple replicas for reliability, or (c) a combination of them [7].

References is not available for this document.

MIT Libraries

MIT Libraries

MinervaFS: A User-Space File System for Generalised Deduplication: (Practical experience report)

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

MinervaFS: A User-Space File System for Generalised Deduplication: (Practical experience report)

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References