FOSDEM 2026
/
Schedule
/
Events
/
Developer rooms
/
Software Defined Storage
/
Dedup for S3: Smarter Storage, Zero Duplicates

Dedup for S3: Smarter Storage, Zero Duplicates

Track: Software Defined Storage
Room: UB4.136
Day: Saturday
Start: 18:05
End: 18:35
Video only: ub4136
Chat: Join the conversation!

Modern S3 workloads generate massive duplicate data—from backup chains to model checkpoints—quietly consuming petabytes. Ceph’s new S3 data deduplication feature solves this by identifying identical content through chunking and cryptographic hashing, storing it only once, and tracking references with a lightweight dedup index.

This talk explains how dedup works inside Ceph RGW: how chunks are created, how refcounts stay consistent under parallel writes, versioning, and deletes, and how the system avoids corruption using atomic metadata updates and safe garbage collection. We’ll also share early performance insights from large-scale tests and show how dedup can significantly reduce capacity, I/O, and network overhead—without requiring any changes to S3 applications.

If you're interested in building efficient, scalable, open-source object storage, this session shows how Ceph makes S3 smarter with zero duplicates.

Speakers

Vidushi Mishra

fosdem-2026

Brussels / 31 January & 1 February 2026

Dedup for S3: Smarter Storage, Zero Duplicates

Speakers

Links