Skip to content

[WIP] "packing" object storage design documentation

This diff adds a design considerations document to the objstorage documentation.

It outlines the problem our object storage is trying to solve, the solutions we've come up with so far, as well as a draft of a new design for a more disk-efficient "packed" object storage, based on experimentations and some literature review around Ceph.

And yes, this is the description of a (somewhat crude) filesystem, trying to balance cramming tiny objects together to avoid wasting space with the ability to store files (way) larger than RADOS supports efficiently.

There's a few TODO points that need to be cleared before this can be implemented:

  • How to efficiently handle index blocks. There is some literature regarding B-Trees backed with RADOS/Ceph which might be interesting to investigate: https://ceph.com/wp-content/uploads/2017/01/CawthonKeyValueStore.pdf. The only issue I can see is that Erasure Coded pools don't support OMAP metadata, which would force the index to be written to a separate, replicated pool.

  • When adding a small object, how to select which data block to write it to. Easy to solve for a single writer (just keep a list of the last block you've written to for the given object size), harder to do properly with several distributed writers.

  • How to handle object restores (i.e. overwriting data on an index node) and deletions. Erasure coded data pools don't support overwriting objects unless you turn a knob on, only create and append.

  • Add some more links to the documents that inspired the design.

Test Plan

cd docs; make html


Migrated from D398 (view on Phabricator)

Merge request reports