Using git to store objects
To brainstorm parts of an idea, I'm wondering about Git's still-in-development partial clone work, with the caveat that you intend to NEVER checkout the entire repository at the same time.
Ideally, using some manner of fuse filesystem (similar to Git Virtual Filesystem) w/ an index-only clone, naive clients could access the object they wanted, which would be fetched on demand from the git server which has mostly git packs and a few sparse objects that are waiting for packing.
The write path on ingest clients would involve sending back the new data, and git background processes on some regular interval packing the loose objects into new packfiles.
Running this on top of CephFS for now means that you get the ability to move it to future storage systems more easily than any custom RBD/EOS development you might do: bring up enough space, sync the files over, profit.
Git handles the deduplication, compression, access methods, and generates large pack files, which Ceph can store more optimally than the plethora of tiny objects.
[snip]
Being able to take a backup of the Git-on-CephFS is also made a lot easier sin
Migrated from T3065 (view on Phabricator)