Scale out object storage explorations

For archive, the following discussions and explorations happened while looking for off-the-shelf solutions that would be a match for the Software Heritage workload.

Explorations

Scale out data and metadata
#3064 (closed) ambry
#3052 (closed) RADOS space benchmark (requires development to reduce the space overhead and maintain performances)
??? RGW
Object packing
#3066 (closed) RocksDB SST
ambry partition format (append only)
#3068 (closed) Sorted String Table (read only)
#3050 (closed) libcephsqlite or SQlite on top of RBD (read write)
#3046 (closed) Using xz-file-format for 1TB archive
#3045 (closed) Using pixz for 1TB archives
#3048 (closed) Using a custom format for 1TB archive
#3069 (closed) Using MZ as a file format
Scale out data and scale up metadata. The metadata is in a database (Rocksdb, etc.) that must be looked up to figure out where the data is to be found, as described in the Finding a needle in Haystack: Facebook’s photo storage.
#3049 (closed) Distributed database + RBD space benchmark (requires development on top of these building blocks)
Storage systems with blockers
#3051 (closed) EOS is too complex (uses RBD + Paxos + QuarkDB for namespace)
#3057 (closed) Seaweedfs is not yet mature (uses large files to pack objects + Paxos + internal database for metadata)
https://github.com/open-io replication is a proprietary feature https://docs.openio.io/latest/source/admin-guide/configuration_replicator.html
https://ipfs.io/ does not provide replication or self-healing. Performances and space overhead are probably the same as the current Software Heritage storage system.
https://www.rozosystems.com/about claims a software patent on the implementation
http://www.orangefs.org/ or http://beegfs.io/ have a focus on high-end computing
https://www.lustre.org/ https://moosefs.com/ are distributed file systems, not object / block storage
min.io stores each object in an individual file on a file system, a space overhead that is identical to the current Software Heritage storage system.
Swift stores each object in an individual file on a file system, a space overhead that is identical to the current Software Heritage storage system.
Inspiration
#3065 (closed) git partial clone (in part because it does packing, in part because it is source code related)
Hardware
- Hardware for object storage

Discussions

Migrated from T3107 (view on Phabricator)

Scale out object storage explorations

Explorations

Discussions

Designs

Child items ...

Activity