Scale out object storage explorations
For archive, the following discussions and explorations happened while looking for off-the-shelf solutions that would be a match for the Software Heritage workload.
Explorations
- Scale out data and metadata
- #3064 (closed) ambry
- #3052 (closed) RADOS space benchmark (requires development to reduce the space overhead and maintain performances)
- ??? RGW
- Object packing
- #3066 (closed) RocksDB SST
- ambry partition format (append only)
- #3068 (closed) Sorted String Table (read only)
- #3050 (closed) libcephsqlite or SQlite on top of RBD (read write)
- #3046 (closed) Using xz-file-format for 1TB archive
- #3045 (closed) Using pixz for 1TB archives
- #3048 (closed) Using a custom format for 1TB archive
- #3069 (closed) Using MZ as a file format
- Scale out data and scale up metadata. The metadata is in a database (Rocksdb, etc.) that must be looked up to figure out where the data is to be found, as described in the Finding a needle in Haystack: Facebook’s photo storage.
- #3049 (closed) Distributed database + RBD space benchmark (requires development on top of these building blocks)
- Storage systems with blockers
- #3051 (closed) EOS is too complex (uses RBD + Paxos + QuarkDB for namespace)
- #3057 (closed) Seaweedfs is not yet mature (uses large files to pack objects + Paxos + internal database for metadata)
- https://github.com/open-io replication is a proprietary feature https://docs.openio.io/latest/source/admin-guide/configuration_replicator.html
- https://ipfs.io/ does not provide replication or self-healing. Performances and space overhead are probably the same as the current Software Heritage storage system.
- https://www.rozosystems.com/about claims a software patent on the implementation
- http://www.orangefs.org/ or http://beegfs.io/ have a focus on high-end computing
- https://www.lustre.org/ https://moosefs.com/ are distributed file systems, not object / block storage
- min.io stores each object in an individual file on a file system, a space overhead that is identical to the current Software Heritage storage system.
- Swift stores each object in an individual file on a file system, a space overhead that is identical to the current Software Heritage storage system.
- Inspiration
- #3065 (closed) git partial clone (in part because it does packing, in part because it is source code related)
- Hardware
Discussions
- Redis as a K/V store for billions of objects
- Looking for hardware to benchmark the object storage design
- Scale out object storage design (take 1)
- Hardware for object storage
- Storing 20 billions of immutable objects in Ceph, 75% <16KB
- Small RGW objects and RADOS 64KB minimun size
- Using RBD to pack billions of small files
- Benchmarking RBD to store artifacts
- Durable self healing distributed append only storage
Migrated from T3107 (view on Phabricator)