Scale out object storage design
[Parent task for all related tasks]
Current status
An object storage design was described and preliminary benchmarks were run.
The winery backend implementation will start October 1st.
Design
Discussions
- Object storage: benchmark updates & preliminary hardware architecture
- Object storage: horizon 2022
- A good name for the object storage
- A practical approach to efficiently store 100 billions small objects in Ceph
- Ceph Quincy CDS & immutable objects
Migrated from T3054 (view on Phabricator)
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Loïc Dachary assigned to @dachary
assigned to @dachary
- Phabricator Migration user marked this issue as related to swh/devel/swh-objstorage#3050 (closed)
marked this issue as related to swh/devel/swh-objstorage#3050 (closed)
- Phabricator Migration user marked this issue as related to swh/devel/swh-objstorage#3048 (closed)
marked this issue as related to swh/devel/swh-objstorage#3048 (closed)
- Phabricator Migration user marked this issue as related to swh/devel/swh-objstorage#3051 (closed)
marked this issue as related to swh/devel/swh-objstorage#3051 (closed)
- Loïc Dachary changed the description
changed the description
- Loïc Dachary changed the description
changed the description
- Loïc Dachary changed the description
changed the description
- Loïc Dachary changed the description
changed the description
- Maintainer
Thanks for this summary/status, very useful. Regarding goals, I think we want to have a read goal also about time to first bite, which is a performance metric which is particularly bad in the current filesystem-based object storage. Not sure what would be a reasonable goal though. Poke @olasd: any idea about a good target for this?
- Maintainer
Here's the output of the following query, which computes exact aggregates for objects smaller than the size boundaries of the original quartiles:
WITH stats AS ( SELECT count(*) filter (WHERE LENGTH <= 1024) AS num_1k, sum(LENGTH) filter (WHERE LENGTH <= 1024) AS size_1k, count(*) filter (WHERE LENGTH <= 1024*3) AS num_3k, sum(LENGTH) filter (WHERE LENGTH <= 1024*3) AS size_3k, count(*) filter (WHERE LENGTH <= 1024*13) AS num_13k, sum(LENGTH) filter (WHERE LENGTH <= 1024*13) AS size_13k, count(*) AS num_total, sum(LENGTH) AS size_total FROM content ) SELECT num_1k as "1k objects", num_1k * 100.0 / num_total AS "1k percentile", pg_size_pretty(size_1k) AS "1k total size", size_1k * 100.0 / size_total AS "1k size percentile", num_3k as "3k objects", num_3k * 100.0 / num_total AS "3k percentile", pg_size_pretty(size_3k) AS "3k total size", size_3k * 100.0 / size_total AS "3k size percentile", num_13k as "13k objects", num_13k * 100.0 / num_total AS "13k percentile", pg_size_pretty(size_13k) AS "13k total size", size_13k * 100.0 / size_total AS "13k size percentile", num_total as "all objects", pg_size_pretty(size_total) AS "total size" FROM stats;
| 1k objects | 2399461986 | 1k percentile | 24.5399538657471905 | 1k total size | 1008 GB | 1k size percentile | 0.13450809072041433971 | 3k objects | 4739399165 | 3k percentile | 48.4711312531961725 | 3k total size | 5155 GB | 3k size percentile | 0.68762191759683564088 | 13k objects | 7358523364 | 13k percentile | 75.2576306381137313 | 13k total size | 21 TB | 13k size percentile | 2.8746496017757644 | all objects | 9777777086 | total size | 732 TB
(these are cumulative, i.e. the 1k percentiles are subsets of the 13k percentiles)
- Author
For the record stats from january 2021
- Loïc Dachary changed the description
changed the description
- Maintainer
@zack, very good point about having a target for the "time to first byte when reading an object".
I don't know what would be a "good" target for that metric; my gut says that staying within 100ms for any given object would be acceptable, as long as the number of parallel readers doesn't impact the amount too much (of course, within the IOPS of the underlying media, etc.).
- Loïc Dachary changed the description
changed the description
- Phabricator Migration user marked this issue as related to #3056
marked this issue as related to #3056