build a list of 1M sha1 (extracted from storage's content table)
retrieve these 1M objects from each objstorage from a close machine (azure vm for azure, rocq machine for uffizi, ec2 machine for s3) and measure the time it takes.
retrieve these 1M objects from each objstorage from a distant machine (if possible) and measure the time it takes.
The script used to perform the bench is swh/meta$817, using a few probing runs (limited to 30s) with different number of workers and threads to find a sweet spot.
Since the results on uffizi above did suffer from a few caveats, I've made a few more tests:
a first result has been obtained with a dataset that had only objects stored on the XFS part of the objstorage
a second dataset has been created (with the order by sha256 part to spread the sha1s)
but results are a mix hot/cold cache tests
Made a new dataset using:
select sha1 from content where sha256 > '\x000729010ac682fa942e4bfedb2366da310ca438c1677ef0812dbb53c42bcea2' order by sha256 limit 100000 \g content_sha1_block2
The given sha256 is the last one of the fist dataset.
It's a smaller one (100k) to just get rough numbers for now. A 1M test case will follow.
I've run this test on XFS and ZFS separately, using local objstorage configs instead of using the RPC server running on uffizi (so these tests have been executed on uffizi itself).
Here, cold means the sha1s have not been retrieved in a previous run (fresh dataset), whereas hot means the same test has been executed a second time immediately.
Note: the XFS has cache=data enabled whereas ZFS only have primarycache=metadata; might explain the big differences between these 2 on the hot cache test case.