cassandra: Add alternative algorithms to list missing objects (!727) · Merge requests · Platform / Development / swh-storage

vlorentz requested to merge generated-differential-D6423-source into generated-differential-D6423-target Oct 06, 2021

The existing implementation is now referred to as 'grouped-naive'. It is pretty bad, because it groups together requests that need to be dispatched to multiple servers.

'concurrent' is a new naive strategy, that is easy to implement and should perform nicely.

'grouped-pk-serial' and 'grouped-pk-concurrent' still group the ids they request, but in a smarter way, so each request group only needs to access a single server. I expect 'grouped-pk-concurrent' to be faster than 'grouped-pk-serial', and it may be faster than 'concurrent' but we need benchmarks to know.

Should address this issue: https://forge.softwareheritage.org/swh/infra/sysadm-environment#3577

Test Plan

Two tests will fail because of some side effect I don't understand, but they won't affect actual deployments.

Migrated from D6423 (view on Phabricator)

cassandra: Add alternative algorithms to list missing objects

Test Plan

Merge request reports