Skip to content

cassandra: Add alternative algorithms to list missing objects

The existing implementation is now referred to as 'grouped-naive'. It is pretty bad, because it groups together requests that need to be dispatched to multiple servers.

'concurrent' is a new naive strategy, that is easy to implement and should perform nicely.

'grouped-pk-serial' and 'grouped-pk-concurrent' still group the ids they request, but in a smarter way, so each request group only needs to access a single server. I expect 'grouped-pk-concurrent' to be faster than 'grouped-pk-serial', and it may be faster than 'concurrent' but we need benchmarks to know.

Should address this issue: https://forge.softwareheritage.org/swh/infra/sysadm-environment#3577

Test Plan

Two tests will fail because of some side effect I don't understand, but they won't affect actual deployments.


Migrated from D6423 (view on Phabricator)

Merge request reports