cassandra: Make content_missing query in batches
Instead of calling content_find() for each object, which needs to make two queries for each.
Given the latency of Cassandra queries, this should be a significant speed-up (possibly up to 100 times faster, as this is the value of PARTITION_KEY_RESTRICTION_MAX_SIZE).
This also changes the schema, because CQL does not allow doing IN
queries on compound partition keys.
Test Plan
Both branches are already covered by existing tests
Migrated from D6118 (view on Phabricator)
Merge request reports
Activity
Build is green
Patch application report for D6118 (id=22137)
Rebasing onto 9f00eb9d...
Current branch diff-target is up to date.
Changes applied before test
commit 0f89a9dc7c86eec7dbf2c75180dfd008d6881196 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 20 13:52:17 2021 +0200 cassandra: Make content_missing query in batches Instead of calling content_find() for each object, which needs to make two queries for each. Given the latency of Cassandra queries, this should be a significant speed-up (possibly up to 100 times faster, as this is the value of PARTITION_KEY_RESTRICTION_MAX_SIZE).
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1362/ for more details.
Build was aborted
Patch application report for D6118 (id=22138)
Rebasing onto 9f00eb9d...
Current branch diff-target is up to date.
Changes applied before test
commit a3cc0dc7b104bc8b7f05988a7e0e26fae462ac7f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 20 13:52:17 2021 +0200 cassandra: Make content_missing query in batches Instead of calling content_find() for each object, which needs to make two queries for each. Given the latency of Cassandra queries, this should be a significant speed-up (possibly up to 100 times faster, as this is the value of PARTITION_KEY_RESTRICTION_MAX_SIZE). This also changes the schema, because CQL does not allow doing `IN` queries on compound partition keys.
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1363/ See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1363/console
Build is green
Patch application report for D6118 (id=22138)
Rebasing onto 9f00eb9d...
Current branch diff-target is up to date.
Changes applied before test
commit a3cc0dc7b104bc8b7f05988a7e0e26fae462ac7f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 20 13:52:17 2021 +0200 cassandra: Make content_missing query in batches Instead of calling content_find() for each object, which needs to make two queries for each. Given the latency of Cassandra queries, this should be a significant speed-up (possibly up to 100 times faster, as this is the value of PARTITION_KEY_RESTRICTION_MAX_SIZE). This also changes the schema, because CQL does not allow doing `IN` queries on compound partition keys.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1364/ for more details.
Build has FAILED
Patch application report for D6118 (id=22162)
Rebasing onto 7113198f...
Current branch diff-target is up to date.
Changes applied before test
commit 54b5abfb26267ad56a67ad9fa2dd9d5d075e30f0 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 20 13:52:17 2021 +0200 cassandra: Make content_missing query in batches Instead of calling content_find() for each object, which needs to make two queries for each. Given the latency of Cassandra queries, this should be a significant speed-up (possibly up to 100 times faster, as this is the value of PARTITION_KEY_RESTRICTION_MAX_SIZE). This also changes the schema, because CQL does not allow doing `IN` queries on compound partition keys.
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1368/ See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1368/console