Add test for origin_visit_get_latest in presence of mismatched id and date orders
Closed
requested to merge generated-differential-D6121-source into generated-differential-D6121-target
It was unclear this actually worked; I had to write this test to realize the code wasn't buggy.
Also replaced a conditional that is always False (because Cassandra always returns results in the order of the clustering key) with an assertion, so the code is less confusing.
Migrated from D6121 (view on Phabricator)
Merge request reports
Activity
Build is green
Patch application report for D6121 (id=22143)
Could not rebase; Attempt merge onto 9f00eb9d...
Updating 9f00eb9d..e291c74b Fast-forward swh/storage/cassandra/cql.py | 45 ++++++++++++++++++++++++ swh/storage/cassandra/model.py | 4 +-- swh/storage/cassandra/schema.py | 2 +- swh/storage/cassandra/storage.py | 26 ++++++++++++-- swh/storage/in_memory.py | 11 ++++++ swh/storage/tests/storage_tests.py | 70 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 152 insertions(+), 6 deletions(-)
Changes applied before test
commit e291c74b04b8e7501f4e41ea237591038ff2d9b8 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 20 20:11:51 2021 +0200 Add test for origin_visit_get_latest in presence of mismatched id and date orders It was unclear this actually worked; I had to write this test to realize the code wasn't buggy. Also replaced a conditional that is always False (because Cassandra always returns results in the order of the clustering key) with an assertion, so the code is less confusing. commit 724a67e06fd6e6c9ed93c28dae79db43239e7fc9 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 20 18:12:26 2021 +0200 cassandra: Bump next_visit_id when origin_visit_add is called by a replayer When called by a replayer, the visit.visit field is set; but origin.next_visit_id was never incremented, so on the next loader run, the visit id would be 1 even if there is already a visit with that id. commit a3cc0dc7b104bc8b7f05988a7e0e26fae462ac7f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 20 13:52:17 2021 +0200 cassandra: Make content_missing query in batches Instead of calling content_find() for each object, which needs to make two queries for each. Given the latency of Cassandra queries, this should be a significant speed-up (possibly up to 100 times faster, as this is the value of PARTITION_KEY_RESTRICTION_MAX_SIZE). This also changes the schema, because CQL does not allow doing `IN` queries on compound partition keys.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1366/ for more details.
Build is green
Patch application report for D6121 (id=22164)
Could not rebase; Attempt merge onto 7113198f...
Updating 7113198f..8f1cdf65 Fast-forward swh/storage/cassandra/cql.py | 45 ++++++++++++++++++++++++ swh/storage/cassandra/model.py | 4 +-- swh/storage/cassandra/schema.py | 2 +- swh/storage/cassandra/storage.py | 26 ++++++++++++-- swh/storage/in_memory.py | 11 ++++++ swh/storage/tests/storage_tests.py | 70 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 152 insertions(+), 6 deletions(-)
Changes applied before test
commit 8f1cdf65a1056dac42755e8c70ae38f3d34aa459 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 20 20:11:51 2021 +0200 Add test for origin_visit_get_latest in presence of mismatched id and date orders It was unclear this actually worked; I had to write this test to realize the code wasn't buggy. Also replaced a conditional that is always False (because Cassandra always returns results in the order of the clustering key) with an assertion, so the code is less confusing. commit cf880db30bb549ccbdbb2cdd05b61d124ed90be7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 20 18:12:26 2021 +0200 cassandra: Bump next_visit_id when origin_visit_add is called by a replayer When called by a replayer, the visit.visit field is set; but origin.next_visit_id was never incremented, so on the next loader run, the visit id would be 1 even if there is already a visit with that id. commit 54b5abfb26267ad56a67ad9fa2dd9d5d075e30f0 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 20 13:52:17 2021 +0200 cassandra: Make content_missing query in batches Instead of calling content_find() for each object, which needs to make two queries for each. Given the latency of Cassandra queries, this should be a significant speed-up (possibly up to 100 times faster, as this is the value of PARTITION_KEY_RESTRICTION_MAX_SIZE). This also changes the schema, because CQL does not allow doing `IN` queries on compound partition keys.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1370/ for more details.
Please register or sign in to reply