Skip to content
Snippets Groups Projects

Add test for origin_visit_get_latest in presence of mismatched id and date orders

It was unclear this actually worked; I had to write this test to realize the code wasn't buggy.

Also replaced a conditional that is always False (because Cassandra always returns results in the order of the clustering key) with an assertion, so the code is less confusing.


Migrated from D6121 (view on Phabricator)

Merge request reports

Closed by Phabricator Migration userPhabricator Migration user 3 years ago (Aug 24, 2021 2:14pm UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build is green

    Patch application report for D6121 (id=22143)

    Could not rebase; Attempt merge onto 9f00eb9d...

    Updating 9f00eb9d..e291c74b
    Fast-forward
     swh/storage/cassandra/cql.py       | 45 ++++++++++++++++++++++++
     swh/storage/cassandra/model.py     |  4 +--
     swh/storage/cassandra/schema.py    |  2 +-
     swh/storage/cassandra/storage.py   | 26 ++++++++++++--
     swh/storage/in_memory.py           | 11 ++++++
     swh/storage/tests/storage_tests.py | 70 ++++++++++++++++++++++++++++++++++++++
     6 files changed, 152 insertions(+), 6 deletions(-)
    Changes applied before test
    commit e291c74b04b8e7501f4e41ea237591038ff2d9b8
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Fri Aug 20 20:11:51 2021 +0200
    
        Add test for origin_visit_get_latest in presence of mismatched id and date orders
        
        It was unclear this actually worked; I had to write this test to realize
        the code wasn't buggy.
        
        Also replaced a conditional that is always False (because Cassandra
        always returns results in the order of the clustering key) with an
        assertion, so the code is less confusing.
    
    commit 724a67e06fd6e6c9ed93c28dae79db43239e7fc9
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Fri Aug 20 18:12:26 2021 +0200
    
        cassandra: Bump next_visit_id when origin_visit_add is called by a replayer
        
        When called by a replayer, the visit.visit field is set; but
        origin.next_visit_id was never incremented, so on the next loader
        run, the visit id would be 1 even if there is already a visit
        with that id.
    
    commit a3cc0dc7b104bc8b7f05988a7e0e26fae462ac7f
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Fri Aug 20 13:52:17 2021 +0200
    
        cassandra: Make content_missing query in batches
        
        Instead of calling content_find() for each object, which needs to make
        two queries for each.
        
        Given the latency of Cassandra queries, this should be a significant
        speed-up (possibly up to 100 times faster, as this is the value of
        PARTITION_KEY_RESTRICTION_MAX_SIZE).
        
        This also changes the schema, because CQL does not allow doing `IN`
        queries on compound partition keys.

    See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1366/ for more details.

  • Merge request was accepted

  • Antoine Lambert approved this merge request

    approved this merge request

  • Author Maintainer

    rebase

  • Build is green

    Patch application report for D6121 (id=22164)

    Could not rebase; Attempt merge onto 7113198f...

    Updating 7113198f..8f1cdf65
    Fast-forward
     swh/storage/cassandra/cql.py       | 45 ++++++++++++++++++++++++
     swh/storage/cassandra/model.py     |  4 +--
     swh/storage/cassandra/schema.py    |  2 +-
     swh/storage/cassandra/storage.py   | 26 ++++++++++++--
     swh/storage/in_memory.py           | 11 ++++++
     swh/storage/tests/storage_tests.py | 70 ++++++++++++++++++++++++++++++++++++++
     6 files changed, 152 insertions(+), 6 deletions(-)
    Changes applied before test
    commit 8f1cdf65a1056dac42755e8c70ae38f3d34aa459
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Fri Aug 20 20:11:51 2021 +0200
    
        Add test for origin_visit_get_latest in presence of mismatched id and date orders
        
        It was unclear this actually worked; I had to write this test to realize
        the code wasn't buggy.
        
        Also replaced a conditional that is always False (because Cassandra
        always returns results in the order of the clustering key) with an
        assertion, so the code is less confusing.
    
    commit cf880db30bb549ccbdbb2cdd05b61d124ed90be7
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Fri Aug 20 18:12:26 2021 +0200
    
        cassandra: Bump next_visit_id when origin_visit_add is called by a replayer
        
        When called by a replayer, the visit.visit field is set; but
        origin.next_visit_id was never incremented, so on the next loader
        run, the visit id would be 1 even if there is already a visit
        with that id.
    
    commit 54b5abfb26267ad56a67ad9fa2dd9d5d075e30f0
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Fri Aug 20 13:52:17 2021 +0200
    
        cassandra: Make content_missing query in batches
        
        Instead of calling content_find() for each object, which needs to make
        two queries for each.
        
        Given the latency of Cassandra queries, this should be a significant
        speed-up (possibly up to 100 times faster, as this is the value of
        PARTITION_KEY_RESTRICTION_MAX_SIZE).
        
        This also changes the schema, because CQL does not allow doing `IN`
        queries on compound partition keys.

    See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1370/ for more details.

  • Author Maintainer

    Merge request was merged

  • closed

Please register or sign in to reply
Loading