Skip to content
Snippets Groups Projects

storage: Refactor internal implementations to use "origin visit update" model representation

(origin visit update objects as first-citizen objects in swh model)

(pairing with @vlorentz)

currently:

  • in-memory implementation
  • pg storage implementation
  • cassandra implementation
  • sql migration script
  • Land swh-model!272 (closed) (to make those tests go green here)
  • rework the perimeter to only include the origin-visit-update changes
  • split diff into multiple diffs (1 per backend, D2937, D2938, D2939)

Related to T2310 Related to swh-model!272 (closed)

Test Plan

tox


Migrated from D2879 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
    • test_storage: Remove no longer relevant validation test
    • storage*: Align origin_visit_update interface and implementations

    still wip

  • Plug to a dedicated branch

  • Rebase on !365 (closed)

    Depends on !365 (closed)

    • Rebase on latest master
    • storage*: Align origin_visit_update interface and implementations
    • wip: pg storage: Start using origin_visit_update

    still wip

  • DONE:

    • in_memory: Adapt internal implementations to use origin visit update
    • storage*: Align origin_visit_update interface and implementations
    • wip: pg storage: Start using origin_visit_update
    • storage*: Add types to origin_visit_get
    • inline now (to amend in wip pg storage)
    • storage: Define a now() function
    • Add missing type annotations on origin_visit_get* endpoints
    • pg storage: Implement origin visit update
    • fix type

    TODO:

    • still wip
    • remains cassandra backend to implement

    Note: This will be splitted later.

    • Ensure visit id is set in origin_visit_upsert before kafka write
    • wip: cassandra: Implement origin_visit_update internally

    cassandra implementation is not complete yet

  • Build has FAILED

    Patch application report for D2879 (id=10397)

    Rebasing onto 623a1b75...

    First, rewinding head to replay your work on top of it...
    Applying: storage: Define a now() function
    Applying: storage*: Align origin_visit_update interface and implementations
    Applying: storage*: Add types to origin_visit_get
    Applying: in_memory: Adapt internal implementations to use origin visit update
    Applying: pg-storage: Start using origin_visit_update
    Applying: Add missing type annotations on origin_visit_get* endpoints
    Applying: pg-storage: Implement origin visit update
    Using index info to reconstruct a base tree...
    M	swh/storage/tests/test_storage.py
    M	swh/storage/validate.py
    Falling back to patching base and 3-way merge...
    Auto-merging swh/storage/validate.py
    CONFLICT (content): Merge conflict in swh/storage/validate.py
    Auto-merging swh/storage/tests/test_storage.py
    Patch failed at 0007 pg-storage: Implement origin visit update
    
    Resolve all conflicts manually, mark them as resolved with
    "git add/rm <conflicted_files>", then run "git rebase --continue".
    You can instead skip this commit: run "git rebase --skip".
    To abort and get back to the state before "git rebase", run "git rebase --abort".
    

    Rebase failed (ret=1)!

    Could not rebase; Attempt merge onto 623a1b75...

    Already up to date.
    Changes applied before test
    commit 62e7718868af8715d110703b9182130ba04bdc2d
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Mon Mar 30 17:01:13 2020 +0200
    
        Ensure visit id is set in origin_visit_upsert before kafka write
    
    commit 0a0953de12d1b63681a29f1622943434f7f04c02
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Mon Mar 30 13:08:32 2020 +0200
    
        wip: cassandra: Implement origin_visit_update internally
        
        methods:
        - origin_visit_add
        - origin_visit_update
        - origin_visit_upsert
        - origin_visit_get
        - origin_visit_find_by_date
        - origin_visit_get_by
        - origin_visit_get_latest
    
    commit dd54c6a2ad96fec391252bd393e6d30ec6cc0030
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 17:38:58 2020 +0100
    
        pg-storage: Implement origin visit update
    
    commit f6ad731889ed78a656b294a85723a4c55e7bdcac
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 17:28:56 2020 +0100
    
        Add missing type annotations on origin_visit_get* endpoints
    
    commit ffff91e1b08856ab0af1d4b0c307795bf60c4c72
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Thu Mar 26 14:15:17 2020 +0100
    
        pg-storage: Start using origin_visit_update
    
    commit f07b0edc98f992a1707b9cf2ae9842ddd0b2c5bc
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Wed Mar 25 17:53:48 2020 +0100
    
        in_memory: Adapt internal implementations to use origin visit update
        
        (pairing with @vlorentz)
        
        Related to [T2310](https://forge.softwareheritage.org/T2310 'view original for T2310 on Phabricator')
    
    commit 0961d931edc1c68171e48e9a96e56026ec48b8ea
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 14:40:36 2020 +0100
    
        storage*: Add types to origin_visit_get
    
    commit 5859dc794a1d8fa1a43049506ef9711d225a0b09
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Thu Mar 26 08:57:18 2020 +0100
    
        storage*: Align origin_visit_update interface and implementations
    
    commit dd1dd1c62b2bed4492130894265f8b2ed22d9daa
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 15:08:35 2020 +0100
    
        storage: Define a now() function

    Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/29/ See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/29/console

  • Finish cassandra implementation

  • Build has FAILED

    Patch application report for D2879 (id=10401)

    Rebasing onto 623a1b75...

    First, rewinding head to replay your work on top of it...
    Applying: storage: Define a now() function
    Applying: storage*: Align origin_visit_update interface and implementations
    Applying: storage*: Add types to origin_visit_get
    Applying: cassandra/cql: Simplify type using Iterator
    Applying: in_memory: Adapt internal implementations to use origin visit update
    Applying: pg-storage: Start using origin_visit_update
    Applying: Add missing type annotations on origin_visit_get* endpoints
    Applying: pg-storage: Adapt internal implementations to use origin visit update
    Using index info to reconstruct a base tree...
    M	swh/storage/tests/test_storage.py
    M	swh/storage/validate.py
    Falling back to patching base and 3-way merge...
    Auto-merging swh/storage/validate.py
    CONFLICT (content): Merge conflict in swh/storage/validate.py
    Auto-merging swh/storage/tests/test_storage.py
    Patch failed at 0008 pg-storage: Adapt internal implementations to use origin visit update
    
    Resolve all conflicts manually, mark them as resolved with
    "git add/rm <conflicted_files>", then run "git rebase --continue".
    You can instead skip this commit: run "git rebase --skip".
    To abort and get back to the state before "git rebase", run "git rebase --abort".
    

    Rebase failed (ret=1)!

    Could not rebase; Attempt merge onto 623a1b75...

    Already up to date.
    Changes applied before test
    commit 9fa8a16610ecd60d8fd76406f1248378af9e0a3d
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Mon Mar 30 17:01:13 2020 +0200
    
        Ensure visit id is set in origin_visit_upsert before kafka write
    
    commit b41430fac90e15c6482719a4f025748c6a9059f2
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Mon Mar 30 13:08:32 2020 +0200
    
        cassandra: Adapt internal implementations to use origin visit update
    
    commit 28ba789262abec36e55cd2f67e6a25d6fe9a1902
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 17:38:58 2020 +0100
    
        pg-storage: Adapt internal implementations to use origin visit update
    
    commit 5b765bd879c3d783f44a5e7121586799da401886
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 17:28:56 2020 +0100
    
        Add missing type annotations on origin_visit_get* endpoints
    
    commit 8a66356234f3925e7b2ef5658e3f12dbebc2d6a8
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Thu Mar 26 14:15:17 2020 +0100
    
        pg-storage: Start using origin_visit_update
    
    commit 46d7ebfabda7b6169eb8f1219e8a02d215f6355d
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Wed Mar 25 17:53:48 2020 +0100
    
        in_memory: Adapt internal implementations to use origin visit update
        
        (pairing with @vlorentz)
        
        Related to [T2310](https://forge.softwareheritage.org/T2310 'view original for T2310 on Phabricator')
    
    commit f3824d3ccc1a178f7d09c2036ce91a4343cecc6c
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Mon Mar 30 20:30:44 2020 +0200
    
        cassandra/cql: Simplify type using Iterator
    
    commit 0961d931edc1c68171e48e9a96e56026ec48b8ea
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 14:40:36 2020 +0100
    
        storage*: Add types to origin_visit_get
    
    commit 5859dc794a1d8fa1a43049506ef9711d225a0b09
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Thu Mar 26 08:57:18 2020 +0100
    
        storage*: Align origin_visit_update interface and implementations
    
    commit dd1dd1c62b2bed4492130894265f8b2ed22d9daa
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 15:08:35 2020 +0100
    
        storage: Define a now() function

    Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/30/ See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/30/console

  • Add migration script (schema, data)

  • Build has FAILED

    Patch application report for D2879 (id=10402)

    Rebasing onto 623a1b75...

    First, rewinding head to replay your work on top of it...
    Applying: storage: Define a now() function
    Applying: storage*: Align origin_visit_update interface and implementations
    Applying: storage*: Add types to origin_visit_get
    Applying: cassandra/cql: Simplify type using Iterator
    Applying: in_memory: Adapt internal implementations to use origin visit update
    Applying: pg-storage: Start using origin_visit_update
    Applying: Add missing type annotations on origin_visit_get* endpoints
    Applying: pg-storage: Adapt internal implementations to use origin visit update
    Using index info to reconstruct a base tree...
    M	swh/storage/tests/test_storage.py
    M	swh/storage/validate.py
    Falling back to patching base and 3-way merge...
    Auto-merging swh/storage/validate.py
    CONFLICT (content): Merge conflict in swh/storage/validate.py
    Auto-merging swh/storage/tests/test_storage.py
    Patch failed at 0008 pg-storage: Adapt internal implementations to use origin visit update
    
    Resolve all conflicts manually, mark them as resolved with
    "git add/rm <conflicted_files>", then run "git rebase --continue".
    You can instead skip this commit: run "git rebase --skip".
    To abort and get back to the state before "git rebase", run "git rebase --abort".
    

    Rebase failed (ret=1)!

    Could not rebase; Attempt merge onto 623a1b75...

    Already up to date.
    Changes applied before test
    commit 72d2928e96ebcfd4e73640a70ef063fb5650dc9e
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Mon Mar 30 17:01:13 2020 +0200
    
        Ensure visit id is set in origin_visit_upsert before kafka write
    
    commit 1b7e7c795e837f70ca0c45ea58d61a1981f5387e
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Mon Mar 30 13:08:32 2020 +0200
    
        cassandra: Adapt internal implementations to use origin visit update
    
    commit 6959e96817a49eb2955d7be148edaeb878993d6b
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 17:38:58 2020 +0100
    
        pg-storage: Adapt internal implementations to use origin visit update
    
    commit 5b765bd879c3d783f44a5e7121586799da401886
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 17:28:56 2020 +0100
    
        Add missing type annotations on origin_visit_get* endpoints
    
    commit 8a66356234f3925e7b2ef5658e3f12dbebc2d6a8
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Thu Mar 26 14:15:17 2020 +0100
    
        pg-storage: Start using origin_visit_update
    
    commit 46d7ebfabda7b6169eb8f1219e8a02d215f6355d
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Wed Mar 25 17:53:48 2020 +0100
    
        in_memory: Adapt internal implementations to use origin visit update
        
        (pairing with @vlorentz)
        
        Related to [T2310](https://forge.softwareheritage.org/T2310 'view original for T2310 on Phabricator')
    
    commit f3824d3ccc1a178f7d09c2036ce91a4343cecc6c
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Mon Mar 30 20:30:44 2020 +0200
    
        cassandra/cql: Simplify type using Iterator
    
    commit 0961d931edc1c68171e48e9a96e56026ec48b8ea
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 14:40:36 2020 +0100
    
        storage*: Add types to origin_visit_get
    
    commit 5859dc794a1d8fa1a43049506ef9711d225a0b09
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Thu Mar 26 08:57:18 2020 +0100
    
        storage*: Align origin_visit_update interface and implementations
    
    commit dd1dd1c62b2bed4492130894265f8b2ed22d9daa
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 15:08:35 2020 +0100
    
        storage: Define a now() function

    Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/31/ See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/31/console

    • test_retry: Use datetime instead of string
  • Build has FAILED

    Patch application report for D2879 (id=10415)

    Rebasing onto 623a1b75...

    First, rewinding head to replay your work on top of it...
    Applying: storage: Define a now() function
    Applying: storage*: Align origin_visit_update interface and implementations
    Applying: storage*: Add types to origin_visit_get
    Applying: cassandra/cql: Simplify type using Iterator
    Applying: in_memory: Adapt internal implementations to use origin visit update
    Applying: pg-storage: Start using origin_visit_update
    Applying: Add missing type annotations on origin_visit_get* endpoints
    Applying: pg-storage: Adapt internal implementations to use origin visit update
    Using index info to reconstruct a base tree...
    M	swh/storage/tests/test_storage.py
    M	swh/storage/validate.py
    Falling back to patching base and 3-way merge...
    Auto-merging swh/storage/validate.py
    CONFLICT (content): Merge conflict in swh/storage/validate.py
    Auto-merging swh/storage/tests/test_storage.py
    Patch failed at 0008 pg-storage: Adapt internal implementations to use origin visit update
    
    Resolve all conflicts manually, mark them as resolved with
    "git add/rm <conflicted_files>", then run "git rebase --continue".
    You can instead skip this commit: run "git rebase --skip".
    To abort and get back to the state before "git rebase", run "git rebase --abort".
    

    Rebase failed (ret=1)!

    Could not rebase; Attempt merge onto 623a1b75...

    Already up to date.
    Changes applied before test
    commit 05ed7d143ad0794831ee2031d4a7cc7a1e89a89e
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Tue Mar 31 12:51:26 2020 +0200
    
        test_retry: Use datetime instead of string
    
    commit 72d2928e96ebcfd4e73640a70ef063fb5650dc9e
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Mon Mar 30 17:01:13 2020 +0200
    
        Ensure visit id is set in origin_visit_upsert before kafka write
    
    commit 1b7e7c795e837f70ca0c45ea58d61a1981f5387e
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Mon Mar 30 13:08:32 2020 +0200
    
        cassandra: Adapt internal implementations to use origin visit update
    
    commit 6959e96817a49eb2955d7be148edaeb878993d6b
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 17:38:58 2020 +0100
    
        pg-storage: Adapt internal implementations to use origin visit update
    
    commit 5b765bd879c3d783f44a5e7121586799da401886
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 17:28:56 2020 +0100
    
        Add missing type annotations on origin_visit_get* endpoints
    
    commit 8a66356234f3925e7b2ef5658e3f12dbebc2d6a8
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Thu Mar 26 14:15:17 2020 +0100
    
        pg-storage: Start using origin_visit_update
    
    commit 46d7ebfabda7b6169eb8f1219e8a02d215f6355d
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Wed Mar 25 17:53:48 2020 +0100
    
        in_memory: Adapt internal implementations to use origin visit update
        
        (pairing with @vlorentz)
        
        Related to [T2310](https://forge.softwareheritage.org/T2310 'view original for T2310 on Phabricator')
    
    commit f3824d3ccc1a178f7d09c2036ce91a4343cecc6c
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Mon Mar 30 20:30:44 2020 +0200
    
        cassandra/cql: Simplify type using Iterator
    
    commit 0961d931edc1c68171e48e9a96e56026ec48b8ea
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 14:40:36 2020 +0100
    
        storage*: Add types to origin_visit_get
    
    commit 5859dc794a1d8fa1a43049506ef9711d225a0b09
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Thu Mar 26 08:57:18 2020 +0100
    
        storage*: Align origin_visit_update interface and implementations
    
    commit dd1dd1c62b2bed4492130894265f8b2ed22d9daa
    Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
    Date:   Fri Mar 27 15:08:35 2020 +0100
    
        storage: Define a now() function

    Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/32/ See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/32/console

  • Rebase on latest master

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading