Skip to content
Snippets Groups Projects

Introduce new scheduling policy to grab origins without last update

Related to #2345

Test Plan

tox


Migrated from D5956 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build has FAILED

    Patch application report for D5956 (id=21386)

    Could not rebase; Attempt merge onto 1006f0ae...

    Updating 1006f0a..97d7828
    Fast-forward
     sql/updates/29.sql                         |  27 +++++++
     swh/scheduler/backend.py                   |  57 ++++++++++---
     swh/scheduler/interface.py                 |  21 +++++
     swh/scheduler/journal_client.py            |  98 ++++++++++++++++++++---
     swh/scheduler/model.py                     |   8 ++
     swh/scheduler/sql/30-schema.sql            |  21 ++++-
     swh/scheduler/tests/test_api_client.py     |   2 +
     swh/scheduler/tests/test_journal_client.py | 115 ++++++++++++++++++---------
     swh/scheduler/tests/test_scheduler.py      | 123 ++++++++++++++++++++++++++---
     9 files changed, 403 insertions(+), 69 deletions(-)
     create mode 100644 sql/updates/29.sql
    Changes applied before test
    commit 97d7828110f5df2160084090c831ade723524c0f
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Jul 1 12:18:49 2021 +0200
    
        Introduce new scheduling policy to grab origins without last update
        
        Related to #2345
    
    commit f3d182b0d38ca6c617805bf2011d2001a0c8bb6c
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Tue Jun 29 16:00:01 2021 +0200
    
        journal_client: Compute next position for origin visit
        
        For origin without any last_update information [1], the journal client is now also in
        charge of moving their next position in the queue for rescheduling. Depending on their
        status, the next position offset and next_visit_queue_position are updated after each
        visit completes:
        
        - if the visit has failed, increase the next visit target by the minimal visit
          interval (to take into account transient loading issues)
        - if the visit is successful, and records some changes, decrease the visit interval
          index by 2 (visit the origin *way* more often).
        - if the visit is successful, and records no changes, increase the visit interval index
          by 1 (visit the origin less often).
        
        We then set the next visit target to its current value + the new visit interval
        multiplied by a random fudge factor (picked in the -/+ 10% range).
        
        The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins
        e.g. when a number of origins from a single hoster are processed at once.
        
        Note that the computations happen for all origins for simplicity and code maintenance
        but it will only be used by a new soon-to-be scheduling policy.
        
        - [1] Lister cannot provide it for some reason.
    
    commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 18:07:59 2021 +0200
    
        Introduce storage for the recurrent visit scheduler queue position
    
    commit ec6e69f6415a007611c46f25e7c48e909a793d53
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 16:42:26 2021 +0200
    
        Start handling of recurrent loading tasks in scheduler
        
        This deals first and foremost with the next_position_offset update done by the scheduler
        journal client.
    
    commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Tue Jun 29 14:41:07 2021 +0200
    
        journal_client: Explicit docstring
    
    commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 16:39:40 2021 +0200
    
        journal_client: Only check last_* fields for some permutation tests
        
        In a future commit, we will add new fields whose values will be permutation dependent.

    Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/400/ See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/400/console

  • Rebase so tests are happy

  • Build is green

    Patch application report for D5956 (id=21403)

    Could not rebase; Attempt merge onto 1006f0ae...

    Updating 1006f0a..cebd184
    Fast-forward
     sql/updates/29.sql                         |  27 +++
     swh/scheduler/backend.py                   |  57 ++++-
     swh/scheduler/interface.py                 |  21 ++
     swh/scheduler/journal_client.py            |  98 ++++++++-
     swh/scheduler/model.py                     |   8 +
     swh/scheduler/sql/30-schema.sql            |  21 +-
     swh/scheduler/tests/test_api_client.py     |   2 +
     swh/scheduler/tests/test_journal_client.py | 336 ++++++++++++++++++-----------
     swh/scheduler/tests/test_scheduler.py      | 123 +++++++++--
     9 files changed, 533 insertions(+), 160 deletions(-)
     create mode 100644 sql/updates/29.sql
    Changes applied before test
    commit cebd1842e1be4738c280125ea0bbeb30bf40180a
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Jul 1 12:18:49 2021 +0200
    
        Introduce new scheduling policy to grab origins without last update
        
        Related to #2345
    
    commit faac6f895e0bd3565441c48c5ad3207ce141e8cb
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Tue Jun 29 16:00:01 2021 +0200
    
        journal_client: Compute next position for origin visit
        
        For origin without any last_update information [1], the journal client is now also in
        charge of moving their next position in the queue for rescheduling. Depending on their
        status, the next position offset and next_visit_queue_position are updated after each
        visit completes:
        
        - if the visit has failed, increase the next visit target by the minimal visit
          interval (to take into account transient loading issues)
        - if the visit is successful, and records some changes, decrease the visit interval
          index by 2 (visit the origin *way* more often).
        - if the visit is successful, and records no changes, increase the visit interval index
          by 1 (visit the origin less often).
        
        We then set the next visit target to its current value + the new visit interval
        multiplied by a random fudge factor (picked in the -/+ 10% range).
        
        The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins
        e.g. when a number of origins from a single hoster are processed at once.
        
        Note that the computations happen for all origins for simplicity and code maintenance
        but it will only be used by a new soon-to-be scheduling policy.
        
        - [1] Lister cannot provide it for some reason.
    
    commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 18:07:59 2021 +0200
    
        Introduce storage for the recurrent visit scheduler queue position
    
    commit ec6e69f6415a007611c46f25e7c48e909a793d53
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 16:42:26 2021 +0200
    
        Start handling of recurrent loading tasks in scheduler
        
        This deals first and foremost with the next_position_offset update done by the scheduler
        journal client.
    
    commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Tue Jun 29 14:41:07 2021 +0200
    
        journal_client: Explicit docstring
    
    commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 16:39:40 2021 +0200
    
        journal_client: Only check last_* fields for some permutation tests
        
        In a future commit, we will add new fields whose values will be permutation dependent.

    See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/402/ for more details.

  • Clarify test a bit

  • Build is green

    Patch application report for D5956 (id=21407)

    Could not rebase; Attempt merge onto 1006f0ae...

    Updating 1006f0a..cc41f0c
    Fast-forward
     sql/updates/29.sql                         |  27 +++
     swh/scheduler/backend.py                   |  59 ++++-
     swh/scheduler/interface.py                 |  21 ++
     swh/scheduler/journal_client.py            |  98 ++++++++-
     swh/scheduler/model.py                     |   8 +
     swh/scheduler/sql/30-schema.sql            |  21 +-
     swh/scheduler/tests/test_api_client.py     |   2 +
     swh/scheduler/tests/test_journal_client.py | 336 ++++++++++++++++++-----------
     swh/scheduler/tests/test_scheduler.py      | 140 ++++++++++--
     9 files changed, 552 insertions(+), 160 deletions(-)
     create mode 100644 sql/updates/29.sql
    Changes applied before test
    commit cc41f0cd579011034e52c245738929fb77fe4a01
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Jul 1 12:18:49 2021 +0200
    
        Introduce new scheduling policy to grab origins without last update
        
        Related to #2345
    
    commit faac6f895e0bd3565441c48c5ad3207ce141e8cb
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Tue Jun 29 16:00:01 2021 +0200
    
        journal_client: Compute next position for origin visit
        
        For origin without any last_update information [1], the journal client is now also in
        charge of moving their next position in the queue for rescheduling. Depending on their
        status, the next position offset and next_visit_queue_position are updated after each
        visit completes:
        
        - if the visit has failed, increase the next visit target by the minimal visit
          interval (to take into account transient loading issues)
        - if the visit is successful, and records some changes, decrease the visit interval
          index by 2 (visit the origin *way* more often).
        - if the visit is successful, and records no changes, increase the visit interval index
          by 1 (visit the origin less often).
        
        We then set the next visit target to its current value + the new visit interval
        multiplied by a random fudge factor (picked in the -/+ 10% range).
        
        The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins
        e.g. when a number of origins from a single hoster are processed at once.
        
        Note that the computations happen for all origins for simplicity and code maintenance
        but it will only be used by a new soon-to-be scheduling policy.
        
        - [1] Lister cannot provide it for some reason.
    
    commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 18:07:59 2021 +0200
    
        Introduce storage for the recurrent visit scheduler queue position
    
    commit ec6e69f6415a007611c46f25e7c48e909a793d53
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 16:42:26 2021 +0200
    
        Start handling of recurrent loading tasks in scheduler
        
        This deals first and foremost with the next_position_offset update done by the scheduler
        journal client.
    
    commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Tue Jun 29 14:41:07 2021 +0200
    
        journal_client: Explicit docstring
    
    commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 16:39:40 2021 +0200
    
        journal_client: Only check last_* fields for some permutation tests
        
        In a future commit, we will add new fields whose values will be permutation dependent.

    See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/403/ for more details.

  • Adapt according to review

  • Build is green

    Patch application report for D5956 (id=21518)

    Could not rebase; Attempt merge onto 1006f0ae...

    Updating 1006f0a..e3fc744
    Fast-forward
     sql/updates/29.sql                         |  27 ++
     swh/scheduler/backend.py                   |  57 +++-
     swh/scheduler/interface.py                 |  19 ++
     swh/scheduler/journal_client.py            | 136 +++++++++-
     swh/scheduler/model.py                     |   8 +
     swh/scheduler/sql/30-schema.sql            |  21 +-
     swh/scheduler/tests/test_api_client.py     |   2 +
     swh/scheduler/tests/test_journal_client.py | 411 ++++++++++++++++++++---------
     swh/scheduler/tests/test_scheduler.py      | 173 +++++++++++-
     9 files changed, 695 insertions(+), 159 deletions(-)
     create mode 100644 sql/updates/29.sql
    Changes applied before test
    commit e3fc744f5224c29863b621bcc44a26877b97cc99
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Jul 1 12:18:49 2021 +0200
    
        Introduce new scheduling policy to grab origins without last update
        
        This is in charge of scheduling origins without last update. This also updates the
        global queue position so the journal client can initialize correctly the next position
        per origin and visit type.
        
        Related to #2345
    
    commit 8c4ae9f14d6abdca41a4f01b438310501ecb6259
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Tue Jun 29 16:00:01 2021 +0200
    
        journal_client: Compute next position for origin visit
        
        For origin without any last_update information [1], the journal client is now also in
        charge of moving their next position in the queue for rescheduling. Depending on their
        status, the next position offset and next_visit_queue_position are updated after each
        visit completes:
        
        - if the visit has failed, increase the next visit target by the minimal visit
          interval (to take into account transient loading issues)
        - if the visit is successful, and records some changes, decrease the visit interval
          index by 2 (visit the origin *way* more often).
        - if the visit is successful, and records no changes, increase the visit interval index
          by 1 (visit the origin less often).
        
        We then set the next visit target to its current value + the new visit interval
        multiplied by a random fudge factor (picked in the -/+ 10% range).
        
        The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins
        e.g. when a number of origins from a single hoster are processed at once.
        
        Note that the computations happen for all origins for simplicity and code maintenance
        but it will only be used by a new soon-to-be scheduling policy.
        
        - [1] Lister cannot provide it for some reason.
    
    commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 18:07:59 2021 +0200
    
        Introduce storage for the recurrent visit scheduler queue position
    
    commit ec6e69f6415a007611c46f25e7c48e909a793d53
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 16:42:26 2021 +0200
    
        Start handling of recurrent loading tasks in scheduler
        
        This deals first and foremost with the next_position_offset update done by the scheduler
        journal client.
    
    commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Tue Jun 29 14:41:07 2021 +0200
    
        journal_client: Explicit docstring
    
    commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 16:39:40 2021 +0200
    
        journal_client: Only check last_* fields for some permutation tests
        
        In a future commit, we will add new fields whose values will be permutation dependent.

    See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/407/ for more details.

  • Revert unneeded modifs

  • Build is green

    Patch application report for D5956 (id=21519)

    Could not rebase; Attempt merge onto 1006f0ae...

    Updating 1006f0a..474a337
    Fast-forward
     sql/updates/29.sql                         |  27 ++
     swh/scheduler/backend.py                   |  43 ++-
     swh/scheduler/interface.py                 |  19 ++
     swh/scheduler/journal_client.py            | 136 +++++++++-
     swh/scheduler/model.py                     |   8 +
     swh/scheduler/sql/30-schema.sql            |  21 +-
     swh/scheduler/tests/test_api_client.py     |   2 +
     swh/scheduler/tests/test_journal_client.py | 411 ++++++++++++++++++++---------
     swh/scheduler/tests/test_scheduler.py      | 173 +++++++++++-
     9 files changed, 687 insertions(+), 153 deletions(-)
     create mode 100644 sql/updates/29.sql
    Changes applied before test
    commit 474a3379d53f876241c265fb6619c7dd3910199d
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Jul 1 12:18:49 2021 +0200
    
        Introduce new scheduling policy to grab origins without last update
        
        This is in charge of scheduling origins without last update. This also updates the
        global queue position so the journal client can initialize correctly the next position
        per origin and visit type.
        
        Related to #2345
    
    commit 8c4ae9f14d6abdca41a4f01b438310501ecb6259
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Tue Jun 29 16:00:01 2021 +0200
    
        journal_client: Compute next position for origin visit
        
        For origin without any last_update information [1], the journal client is now also in
        charge of moving their next position in the queue for rescheduling. Depending on their
        status, the next position offset and next_visit_queue_position are updated after each
        visit completes:
        
        - if the visit has failed, increase the next visit target by the minimal visit
          interval (to take into account transient loading issues)
        - if the visit is successful, and records some changes, decrease the visit interval
          index by 2 (visit the origin *way* more often).
        - if the visit is successful, and records no changes, increase the visit interval index
          by 1 (visit the origin less often).
        
        We then set the next visit target to its current value + the new visit interval
        multiplied by a random fudge factor (picked in the -/+ 10% range).
        
        The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins
        e.g. when a number of origins from a single hoster are processed at once.
        
        Note that the computations happen for all origins for simplicity and code maintenance
        but it will only be used by a new soon-to-be scheduling policy.
        
        - [1] Lister cannot provide it for some reason.
    
    commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 18:07:59 2021 +0200
    
        Introduce storage for the recurrent visit scheduler queue position
    
    commit ec6e69f6415a007611c46f25e7c48e909a793d53
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 16:42:26 2021 +0200
    
        Start handling of recurrent loading tasks in scheduler
        
        This deals first and foremost with the next_position_offset update done by the scheduler
        journal client.
    
    commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Tue Jun 29 14:41:07 2021 +0200
    
        journal_client: Explicit docstring
    
    commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 16:39:40 2021 +0200
    
        journal_client: Only check last_* fields for some permutation tests
        
        In a future commit, we will add new fields whose values will be permutation dependent.

    See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/408/ for more details.

  • Revert one last unneeded change

  • Build is green

    Patch application report for D5956 (id=21521)

    Could not rebase; Attempt merge onto 1006f0ae...

    Updating 1006f0a..b02db7c
    Fast-forward
     sql/updates/29.sql                         |  27 ++
     swh/scheduler/backend.py                   |  41 ++-
     swh/scheduler/interface.py                 |  19 ++
     swh/scheduler/journal_client.py            | 136 +++++++++-
     swh/scheduler/model.py                     |   8 +
     swh/scheduler/sql/30-schema.sql            |  21 +-
     swh/scheduler/tests/test_api_client.py     |   2 +
     swh/scheduler/tests/test_journal_client.py | 411 ++++++++++++++++++++---------
     swh/scheduler/tests/test_scheduler.py      | 173 +++++++++++-
     9 files changed, 686 insertions(+), 152 deletions(-)
     create mode 100644 sql/updates/29.sql
    Changes applied before test
    commit b02db7ce6222feeb5db7a7aff83a11c3a3697bd3
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Jul 1 12:18:49 2021 +0200
    
        Introduce new scheduling policy to grab origins without last update
        
        This is in charge of scheduling origins without last update. This also updates the
        global queue position so the journal client can initialize correctly the next position
        per origin and visit type.
        
        Related to #2345
    
    commit 8c4ae9f14d6abdca41a4f01b438310501ecb6259
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Tue Jun 29 16:00:01 2021 +0200
    
        journal_client: Compute next position for origin visit
        
        For origin without any last_update information [1], the journal client is now also in
        charge of moving their next position in the queue for rescheduling. Depending on their
        status, the next position offset and next_visit_queue_position are updated after each
        visit completes:
        
        - if the visit has failed, increase the next visit target by the minimal visit
          interval (to take into account transient loading issues)
        - if the visit is successful, and records some changes, decrease the visit interval
          index by 2 (visit the origin *way* more often).
        - if the visit is successful, and records no changes, increase the visit interval index
          by 1 (visit the origin less often).
        
        We then set the next visit target to its current value + the new visit interval
        multiplied by a random fudge factor (picked in the -/+ 10% range).
        
        The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins
        e.g. when a number of origins from a single hoster are processed at once.
        
        Note that the computations happen for all origins for simplicity and code maintenance
        but it will only be used by a new soon-to-be scheduling policy.
        
        - [1] Lister cannot provide it for some reason.
    
    commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 18:07:59 2021 +0200
    
        Introduce storage for the recurrent visit scheduler queue position
    
    commit ec6e69f6415a007611c46f25e7c48e909a793d53
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 16:42:26 2021 +0200
    
        Start handling of recurrent loading tasks in scheduler
        
        This deals first and foremost with the next_position_offset update done by the scheduler
        journal client.
    
    commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Tue Jun 29 14:41:07 2021 +0200
    
        journal_client: Explicit docstring
    
    commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 16:39:40 2021 +0200
    
        journal_client: Only check last_* fields for some permutation tests
        
        In a future commit, we will add new fields whose values will be permutation dependent.

    See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/409/ for more details.

  • Author Maintainer

    make the handling of CTEs more modular

  • Build is green

    Patch application report for D5956 (id=21740)

    Could not rebase; Attempt merge onto 1006f0ae...

    Updating 1006f0a..d58776a
    Fast-forward
     sql/updates/29.sql                         |  27 ++
     swh/scheduler/backend.py                   | 116 +++++---
     swh/scheduler/interface.py                 |  19 ++
     swh/scheduler/journal_client.py            | 136 +++++++++-
     swh/scheduler/model.py                     |   8 +
     swh/scheduler/sql/30-schema.sql            |  21 +-
     swh/scheduler/tests/test_api_client.py     |   2 +
     swh/scheduler/tests/test_journal_client.py | 411 ++++++++++++++++++++---------
     swh/scheduler/tests/test_scheduler.py      | 173 +++++++++++-
     9 files changed, 730 insertions(+), 183 deletions(-)
     create mode 100644 sql/updates/29.sql
    Changes applied before test
    commit d58776ab0b41ccaf93cc64c86688712db5b44c07
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Jul 22 12:22:24 2021 +0200
    
        Introduce new scheduling policy to grab origins without last update
        
        This is in charge of scheduling origins without last update. This also updates the
        global queue position so the journal client can initialize correctly the next position
        per origin and visit type.
        
        Related to #2345
    
    commit 825e8cfe7d245d025c70384439d0f739b878eadd
    Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
    Date:   Thu Jul 22 12:19:42 2021 +0200
    
        grab_next_visits: make the handling of CTEs more modular
        
        This allows us to insert extra CTEs if a scheduling policy needs it.
    
    commit 8c4ae9f14d6abdca41a4f01b438310501ecb6259
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Tue Jun 29 16:00:01 2021 +0200
    
        journal_client: Compute next position for origin visit
        
        For origin without any last_update information [1], the journal client is now also in
        charge of moving their next position in the queue for rescheduling. Depending on their
        status, the next position offset and next_visit_queue_position are updated after each
        visit completes:
        
        - if the visit has failed, increase the next visit target by the minimal visit
          interval (to take into account transient loading issues)
        - if the visit is successful, and records some changes, decrease the visit interval
          index by 2 (visit the origin *way* more often).
        - if the visit is successful, and records no changes, increase the visit interval index
          by 1 (visit the origin less often).
        
        We then set the next visit target to its current value + the new visit interval
        multiplied by a random fudge factor (picked in the -/+ 10% range).
        
        The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins
        e.g. when a number of origins from a single hoster are processed at once.
        
        Note that the computations happen for all origins for simplicity and code maintenance
        but it will only be used by a new soon-to-be scheduling policy.
        
        - [1] Lister cannot provide it for some reason.
    
    commit cb1edf1ab24d1c8db5821578a7fb2633fab50ff4
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 18:07:59 2021 +0200
    
        Introduce storage for the recurrent visit scheduler queue position
    
    commit ec6e69f6415a007611c46f25e7c48e909a793d53
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 16:42:26 2021 +0200
    
        Start handling of recurrent loading tasks in scheduler
        
        This deals first and foremost with the next_position_offset update done by the scheduler
        journal client.
    
    commit c486b28ece7c0b127fea10bbb4d7f5d1ad5c50ba
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Tue Jun 29 14:41:07 2021 +0200
    
        journal_client: Explicit docstring
    
    commit 98f99b9fd457820dc2d4b5dab7e89cb8261a34a4
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 23 16:39:40 2021 +0200
    
        journal_client: Only check last_* fields for some permutation tests
        
        In a future commit, we will add new fields whose values will be permutation dependent.

    See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/417/ for more details.

  • Antoine R. Dumont mentioned in merge request !328 (closed)

    mentioned in merge request !328 (closed)

  • Antoine R. Dumont mentioned in issue #2345

    mentioned in issue #2345

  • Antoine R. Dumont mentioned in merge request !190 (closed)

    mentioned in merge request !190 (closed)

  • Even better ;)

  • Merge request was accepted

  • Antoine R. Dumont approved this merge request

    approved this merge request

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading