Commits · debian/upstream/0.19.0 · Platform / Development / swh-scheduler

Oct 28, 2021

New upstream version 0.19.0 · ae3c3c6f
Jenkins for Software Heritage authored 3 years ago

debian/upstream/0.19.0

ae3c3c6f

Add a new cli endpoint to schedule recurrent visits in Celery · 50d7fd7f

Nicolas Dandrimont authored 3 years ago and


For each known visit type, we run a loop which:
 - monitors the size of the relevant celery queue
 - schedules more visits of the relevant type once the number of
 available slots goes over a given threshold (currently set to 5% of the
 max queue size).

The scheduling of visits combines multiple scheduling policies, for now
using static ratios set in the `POLICY_RATIOS` dict. We emit a warning
if the ratio of origins fetched for each policy is skewed with respect
to the original request (allowing, for now, manual adjustement of the
ratios).

The CLI endpoint spawns one thread for each visit type, which all handle
connections to RabbitMQ and the scheduler backend separately. For now,
we handle exceptions in the visit scheduling threads by (stupidly)
respawning the relevant thread directly. We should probably improve this
to give up after a specific number of tries.

Co-authored-by: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>

50d7fd7f

Oct 27, 2021

grab_next_visits: avoid time interval calculations in PostgreSQL · 0c7ef27b

Nicolas Dandrimont authored 3 years ago

When the database is in a non-UTC timezone with DST, and a `timestamptz
- interval` calculation crosses a DST change, the result of the
calculation can be one hour off from the expected value:

PostgreSQL will vary the timestamp by the amount of days in the
interval, and will keep the same (local) time, which will be offset by
an hour because of the DST change.

Doing the datetime +- timedelta calculations in Python instead of
PostgreSQL avoids this caveat altogether.

0c7ef27b

Oct 22, 2021
- Restrict the click version to avoid conflict version with celery's · ecc0e280
  Antoine R. Dumont authored 3 years ago
```
Otherwise, in some edge case, like run in docker, the install fails on conflict.

Related to P1205#8092
```
  ecc0e280
Oct 20, 2021
- Add docstring to runner and listener modules · 243a69fc
  Antoine R. Dumont authored 3 years ago
```
Related to T3667
```
  243a69fc
- Drop deprecated listener module · 5b531964
  Antoine R. Dumont authored 3 years ago
```
It's been deprecated for enough time.

Related to T3667
```
  5b531964
- scheduler: Deprecate unused main celery runner · f15c5109
  Antoine R. Dumont authored 3 years ago
  
  f15c5109
Oct 18, 2021
- New upstream version 0.18.2 · 4241b214
  Jenkins for Software Heritage authored 3 years ago
  
  debian/upstream/0.18.2
  
  4241b214
- Use swh_storage fixture for cli tests · 3aed6887
  Antoine R. Dumont authored 3 years ago
```
This actually fixes the debian build failure.

Related to T3666
```
  v0.18.2
  
  3aed6887
Oct 15, 2021

New upstream version 0.18.1 · 22cdd27f
Jenkins for Software Heritage authored 3 years ago

debian/upstream/0.18.1

22cdd27f

Return 0 slot if no more slots available in the queues · 3aed7bf1

Antoine R. Dumont authored 3 years ago

This scenario happens with the loader oneshot for example. This loader deals with more
than 1 type of origins to ingest in the same queue. So the computation of that function
returned negative value [1]. Which is ultimately not possible to execute in sql [1].

This commits fixes that behavior. This also explicits that the function must return
positive values in its docstring.

[1]
```
...
psycopg2.errors.InvalidRowCountInLimitClause: LIMIT must not be negative
```

3aed7bf1

Sep 02, 2021
- New upstream version 0.18.0 · 66bf492b
  Jenkins for Software Heritage authored 3 years ago
  
  debian/upstream/0.18.0
  
  66bf492b
- runner: Improve help message on the task types flag. · ecc14007
  Antoine R. Dumont authored 3 years ago
  
  v0.18.0
  
  ecc14007
Aug 27, 2021

send-to-celery: Add more options to allow scheduling of edge cases · 63fdda00

Antoine R. Dumont authored 3 years ago

In the non optimal case, we may want to trigger specific case (not-yet enabled origins,
origin from specific lister...).

Related to T3350

63fdda00

Aug 26, 2021

Refine scheduling policy for origins with no known last update · 7cc37fa2

Nicolas Dandrimont authored 3 years ago

For origins that have never been visited, and for which we don't have a
queue position yet, we want to visit them in the order they've been
added.

7cc37fa2

Add a swh scheduler origin send-to-celery subcommand · 2efad289
Nicolas Dandrimont authored 3 years ago
```
The subcommand bypasses the legacy task-based mechanism to directly send
new origin visits to celery
```
2efad289

Add table sampling option to grab_next_visits · 5e8007fd

Nicolas Dandrimont authored 3 years ago

Running common operations on all git origins is pretty intense. Using
table sampling gives us the opportunity to at least schedule some jobs
in (decently small) time.

5e8007fd

journal_client: Only upsert if we have something to upsert · cc76a573
Antoine R. Dumont authored 3 years ago

cc76a573
New upstream version 0.17.1 · d04dbb30
Jenkins for Software Heritage authored 3 years ago

debian/upstream/0.17.1

d04dbb30

journal_client: Ensure queue position does not overflow · 506f78c8

Antoine R. Dumont authored 3 years ago

Queue positions are date and the current next_position_offset used to compute the new
queue position was not bounded. This has the side-effect of making overflow error.

This commit adapts the journal client computations to limit such next_position_offset to
10. This value was chosen because above that exponent the dates overflow (and we are way
in the future already).

Related to T3502

506f78c8

Aug 18, 2021
- Replace index-fossology-license-for-range with index-fossology-license-for-partition · 28ae1d86
  vlorentz authored 3 years ago
```
We changed the task name/interface a while ago
```
  28ae1d86
Aug 06, 2021
- New upstream version 0.17.0 · 3c610592
  Jenkins for Software Heritage authored 3 years ago
  
  debian/upstream/0.17.0
  
  3c610592
Aug 03, 2021

journal_client: Disable origins when too many visited attempts failed · 8281e351

Antoine R. Dumont authored 3 years ago

This disable origins for either failed or not found attempts 3 times in a row. It's not
definitive though as it's the lister's responsibility to activate back origins if they
get listed again.

Related to T2345

8281e351

Add a successive_visits counter to origin visit stats · 1bcf84d5

David Douard authored 3 years ago and

Antoine R. Dumont committed 3 years ago

This maintains the number of successive visits resulting in the same status. This will
help implementing disabling of too many successive failed or not_found visits for a
given origin.

Related to T2345

1bcf84d5

Jul 30, 2021
- journal_client: Update get_last_status docstring · 4fa29fe1
  Antoine R. Dumont authored 3 years ago
```
Related to T2345
```
  4fa29fe1
- journal_client: Refactor by inlining the update_position_offset · 3b929d0b
  Antoine R. Dumont authored 3 years ago
```
This is no longer required as it's called once.

Related to T2345
```
  3b929d0b
Jul 23, 2021

Only record last_visited and last_successful in origin_visit_stats · 87e66faa

Nicolas Dandrimont authored 3 years ago

After using this schema for a while, all queries can be implemented in
terms of these two timestamps, instead of the four original
last_eventful, last_uneventful, last_failed and last_notfound
timestamps.

This ends up simplifying the logic within the journal client, as well as
that of the grab_next_visits query builder.

To make this change work, we also stop considering out of order messages
altogether in journal_client. This welcome simplification is an accuracy
tradeoff that is explained in the updated documentation of the journal
client:

.. [1] Ignoring out of order messages makes the initialization of the
      origin_visit_status table (from a full journal) less deterministic: only the
      `last_visit`, `last_visit_state` and `last_successful` fields are guaranteed
      to be exact, the `next_position_offset` field is a best effort estimate
      (which should converge once the client has run for a while on in-order
      messages).

87e66faa

test_journal_client: Unify test assertion like the rest · 3ca0d659
Antoine R. Dumont authored 3 years ago
```
Related to D5917
```
3ca0d659
test: Refactor assert_visit_stats_ok to ignore_fields · 8cf2238e
Antoine R. Dumont authored 3 years ago
```
This simplifies and unifies properly the utility test function to compare visit stats.
```
8cf2238e

Jul 22, 2021

Introduce new scheduling policy to grab origins without last update · d58776ab

Antoine R. Dumont authored 3 years ago and

Nicolas Dandrimont committed 3 years ago

This is in charge of scheduling origins without last update. This also updates the
global queue position so the journal client can initialize correctly the next position
per origin and visit type.

Related to T2345

d58776ab

grab_next_visits: make the handling of CTEs more modular · 825e8cfe
Nicolas Dandrimont authored 3 years ago
```
This allows us to insert extra CTEs if a scheduling policy needs it.
```
825e8cfe

Jul 06, 2021

journal_client: Compute next position for origin visit · 8c4ae9f1

Antoine R. Dumont authored 3 years ago

For origin without any last_update information [1], the journal client is now also in
charge of moving their next position in the queue for rescheduling. Depending on their
status, the next position offset and next_visit_queue_position are updated after each
visit completes:

- if the visit has failed, increase the next visit target by the minimal visit
  interval (to take into account transient loading issues)
- if the visit is successful, and records some changes, decrease the visit interval
  index by 2 (visit the origin *way* more often).
- if the visit is successful, and records no changes, increase the visit interval index
  by 1 (visit the origin less often).

We then set the next visit target to its current value + the new visit interval
multiplied by a random fudge factor (picked in the -/+ 10% range).

The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins
e.g. when a number of origins from a single hoster are processed at once.

Note that the computations happen for all origins for simplicity and code maintenance
but it will only be used by a new soon-to-be scheduling policy.

[1] Lister cannot provide it for some reason.

8c4ae9f1

Jul 01, 2021
- Introduce storage for the recurrent visit scheduler queue position · cb1edf1a
  Antoine R. Dumont authored 3 years ago
  
  cb1edf1a
- Start handling of recurrent loading tasks in scheduler · ec6e69f6
  Antoine R. Dumont authored 3 years ago
```
This deals first and foremost with the next_position_offset update done by the scheduler
journal client.
```
  ec6e69f6
Jun 29, 2021
- journal_client: Explicit docstring · c486b28e
  Antoine R. Dumont authored 3 years ago
  
  c486b28e
Jun 23, 2021
- journal_client: Only check last_* fields for some permutation tests · 98f99b9f
  Antoine R. Dumont authored 3 years ago
```
In a future commit, we will add new fields whose values will be permutation dependent.
```
  98f99b9f
- journal_client: Auto-generate the empty object from model fields · 1006f0ae
  Antoine R. Dumont authored 3 years ago
```
This will help us when adding new fields to the table.
```
  1006f0ae
- backend: Auto-generate origin visit stats upsert query · 6400cc2b
  Antoine R. Dumont authored 3 years ago
```
This will help us when adding new fields to the table.
```
  6400cc2b
- cli/task: Ensure cli output is always in the same order · 3762c340
  Antoine R. Dumont authored 3 years ago
  
  3762c340
- Add a specific cooldown for notfound origins · ed818702
  Nicolas Dandrimont authored 3 years ago
```
This allows us to avoid repeating visits on them, until a next pass of
the lister can mark them as disabled.
```
  ed818702