Commits · debian/0.13.0-1_swh1_bpo10+1 · Platform / Development / swh-scheduler

Apr 20, 2021
- Updated backport on buster-swh from debian/0.13.0-1_swh1 (unstable-swh) · 14597f21
  Jenkins for Software Heritage authored 3 years ago
  
  debian/0.13.0-1_swh1_bpo10+1
  
  14597f21
- Merge tag 'debian/0.13.0-1_swh1' into debian/buster-swh · 2b2b7b72
  Jenkins for Software Heritage authored 3 years ago
  
  2b2b7b72
- Updated debian changelog for version 0.13.0 · 828d288a
  Jenkins for Software Heritage authored 3 years ago
  
  debian/0.13.0-1_swh1
  
  828d288a
- Update upstream source from tag 'debian/upstream/0.13.0' · f2122c25
  Jenkins for Software Heritage authored 3 years ago
```
Update to upstream version '0.13.0'
with Debian dir 753ba2cc0646d7be73acfff0f55b36cc7fe12e65
```
  f2122c25
- New upstream version 0.13.0 · 15684cb0
  Jenkins for Software Heritage authored 3 years ago
  
  debian/upstream/0.13.0
  
  15684cb0
- scheduler: Clean up priority/ratio task dead code · befccb94
  Antoine R. Dumont authored 3 years ago
```
Since [1], tasks with priority are routed to dedicated queues (see tasks for more
details). The tasks with priority to be scheduled have their own dedicated endpoints to
be called.

[1] Related to T3084
Related to T3271
```
  v0.13.0
  
  befccb94
- Parse task_ids before calling set_status_tasks. · 4e06bcd7
  vlorentz authored 3 years ago
```
So errors on the CLI side do not trigger an exception on the server
```
  4e06bcd7
Apr 15, 2021
- tests: Complete checks on message with priority consumption · 974c0c2e
  Antoine R. Dumont authored 3 years ago
```
Related to T3084
```
  974c0c2e
- Updated backport on buster-swh from debian/0.12.0-1_swh1 (unstable-swh) · 8a3a8f94
  Jenkins for Software Heritage authored 3 years ago
  
  debian/0.12.0-1_swh1_bpo10+1
  
  8a3a8f94
- Merge tag 'debian/0.12.0-1_swh1' into debian/buster-swh · e4b9ac5a
  Jenkins for Software Heritage authored 3 years ago
  
  e4b9ac5a
- Updated debian changelog for version 0.12.0 · 38ef60a7
  Jenkins for Software Heritage authored 3 years ago
  
  debian/0.12.0-1_swh1
  
  38ef60a7
- Update upstream source from tag 'debian/upstream/0.12.0' · 9db86c1d
  Jenkins for Software Heritage authored 3 years ago
```
Update to upstream version '0.12.0'
with Debian dir 788879e9b7caf6d54d8f4645d720dc7b7859e069
```
  9db86c1d
- New upstream version 0.12.0 · a8b8fde2
  Jenkins for Software Heritage authored 3 years ago
  
  debian/upstream/0.12.0
  
  a8b8fde2
- Route priority tasks to dedicated save code now queues · 17052c4c
  Antoine R. Dumont authored 3 years ago
```
This splits the calls to read tasks into 2 calls, one for tasks with no
priority (standard), another call for tasks with priority. If any tasks with priority
are detected, they are routed to dedicated `save_code_now:` prefixed named queues (per
task type).

Related to T3084
```
  v0.12.0
  
  17052c4c
- Fix various Sphinx warnings · bfc1a87b
  vlorentz authored 3 years ago
  
  bfc1a87b
Apr 14, 2021
- Updated backport on buster-swh from debian/0.11.0-1_swh1 (unstable-swh) · 641b53f3
  Jenkins for Software Heritage authored 3 years ago
  
  debian/0.11.0-1_swh1_bpo10+1
  
  641b53f3
- Merge tag 'debian/0.11.0-1_swh1' into debian/buster-swh · d5b60329
  Jenkins for Software Heritage authored 3 years ago
  
  d5b60329
- Updated debian changelog for version 0.11.0 · 3bf2f9ab
  Jenkins for Software Heritage authored 3 years ago
  
  debian/0.11.0-1_swh1
  
  3bf2f9ab
- New upstream version 0.11.0 · 77ea5902
  Jenkins for Software Heritage authored 3 years ago
  
  debian/upstream/0.11.0
  
  77ea5902
- Update upstream source from tag 'debian/upstream/0.11.0' · c32aff0c
  Jenkins for Software Heritage authored 3 years ago
```
Update to upstream version '0.11.0'
with Debian dir aac6bb6258a43d962787d39b4006589985c90bcb
```
  c32aff0c
Apr 13, 2021

backend: Open endpoints to peek/grab tasks with any priority · 3e2ae3d4

The priority notion becomes a blur. Any tasks with a non null priority is considered for
reading or grabbing.

In a future commit, this should allow to make the runner evolve to reroute tasks with
priority to other queues.

Related to T3084

3e2ae3d4

Feb 11, 2021

Make origin_visit_stats_get return results from all pages · ecab745a

Nicolas Dandrimont authored 3 years ago

psycopg2.extras.execute_values executes queries in batches of 100 by
default. At the end of execute_values, only the last batch of results is
available in the cursor; To fetch all results, one needs to set
fetch=True instead of using the cursor.

ecab745a

journal client: Filter out status messages without type · 86ada443
Nicolas Dandrimont authored 3 years ago
```
This allows us to support reading the journal from the beginning,
ignoring messages with the old schema.
```
86ada443

Simplify max_date() · cdb1775f

Nicolas Dandrimont authored 3 years ago

The built-in `max` function can take an iterable directly, no need to
reimplement it.

cdb1775f

Feb 09, 2021

journal_client: Fix date computations for (un)eventful visits · cf32e376

Vincent Sellier authored 3 years ago

Fix a wrong computation when several messages (>=3) for the same
snapshot are received in the wrong order
For example, before the fix, the following occurs:
```
| date | snapshot |     | last_ev  | last_unev | Snap |
| ---- | -------- | --- | -------- | --------- | ---- |
| 2022 | S2       |     | 2022     |           | S2   |
| 2020 | S2       |     | 2020     | 2022      | S2   |
| 2021 | S2       |     | **2021** | **2020**  | S2   |
```

as it should be:
```
| date | snapshot |     | last_ev  | last_unev | Snap |
| ---- | -------- | --- | -------- | --------- | ---- |
| 2022 | S2       |     | 2022     |           | S2   |
| 2020 | S2       |     | 2020     | 2022      | S2   |
| 2021 | S2       |     | **2020** | **2022**  | S2   |
```

Related to T3000

cf32e376

Feb 05, 2021
- journal_client: Deal with failed status message · aa507ac5
  Antoine R. Dumont authored 3 years ago
```
As loader will start to create failed status message, deal with them if any.

Related to T3030
```
  aa507ac5
Feb 03, 2021
- Updated backport on buster-swh from debian/0.10.0-1_swh1 (unstable-swh) · b4249ca6
  Jenkins for Software Heritage authored 3 years ago
  
  debian/0.10.0-1_swh1_bpo10+1
  
  b4249ca6
- Merge tag 'debian/0.10.0-1_swh1' into debian/buster-swh · f5488227
  Jenkins for Software Heritage authored 3 years ago
  
  f5488227
- Updated debian changelog for version 0.10.0 · a78d5b2f
  Jenkins for Software Heritage authored 3 years ago
  
  debian/0.10.0-1_swh1
  
  a78d5b2f
- New upstream version 0.10.0 · 2cf46e3b
  Jenkins for Software Heritage authored 3 years ago
  
  debian/upstream/0.10.0
  
  2cf46e3b
- Update upstream source from tag 'debian/upstream/0.10.0' · 97ca32ac
  Jenkins for Software Heritage authored 3 years ago
```
Update to upstream version '0.10.0'
with Debian dir 17cb8a0e3def3b15efe5ce2e2ae36c621314e1f3
```
  97ca32ac
- celery: acknowledge tasks as soon as they're received · 14feab95
  Nicolas Dandrimont authored 3 years ago
```
With late acknowledgements, RabbitMQ will re-send tasks to clients even
if they can't ever complete the task (e.g. when the task gets killed
because the machine is out of memory).

This problem only increases over time, leading to complete starvation of
the ingestion system.

Now that we have multiple mechanisms to issue retries of tasks, we can
use early acknowledgements for tasks instead, which should mitigate the
ongoing starvation, at the expense of having to retry tasks externally.
```
  v0.10.0
  
  14feab95
Feb 01, 2021
- Simulator: allow to export results in a csv file · aaffff26
  David Douard authored 3 years ago
  
  aaffff26
- Add minimal tests for the SimulationReport.format() method · 9fce3f6f
  David Douard authored 3 years ago
  
  9fce3f6f
Jan 29, 2021
- Make plottings optional in simulator cli output · aaf7dd6f
  David Douard authored 3 years ago
  
  aaf7dd6f
Jan 26, 2021
- simulator: stop validating the scheduling policy in the CLI · cf0583b0
  Nicolas Dandrimont authored 3 years ago and vlorentz committed 3 years ago
```
We already do that in the scheduler backend function
```
  cf0583b0
- Run simulator tests on all known scheduling policies · ebb5847e
  Nicolas Dandrimont authored 3 years ago and vlorentz committed 3 years ago
  
  ebb5847e
- simulator: record visit metrics alongside scheduler metrics · 1f77521d
  Nicolas Dandrimont authored 3 years ago and vlorentz committed 3 years ago
```
This allows us to check the behavior of the archive over time in terms
of number of visits.
```
  1f77521d
- simulator: stop using the database as a cache for origin data · 88983944
  Nicolas Dandrimont authored 3 years ago and vlorentz committed 3 years ago
```
This was a significant bottleneck of the simulator. To work around this,
we:

 - Generate snapshot ids consistently in the OriginModel
 - Cache the origin data locally in the simulator, to compute the
   eventfulness of visits
 - Cache the last visit time for all origins to compute the estimated
   run time of visit tasks.
```
  88983944
- grab_next_visits: don't re-schedule visits too fast · c92ead58
  Nicolas Dandrimont authored 3 years ago and vlorentz committed 3 years ago
```
The earlier implementation would just schedule new visits for origins
forever, regardless of whether they were already scheduled or not.
```
  c92ead58