- Apr 30, 2021
-
-
Nicolas Dandrimont authored
This would only be useful if we had multiple runners running concurrently, but that's not the case.
-
- Apr 26, 2021
-
-
Antoine Lambert authored
Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258
-
- Apr 20, 2021
-
-
Antoine R. Dumont authored
The staging scheduler runner was slow when fetching task due to that missing index. Related to T3271#63831
-
vlorentz authored
We no longer need to deal with ratios, so let's count the objects directly instead. Plus, the existing tests did not check tasks with None priority (because they did not have access to it when ratios were given by the backend), so they do now.
-
Antoine R. Dumont authored
Related to TT3271
-
Antoine R. Dumont authored
Since [1], tasks with priority are routed to dedicated queues (see tasks for more details). The tasks with priority to be scheduled have their own dedicated endpoints to be called. [1] Related to T3084 Related to T3271
-
vlorentz authored
So errors on the CLI side do not trigger an exception on the server
-
- Apr 15, 2021
-
-
Antoine R. Dumont authored
Related to T3084
-
Antoine R. Dumont authored
This splits the calls to read tasks into 2 calls, one for tasks with no priority (standard), another call for tasks with priority. If any tasks with priority are detected, they are routed to dedicated `save_code_now:` prefixed named queues (per task type). Related to T3084
-
vlorentz authored
-
- Apr 13, 2021
-
-
Antoine R. Dumont authored
The priority notion becomes a blur. Any tasks with a non null priority is considered for reading or grabbing. In a future commit, this should allow to make the runner evolve to reroute tasks with priority to other queues. Related to T3084
-
- Feb 11, 2021
-
-
Nicolas Dandrimont authored
psycopg2.extras.execute_values executes queries in batches of 100 by default. At the end of execute_values, only the last batch of results is available in the cursor; To fetch all results, one needs to set fetch=True instead of using the cursor.
-
Nicolas Dandrimont authored
This allows us to support reading the journal from the beginning, ignoring messages with the old schema.
-
Nicolas Dandrimont authored
The built-in `max` function can take an iterable directly, no need to reimplement it.
-
- Feb 09, 2021
-
-
Vincent Sellier authored
Fix a wrong computation when several messages (>=3) for the same snapshot are received in the wrong order For example, before the fix, the following occurs: ``` | date | snapshot | | last_ev | last_unev | Snap | | ---- | -------- | --- | -------- | --------- | ---- | | 2022 | S2 | | 2022 | | S2 | | 2020 | S2 | | 2020 | 2022 | S2 | | 2021 | S2 | | **2021** | **2020** | S2 | ``` as it should be: ``` | date | snapshot | | last_ev | last_unev | Snap | | ---- | -------- | --- | -------- | --------- | ---- | | 2022 | S2 | | 2022 | | S2 | | 2020 | S2 | | 2020 | 2022 | S2 | | 2021 | S2 | | **2020** | **2022** | S2 | ``` Related to T3000
-
- Feb 05, 2021
-
-
Antoine R. Dumont authored
As loader will start to create failed status message, deal with them if any. Related to T3030
-
- Feb 03, 2021
-
-
Nicolas Dandrimont authored
With late acknowledgements, RabbitMQ will re-send tasks to clients even if they can't ever complete the task (e.g. when the task gets killed because the machine is out of memory). This problem only increases over time, leading to complete starvation of the ingestion system. Now that we have multiple mechanisms to issue retries of tasks, we can use early acknowledgements for tasks instead, which should mitigate the ongoing starvation, at the expense of having to retry tasks externally.
-
- Feb 01, 2021
-
-
David Douard authored
-
David Douard authored
-
- Jan 29, 2021
-
-
David Douard authored
-
- Jan 26, 2021
-
-
We already do that in the scheduler backend function
-
-
This allows us to check the behavior of the archive over time in terms of number of visits.
-
This was a significant bottleneck of the simulator. To work around this, we: - Generate snapshot ids consistently in the OriginModel - Cache the origin data locally in the simulator, to compute the eventfulness of visits - Cache the last visit time for all origins to compute the estimated run time of visit tasks.
-
The earlier implementation would just schedule new visits for origins forever, regardless of whether they were already scheduled or not.
-
This makes the simulator behavior more consistent with reality.
-
-
vlorentz authored
-
- Jan 25, 2021
-
-
vlorentz authored
This generates consistent last_update values according to the model and simulated time.
-
Antoine Lambert authored
Some loaders, for instance the debian one, can have non string arguments so change the extra_loader_arguments type of the ListedOrigin model to something more generic. Related to T2979
-
- Jan 23, 2021
-
-
Vincent Sellier authored
Fix the case: m1: date2/snapshot1 m2: date1/snaptshot1 which results to: last_eventful = date2 last_uneventful = date2 The upsert was always keeping the most recent date when the eventful/uneventful dates were switched Related to T2978
-
Vincent Sellier authored
Avoid to copy the eventful date to the uneventful date when a duplicated message (same date/same snapshot) is received, related to T2978
-
- Jan 22, 2021
-
-
David Douard authored
-
David Douard authored
this allows to follows what the simulation is doing.
-
- Jan 21, 2021
-
-
vlorentz authored
1. consistent with swh-storage and swh-indexer-storage 2. we can use swh.core.api.classes.stream_results on scheduler.get_listed_origins.
-
This policy schedules origins by decreasing order of "visit lag" (that is, origins with the most lag are scheduled first).
-
This policy orders never visited origins by increasing date of last update (scheduling the "oldest" never visited origins first).
-
- factor out test setup and results checking - properly exercize corner cases of the oldest_scheduled_first policy
-
This will allow us to easily plug new scheduling policies in that function.
-