- Feb 03, 2021
-
-
Nicolas Dandrimont authored
With late acknowledgements, RabbitMQ will re-send tasks to clients even if they can't ever complete the task (e.g. when the task gets killed because the machine is out of memory). This problem only increases over time, leading to complete starvation of the ingestion system. Now that we have multiple mechanisms to issue retries of tasks, we can use early acknowledgements for tasks instead, which should mitigate the ongoing starvation, at the expense of having to retry tasks externally.
- Feb 01, 2021
-
-
David Douard authored
-
David Douard authored
-
- Jan 29, 2021
-
-
David Douard authored
-
- Jan 26, 2021
-
-
We already do that in the scheduler backend function
-
-
This allows us to check the behavior of the archive over time in terms of number of visits.
-
This was a significant bottleneck of the simulator. To work around this, we: - Generate snapshot ids consistently in the OriginModel - Cache the origin data locally in the simulator, to compute the eventfulness of visits - Cache the last visit time for all origins to compute the estimated run time of visit tasks.
-
The earlier implementation would just schedule new visits for origins forever, regardless of whether they were already scheduled or not.
-
This makes the simulator behavior more consistent with reality.
-
-
vlorentz authored
-
- Jan 25, 2021
-
-
vlorentz authored
This generates consistent last_update values according to the model and simulated time.
-
Antoine Lambert authored
Some loaders, for instance the debian one, can have non string arguments so change the extra_loader_arguments type of the ListedOrigin model to something more generic. Related to T2979
- Jan 23, 2021
-
-
Vincent Sellier authored
Fix the case: m1: date2/snapshot1 m2: date1/snaptshot1 which results to: last_eventful = date2 last_uneventful = date2 The upsert was always keeping the most recent date when the eventful/uneventful dates were switched Related to T2978
-
Vincent Sellier authored
Avoid to copy the eventful date to the uneventful date when a duplicated message (same date/same snapshot) is received, related to T2978
-
- Jan 22, 2021
-
-
David Douard authored
-
David Douard authored
this allows to follows what the simulation is doing.
-
- Jan 21, 2021
-
-
Vincent Sellier authored
Fix the case: m1: date2/snapshot1 m2: date1/snaptshot1 which results to: last_eventful = date2 last_uneventful = date2 The upsert was always keeping the most recent date when the eventful/uneventful dates were switched Related to T2978
-
Vincent Sellier authored
Avoid to copy the eventful date to the uneventful date when a duplicated message (same date/same snapshot) is received, related to T2978
-
vlorentz authored
1. consistent with swh-storage and swh-indexer-storage 2. we can use swh.core.api.classes.stream_results on scheduler.get_listed_origins.
-
- factor out test setup and results checking - properly exercize corner cases of the oldest_scheduled_first policy
-
This policy schedules origins by decreasing order of "visit lag" (that is, origins with the most lag are scheduled first).
-
This policy orders never visited origins by increasing date of last update (scheduling the "oldest" never visited origins first).
-
This will allow us to easily plug new scheduling policies in that function.
-
Antoine R. Dumont authored
Related to T2967
-
Antoine R. Dumont authored
Related to T2967
- Jan 20, 2021
-
-
David Douard authored
- sort visits by default (there is a test dedicated to dealing with unsorted messagaes from the journal), - remove "intermediate checks" in several tests: these do not help much but make the code more difficult to read and maintain, - rename VISIT_STATUSES1 as VISIT_STATUSES_1 to make less prone to being confused with VISIT_STATUSES (which also exists).
-
David Douard authored
This reverts commit b03d9782. It's actually not needed, after all...
-
For now, only plot the known_origins and origins_never_visited metrics.
-
vlorentz authored
This reuses the scheduler instantiated by the cli instead of hardcoding our own using the PG* variables.
-
vlorentz authored
-
vlorentz authored
-
-
We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does.
-