- Apr 14, 2021
- Apr 13, 2021
-
-
Antoine R. Dumont authored
The priority notion becomes a blur. Any tasks with a non null priority is considered for reading or grabbing. In a future commit, this should allow to make the runner evolve to reroute tasks with priority to other queues. Related to T3084
-
- Feb 11, 2021
-
-
Nicolas Dandrimont authored
psycopg2.extras.execute_values executes queries in batches of 100 by default. At the end of execute_values, only the last batch of results is available in the cursor; To fetch all results, one needs to set fetch=True instead of using the cursor.
-
Nicolas Dandrimont authored
This allows us to support reading the journal from the beginning, ignoring messages with the old schema.
-
Nicolas Dandrimont authored
The built-in `max` function can take an iterable directly, no need to reimplement it.
-
- Feb 09, 2021
-
-
Vincent Sellier authored
Fix a wrong computation when several messages (>=3) for the same snapshot are received in the wrong order For example, before the fix, the following occurs: ``` | date | snapshot | | last_ev | last_unev | Snap | | ---- | -------- | --- | -------- | --------- | ---- | | 2022 | S2 | | 2022 | | S2 | | 2020 | S2 | | 2020 | 2022 | S2 | | 2021 | S2 | | **2021** | **2020** | S2 | ``` as it should be: ``` | date | snapshot | | last_ev | last_unev | Snap | | ---- | -------- | --- | -------- | --------- | ---- | | 2022 | S2 | | 2022 | | S2 | | 2020 | S2 | | 2020 | 2022 | S2 | | 2021 | S2 | | **2020** | **2022** | S2 | ``` Related to T3000
-
- Feb 05, 2021
-
-
Antoine R. Dumont authored
As loader will start to create failed status message, deal with them if any. Related to T3030
-
- Feb 03, 2021
-
-
Nicolas Dandrimont authored
With late acknowledgements, RabbitMQ will re-send tasks to clients even if they can't ever complete the task (e.g. when the task gets killed because the machine is out of memory). This problem only increases over time, leading to complete starvation of the ingestion system. Now that we have multiple mechanisms to issue retries of tasks, we can use early acknowledgements for tasks instead, which should mitigate the ongoing starvation, at the expense of having to retry tasks externally.
- Feb 01, 2021
-
-
David Douard authored
-
David Douard authored
-
- Jan 29, 2021
-
-
David Douard authored
-
- Jan 26, 2021
-
-
We already do that in the scheduler backend function
-
-
This allows us to check the behavior of the archive over time in terms of number of visits.
-
This was a significant bottleneck of the simulator. To work around this, we: - Generate snapshot ids consistently in the OriginModel - Cache the origin data locally in the simulator, to compute the eventfulness of visits - Cache the last visit time for all origins to compute the estimated run time of visit tasks.
-
The earlier implementation would just schedule new visits for origins forever, regardless of whether they were already scheduled or not.
-
This makes the simulator behavior more consistent with reality.
-
-
vlorentz authored
-
- Jan 25, 2021
-
-
vlorentz authored
This generates consistent last_update values according to the model and simulated time.
-
Antoine Lambert authored
Some loaders, for instance the debian one, can have non string arguments so change the extra_loader_arguments type of the ListedOrigin model to something more generic. Related to T2979
- Jan 23, 2021
-
-
Vincent Sellier authored
Fix the case: m1: date2/snapshot1 m2: date1/snaptshot1 which results to: last_eventful = date2 last_uneventful = date2 The upsert was always keeping the most recent date when the eventful/uneventful dates were switched Related to T2978
-
Vincent Sellier authored
Avoid to copy the eventful date to the uneventful date when a duplicated message (same date/same snapshot) is received, related to T2978
-
- Jan 22, 2021
-
-
David Douard authored
-
David Douard authored
this allows to follows what the simulation is doing.
-
- Jan 21, 2021
-
-
Vincent Sellier authored
Fix the case: m1: date2/snapshot1 m2: date1/snaptshot1 which results to: last_eventful = date2 last_uneventful = date2 The upsert was always keeping the most recent date when the eventful/uneventful dates were switched Related to T2978
-
Vincent Sellier authored
Avoid to copy the eventful date to the uneventful date when a duplicated message (same date/same snapshot) is received, related to T2978
-
vlorentz authored
1. consistent with swh-storage and swh-indexer-storage 2. we can use swh.core.api.classes.stream_results on scheduler.get_listed_origins.
-
- factor out test setup and results checking - properly exercize corner cases of the oldest_scheduled_first policy
-
This policy schedules origins by decreasing order of "visit lag" (that is, origins with the most lag are scheduled first).
-
This policy orders never visited origins by increasing date of last update (scheduling the "oldest" never visited origins first).
-
This will allow us to easily plug new scheduling policies in that function.
-
Antoine R. Dumont authored
Related to T2967
-
Antoine R. Dumont authored
Related to T2967
- Jan 20, 2021
-
-
David Douard authored
- sort visits by default (there is a test dedicated to dealing with unsorted messagaes from the journal), - remove "intermediate checks" in several tests: these do not help much but make the code more difficult to read and maintain, - rename VISIT_STATUSES1 as VISIT_STATUSES_1 to make less prone to being confused with VISIT_STATUSES (which also exists).
-