- Oct 15, 2021
-
-
Antoine R. Dumont authored
This scenario happens with the loader oneshot for example. This loader deals with more than 1 type of origins to ingest in the same queue. So the computation of that function returned negative value [1]. Which is ultimately not possible to execute in sql [1]. This commits fixes that behavior. This also explicits that the function must return positive values in its docstring. [1] ``` ... psycopg2.errors.InvalidRowCountInLimitClause: LIMIT must not be negative ```
-
- Sep 02, 2021
-
-
Antoine R. Dumont authored
-
- Aug 27, 2021
-
-
Antoine R. Dumont authored
In the non optimal case, we may want to trigger specific case (not-yet enabled origins, origin from specific lister...). Related to T3350
-
- Aug 26, 2021
-
-
Nicolas Dandrimont authored
For origins that have never been visited, and for which we don't have a queue position yet, we want to visit them in the order they've been added.
-
Nicolas Dandrimont authored
The subcommand bypasses the legacy task-based mechanism to directly send new origin visits to celery
-
Nicolas Dandrimont authored
Running common operations on all git origins is pretty intense. Using table sampling gives us the opportunity to at least schedule some jobs in (decently small) time.
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
Queue positions are date and the current next_position_offset used to compute the new queue position was not bounded. This has the side-effect of making overflow error. This commit adapts the journal client computations to limit such next_position_offset to 10. This value was chosen because above that exponent the dates overflow (and we are way in the future already). Related to T3502
-
- Aug 18, 2021
-
-
vlorentz authored
We changed the task name/interface a while ago
-
- Aug 03, 2021
-
-
Antoine R. Dumont authored
This disable origins for either failed or not found attempts 3 times in a row. It's not definitive though as it's the lister's responsibility to activate back origins if they get listed again. Related to T2345
-
This maintains the number of successive visits resulting in the same status. This will help implementing disabling of too many successive failed or not_found visits for a given origin. Related to T2345
-
- Jul 30, 2021
-
-
Antoine R. Dumont authored
Related to T2345
-
Antoine R. Dumont authored
This is no longer required as it's called once. Related to T2345
-
- Jul 23, 2021
-
-
Nicolas Dandrimont authored
After using this schema for a while, all queries can be implemented in terms of these two timestamps, instead of the four original last_eventful, last_uneventful, last_failed and last_notfound timestamps. This ends up simplifying the logic within the journal client, as well as that of the grab_next_visits query builder. To make this change work, we also stop considering out of order messages altogether in journal_client. This welcome simplification is an accuracy tradeoff that is explained in the updated documentation of the journal client: .. [1] Ignoring out of order messages makes the initialization of the origin_visit_status table (from a full journal) less deterministic: only the `last_visit`, `last_visit_state` and `last_successful` fields are guaranteed to be exact, the `next_position_offset` field is a best effort estimate (which should converge once the client has run for a while on in-order messages).
-
Antoine R. Dumont authored
Related to D5917
-
Antoine R. Dumont authored
This simplifies and unifies properly the utility test function to compare visit stats.
-
- Jul 22, 2021
-
-
This is in charge of scheduling origins without last update. This also updates the global queue position so the journal client can initialize correctly the next position per origin and visit type. Related to T2345
-
Nicolas Dandrimont authored
This allows us to insert extra CTEs if a scheduling policy needs it.
-
- Jul 06, 2021
-
-
Antoine R. Dumont authored
For origin without any last_update information [1], the journal client is now also in charge of moving their next position in the queue for rescheduling. Depending on their status, the next position offset and next_visit_queue_position are updated after each visit completes: - if the visit has failed, increase the next visit target by the minimal visit interval (to take into account transient loading issues) - if the visit is successful, and records some changes, decrease the visit interval index by 2 (visit the origin *way* more often). - if the visit is successful, and records no changes, increase the visit interval index by 1 (visit the origin less often). We then set the next visit target to its current value + the new visit interval multiplied by a random fudge factor (picked in the -/+ 10% range). The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins e.g. when a number of origins from a single hoster are processed at once. Note that the computations happen for all origins for simplicity and code maintenance but it will only be used by a new soon-to-be scheduling policy. [1] Lister cannot provide it for some reason.
-
- Jul 01, 2021
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
This deals first and foremost with the next_position_offset update done by the scheduler journal client.
-
- Jun 29, 2021
-
-
Antoine R. Dumont authored
-
- Jun 23, 2021
-
-
Antoine R. Dumont authored
In a future commit, we will add new fields whose values will be permutation dependent.
-
Antoine R. Dumont authored
This will help us when adding new fields to the table.
-
Antoine R. Dumont authored
This will help us when adding new fields to the table.
-
Antoine R. Dumont authored
-
Nicolas Dandrimont authored
This allows us to avoid repeating visits on them, until a next pass of the lister can mark them as disabled.
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
- Jun 22, 2021
-
-
Antoine Lambert authored
Add new method to scheduler interface returning the full list of listers registered in the database. Related to T3127
-
- Jun 21, 2021
-
-
Nicolas Dandrimont authored
-
- Jun 10, 2021
-
-
Antoine R. Dumont authored
In effect, this will allow to run 2 runners: - one for recurring tasks - one for the save code now This should decrease the probability of the scheduling tasks for the save code now to be stuck behind the main scheduler runner. Related to T3367
-
Antoine R. Dumont authored
This adds coverage as well. This will be needed for subsidiary diffs. Related to T3367
-
- Jun 09, 2021
-
-
Antoine R. Dumont authored
This also explicits missing dependencies
-
- May 25, 2021
-
-
Antoine Lambert authored
Since the release of kombu 5.1.0, a warning is now issued when a hostname is not set in the broker_url config value of a celery app. That change makes the test_celery_monitor_ping test fails due to that new unexpected warning. So explicitly add localhost hostname in the broker_url value of the celery TestApp config.
-
- May 06, 2021
-
- Apr 30, 2021
-
-
Nicolas Dandrimont authored
This would only be useful if we had multiple runners running concurrently, but that's not the case.
-
- Apr 26, 2021
-
-
Antoine Lambert authored
Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258
-
- Apr 20, 2021
-
-
Antoine R. Dumont authored
The staging scheduler runner was slow when fetching task due to that missing index. Related to T3271#63831
-