- Apr 15, 2025
-
-
Antoine Lambert authored
DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC)
-
- Mar 31, 2025
-
- Mar 28, 2025
-
-
Nicolas Dandrimont authored
-
- Mar 26, 2025
-
-
Nicolas Dandrimont authored
-
- Mar 21, 2025
-
-
The adapter registration become simpler, as psycopg understand enum natively. Most change are about the new parameter substitution. We no longer need to raise Scheduler error if we have multiple Visit update for the same URl. It seemed fine so I did not make extra work to make use it happens. On test was related to psycopg2 page size, we kept it to test "large" batch.
-
- Mar 17, 2025
-
-
Antoine R. Dumont authored
If none is provided, the current behavior is as before. When providing patterns, the list of task types is filtered to only allow the task types which starts with the patterns. Refs. swh/infra/sysadm-environment#5512
-
Antoine R. Dumont authored
It's unclear whether the change in this mr triggers the bug in the simulator. But it should be timestamp that needs manipulation and it's currently datetimes. So this makes the current state fails with [1] Adding the conversion layer from datetime to timestamps make the tests happier. [1] ``` 16:17:06 low = datetime.datetime(2025, 3, 14, 15, 17, 2, 139077, tzinfo=datetime.timezone.utc) 16:17:06 high = datetime.datetime(2025, 3, 14, 15, 17, 2, 139077, tzinfo=datetime.timezone.utc) 16:17:06 16:17:06 def _diff(low, high): 16:17:06 if low == high: 16:17:06 if low == 0: 16:17:06 return 0.5 16:17:06 else: 16:17:06 > return abs(low * 0.1) 16:17:06 E TypeError: unsupported operand type(s) for *: 'datetime.datetime' and 'float' 16:17:06 ``` Refs. swh/infra/sysadm-environment#5512
-
Antoine R. Dumont authored
Prior to this, the runner called the `grab_ready{_priority}_tasks` method. Those method update the task's status to 'next_run_scheduled' at the listing time. So it actually writes immediately to postgresql. So, failing to write to rabbitmq would update the status anyway. So we change the runner's calls to use the `peek_ready{_priority}_tasks` methods instead. This now only gets the task list to schedule. And at the end of the runner, there is a call of `mass_schedule_task_runs` method. This method is now in charge to update the tasks' status to 'next_run_scheduled' within the same transaction. Refs. swh/infra/sysadm-environment#5512
-
Antoine R. Dumont authored
Messages are now first sent to rabbitmq then postgresql. In the nominal case where all writes are ok, that changes nothing vs the previous implementation (postgresql first then rabbitmq). In degraded performance though, that's supposedly better. 1. If we cannot write to rabbitmq, then we won't write to postgresql either, that function will raise and stop. 2. If we can write to rabbitmq first, then the messages will be consumed independently from this. And then, if we cannot write to postgresql (for some reason), then we just lose the information we sent the task already. This means the same task will be rescheduled and we'll have a go at it again. As those kind of tasks are supposed to be idempotent, that should not a major issue for their upstream. Also, those tasks are mostly listers now and they have a state management of their own, so that should definitely mostly noops (if the ingestion from the previous run went fine). Edge cases scenario like down site will behave as before. Refs. swh/infra/sysadm-environment#5512
-
- Mar 14, 2025
-
-
Antoine R. Dumont authored
To explicit its current behavior. Refs. swh/infra/sysadm-environment#5512
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5512
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5512
-
- Mar 05, 2025
-
-
vlorentz authored
-
- Feb 25, 2025
-
- Feb 17, 2025
-
-
Antoine Lambert authored
-
Antoine Lambert authored
-
Antoine Lambert authored
Bump development tools: mypy, codespell, isort, ... Move all tools configuration in pyproject.toml. Remove no longer needed mypy overrides.
-
- Feb 10, 2025
-
-
In the CSV file consumed by the schedule command, allow to use the celery backend name as task type name because the mapping between a backend name and its task type name can be easily retrieved from the scheduler API and only the celry backend name is available in sentry events data.
-
- Feb 05, 2025
-
-
Antoine Lambert authored
An origin can have be listed by the bulk-save lister but never scheduled so we need to handle that case to avoid errors when attempting to schedule priority first visits.
-
- Dec 09, 2024
-
-
Antoine Lambert authored
A memory backend was recently introduced so that temporary backend relying on a postgresql server is no longer needed.
-
- Nov 29, 2024
-
-
David Douard authored
-
- Nov 06, 2024
-
-
Antoine Lambert authored
From now on requests to the scheduler remote API will be retried when encountering connection errors and transient remote exceptions.
-
- Oct 30, 2024
-
-
David Douard authored
These have been deprecated for ages now.
-
David Douard authored
-
- Oct 28, 2024
-
-
David Douard authored
The former has been deprecated for ages now.
-
- Oct 24, 2024
-
-
David Douard authored
Normalize the scheduler db for swh.core 3.6 with improved `swh db` handling capabilities. Remove test_init.py, it's now outdated.
-
Antoine R. Dumont authored
This log message serves as a crude healt check so we keep it but we make it a bit more interesting.
-
- Oct 17, 2024
-
-
Antoine R. Dumont authored
Refs. swh/devel/swh-scheduler#4687
-
Antoine R. Dumont authored
Refs. swh/devel/swh-scheduler#4687
-
Antoine R. Dumont authored
Refs. swh/devel/swh-scheduler#4687
-
Antoine R. Dumont authored
This also makes the function return the number of scheduled origins. Refs. swh/devel/swh-scheduler#4687
-
Antoine R. Dumont authored
The current opened cli was not looping. In effect, doing one round, schedule origins and then crash in production-like environment. There is no issue in the docker environment as the loop is implemented outside the pre-existing cli. This kept said cli to avoid breaking the docker environment. Refs. swh/devel/swh-scheduler#4687
- Oct 14, 2024
-
-
Antoine Lambert authored
This new command in the origin group enables to schedule first visits with high priority for origins registered by listers having the first_visits_priority_queue attribute set. The command ensures the visits of all origins registered by such listers will be scheduled with high priority after the first listing regardless if some have already been scheduled prior it. Subsequent executions of such listers will no longer trigger visits with high priority though, those will be scheduled by the recurrent visits runner. Related to #4687.
-
Antoine Lambert authored
It allows to return the set of visit types from the origins listed by a specific lister. Related to #4687.
-
- Oct 09, 2024
-
-
Antoine Lambert authored
This new optional parameter enables to only return listers whose first visits of listed origins must be scheduled with high priority after a first listing but were not scheduled yet. Those types of listers have the first_visits_queue_prefix attribute set. Related to #4687.
-
Antoine Lambert authored
In order to implement a new scheduler runner that will schedule first visits of listed origins with high priority, add the following new columns to the Lister model: - last_listing_finished_at: Timestamp at which the last execution of the lister finished - first_visits_queue_prefix: Optional prefix of message queue names to schedule first visits with high priority - first_visits_scheduled_at: Timestamp at which all the first visits of listed origins with high priority were scheduled Related to #4687.
-
- Sep 10, 2024
-
-
Antoine Lambert authored
It exist cases (for instance when running tests on Jenkins) where more than one log record is captured during that test, making it flaky.
-
- Aug 30, 2024
-
-
Antoine Lambert authored
-
Antoine Lambert authored
-