Commits · master · Platform / Development / swh-scheduler

Apr 15, 2025

Fix Python deprecation warnings related to datetime.utcnow · ced3e82f

DeprecationWarning: datetime.datetime.utcnow() is deprecated and
scheduled for removal in a future version. Use timezone-aware objects
to represent datetimes in UTC: datetime.datetime.now(datetime.UTC)

ced3e82f

Mar 31, 2025
- Split pytest plugin requirements into a separate swh.scheduler[pytest] extra · e28c11bf
  Nicolas Dandrimont authored 3 weeks ago
  
  View commits for tag v3.1.0 v3.1.0
  
  e28c11bf
Mar 28, 2025
- Require swh-storage[pytest] instead of swh-storage[testing] · 6f076570
  Nicolas Dandrimont authored 3 weeks ago
  
  6f076570
Mar 26, 2025
- Require the swh.journal pytest plugin explicitly · d21de66a
  Nicolas Dandrimont authored 3 weeks ago
  
  d21de66a
Mar 21, 2025

Migration psycopg3 · af1245b0

Pierre-Yves David authored 3 months ago and

Nicolas Dandrimont committed 1 month ago

The adapter registration become simpler, as psycopg understand enum
natively.

Most change are about the new parameter substitution.

We no longer need to raise Scheduler error if we have multiple Visit
update for the same URl. It seemed fine so I did not make extra work to
make use it happens.

On test was related to psycopg2 page size, we kept it to test "large"
batch.

af1245b0

Mar 17, 2025

scheduler/runner: Allow specifying task type patterns to schedule · 5f94aa08

Antoine R. Dumont authored 1 month ago

If none is provided, the current behavior is as before. When providing
patterns, the list of task types is filtered to only allow the task types
which starts with the patterns.

Refs. swh/infra/sysadm-environment#5512

5f94aa08

simulator: Fix timestamps manipulation · 2b7e34b2

Antoine R. Dumont authored 1 month ago

It's unclear whether the change in this mr triggers the bug in the simulator.
But it should be timestamp that needs manipulation and it's currently
datetimes. So this makes the current state fails with [1]

Adding the conversion layer from datetime to timestamps make the tests happier.

[1]
```
16:17:06  low = datetime.datetime(2025, 3, 14, 15, 17, 2, 139077, tzinfo=datetime.timezone.utc)
16:17:06  high = datetime.datetime(2025, 3, 14, 15, 17, 2, 139077, tzinfo=datetime.timezone.utc)
16:17:06
16:17:06      def _diff(low, high):
16:17:06          if low == high:
16:17:06              if low == 0:
16:17:06                  return 0.5
16:17:06              else:
16:17:06  >               return abs(low * 0.1)
16:17:06  E               TypeError: unsupported operand type(s) for *: 'datetime.datetime' and 'float'
16:17:06
```

Refs. swh/infra/sysadm-environment#5512

2b7e34b2

runner: Update task status only after sending the tasks to rabbitmq · 8445489a

Antoine R. Dumont authored 1 month ago

Prior to this, the runner called the `grab_ready{_priority}_tasks` method.
Those method update the task's status to 'next_run_scheduled' at the listing
time. So it actually writes immediately to postgresql.

So, failing to write to rabbitmq would update the status anyway. So we change
the runner's calls to use the `peek_ready{_priority}_tasks` methods
instead. This now only gets the task list to schedule. And at the end of the
runner, there is a call of `mass_schedule_task_runs` method. This method is
now in charge to update the tasks' status to 'next_run_scheduled' within the
same transaction.

Refs. swh/infra/sysadm-environment#5512

8445489a

celery/runner: Change write order to rabbitmq then postgresql · 259c70f3

Antoine R. Dumont authored 1 month ago

Messages are now first sent to rabbitmq then postgresql.

In the nominal case where all writes are ok, that changes nothing vs the
previous implementation (postgresql first then rabbitmq).

In degraded performance though, that's supposedly better.

1. If we cannot write to rabbitmq, then we won't write to postgresql either,
that function will raise and stop.

2. If we can write to rabbitmq first, then the messages will be consumed
independently from this. And then, if we cannot write to postgresql (for some
reason), then we just lose the information we sent the task already. This
means the same task will be rescheduled and we'll have a go at it again. As
those kind of tasks are supposed to be idempotent, that should not a major
issue for their upstream.

Also, those tasks are mostly listers now and they have a state management of
their own, so that should definitely mostly noops (if the ingestion from the
previous run went fine). Edge cases scenario like down site will behave as
before.

Refs. swh/infra/sysadm-environment#5512

259c70f3

Mar 14, 2025
- celery_backend/runner: Extract a write to backends function · 41f6156b
  Antoine R. Dumont authored 1 month ago
  
  To explicit its current behavior. Refs. swh/infra/sysadm-environment#5512
  41f6156b
- Use logger.warning instead of logger.warn · c4f108cf
  Antoine R. Dumont authored 1 month ago
  
  Refs. swh/infra/sysadm-environment#5512
  c4f108cf
- backend: Fix debug instruction and log level · fe3d8727
  Antoine R. Dumont authored 1 month ago
  
  Refs. swh/infra/sysadm-environment#5512
  fe3d8727
Mar 05, 2025
- Fix mypy error · b477d563
  vlorentz authored 1 month ago
  
  b477d563
Feb 25, 2025
- Migrate from deprecated pkg_resources package to importlib.metadata · 2b8df8d3
  Antoine Lambert authored 1 month ago
  
  View commits for tag v2.7.3 v2.7.3
  
  2b8df8d3
Feb 17, 2025
- mypy: Use type stubs for celery and fix typing · e3d9963e
  Antoine Lambert authored 2 months ago
  
  e3d9963e
- model: Fix black formatting · 9dc49075
  Antoine Lambert authored 2 months ago
  
  9dc49075
- Apply swh-py-template v0.3.3 with copier · 3cedcd2e
  Antoine Lambert authored 2 months ago
  
  Bump development tools: mypy, codespell, isort, ... Move all tools configuration in pyproject.toml. Remove no longer needed mypy overrides.
  3cedcd2e
Feb 10, 2025

cli/task: Allow to use backend name as task type in schedule command · 9f1adc23

Antoine Lambert authored 2 months ago and

Antoine Lambert committed 2 months ago

In the CSV file consumed by the schedule command, allow to use the celery
backend name as task type name because the mapping between a backend name
and its task type name can be easily retrieved from the scheduler API and
only the celry backend name is available in sentry events data.

9f1adc23

Feb 05, 2025

first_visits: Add missing None check before comparing dates · d3422dfe

Antoine Lambert authored 2 months ago

An origin can have be listed by the bulk-save lister but never
scheduled so we need to handle that case to avoid errors when
attempting to schedule priority first visits.

d3422dfe

Dec 09, 2024
- backend: Remove no longer needed TemporarySchedulerBackend class · 84a1f8b8
  Antoine Lambert authored 4 months ago
  
  A memory backend was recently introduced so that temporary backend relying on a postgresql server is no longer needed.
  View commits for tag v2.7.0 v2.7.0
  
  84a1f8b8
Nov 29, 2024
- Add an in-memory scheduler for testing purpose · 76cccbc8
  David Douard authored 4 months ago
  
  76cccbc8
Nov 06, 2024
- api/client: Enable new retry feature of swh.core.api.RPCClient · 758644b9
  Antoine Lambert authored 5 months ago
  
  From now on requests to the scheduler remote API will be retried when encountering connection errors and transient remote exceptions.
  View commits for tag v2.6.2 v2.6.2
  
  758644b9
Oct 30, 2024
- get_scheduler: Remove support for local cls and args argument · dbaf1d2e
  David Douard authored 5 months ago
  
  These have been deprecated for ages now.
  dbaf1d2e
- test: Do not run test_temporary twice · 9bf2b78e
  David Douard authored 5 months ago
  
  9bf2b78e
Oct 28, 2024
- Replace remaining 'local' cls with 'postgresql' · 8f341cfe
  David Douard authored 5 months ago
  
  The former has been deprecated for ages now.
  8f341cfe
Oct 24, 2024
- Declare scheduler backends in the swh.scheduler.classes entry point · a08abb96
  David Douard authored 6 months ago
  
  Normalize the scheduler db for swh.core 3.6 with improved `swh db` handling capabilities. Remove test_init.py, it's now outdated.
  a08abb96
- runner-first-visits: Improve log message · da126619
  Antoine R. Dumont authored 5 months ago
  
  This log message serves as a crude healt check so we keep it but we make it a bit more interesting.
  View commits for tag v2.6.1 v2.6.1
  
  da126619
Oct 17, 2024
- scheduler/runner-first-visits: Add tests around cli · f28b3fa4
  Antoine R. Dumont authored 6 months ago
  
  Refs. swh/devel/swh-scheduler#4687
  View commits for tag v2.6.0 v2.6.0
  
  f28b3fa4
- scheduler: Drop unused swh origin schedule-high-priority-first-visits · 18e9ecc4
  Antoine R. Dumont authored 6 months ago
  
  Refs. swh/devel/swh-scheduler#4687
  18e9ecc4
- scheduler/runner-first-visits: Log number of first visits scheduled · ad956e4d
  Antoine R. Dumont authored 6 months ago
  
  Refs. swh/devel/swh-scheduler#4687
  ad956e4d
- scheduler/utils: Move schedule_first_visits in dedicated module · 23dd849b
  Antoine R. Dumont authored 6 months ago
  
  Refs. swh/devel/swh-scheduler#4687
  23dd849b
- scheduler/utils: Move helper functions in dedicated module · 5952077b
  Antoine R. Dumont authored 6 months ago
  
  This also makes the function return the number of scheduled origins. Refs. swh/devel/swh-scheduler#4687
  5952077b
- scheduler: Add a runner for first visits · 859bb348
  Antoine R. Dumont authored 6 months ago
  
  The current opened cli was not looping. In effect, doing one round, schedule origins and then crash in production-like environment. There is no issue in the docker environment as the loop is implemented outside the pre-existing cli. This kept said cli to avoid breaking the docker environment. Refs. swh/devel/swh-scheduler#4687
  859bb348
Oct 14, 2024

cli/origin: Add schedule-high-priority-first-visits command · 4adc20b0

Antoine Lambert authored 6 months ago

This new command in the origin group enables to schedule first
visits with high priority for origins registered by listers having
the first_visits_priority_queue attribute set.

The command ensures the visits of all origins registered by such
listers will be scheduled with high priority after the first listing
regardless if some have already been scheduled prior it.

Subsequent executions of such listers will no longer trigger visits
with high priority though, those will be scheduled by the recurrent
visits runner.

Related to #4687.

4adc20b0

interface: Add get_visit_types_for_listed_origins method · 89c99a03
Antoine Lambert authored 6 months ago
```
It allows to return the set of visit types from the origins listed
by a specific lister.

Related to #4687.
```
89c99a03

Oct 09, 2024

interface: Add with_first_visits_to_schedule parameter to get_listers · 6b266002

Antoine Lambert authored 6 months ago

This new optional parameter enables to only return listers whose first
visits of listed origins must be scheduled with high priority after a
first listing but were not scheduled yet.

Those types of listers have the first_visits_queue_prefix attribute set.

Related to #4687.

6b266002

model: Add new columns to Lister model related to priority scheduling · ccee462b

Antoine Lambert authored 6 months ago

In order to implement a new scheduler runner that will schedule first
visits of listed origins with high priority, add the following new
columns to the Lister model:

- last_listing_finished_at: Timestamp at which the last execution of
  the lister finished

- first_visits_queue_prefix: Optional prefix of message queue names
  to schedule first visits with high priority

- first_visits_scheduled_at: Timestamp at which all the first visits
  of listed origins with high priority were scheduled

Related to #4687.

ccee462b

Sep 10, 2024

tests: Prevent flaky test_celery_monitor_ping · a5a7b9b2

Antoine Lambert authored 7 months ago

It exist cases (for instance when running tests on Jenkins) where
more than one log record is captured during that test, making it
flaky.

a5a7b9b2

Aug 30, 2024
- sql/upgrades: Fix typo spotted after codespell upgrade · 42b1a34e
  Antoine Lambert authored 7 months ago
  
  42b1a34e
- Fix some formatting after black version bump · a903f2c2
  Antoine Lambert authored 7 months ago
  
  a903f2c2