Implement storage of listed origins
This new API endpoint allows listers to record the origins they have seen during their current run.
Origins are identified by the lister instance, the url of the origin, and the type of loader that should be used to load this origin.
The implementation allows listers just send the list of origins they've seen (with some lightweight extra information), leaving the backend to handle whether to do an insertion or an update to an existing origin.
The current implementation doesn't disable origins that have disappeared when
doing a full listing run. This step will be done by a separate "origin garbage
collection" endpoint, which will peruse the last_seen
field.
Depends on !305 (closed). Related to #2442 (closed)
Test Plan
tox tests added for both the insert and update behaviors
Migrated from D3289 (view on Phabricator)
Merge request reports
Activity
Build is green
Patch application report for D3289 (id=11662)
Could not rebase; Attempt merge onto 1c93e553...
Updating 1c93e55..d107a55 Fast-forward swh/scheduler/backend.py | 40 +++++++++++++++- swh/scheduler/interface.py | 16 ++++++- swh/scheduler/model.py | 88 ++++++++++++++++++++++++++++++---- swh/scheduler/sql/30-swh-schema.sql | 33 +++++++++++++ swh/scheduler/tests/conftest.py | 26 +++++++++- swh/scheduler/tests/test_api_client.py | 1 + swh/scheduler/tests/test_model.py | 19 +++++++- swh/scheduler/tests/test_scheduler.py | 42 ++++++++++++---- 8 files changed, 241 insertions(+), 24 deletions(-)
Changes applied before test
commit d107a5553414ec7f2745a739dbc82e56eb62514e Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jun 16 10:25:08 2020 +0200 Implement storage of listed origins This new API endpoint allows listers to record the origins they have seen during their current run. Origins are identified by the lister instance, the url of the origin, and the type of loader that should be used to load this origin. The implementation allows listers just send the list of origins they've seen (with some lightweight extra information), leaving the backend to handle whether to do an insertion or an update to an existing origin. The current implementation doesn't disable origins that have disappeared when doing a full listing run. This step will be done by a separate "origin garbage collection" endpoint, which will peruse the `last_seen` field. commit e0fa5c58d38c2cbe39fe1f8e0fbb36591c29b661 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jun 16 10:24:03 2020 +0200 Move lister addition in scheduler tests to a pytest fixture This lets us keep the tests a little DRYer. commit 04894bd7fb6a1c4d658587395cbbe4f2d60c2a2a Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jun 16 10:22:23 2020 +0200 Lister.instance_name doesn't need a factory/default value commit f520108a8d0abefec3a91967aedbc29fb1a808f8 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jun 16 10:08:59 2020 +0200 Improve support of primary keys This splits primary keys across "automatic" primary keys (handled by the database) and manual primary keys (managed by the user). Use the opportunity to improve/clarify the documentation of field metadata attributes.
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/31/ for more details.