All the current swh.provenance changes that are running in production...
this is the accumulation of changes that have been tested in production on mmca and met over the past weeks. A lot of this has been pair-programmed, and the tests still pass, so we're probably in good shape.
git log origin/master..
says:
commit 8f476d494b4aeab6e0cd6a7adb5f2bce095e8c60
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 13:37:12 2022 +0200
swhgraph: handle empty responses
When the visit_edges response is empty, swh.graph.client generates an
empty tuple, which can't be unpacked. Work around the issue.
swh/provenance/swhgraph/archive.py | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
commit edf00f88894fb9cf407017944dc5cd751b012357
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 13:36:39 2022 +0200
Use proper signatures in journal_client
We're always passing the provenance-internal object types, not those of
swh.storage.
swh/provenance/journal_client.py | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
commit 08de80b680bdf008f9a1f45805f2d54a7a397549
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 11:26:01 2022 +0200
origin layer: retrieve multiple levels of revision history at once
Replace `revision_get_parents` with `revision_get_some_outbound_edges`,
which can optionally retrieve more levels of history than just a single
one. This allows us to do way fewer queries on the swh.graph or
swh.storage backend if the revision exists there.
The swh.storage backend does limited recursion, so we still process the
origin in multiple steps to fetch the whole history.
swh/provenance/archive.py | 15 +++++----
swh/provenance/graph.py | 43 +++++++++++++-------------
swh/provenance/interface.py | 10 +++---
swh/provenance/journal_client.py | 1 -
swh/provenance/model.py | 20 +-----------
swh/provenance/multiplexer/archive.py | 28 ++++++++++-------
swh/provenance/origin.py | 26 ++++++----------
swh/provenance/postgresql/archive.py | 27 +++++++++-------
swh/provenance/provenance.py | 28 ++++++++---------
swh/provenance/storage/archive.py | 12 ++++---
swh/provenance/swhgraph/archive.py | 23 ++++++++------
swh/provenance/tests/test_archive_interface.py | 29 ++++++++++-------
12 files changed, 130 insertions(+), 132 deletions(-)
commit 68e1907e7f37863d732edcb6211be893df94b9c7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 11:03:12 2022 +0200
Appease pyright by ensuring target_type is bound
swh/provenance/tests/test_archive_interface.py | 2 ++
1 file changed, 2 insertions(+)
commit d935abf431df5105fec8422e87eb5ee47d3c177a
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 11:01:37 2022 +0200
Rename origin.proceed_origin to origin.process_origin
swh/provenance/origin.py | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
commit 2ac46f58346f7c3763f1263109885fea6797e155
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Aug 3 18:21:32 2022 +0200
multiplexer: add endpoint counts per backend
swh/provenance/__init__.py | 6 ++-
swh/provenance/multiplexer/archive.py | 61 +++++++++++++++++++-------
swh/provenance/tests/test_archive_interface.py | 4 +-
swh/provenance/tests/test_init.py | 6 ++-
4 files changed, 57 insertions(+), 20 deletions(-)
commit 8d323c322df2bf9a429a1329de6c87636927df19
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 17:46:20 2022 +0200
journal client: only use the provenance context manager once
The context manager for the provenance storage rabbitmq client doesn't
like being used multiple times over the lifetime of a process. Only use
it once in the cli of the journal client.
swh/provenance/cli.py | 6 ++++--
swh/provenance/journal_client.py | 6 ++----
2 files changed, 6 insertions(+), 6 deletions(-)
commit f5f8555f8e3d8c72a5d51f4a10d0b761e74c97fe
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 17:45:27 2022 +0200
provenance: lower the cache thresholds
Instead of flushing if any entry is over the threshold, flush when the
cumulative count goes over.
swh/provenance/provenance.py | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
commit 4b3de6177b4f2c5b45dede931004c719fdfb0f7d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 17:44:47 2022 +0200
revision: only trigger partial flushes when necessary
swh/provenance/revision.py | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
commit 9c936c39779cdb42b0f8f1a40df23d2de3032dfb
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 17:43:13 2022 +0200
revision: sort batches by date, improve logging, add incremental flushing
swh/provenance/revision.py | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
commit 5b66b98e62c50c5958936adcc3b0ab651fb2d279
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 17:40:59 2022 +0200
revision: capture datetime exceptions with sentry
swh/provenance/journal_client.py | 3 +++
1 file changed, 3 insertions(+)
commit af09058f0a80aac79a4e477fb2f7bd9800e3603f
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 17:40:26 2022 +0200
revision: don't process revisions before the epoch
swh/provenance/journal_client.py | 7 +++++++
1 file changed, 7 insertions(+)
commit 3473d4af62d85255845aafc1def6c591090062e7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 17:39:30 2022 +0200
revision: don't process revisions with unknown dates
swh/provenance/journal_client.py | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
commit d7d0c3d876059abe6a1d60a6c38ed4245e1b58c9
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 17:34:21 2022 +0200
postgresql archive: add support for partially copied databases
The incremental copy of the archive to mmca is not atomic: the directory
table needs to be copied first, then the directory_entry_* tables need
to be updated. This means that the client can view inconsistent entries,
where the directory has been synced but not all the entry rows.
We return an empty list when one of these bogus entries is detected.
This allows smooth fallback to the main database through the
multiplexer.
swh/provenance/postgresql/archive.py | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
commit 95eb9622a00ce99d089bb9accdaed0bdbf1bdc37
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 17:33:28 2022 +0200
postgresql archive: don't use custom types
The partial copy of the archive on mmca doesn't have them anyway.
swh/provenance/postgresql/archive.py | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
commit 34a9a1ac220bfabdda26b243c79742bdab090d76
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 17:32:09 2022 +0200
Remove sneaky caches in the postgresql archive implementation
mypy.ini | 3 ---
requirements.txt | 1 -
swh/provenance/postgresql/archive.py | 3 ---
3 files changed, 7 deletions(-)
commit bae8f4afda455ca28e64e54f1c9c37c6af2214b6
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 17:29:45 2022 +0200
rabbitmq: Extend timeouts for reception of acks
The retry logic is not very refined, extending the timeouts makes more
sense.
swh/provenance/api/client.py | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
commit 1efc40c7917feaedfa1204b6e4e395d41530d14c
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 17:28:31 2022 +0200
rabbitmq: close the consumer only after all acks are received
This is not quite working but it seems to reduce issues on worker
termination a bit.
swh/provenance/api/client.py | 63 ++++++++++++++++++++++++++++----------------
1 file changed, 41 insertions(+), 22 deletions(-)
commit ef7cd991712e47a14d7877f726f427a9de22e545
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 17:14:58 2022 +0200
Improve logging in the API client and the revision layer
swh/provenance/api/client.py | 39 +++++++++++++++++++++++----------------
swh/provenance/provenance.py | 2 +-
swh/provenance/revision.py | 12 ++++++++++++
3 files changed, 36 insertions(+), 17 deletions(-)
commit 3edf3690258b9e61de5452967c6ee178120276e7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 16:53:11 2022 +0200
Add systemd notification support
mypy.ini | 3 +++
swh/provenance/cli.py | 15 +++++++++++++++
swh/provenance/journal_client.py | 9 +++++++++
3 files changed, 27 insertions(+)
commit 5cadb13de9eb27b309d2ada3df54dc86452785b3
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 16:54:27 2022 +0200
Try to avoid some circular imports
swh/provenance/__init__.py | 2 +-
swh/provenance/api/server.py | 3 ++-
2 files changed, 3 insertions(+), 2 deletions(-)
commit 98254d2e930f639c7b1fdb3c27f5eb2a668b857d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Fri Aug 12 17:17:11 2022 +0200
blacken swhgraph/archive.py
swh/provenance/swhgraph/archive.py | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
Test Plan
tests pass, and the journal clients seem happy enough...
Migrated from D8243 (view on Phabricator)