Skip to content

All the current swh.provenance changes that are running in production...

this is the accumulation of changes that have been tested in production on mmca and met over the past weeks. A lot of this has been pair-programmed, and the tests still pass, so we're probably in good shape.

git log origin/master.. says:

commit 8f476d494b4aeab6e0cd6a7adb5f2bce095e8c60
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 13:37:12 2022 +0200

    swhgraph: handle empty responses

    When the visit_edges response is empty, swh.graph.client generates an
    empty tuple, which can't be unpacked. Work around the issue.

 swh/provenance/swhgraph/archive.py | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

commit edf00f88894fb9cf407017944dc5cd751b012357
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 13:36:39 2022 +0200

    Use proper signatures in journal_client

    We're always passing the provenance-internal object types, not those of
    swh.storage.

 swh/provenance/journal_client.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

commit 08de80b680bdf008f9a1f45805f2d54a7a397549
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 11:26:01 2022 +0200

    origin layer: retrieve multiple levels of revision history at once

    Replace `revision_get_parents` with `revision_get_some_outbound_edges`,
    which can optionally retrieve more levels of history than just a single
    one. This allows us to do way fewer queries on the swh.graph or
    swh.storage backend if the revision exists there.

    The swh.storage backend does limited recursion, so we still process the
    origin in multiple steps to fetch the whole history.

 swh/provenance/archive.py                      | 15 +++++----
 swh/provenance/graph.py                        | 43 +++++++++++++-------------
 swh/provenance/interface.py                    | 10 +++---
 swh/provenance/journal_client.py               |  1 -
 swh/provenance/model.py                        | 20 +-----------
 swh/provenance/multiplexer/archive.py          | 28 ++++++++++-------
 swh/provenance/origin.py                       | 26 ++++++----------
 swh/provenance/postgresql/archive.py           | 27 +++++++++-------
 swh/provenance/provenance.py                   | 28 ++++++++---------
 swh/provenance/storage/archive.py              | 12 ++++---
 swh/provenance/swhgraph/archive.py             | 23 ++++++++------
 swh/provenance/tests/test_archive_interface.py | 29 ++++++++++-------
 12 files changed, 130 insertions(+), 132 deletions(-)

commit 68e1907e7f37863d732edcb6211be893df94b9c7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 11:03:12 2022 +0200

    Appease pyright by ensuring target_type is bound

 swh/provenance/tests/test_archive_interface.py | 2 ++
 1 file changed, 2 insertions(+)

commit d935abf431df5105fec8422e87eb5ee47d3c177a
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 11:01:37 2022 +0200

    Rename origin.proceed_origin to origin.process_origin

 swh/provenance/origin.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

commit 2ac46f58346f7c3763f1263109885fea6797e155
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed Aug 3 18:21:32 2022 +0200

    multiplexer: add endpoint counts per backend

 swh/provenance/__init__.py                     |  6 ++-
 swh/provenance/multiplexer/archive.py          | 61 +++++++++++++++++++-------
 swh/provenance/tests/test_archive_interface.py |  4 +-
 swh/provenance/tests/test_init.py              |  6 ++-
 4 files changed, 57 insertions(+), 20 deletions(-)

commit 8d323c322df2bf9a429a1329de6c87636927df19
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 17:46:20 2022 +0200

    journal client: only use the provenance context manager once

    The context manager for the provenance storage rabbitmq client doesn't
    like being used multiple times over the lifetime of a process. Only use
    it once in the cli of the journal client.

 swh/provenance/cli.py            | 6 ++++--
 swh/provenance/journal_client.py | 6 ++----
 2 files changed, 6 insertions(+), 6 deletions(-)

commit f5f8555f8e3d8c72a5d51f4a10d0b761e74c97fe
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 17:45:27 2022 +0200

    provenance: lower the cache thresholds

    Instead of flushing if any entry is over the threshold, flush when the
    cumulative count goes over.

 swh/provenance/provenance.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

commit 4b3de6177b4f2c5b45dede931004c719fdfb0f7d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 17:44:47 2022 +0200

    revision: only trigger partial flushes when necessary

 swh/provenance/revision.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

commit 9c936c39779cdb42b0f8f1a40df23d2de3032dfb
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 17:43:13 2022 +0200

    revision: sort batches by date, improve logging, add incremental flushing

 swh/provenance/revision.py | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

commit 5b66b98e62c50c5958936adcc3b0ab651fb2d279
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 17:40:59 2022 +0200

    revision: capture datetime exceptions with sentry

 swh/provenance/journal_client.py | 3 +++
 1 file changed, 3 insertions(+)

commit af09058f0a80aac79a4e477fb2f7bd9800e3603f
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 17:40:26 2022 +0200

    revision: don't process revisions before the epoch

 swh/provenance/journal_client.py | 7 +++++++
 1 file changed, 7 insertions(+)

commit 3473d4af62d85255845aafc1def6c591090062e7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 17:39:30 2022 +0200

    revision: don't process revisions with unknown dates

 swh/provenance/journal_client.py | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

commit d7d0c3d876059abe6a1d60a6c38ed4245e1b58c9
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 17:34:21 2022 +0200

    postgresql archive: add support for partially copied databases

    The incremental copy of the archive to mmca is not atomic: the directory
    table needs to be copied first, then the directory_entry_* tables need
    to be updated. This means that the client can view inconsistent entries,
    where the directory has been synced but not all the entry rows.

    We return an empty list when one of these bogus entries is detected.
    This allows smooth fallback to the main database through the
    multiplexer.

 swh/provenance/postgresql/archive.py | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

commit 95eb9622a00ce99d089bb9accdaed0bdbf1bdc37
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 17:33:28 2022 +0200

    postgresql archive: don't use custom types

    The partial copy of the archive on mmca doesn't have them anyway.

 swh/provenance/postgresql/archive.py | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

commit 34a9a1ac220bfabdda26b243c79742bdab090d76
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 17:32:09 2022 +0200

    Remove sneaky caches in the postgresql archive implementation

 mypy.ini                             | 3 ---
 requirements.txt                     | 1 -
 swh/provenance/postgresql/archive.py | 3 ---
 3 files changed, 7 deletions(-)

commit bae8f4afda455ca28e64e54f1c9c37c6af2214b6
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 17:29:45 2022 +0200

    rabbitmq: Extend timeouts for reception of acks

    The retry logic is not very refined, extending the timeouts makes more
    sense.

 swh/provenance/api/client.py | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

commit 1efc40c7917feaedfa1204b6e4e395d41530d14c
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 17:28:31 2022 +0200

    rabbitmq: close the consumer only after all acks are received

    This is not quite working but it seems to reduce issues on worker
    termination a bit.

 swh/provenance/api/client.py | 63 ++++++++++++++++++++++++++++----------------
 1 file changed, 41 insertions(+), 22 deletions(-)

commit ef7cd991712e47a14d7877f726f427a9de22e545
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 17:14:58 2022 +0200

    Improve logging in the API client and the revision layer

 swh/provenance/api/client.py | 39 +++++++++++++++++++++++----------------
 swh/provenance/provenance.py |  2 +-
 swh/provenance/revision.py   | 12 ++++++++++++
 3 files changed, 36 insertions(+), 17 deletions(-)

commit 3edf3690258b9e61de5452967c6ee178120276e7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 16:53:11 2022 +0200

    Add systemd notification support

 mypy.ini                         |  3 +++
 swh/provenance/cli.py            | 15 +++++++++++++++
 swh/provenance/journal_client.py |  9 +++++++++
 3 files changed, 27 insertions(+)

commit 5cadb13de9eb27b309d2ada3df54dc86452785b3
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 16:54:27 2022 +0200

    Try to avoid some circular imports

 swh/provenance/__init__.py   | 2 +-
 swh/provenance/api/server.py | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

commit 98254d2e930f639c7b1fdb3c27f5eb2a668b857d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Aug 12 17:17:11 2022 +0200

    blacken swhgraph/archive.py

 swh/provenance/swhgraph/archive.py | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Test Plan

tests pass, and the journal clients seem happy enough...


Migrated from D8243 (view on Phabricator)

Merge request reports