- Oct 18, 2022
-
-
David Douard authored
- pre-commit from 4.1.0 to 4.3.0, - codespell from 2.2.1 to 2.2.2, - black from 22.3.0 to 22.10.0 and - flake8 from 4.0.1 to 5.0.4. Also freeze flake8 dependencies. Also change flake8's repo config to github (the gitlab mirror being outdated).
-
- Oct 13, 2022
-
-
David Douard authored
-
David Douard authored
As usual when kafka is involved in tests, provided new tests are a bit slow...
-
David Douard authored
instead of depending on the proper behavior of the user of the ProvenanceStoragePostgresql user.
-
David Douard authored
it's not used, and keeping it makes code unnecessarly complex.
-
- Oct 12, 2022
-
-
David Douard authored
-
- Oct 11, 2022
-
-
David Douard authored
the new ProvenanceStorageJournal is a proxy ProvenanceStorageInterface that will push added objects in a swh-journal (typ. a kafka). Journal messages are simple dicts with 2 keys: id (the sharding key) and value (a serialiazable version of the argument of the xxx_add() method). Use the 'kafka' pytest marker for all kafka-related tests (especially used for tox, see tox.ini).
-
David Douard authored
and fix all occurrences of the typo.
-
David Douard authored
make them all accept a Dict[Sha1Git, xxx] as argument, ie: - remove support for Iterable[bytes] in revision_add, and - replace Iterable[bytes] by Dict[Sha1Git, bytes] for location_add Currently, the sha1 of location path in location_add() is not really used by any backend, so the computation of said hashed is a waste of resource, but it makes the API of this interface much more consistent which will be helpful for coming features (like kafka journal).
-
- Oct 03, 2022
-
-
David Douard authored
move ingestion related code for the 3 "layers" in an algos/ submodule.
-
David Douard authored
- move everything (swh)archive related in a archive/ submodule - move everything provenance storage related in a storage/ submodule (which remains a not ideal name, may be confusing with the general 'storage == swh-storage' acceptance in swh) - rename rabbitmq's backend from api/ to storage/rabbitmq - spit interface.py in 3 parts (one for each part, ProvenanceInterface, ProvenanceStorageInterface and ArchiveInterface).
-
David Douard authored
so that one can easily run tests for the revision or origin layer only.
-
David Douard authored
-
David Douard authored
-
- Sep 08, 2022
-
-
vlorentz authored
'python' may be Python 2, according to https://peps.python.org/pep-0394/#for-python-runtime-distributors
-
David Douard authored
This allows us to ignore git bombs and other suspiciously large repos.
-
- Sep 01, 2022
-
-
David Douard authored
replace the (deprecated) HTTP RCP API to access the swh-graph service, in favor of the grpc server. To be able to test the (now) grpc-based ArchiveGraph, compressed graph datasets for all 3 common datasets (cmdbts2, out-of-order and with-merges) have been generated and included in this revision.
-
David Douard authored
this will be needed for testing grpc swh-graph archive backend.
-
- Aug 30, 2022
-
-
David Douard authored
-
- Aug 12, 2022
-
-
Nicolas Dandrimont authored
When the visit_edges response is empty, swh.graph.client generates an empty tuple, which can't be unpacked. Work around the issue.
-
Nicolas Dandrimont authored
We're always passing the provenance-internal object types, not those of swh.storage.
-
Nicolas Dandrimont authored
Replace `revision_get_parents` with `revision_get_some_outbound_edges`, which can optionally retrieve more levels of history than just a single one. This allows us to do way fewer queries on the swh.graph or swh.storage backend if the revision exists there. The swh.storage backend does limited recursion, so we still process the origin in multiple steps to fetch the whole history.
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
The context manager for the provenance storage rabbitmq client doesn't like being used multiple times over the lifetime of a process. Only use it once in the cli of the journal client.
-
Nicolas Dandrimont authored
Instead of flushing if any entry is over the threshold, flush when the cumulative count goes over.
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
The incremental copy of the archive to mmca is not atomic: the directory table needs to be copied first, then the directory_entry_* tables need to be updated. This means that the client can view inconsistent entries, where the directory has been synced but not all the entry rows. We return an empty list when one of these bogus entries is detected. This allows smooth fallback to the main database through the multiplexer.
-
Nicolas Dandrimont authored
The partial copy of the archive on mmca doesn't have them anyway.
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
The retry logic is not very refined, extending the timeouts makes more sense.
-
Nicolas Dandrimont authored
This is not quite working but it seems to reduce issues on worker termination a bit.
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-