- May 16, 2023
-
-
Nicolas Dandrimont authored
This should give us shorter transactions and avoid interfering with other workloads.
-
- Apr 13, 2023
-
-
David Douard authored
-
David Douard authored
this is required to prevent errors when dealing with weird/pathological tz (aka unsupported by psycopg, like delta over 24h etc.).
-
David Douard authored
For this, we add a new ArchiveInterface.revisions_get(). Note that ArchivePostgreSQL and ArchiveGraph backends have not an "optimized" version of the method yet (it may not be really necessary).
-
David Douard authored
-
- Apr 07, 2023
-
-
vlorentz authored
- Feb 16, 2023
-
-
Jérémy Bobbio (Lunar) authored
Related to swh/meta#4959
-
- Dec 20, 2022
-
-
This command allowd to backfill a kafka journal from an existing Postgresql provenance storage. The command will run a given number of workers in parallel. The state of the backfilling process is saved in a leveldb store, so interrupting and restarting a backfilling process is possible, with limitations: it won't work properly if the range generation is modified.
-
- Dec 09, 2022
-
-
David Douard authored
This allows to use the journal writing part independently from the ProvenanceStorage proxy class, eg. for the backfiller mechanism.
-
- Nov 29, 2022
-
-
Nicolas Dandrimont authored
This allows answering the earliest occurrence question in a single query instead of having to go through a join of the revision table.
-
Nicolas Dandrimont authored
It seems very unlikely that we'll ever use it considering the efficiency of zfs compression when running PostgreSQL on it.
-
Nicolas Dandrimont authored
For consistency with other modules
-
Nicolas Dandrimont authored
For consistency with other modules
-
- Nov 23, 2022
-
-
Nicolas Dandrimont authored
This avoids flushing the producer on every message, which will be pretty slow considering that we've increased the number of messages by a lot.
-
Nicolas Dandrimont authored
We forgot to do this change alongside the actual kafka schema change, oops.
-
David Douard authored
Change the keys for relations to allow topics to be compacted. Entity topics depend on NOT being compacted as outdated data can reach the topic when multiple writers process the same content. Update the replayer accordingly fixing a few behavioural bugs.
-
David Douard authored
-
Nicolas Dandrimont authored
It prevents us from using the journal proxy as well.
-
Nicolas Dandrimont authored
-
- Nov 02, 2022
-
-
Nicolas Dandrimont authored
We still need to pre-create the revision entities to avoid massive add/add conflicts when using the sharded rabbitmq storage backend. This reverts commit e1da37d4.
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
David Douard authored
-
David Douard authored
this speeds up tests execution significantly.
-
David Douard authored
start the grpc server directly instead of starting the HTTP rpc server, since this later is actually not used, and the url/port detection was possibly misleading.
-
- Oct 18, 2022
-
-
David Douard authored
and rename test_utils.py as utils.py to prevent confusion about this module not being a set of tests.
-
David Douard authored
-
David Douard authored
- pre-commit from 4.1.0 to 4.3.0, - codespell from 2.2.1 to 2.2.2, - black from 22.3.0 to 22.10.0 and - flake8 from 4.0.1 to 5.0.4. Also freeze flake8 dependencies. Also change flake8's repo config to github (the gitlab mirror being outdated).
-
- Oct 13, 2022
-
-
David Douard authored
-
David Douard authored
As usual when kafka is involved in tests, provided new tests are a bit slow...
-
David Douard authored
instead of depending on the proper behavior of the user of the ProvenanceStoragePostgresql user.
-
David Douard authored
it's not used, and keeping it makes code unnecessarly complex.
-
- Oct 12, 2022
-
-
David Douard authored
-
- Oct 11, 2022
-
-
David Douard authored
the new ProvenanceStorageJournal is a proxy ProvenanceStorageInterface that will push added objects in a swh-journal (typ. a kafka). Journal messages are simple dicts with 2 keys: id (the sharding key) and value (a serialiazable version of the argument of the xxx_add() method). Use the 'kafka' pytest marker for all kafka-related tests (especially used for tox, see tox.ini).
-
David Douard authored
and fix all occurrences of the typo.
-
David Douard authored
make them all accept a Dict[Sha1Git, xxx] as argument, ie: - remove support for Iterable[bytes] in revision_add, and - replace Iterable[bytes] by Dict[Sha1Git, bytes] for location_add Currently, the sha1 of location path in location_add() is not really used by any backend, so the computation of said hashed is a waste of resource, but it makes the API of this interface much more consistent which will be helpful for coming features (like kafka journal).
-
- Oct 03, 2022
-
-
David Douard authored
move ingestion related code for the 3 "layers" in an algos/ submodule.
-
David Douard authored
- move everything (swh)archive related in a archive/ submodule - move everything provenance storage related in a storage/ submodule (which remains a not ideal name, may be confusing with the general 'storage == swh-storage' acceptance in swh) - rename rabbitmq's backend from api/ to storage/rabbitmq - spit interface.py in 3 parts (one for each part, ProvenanceInterface, ProvenanceStorageInterface and ArchiveInterface).
-