Skip to content
Snippets Groups Projects
  1. May 16, 2023
  2. Apr 13, 2023
  3. Apr 07, 2023
  4. Feb 16, 2023
  5. Dec 20, 2022
    • David Douard's avatar
      Add a backfiller cli command · e66a2bf9
      David Douard authored and Nicolas Dandrimont's avatar Nicolas Dandrimont committed
      This command allowd to backfill a kafka journal from an existing
      Postgresql provenance storage.
      
      The command will run a given number of workers in parallel. The state of
      the backfilling process is saved in a leveldb store, so interrupting and
      restarting a backfilling process is possible, with limitations: it won't
      work properly if the range generation is modified.
      e66a2bf9
  6. Dec 09, 2022
  7. Nov 29, 2022
  8. Nov 23, 2022
  9. Nov 02, 2022
  10. Oct 18, 2022
  11. Oct 13, 2022
  12. Oct 12, 2022
  13. Oct 11, 2022
    • David Douard's avatar
      Add support for kafka journalization of the ProvenanceStorageInterface · 08f2e604
      David Douard authored
      the new ProvenanceStorageJournal is a proxy ProvenanceStorageInterface
      that will push added objects in a swh-journal (typ. a kafka).
      
      Journal messages are simple dicts with 2 keys: id (the sharding key) and
      value (a serialiazable version of the argument of the xxx_add() method).
      
      Use the 'kafka' pytest marker for all kafka-related tests (especially
      used for tox, see tox.ini).
      08f2e604
    • David Douard's avatar
      Rename ProvenanceInterface.directory_xxx_flattenned as directory_xxx_flattened · 7e6a62c9
      David Douard authored
      and fix all occurrences of the typo.
      7e6a62c9
    • David Douard's avatar
      Normalize _add() methods of the ProvenanceStorage interface · 2bd74fc7
      David Douard authored
      make them all accept a Dict[Sha1Git, xxx] as argument, ie:
      
      - remove support for Iterable[bytes] in revision_add, and
      - replace Iterable[bytes] by Dict[Sha1Git, bytes] for location_add
      
      Currently, the sha1 of location path in location_add() is not really
      used by any backend, so the computation of said hashed is a waste of
      resource, but it makes the API of this interface much more consistent
      which will be helpful for coming features (like kafka journal).
      2bd74fc7
  14. Oct 03, 2022
    • David Douard's avatar
      More core reorganization · 6f4a193e
      David Douard authored
      move ingestion related code for the 3 "layers" in an algos/ submodule.
      6f4a193e
    • David Douard's avatar
      Reorganize the code · 7c882f57
      David Douard authored
      - move everything (swh)archive related in a archive/ submodule
      - move everything provenance storage related in a storage/ submodule
        (which remains a not ideal name, may be confusing with the general
        'storage == swh-storage' acceptance in swh)
      - rename rabbitmq's backend from api/ to storage/rabbitmq
      - spit interface.py in 3 parts (one for each part, ProvenanceInterface,
        ProvenanceStorageInterface and ArchiveInterface).
      7c882f57
Loading