Skip to content
Snippets Groups Projects

Add a backfiller cli command

Closed David Douard requested to merge generated-differential-D8964-source into master
3 unresolved threads

This command allowd to backfill a kafka journal from an existing Postgresql provenance storage.

The command will run a given number of workers in parallel. The state of the backfilling process is saved in a leveldb store, so interrupting and restarting a backfilling process is possible, with limitations: it won't work properly if the range generation is modified.


Migrated from D8964 (view on Phabricator)

Merge request reports

Approval is optional

Closed by Nicolas DandrimontNicolas Dandrimont 2 years ago (Jan 12, 2023 1:31pm UTC)

Merge details

  • The changes were not merged into master.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build is green

    Patch application report for D8964 (id=32301)

    Rebasing onto c626cc21...

    Current branch diff-target is up to date.
    Changes applied before test
    commit a98494dafd797ae6c1fb8e509d53fa17afa07374
    Author: David Douard <david.douard@sdfa3.org>
    Date:   Fri Dec 16 15:09:11 2022 +0100
    
        Add a backfiller cli command
        
        This command allowd to backfill a kafka journal from an existing
        Postgresql provenance storage.
        
        The command will run a given number of workers in parallel. The state of
        the backfilling process is saved in a leveldb store, so interrupting and
        restarting a backfilling process is possible, with limitations: it won't
        work properly if the range generation is modified.
    
    commit 0b9df1a11c798767beacf09dfed6179ddc593419
    Author: David Douard <david.douard@sdfa3.org>
    Date:   Fri Dec 9 15:06:16 2022 +0100
    
        Extract the journal writer part from the ProvenanceStorageJournal class
        
        This allows to use the journal writing part independently from the
        ProvenanceStorage proxy class, eg. for the backfiller mechanism.

    See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/711/ for more details.

  • vlorentz
    vlorentz @vlorentz started a thread on the diff
191 logger.debug("Exiting producer")
192 return
193 except StopIteration:
194 logger.info("range generator for %s is over, removing", etype)
195
196 def backfill_worker(self, queue, exiting):
197 logger.info("Spawning backfiller worker %s", threading.current_thread().name)
198 while not exiting.is_set():
199 etype, start, stop = queue.get()
200 logger.info("backfill %s [%s, %s)", etype, start.hex(), stop.hex())
201 backfiller = getattr(self, f"backfill_{etype}")
202 try:
203 n = backfiller(start=start, stop=stop)
204 except Exception as e:
205 print("argh", etype, e, repr(e))
206 raise
  • You might want to remove your calls to logger.error() and print() before re-raising

  • vlorentz
    vlorentz @vlorentz started a thread on the diff
  • 180 etype, range_gen = range_gens.pop(0)
    181 try:
    182 start, stop = next(range_gen)
    183 range_gens.append((etype, range_gen))
    184 logger.debug(
    185 "Adding range %s: [%s, %s)", etype, start.hex(), stop.hex()
    186 )
    187 try:
    188 queue.put((etype, start, stop), timeout=1)
    189 except Full:
    190 if exiting.is_set():
    191 logger.debug("Exiting producer")
    192 return
    193 except StopIteration:
    194 logger.info("range generator for %s is over, removing", etype)
    195
  • Apply @vlorentz's comments

  • Build is green

    Patch application report for D8964 (id=32334)

    Could not rebase; Attempt merge onto c626cc21...

    Updating c626cc2..e66a2bf
    Fast-forward
     mypy.ini                                           |   3 +
     requirements.txt                                   |   1 +
     swh/provenance/cli.py                              |  91 +++++-
     swh/provenance/storage/backfill.py                 | 344 +++++++++++++++++++++
     swh/provenance/storage/journal.py                  | 104 ++++---
     .../tests/test_provenance_journal_writer.py        |  64 ++--
     6 files changed, 539 insertions(+), 68 deletions(-)
     create mode 100644 swh/provenance/storage/backfill.py
    Changes applied before test
    commit e66a2bf98615a59ffbea30f1269c364bdf4db57e
    Author: David Douard <david.douard@sdfa3.org>
    Date:   Fri Dec 16 15:09:11 2022 +0100
    
        Add a backfiller cli command
        
        This command allowd to backfill a kafka journal from an existing
        Postgresql provenance storage.
        
        The command will run a given number of workers in parallel. The state of
        the backfilling process is saved in a leveldb store, so interrupting and
        restarting a backfilling process is possible, with limitations: it won't
        work properly if the range generation is modified.
    
    commit 0b9df1a11c798767beacf09dfed6179ddc593419
    Author: David Douard <david.douard@sdfa3.org>
    Date:   Fri Dec 9 15:06:16 2022 +0100
    
        Extract the journal writer part from the ProvenanceStorageJournal class
        
        This allows to use the journal writing part independently from the
        ProvenanceStorage proxy class, eg. for the backfiller mechanism.

    See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/712/ for more details.

  • merged via e66a2bf9

  • Please register or sign in to reply
    Loading