Skip to content
Snippets Groups Projects

luigi: Add LocalExport task

1 unresolved thread

It allows other packages (eg. swh-graph) to depend on the presence of the local dataset, with a configurable way to obtain it if missing

Depends on !76 (closed).


Migrated from D8865 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build is green

    Patch application report for D8865 (id=31954)

    Could not rebase; Attempt merge onto 23853dbf...

    Updating 23853db..e4df585
    Fast-forward
     swh/dataset/luigi.py | 125 +++++++++++++++++++++++++++++++++++----------------
     1 file changed, 86 insertions(+), 39 deletions(-)
    Changes applied before test
    commit e4df585f8dd66aa3bca0be967ef79cd6fa8a7c0a
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Mon Nov 21 15:47:03 2022 +0100
    
        luigi: Add LocalExport task
        
        It allows other packages (eg. swh-graph) to depend on the presence of the local
        dataset, with a configurable way to obtain it if missing
    
    commit b39436e38be5fefe16c92d6553845cd113bafd14
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Mon Nov 21 15:42:33 2022 +0100
    
        luigi: Remove copies of stamp files to/from S3
        
        They are only useful while exporting the dataset -- after the export is
        finished, meta.json is good enough and stamp files only save a couple
        of minutes when only some objects types are needed (ie. never in practice)

    See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/166/ for more details.

489 class LocalExport(luigi.Task):
490 """Task that depends on a local dataset being present -- either directly from
491 :class:`ExportGraph` or via :class:`DownloadFromS3`.
492 """
493
494 local_export_path = PathParameter(is_dir=True)
495 formats = luigi.EnumListParameter(enum=Format, batch_method=merge_lists)
496 object_types = luigi.EnumListParameter(
497 enum=ObjectType, default=list(ObjectType), batch_method=merge_lists
498 )
499 export_task_type = luigi.TaskParameter(
500 default=DownloadFromS3,
501 significant=False,
502 description="""The task used to get the dataset if it is not present.
503 Should be either ``swh.dataset.luigi.ExportGraph`` or
504 ``swh.dataset.luigi.DownloadFromS3``.""",
  • Antoine R. Dumont mentioned in merge request !78 (closed)

    mentioned in merge request !78 (closed)

  • lgtm but fix the typo in the docstring cli parameter ;)

  • Merge request was accepted

  • Antoine R. Dumont approved this merge request

    approved this merge request

  • Author Maintainer

    fix typo

  • Author Maintainer

    Merge request was merged

  • closed

  • Build is green

    Patch application report for D8865 (id=32024)

    Could not rebase; Attempt merge onto 23853dbf...

    Updating 23853db..0bf9c88
    Fast-forward
     swh/dataset/luigi.py | 125 +++++++++++++++++++++++++++++++++++----------------
     1 file changed, 86 insertions(+), 39 deletions(-)
    Changes applied before test
    commit 0bf9c88d9604184b55735541b890797a890a9182
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Mon Nov 21 15:47:03 2022 +0100
    
        luigi: Add LocalExport task
        
        It allows other packages (eg. swh-graph) to depend on the presence of the local
        dataset, with a configurable way to obtain it if missing
    
    commit b39436e38be5fefe16c92d6553845cd113bafd14
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Mon Nov 21 15:42:33 2022 +0100
    
        luigi: Remove copies of stamp files to/from S3
        
        They are only useful while exporting the dataset -- after the export is
        finished, meta.json is good enough and stamp files only save a couple
        of minutes when only some objects types are needed (ie. never in practice)

    See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/168/ for more details.

  • Please register or sign in to reply
    Loading