Project 'swh/devel/swh-dataset' was moved to 'swh/devel/swh-export'. Please update any links and bookmarks that may still have the old path.
luigi: Add LocalExport task
Closed
luigi: Add LocalExport task
1 unresolved thread
Closed
requested to merge generated-differential-D8865-source into generated-differential-D8865-target
1 unresolved thread
It allows other packages (eg. swh-graph) to depend on the presence of the local dataset, with a configurable way to obtain it if missing
Depends on !76 (closed).
Migrated from D8865 (view on Phabricator)
Merge request reports
Activity
Filter activity
Build is green
Patch application report for D8865 (id=31954)
Could not rebase; Attempt merge onto 23853dbf...
Updating 23853db..e4df585 Fast-forward swh/dataset/luigi.py | 125 +++++++++++++++++++++++++++++++++++---------------- 1 file changed, 86 insertions(+), 39 deletions(-)
Changes applied before test
commit e4df585f8dd66aa3bca0be967ef79cd6fa8a7c0a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Nov 21 15:47:03 2022 +0100 luigi: Add LocalExport task It allows other packages (eg. swh-graph) to depend on the presence of the local dataset, with a configurable way to obtain it if missing commit b39436e38be5fefe16c92d6553845cd113bafd14 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Nov 21 15:42:33 2022 +0100 luigi: Remove copies of stamp files to/from S3 They are only useful while exporting the dataset -- after the export is finished, meta.json is good enough and stamp files only save a couple of minutes when only some objects types are needed (ie. never in practice)
See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/166/ for more details.
489 class LocalExport(luigi.Task): 490 """Task that depends on a local dataset being present -- either directly from 491 :class:`ExportGraph` or via :class:`DownloadFromS3`. 492 """ 493 494 local_export_path = PathParameter(is_dir=True) 495 formats = luigi.EnumListParameter(enum=Format, batch_method=merge_lists) 496 object_types = luigi.EnumListParameter( 497 enum=ObjectType, default=list(ObjectType), batch_method=merge_lists 498 ) 499 export_task_type = luigi.TaskParameter( 500 default=DownloadFromS3, 501 significant=False, 502 description="""The task used to get the dataset if it is not present. 503 Should be either ``swh.dataset.luigi.ExportGraph`` or 504 ``swh.dataset.luigi.DownloadFromS3``.""", mentioned in merge request !78 (closed)
Build is green
Patch application report for D8865 (id=32024)
Could not rebase; Attempt merge onto 23853dbf...
Updating 23853db..0bf9c88 Fast-forward swh/dataset/luigi.py | 125 +++++++++++++++++++++++++++++++++++---------------- 1 file changed, 86 insertions(+), 39 deletions(-)
Changes applied before test
commit 0bf9c88d9604184b55735541b890797a890a9182 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Nov 21 15:47:03 2022 +0100 luigi: Add LocalExport task It allows other packages (eg. swh-graph) to depend on the presence of the local dataset, with a configurable way to obtain it if missing commit b39436e38be5fefe16c92d6553845cd113bafd14 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Nov 21 15:42:33 2022 +0100 luigi: Remove copies of stamp files to/from S3 They are only useful while exporting the dataset -- after the export is finished, meta.json is good enough and stamp files only save a couple of minutes when only some objects types are needed (ie. never in practice)
See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/168/ for more details.
Please register or sign in to reply