Add a ORC file loading function
this generates swh.model objects from ORC files. This should allow to rebuild a storage from an ORC dataset of the archive.
Note: not all object types are supported for now (eg. ExtID, metadata related objects, etc. are not yet supported).
Depends on !51 (closed)
Migrated from D7520 (view on Phabricator)
Merge request reports
Activity
Build has FAILED
Patch application report for D7520 (id=27283)
Could not rebase; Attempt merge onto 9d97f0c0...
Updating 9d97f0c..0c0df2e Fast-forward swh/dataset/cli.py | 42 ++++--- swh/dataset/orc_loader.py | 254 ++++++++++++++++++++++++++++++++++++++ swh/dataset/test/test_orc_load.py | 26 ++++ 3 files changed, 303 insertions(+), 19 deletions(-) create mode 100644 swh/dataset/orc_loader.py create mode 100644 swh/dataset/test/test_orc_load.py
Changes applied before test
commit 0c0df2e01c76aed77a662d7f22481af3d3da0c89 Author: David Douard <david.douard@sdfa3.org> Date: Fri Mar 18 14:16:39 2022 +0100 Add a ORC file loading function this generates swh.model objects from ORC files. This should allow to rebuild a storage from an ORC dataset of the archive. Note: not all object types are supported for now (eg. ExtID, metadata related objects, etc. are not yet supported). commit 18325cc8e78e99ac35f550687e41b6f21c5d3a9f Author: David Douard <david.douard@sdfa3.org> Date: Wed Apr 6 16:16:09 2022 +0200 Reduce cli's loading time by moving import statements in commands
Link to build: https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/125/ See console output for more information: https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/125/console
Build has FAILED
Patch application report for D7520 (id=27320)
Rebasing onto 18325cc8...
Current branch diff-target is up to date.
Changes applied before test
commit 8be63a1a7d7430794b2e4e31aa6f8af50a074dd4 Author: David Douard <david.douard@sdfa3.org> Date: Fri Mar 18 14:16:39 2022 +0100 Add a ORC file loading function this generates swh.model objects from ORC files. This should allow to rebuild a storage from an ORC dataset of the archive. Note: not all object types are supported for now (eg. ExtID, metadata related objects, etc. are not yet supported).
Link to build: https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/126/ See console output for more information: https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/126/console
Build is green
Patch application report for D7520 (id=27322)
Rebasing onto 18325cc8...
Current branch diff-target is up to date.
Changes applied before test
commit dbf1b87b0cb59a8c76a9928f1efdacd87abcf4ad Author: David Douard <david.douard@sdfa3.org> Date: Fri Mar 18 14:16:39 2022 +0100 Add a ORC file loading function this generates swh.model objects from ORC files. This should allow to rebuild a storage from an ORC dataset of the archive. Note: not all object types are supported for now (eg. ExtID, metadata related objects, etc. are not yet supported).
See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/127/ for more details.
- swh/dataset/orc_loader.py 0 → 100644
- swh/dataset/orc_loader.py 0 → 100644
17 swhmodel.OriginVisit, 18 swhmodel.OriginVisitStatus, 19 swhmodel.Snapshot, 20 swhmodel.SnapshotBranch, 21 swhmodel.Release, 22 swhmodel.Revision, 23 swhmodel.Directory, 24 swhmodel.DirectoryEntry, 25 swhmodel.Content, 26 swhmodel.SkippedContent, 27 swhmodel.MetadataAuthority, 28 swhmodel.MetadataFetcher, 29 swhmodel.RawExtrinsicMetadata, 30 swhmodel.ExtID, 31 ) 32 } - swh/dataset/orc_loader.py 0 → 100644
33 34 35 # basic utility functions 36 37 38 def hash_to_bytes_or_none(hash: Optional[str]) -> Optional[bytes]: 39 return hash_to_bytes(hash) if hash is not None else None 40 41 42 def orc_to_swh_date(d, prefix): 43 timestamp = d.pop(f"{prefix}") 44 offset_bytes = d.pop(f"{prefix}_raw_offset_bytes") 45 if prefix == "committer_date": 46 d.pop("committer_offset") 47 else: 48 d.pop(f"{prefix}_offset") assigned to @douardda
Jenkins job DDATASET/gitlab-builds #57 failed .
See Console Output and Coverage Report for more details.added mr-reviewed-fall-2023 label