Skip to content
Snippets Groups Projects

Add a ORC file loading function

Open David Douard requested to merge generated-differential-D7520-source into master
3 unresolved threads

this generates swh.model objects from ORC files. This should allow to rebuild a storage from an ORC dataset of the archive.

Note: not all object types are supported for now (eg. ExtID, metadata related objects, etc. are not yet supported).

Depends on !51 (closed)


Migrated from D7520 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
1 from collections import defaultdict
  • vlorentz
    vlorentz @vlorentz started a thread on the diff
  • 17 swhmodel.OriginVisit,
    18 swhmodel.OriginVisitStatus,
    19 swhmodel.Snapshot,
    20 swhmodel.SnapshotBranch,
    21 swhmodel.Release,
    22 swhmodel.Revision,
    23 swhmodel.Directory,
    24 swhmodel.DirectoryEntry,
    25 swhmodel.Content,
    26 swhmodel.SkippedContent,
    27 swhmodel.MetadataAuthority,
    28 swhmodel.MetadataFetcher,
    29 swhmodel.RawExtrinsicMetadata,
    30 swhmodel.ExtID,
    31 )
    32 }
  • vlorentz
    vlorentz @vlorentz started a thread on the diff
  • 33
    34
    35 # basic utility functions
    36
    37
    38 def hash_to_bytes_or_none(hash: Optional[str]) -> Optional[bytes]:
    39 return hash_to_bytes(hash) if hash is not None else None
    40
    41
    42 def orc_to_swh_date(d, prefix):
    43 timestamp = d.pop(f"{prefix}")
    44 offset_bytes = d.pop(f"{prefix}_raw_offset_bytes")
    45 if prefix == "committer_date":
    46 d.pop("committer_offset")
    47 else:
    48 d.pop(f"{prefix}_offset")
  • Have you tried using swh.storage.postgresql.converters.db_to_*? It is very similar to these cvrt_* functions, so you'd only need to convert dates from ISO8601 strings to datetime

  • assigned to @douardda

  • Jenkins job DDATASET/gitlab-builds #57 failed .
    See Console Output and Coverage Report for more details.

  • Please register or sign in to reply
    Loading