Snippets Groups Projects

Normalize and improve content HashDict/CompositeObjId definitions in swh.model

When dealing with content identifiers, we currently use several types/definitions to represent this multi-hash identifier:

in swh.objstorage are defined CompositeObjId (a non-total TypedDict) and ObjId (as a Union[bytes, CompositeObjId])
in swh.storage are defined HashDict (exacte same definition as the CompositeObjId) and TotalHashDict (HashDict for which total is True; all keys are mandatory)
in swh.model, BaseContent, Content and SkippedContent produce simple dicts (untyped) that should legitimately be HashDict/CompositeObjId
in swh.storage, a few endpoints of the Storage are not consistent with regard to the way content are identified:
- content_missing returns a list of bytes (hashes, the hash algo can be specified by the key_hash argument; defaults to sha1)
- content_get_random returns a single bytes (sha1_git)
- skipped_content_missing takes a List[Dict[str, Any]] and returns a list of dicts (currently annoted as Dict[str, Any] but it seems to actually be a Dict[str, Optional[bytes])

The idea is to start from a common definition of a content identifier (in swh.model) and then move towards fixing these discrepancies listed above.

That would add a dependency between swh.objstorage and swh.model, which seems acceptable.

Designs

Child items ...

Activity

David Douard changed milestone to %MRO 2023 2 years ago

changed milestone to %MRO 2023
David Douard mentioned in issue swh-storage#4678 2 years ago

mentioned in issue swh-storage#4678
David Douard mentioned in merge request !331 2 years ago

mentioned in merge request !331
David Douard mentioned in merge request swh-objstorage!148 (closed) 2 years ago

mentioned in merge request swh-objstorage!148 (closed)

Please register or sign in to reply