Normalize and improve content HashDict/CompositeObjId definitions in swh.model
When dealing with content identifiers, we currently use several types/definitions to represent this multi-hash identifier:
- in swh.objstorage are defined
CompositeObjId
(a non-total TypedDict) andObjId
(as aUnion[bytes, CompositeObjId]
) - in swh.storage are defined
HashDict
(exacte same definition as theCompositeObjId
) andTotalHashDict
(HashDict
for whichtotal
is True; all keys are mandatory) - in swh.model, BaseContent, Content and SkippedContent produce simple dicts (untyped) that should legitimately be
HashDict
/CompositeObjId
- in swh.storage, a few endpoints of the Storage are not consistent with regard to the way content are identified:
-
content_missing
returns a list of bytes (hashes, the hash algo can be specified by thekey_hash
argument; defaults to sha1) -
content_get_random
returns a single bytes (sha1_git) -
skipped_content_missing
takes aList[Dict[str, Any]]
and returns a list of dicts (currently annoted asDict[str, Any]
but it seems to actually be aDict[str, Optional[bytes]
)
-
The idea is to start from a common definition of a content identifier (in swh.model) and then move towards fixing these discrepancies listed above.
That would add a dependency between swh.objstorage
and swh.model
, which seems acceptable.