Make model objects immutable
The problem
Currently, model objects, as defined in swh.model.model are declared using attrs as frozen
, typically:
@attr.s(frozen=True)
class Person(BaseModel):
"""Represents the author/committer of a revision or release."""
fullname = attr.ib(type=bytes, validator=type_validator())
name = attr.ib(type=Optional[bytes], validator=type_validator())
email = attr.ib(type=Optional[bytes], validator=type_validator())
However, some object models have mutable attributes, which fails the immutable property expected when using frozen entity types.
Besides the inconsistency that this situation provoke, it also prevents these model objects from being hashable, thus from being put in structures likes sets or dicts.
And with the current move to "use swh.model.model
model objects everywhere", it //would be nice// to have consistent core model scaffolding.
Affected model classes are:
-
Revision
because of theparents
attribute (List) and themetadata
attribute (Dict), -
Directory
because of theentries
attribute (List), -
OriginVisit
because of themetadata
attribute (Dict), -
OriginVisitStatus
because of themetadata
attribute (Dict), -
Snapshot
because of thebranches
attribute (Dict), -
Release
because of themetadata
attribute (Dict).
Proposed fix
-
Replace
List
attribute types byTuple
ones (namelyRevision.parents
andDirectory.entries
). An implementation of this idea is proposed is !113 (closed). -
Replace
Snapshot.branches
by aTuple
. It is currently a dict which keys are branch names ({branch_name: branch}
). So it would require to add a newname
attribute to theSnapshotBranch
class and store them in aSnapshot
as a tuple. -
Replace
metadata
attributes by either a json-encoded string or aTuple[Tuple[str, str]]
(or similar, as long as it remains a hashable structure).
Impacts
Tuple
instead of List
The first point should have very limited impact on other swh packages, as long as the .from_dict()
method is nice enough to convert the list into a tuple on the fly.
Snapshot.branches
as a Tuple
Second point (Snapshot.branches
as a tuple) has possible impact on almost everything (namely on swh.deposit
, swh.indexer
, swh.loader.core
, swh.loader.git
, swh.loader.mercurial
, swh.loader.svn
, swh.storage
, swh.web.client
and swh.web
).
However, when grepping accesses to the .branches
attribute, we have only a few hits:
~/swh/swh-environment$ pygrep "[.]branches" swh-*/swh
swh-loader-core/swh/loader/package/loader.py: for rev in snapshot.branches.values()
swh-loader-git/swh/loader/git/loader.py: for branch in base_snapshot.branches.values():
swh-loader-git/swh/loader/git/loader.py: eventful = bool(self.snapshot.branches)
swh-loader-git/swh/loader/git/from_disk.py: eventful = bool(self.snapshot.branches)
swh-loader-mercurial/swh/loader/mercurial/loader.py: self.branches = {}
swh-model/swh/model/tests/test_hypothesis_strategies.py: branches = snapshot.branches
swh-model/swh/model/tests/test_hypothesis_strategies.py: assert len(snapshot.branches) == _min_snp_size
swh-storage/swh/storage/in_memory.py: sorted_branch_names = sorted(snapshot.branches)
swh-storage/swh/storage/in_memory.py: for branch in snapshot.branches.values()
swh-storage/swh/storage/in_memory.py: branch = snapshot.branches[branch_name]
swh-storage/swh/storage/in_memory.py: branch_name: snapshot.branches[branch_name]
swh-storage/swh/storage/storage.py: for name, info in snapshot.branches.items()
swh-storage/swh/storage/cassandra/storage.py: for (branch_name, branch) in snapshot.branches.items():
The only piece of code really using the dict feature here is swh.storage.in_memory
.
So it could be possible to make this modification without too much trouble for other swh packages, by using a bw-compatible .from_dict()
method.
<cls>.metadata
as a str
(json encoded MD structure)
//TODO//
Migrated from T2421 (view on Phabricator)