MR co-authored with @vlorentz
The two main commits for this change are:
This is a simple database of the SWHIDs of objects for which we have made a policy decision to restrict the diffusion without removing them from the archive, and a lightweight history structure for the associated object masking requests.
Doing this as an overlay, instead of modifying the storage schema for all objects, allows us to start better separating the concerns of archival of origins (which necessitates a full view of all the unmodified objects that are stored in the archive), with the concerns about the dissemination of said archived objects.
To avoid interfering with archival, the masking policy will only be applied for full object retrieval and implemented as a new proxy storage, which will be placed in front of all public-facing storages.
This new masking proxy storage intercepts all information retrieval from the underlying storage, and matches the SWHIDs of returned objects to the contents of the masking database.
For simplicity, when any of the returned objects matches the masking database, a non-retryable MaskedObjectException is raised, with a dict mapping the masked SWHIDs to information about the masking request, including an opaque id and a masking state (temporary or permanent). It is up to the client to process this exception to display the information in a useful manner. If necessary, a client fetching a batch of objects including some masked and non-masked ones could extract the ids of the masked objects and retry for the non-masked objects as well. If this usage becomes prevalent, it could be implemented as one more proxy.
When an object's SWHID (or a list thereof) is passed as argument to the storage function, we first call the underlying function to check the object for existence, before we attempt to match the object with the masking database. This avoids leaking information out of the masking database until it's absolutely needed, avoiding potential issues after a content removal has been processed.
For now, our implementation does not consider that the SWHID of masked objects itself needs to be masked. For instance, an unmasked Directory containing masked Contents will still allow being listed. Only accessing the data of the masked Content object itself would raise a MaskedObjectException. This choice was made to limit the impact of masked objects in the overall archive navigation experience.
__getattr__
lru cache with something smarter, at least...