- Mar 28, 2024
-
-
vlorentz authored
This is meant to replace the current postgresql-specific handling of display names, in a way that can be used with other backends (ie. Cassandra). This is implemented using a PostgreSQL database, like the masking proxy. Co-Authored-By:
David Douard <david.douard@sdfa3.org>
-
- Mar 26, 2024
-
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
Instead of generating a full Content object with the four hashes, which is more expensive.
-
Nicolas Dandrimont authored
We have to explicitly test this codepath as the masking proxy checks the computed swhid of the returned content, so we might as well tell pytest that we know that the call is deprecated.
-
Nicolas Dandrimont authored
To avoid calling __getattr__ multiple times on the same method, we can just use setattr to cache the built method for further calls. This avoids caveats of the LRU cache on instance methods (which can make garbage-collecting difficult).
-
Nicolas Dandrimont authored
The target argument for raw_extrinsic_metadata_get_authorities is an ExtendedSWHID, there's no need to extend it again.
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
Passing it a CoreSWHID works most of the time as the value is stringified before accessing databases, but doesn't work when using ExtendedSWHID methods/attributes directly, for instance in a proxy.
-
Nicolas Dandrimont authored
The original implementation masked the snapshots targeted by visits if they were masked, when we really want to mask results only if the queried origin itself is masked.
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
vlorentz authored
-
vlorentz authored
-
- Mar 25, 2024
-
-
Nicolas Dandrimont authored
This new masking proxy storage intercepts all information retrieval from the underlying storage, and matches the SWHIDs of returned objects to the contents of the masking database. For simplicity, when any of the returned objects matches the masking database, a non-retryable MaskedObjectException is raised, with a dict mapping the masked SWHIDs to information about the masking request, including an opaque id and a masking state (temporary or permanent). It is up to the client to process this exception to display the information in a useful manner. If necessary, a client fetching a batch of objects including some masked and non-masked ones could extract the ids of the masked objects and retry for the non-masked objects as well. If this usage becomes prevalent, it could be implemented as one more proxy. When an object's SWHID (or a list thereof) is passed as argument to the storage function, we first call the underlying function to check the object for existence, before we attempt to match the object with the masking database. This avoids leaking information out of the masking database until it's absolutely needed, avoiding potential issues after a content removal has been processed. For now, our implementation does not consider that the SWHID of masked objects itself needs to be masked. For instance, an unmasked Directory containing masked Contents will still allow being listed. Only accessing the data of the masked Content object itself would raise a MaskedObjectException. This choice was made to limit the impact of masked objects in the overall archive navigation experience.
-
Nicolas Dandrimont authored
This is a simple database of the SWHIDs of objects for which we have made a policy decision to restrict the diffusion without removing them from the archive, and a lightweight history structure for the associated object masking requests. Doing this as an overlay, instead of modifying the storage schema for all objects, allows us to start better separating the concerns of archival of origins (which necessitates a full view of all the unmodified objects that are stored in the archive), with the concerns about the dissemination of said archived objects. To avoid interfering with archival, the masking policy will only be applied for full object retrieval and implemented as a new proxy storage, which will be placed in front of all public-facing storages.
-
Nicolas Dandrimont authored
This new type will be used for non-retryable exceptions that will not be storage argument exceptions.
-
Nicolas Dandrimont authored
The intent behind test_types is to test the signature of wrapped storages, to check that they match that of the StorageInterface Protocol. However, the way the test was refactored ended up testing the storage being *wrapped* by the storage under test, masking a few inconsistencies in the way storages are being wrapped. Unfortunately this breaks the tenacious proxy's test in an inscrutable way (even when `functools.wraps`ing the return values of its `__getattr__` function).
-
- Mar 22, 2024
-
-
Nicolas Dandrimont authored
-
Antoine R. Dumont authored
This reverts commit 74caf618. Refs. swh/infra/sysadm-environment#5291
-
- Mar 21, 2024
-
-
Antoine R. Dumont authored
But still retrieve empty entries as None in the model when that makes sense. This should not disturb the current api calls when reading from cassandra. Refs. swh/infra/sysadm-environment#5287
-
Antoine R. Dumont authored
-
- Mar 11, 2024
-
-
Nicolas Dandrimont authored
All the data has been migrated, this fallback can now be removed. Ref. swh/infra/sysadm-environment#2564
-
- Mar 05, 2024
-
-
- Feb 13, 2024
-
-
Antoine Lambert authored
It enables to filter on a specific visit type when searching a visit by date. Related to swh-web#4786.
-
- Feb 09, 2024
-
-
Antoine Lambert authored
-
- Feb 06, 2024
-
-
Antoine Lambert authored
Related to swh/meta#5075.
-
- Feb 02, 2024
-
-
Nicolas Dandrimont authored
swh.storage doesn't actually declare the dependency to pytest-postgresql, it comes through swh.core[testing].
-
- Jan 17, 2024
-
-
David Douard authored
Where async usage got dropped from the discovery protocol.
-
- Dec 11, 2023
-
-
David Douard authored
-
- Dec 05, 2023
-
-
David Douard authored
-
Antoine Lambert authored
-
- Dec 04, 2023
-
-
David Douard authored
and replace comment type annotations by explicit ones.
-
- Dec 03, 2023
-
- Nov 29, 2023
-
-
David Douard authored
-
David Douard authored
-
- Nov 25, 2023
-
- Nov 24, 2023
-
-
Antoine Lambert authored
When raising a QueryTimeout exception, forward the arguments of the caught psycopg2 QueryCanceled exception to it.
-
- Nov 16, 2023
-
-
David Douard authored
Convert README from markdown to ReST to make it embeddable in docs/index.rst
-
- Nov 07, 2023
-
-
Jérémy Bobbio (Lunar) authored
`swh storage remove-old-object-reference-partitions 2023-09-01` can be used to remove all partition tables for weeks before the given date. By default, this will print the weeks for which tables would be dropped and ask for a confirmation. The `--force` option will just proceed directly. The command will simply refuse to drop all partitions as it most probably an error.
-
Jérémy Bobbio (Lunar) authored
In order to be able to remove older partitions, we want to list those who actually exist. `object_references_list_partition()` will return a list of ObjectReferencesPartition, a new dataclass describing partitions. This new method replace `get_object_references_partition_bounds()` that was only available in tests.
-