Skip to content
Snippets Groups Projects
  1. Mar 28, 2024
    • vlorentz's avatar
      Introduce patching proxy · 07e28c2a
      vlorentz authored
      
      This is meant to replace the current postgresql-specific handling of
      display names, in a way that can be used with other backends (ie.
      Cassandra).
      
      This is implemented using a PostgreSQL database, like the masking proxy.
      
      Co-Authored-By: default avatarDavid Douard <david.douard@sdfa3.org>
      07e28c2a
  2. Mar 26, 2024
  3. Mar 25, 2024
    • Nicolas Dandrimont's avatar
      Introduce the object masking proxy storage · 62160a37
      Nicolas Dandrimont authored
      This new masking proxy storage intercepts all information retrieval from
      the underlying storage, and matches the SWHIDs of returned objects to
      the contents of the masking database.
      
      For simplicity, when any of the returned objects matches the masking
      database, a non-retryable MaskedObjectException is raised, with a dict
      mapping the masked SWHIDs to information about the masking request,
      including an opaque id and a masking state (temporary or permanent). It
      is up to the client to process this exception to display the information
      in a useful manner. If necessary, a client fetching a batch of objects
      including some masked and non-masked ones could extract the ids of the
      masked objects and retry for the non-masked objects as well. If this
      usage becomes prevalent, it could be implemented as one more proxy.
      
      When an object's SWHID (or a list thereof) is passed as argument to the
      storage function, we first call the underlying function to check the
      object for existence, before we attempt to match the object with the
      masking database. This avoids leaking information out of the masking
      database until it's absolutely needed, avoiding potential issues after a
      content removal has been processed.
      
      For now, our implementation does not consider that the SWHID of masked
      objects itself needs to be masked. For instance, an unmasked Directory
      containing masked Contents will still allow being listed. Only accessing
      the data of the masked Content object itself would raise a
      MaskedObjectException. This choice was made to limit the impact of
      masked objects in the overall archive navigation experience.
      62160a37
    • Nicolas Dandrimont's avatar
      Introduce a PostgreSQL storage schema and API for object masking information · 53d67fce
      Nicolas Dandrimont authored
      This is a simple database of the SWHIDs of objects for which we have
      made a policy decision to restrict the diffusion without removing them
      from the archive, and a lightweight history structure for the associated
      object masking requests.
      
      Doing this as an overlay, instead of modifying the storage schema for
      all objects, allows us to start better separating the concerns of
      archival of origins (which necessitates a full view of all the
      unmodified objects that are stored in the archive), with the concerns
      about the dissemination of said archived objects.
      
      To avoid interfering with archival, the masking policy will only be
      applied for full object retrieval and implemented as a new proxy
      storage, which will be placed in front of all public-facing storages.
      53d67fce
    • Nicolas Dandrimont's avatar
      retry: introduce a specific exception class for non-retryable exceptions · c57b5501
      Nicolas Dandrimont authored
      This new type will be used for non-retryable exceptions that will not be
      storage argument exceptions.
      c57b5501
    • Nicolas Dandrimont's avatar
      storage_tests: run test_types on the currently tested storage, not its backend · 8ccfc06b
      Nicolas Dandrimont authored
      The intent behind test_types is to test the signature of wrapped
      storages, to check that they match that of the StorageInterface
      Protocol. However, the way the test was refactored ended up testing the
      storage being *wrapped* by the storage under test, masking a few
      inconsistencies in the way storages are being wrapped.
      
      Unfortunately this breaks the tenacious proxy's test in an inscrutable
      way (even when `functools.wraps`ing the return values of its
      `__getattr__` function).
      8ccfc06b
  4. Mar 22, 2024
  5. Mar 21, 2024
  6. Mar 11, 2024
  7. Mar 05, 2024
  8. Feb 13, 2024
  9. Feb 09, 2024
  10. Feb 06, 2024
  11. Feb 02, 2024
  12. Jan 17, 2024
  13. Dec 11, 2023
  14. Dec 05, 2023
  15. Dec 04, 2023
  16. Dec 03, 2023
  17. Nov 29, 2023
  18. Nov 25, 2023
  19. Nov 24, 2023
  20. Nov 16, 2023
  21. Nov 07, 2023
    • Jérémy Bobbio (Lunar)'s avatar
      Add command line tool to remove old object reference partition tables · 0d5de08d
      Jérémy Bobbio (Lunar) authored
      `swh storage remove-old-object-reference-partitions 2023-09-01` can
      be used to remove all partition tables for weeks before the given date.
      
      By default, this will print the weeks for which tables would be dropped
      and ask for a confirmation. The `--force` option will just proceed
      directly.
      
      The command will simply refuse to drop all partitions as it most
      probably an error.
      0d5de08d
    • Jérémy Bobbio (Lunar)'s avatar
      Add db.object_references_list_partitions() for PostgreSQL · a33a702e
      Jérémy Bobbio (Lunar) authored
      In order to be able to remove older partitions, we want to list those
      who actually exist. `object_references_list_partition()` will return
      a list of ObjectReferencesPartition, a new dataclass describing
      partitions.
      
      This new method replace `get_object_references_partition_bounds()`
      that was only available in tests.
      a33a702e
Loading