Skip to content

regularly scrub all the data stores of swh

Make sure we have background jobs that regularly/constantly check data integrity in all the SWH data sources:

  • check hashes stored in the main postgresql storage (and replicas?)
  • check objects stored in kafka
  • check blob hashes for objects stored in all the objstorages (saam, azure, s3)

For example, doing mirroring tests, I found several blob objects in S3 that look to be corrupted (but original copies in the main objstorage are fine).


Migrated from T3841 (view on Phabricator)

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information