Skip to content

add support for reverse lookup from swh:1:ori:... PIDs to origin URLs

Now that we have defined an intrinsic PID schema for origins and support for it in both swh identify and swh-graph (as graph roots), we need a way to reverse lookup from origin PIDs to origin URLs.

As I understand it that means:

  • adding a column to the origin table for the origin checksum (either as a PID or, more consistently with the rest of the SQL schema, as a SHA1 checksum)
  • patch the storage functions that create new origins to also fill the SHA1 column
  • add a storage function to perform the SHA1→URL lookup

For the transition we will need to:

  1. initially mark the SHA1 column as NULL-able
  2. deploy in production a storage version that fills the SHA1 for //new// origins
  3. perform a one off conversion of all old origins that have NULL SHA1s
  4. mark the SHA1 column as non NULL-able (and add a B-tree index on it)

Migrated from T2045 (view on Phabricator)

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information