vlorentzchanged title from Allow querying raw_extrinsic_metadata by hash in swh.storage.postgresql to Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql
changed title from Allow querying raw_extrinsic_metadata by hash in swh.storage.postgresql to Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql
After a lot of back and forth, and the release of swh.model v2.3.0 and swh.storage v0.26.0, this is now all done and deployed in staging and production.
We went the following way for the migration:
make swh.model write id fields in the journal
deploy swh.storage with the new swh.model (so all writes happen with the new model)
run swh storage backfill on the raw_extrinsic_metadata topic to fill the journal with objects using the new model
(make sure the journal gets compacted to remove old versions of the object, with a combination of topic.retention.ms and having to run the backfill multiple times for it to work on all the real-world data)
run swh storage replay on raw_extrinsic_metadata, using a fork of //swh.storage// that wrote objects to a new table (using the new schema)
once the replayer caught up, run some queries to spot check that all the data got properly migrated
once validated, stop the workers; stop replayer; deploy new version of //swh.storage// with new schema, move the new table in place of the old one (and take care of logical replication); then restart the workers