Skip to content

Migrate deposit SWHIDs (data) to the new specification

Migrate deposit SWHIDs (data) to the new specification

Migrate both "recent" and "old" format deposits [1] to the new specification.

That means the deposit swh_id* fields will be set to:

  • swh_id: directory SWHID (no context)
  • swh_id_context: directory SWHID (with context, origin, visit, anchor path)

Optionally, those 2 fields will be kept (for now) and realigned where it was not set ("old" deposits) to:

  • swh_anchor_id: revision SWHID (no context)
  • swh_anchor_id_context: revision SWHID (context with only origin)

It's expected some very "old" deposits won't be migrated as we cannot resolve those values. They will be rescheduled when it will be possible to do so (deploy [2]).

  • [1] "recent" format means all swh_id fields are set:
  • swh_id: directory SWHID (no context)
  • swh_id_context: directory SWHID (context with only origin)
  • swh_anchor_id: revision SWHID (no context)
  • swh_anchor_id_context: revision SWHID (context with only origin)

"old" format:

  • swh_id: revision SWHID (no context)

  • swh_id_context: not set

  • swh_anchor_id: not set

  • swh_anchor_id_context: not set

  • [2] Related to !70 (closed)

Related to #2398 (closed)

Test Plan

Dump out of production db restored in staging db. And run the migration scripts:

$ SWH_CONFIG_FILENAME=/etc/softwareheritage/deposit/server.yml django-admin migrate --settings=swh.deposit.settings.production --verbosity 3

"Recent" deposits

From

 id  | status |                       swh_id                       |                                                           swh_id_context
-----+--------+----------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------
 608 | done   | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9 | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9;origin=https://hal.archives-ouvertes.fr/hal-02560320
 607 | done   | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9 | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9;origin=https://hal.archives-ouvertes.fr/hal-02560320
 606 | done   | swh:1:dir:d85591aeefea2c1c58142e34683fd1923b19c895 | swh:1:dir:d85591aeefea2c1c58142e34683fd1923b19c895;origin=https://doi.org/10.5201/ipol.2018.236
 605 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-14T08:28:05.683282
 603 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-09T14:09:50.098364
 602 | done   | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab;origin=https://doi.org/10.5201/ipol.2018.236
 601 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-07T16:05:49.106202
 600 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-07T14:09:14.062873
 599 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-07T12:52:53.361776
 598 | done   | swh:1:dir:43b7a45a89c836b1baad8849215a51e65a67f80e | swh:1:dir:43b7a45a89c836b1baad8849215a51e65a67f80e;origin=https://hal.archives-ouvertes.fr/hal-02546057
 597 | done   | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab;origin=https://doi.org/10.5201/ipol.2018.236
...

to

 id  | status |                       swh_id                       |                                                                                                                        swh_id_context                                    $
-----+--------+----------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------$
 608 | done   | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9 | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9;origin=https://hal.archives-ouvertes.fr/hal-02560320;visit=swh:1:snp:e5e82d064a9c3df7464223042e0c55d72ccff7f0;anchor=s$
 607 | done   | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9 | swh:1:dir:c3d06da1a556900e295b64aea1cc5a413b374ae9;origin=https://hal.archives-ouvertes.fr/hal-02560320;visit=swh:1:snp:3e95ef6e04c381a34cc2f314576bc5644f2c797f;anchor=s$
 606 | done   | swh:1:dir:d85591aeefea2c1c58142e34683fd1923b19c895 | swh:1:dir:d85591aeefea2c1c58142e34683fd1923b19c895;origin=https://doi.org/10.5201/ipol.2018.236;visit=swh:1:snp:07c80b96ab64e714fb69ed725f6b18caf87763ba;anchor=swh:1:rev$
 605 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-14T08:28:05.683282;visit=swh:1:snp:4577ab1375d35bab6e316$
 603 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-09T14:09:50.098364;visit=swh:1:snp:7e09ab0433291e2c5ea14$
 602 | done   | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab;origin=https://doi.org/10.5201/ipol.2018.236;visit=swh:1:snp:994f6ca7c49b1012768c4a5a6470f17f28d0e294;anchor=swh:1:rev$
 601 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-07T16:05:49.106202;visit=swh:1:snp:7c6ad0d82051bce0d5ebd$
 600 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-07T14:09:14.062873;visit=swh:1:snp:8f2341e340bd883300885$
 599 | done   | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea | swh:1:dir:ef04a768181417fbc5eef4243e2507915f24deea;origin=https://www.softwareheritage.org/check-deposit-2020-05-07T12:52:53.361776;visit=swh:1:snp:ce3d7eb9b08b839171c01$
 598 | done   | swh:1:dir:43b7a45a89c836b1baad8849215a51e65a67f80e | swh:1:dir:43b7a45a89c836b1baad8849215a51e65a67f80e;origin=https://hal.archives-ouvertes.fr/hal-02546057;visit=swh:1:snp:526c43a6e4459f2c72c67031adf931ed6d3bdca7;anchor=s$
 597 | done   | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab | swh:1:dir:a10423592dd061a00f7d34e4a3c102ba00c3d2ab;origin=https://doi.org/10.5201/ipol.2018.236;visit=swh:1:snp:f7decde6a26a4fa5f0886d71c010ceae827bae92;anchor=swh:1:rev$
 ...

"Old" deposits:

From

 id  | status |                       swh_id                       | swh_id_context
-----+--------+----------------------------------------------------+----------------
 156 | done   | swh:1:rev:698771f9ca7ce7605fdcabf27b5851f322ea692c |
 155 | done   | swh:1:rev:6c9bdcaac6b1b22726752d5d46d04865313d78aa |
 154 | done   | swh:1:rev:8127063816bd4f75e00c2986c0a95fd95d78d876 |
 153 | done   | swh:1:rev:2176d2be0d7e13e89a90447d7d0853af5cbab973 |
 152 | done   | swh:1:rev:e2655c5b28552465a7be15c06f31aa066f64535a |
 151 | done   | swh:1:rev:504a90c58872a8a594886fcf75fc5bfebe151e68 |
 150 | done   | swh:1:rev:c648730299c2a4f4df3c1fe6e527ef3681f9527e |
 149 | done   | swh:1:rev:bb8d72c6646316967ac08a7bc4acc95c50c14d79 |
 147 | done   | swh:1:rev:c8fca417ee9eefe25683042192da67470147be07 |
 146 | done   | swh:1:rev:cccf789c12617208fe188ad3dbc2746d4c884ab7 |

to

 id  | status |                       swh_id                       |                                                                                                          swh_id_context                                                  $
-----+--------+----------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------$
 156 | done   | swh:1:dir:2c01e745c6d89e0eeb9a6ec9590f7ef0750b7002 | swh:1:dir:2c01e745c6d89e0eeb9a6ec9590f7ef0750b7002;origin=https://hal.archives-ouvertes.fr/hal-01831369;visit=swh:1:snp:42f0897956e700a23f5b8aafce43360b8699c0f1;anchor=s$
 155 | done   | swh:1:rev:6c9bdcaac6b1b22726752d5d46d04865313d78aa |
 154 | done   | swh:1:dir:3cb45c908fdad87542c5090e9464fc7f504e1509 | swh:1:dir:3cb45c908fdad87542c5090e9464fc7f504e1509;origin=https://hal.archives-ouvertes.fr/hal-01836266;visit=swh:1:snp:1fbb294bc458809e043bba9073f9d7a8b0b40fc9;anchor=s$
 153 | done   | swh:1:dir:95486800004625900d8365ee968683c7608a3b9d | swh:1:dir:95486800004625900d8365ee968683c7608a3b9d;origin=https://hal.archives-ouvertes.fr/hal-01837101;visit=swh:1:snp:2c2c2e4dcd61753b61739a45669ffbb89104d17a;anchor=s$
 152 | done   | swh:1:dir:f23a9f9d65671aaad715012a1781cb5de6451a3e | swh:1:dir:f23a9f9d65671aaad715012a1781cb5de6451a3e;origin=https://hal.archives-ouvertes.fr/hal-01831364;visit=swh:1:snp:f34ffc4d2fb57ba19a8586b88091fe99714a970a;anchor=s$
 151 | done   | swh:1:dir:f5cba66f896192d98641cf2d801de11dfca9f2a7 | swh:1:dir:f5cba66f896192d98641cf2d801de11dfca9f2a7;origin=https://hal.archives-ouvertes.fr/hal-01836189;visit=swh:1:snp:0e0f73db37ae7d26bf4b29d5599da2bfced30d63;anchor=s$
 150 | done   | swh:1:dir:accc6076ec6104d2125567e4a0c7685fb91f71e7 | swh:1:dir:accc6076ec6104d2125567e4a0c7685fb91f71e7;origin=https://hal.archives-ouvertes.fr/hal-01836169;visit=swh:1:snp:e3640bbfa187762803f29012b02693dd48e0ac88;anchor=s$
 149 | done   | swh:1:rev:bb8d72c6646316967ac08a7bc4acc95c50c14d79 |
 147 | done   | swh:1:dir:f23a9f9d65671aaad715012a1781cb5de6451a3e | swh:1:dir:f23a9f9d65671aaad715012a1781cb5de6451a3e;origin=https://hal.archives-ouvertes.fr/hal-01831364;visit=swh:1:snp:2cce797c46e9d06eb424e2f806a8d7d1fab6bf38;anchor=s$
 146 | done   | swh:1:dir:8a9521f0228d4f79a20d8d20f28523d557f9d2f8 | swh:1:dir:8a9521f0228d4f79a20d8d20f28523d557f9d2f8;origin=https://hal.archives-ouvertes.fr/hal-01831369;visit=swh:1:snp:a0f733bb6f16d6fe65c95194ad76c471fe739e75;anchor=s$

Expectedly, there could be some deposits that are not migrated (see description)

Leftover to reschedule

swh-deposit=> select id, status, swh_id, swh_id_context from deposit where status='done' and swh_id_context is null order by id desc;
 id  | status |                       swh_id                       | swh_id_context
-----+--------+----------------------------------------------------+----------------
 155 | done   | swh:1:rev:6c9bdcaac6b1b22726752d5d46d04865313d78aa |
 149 | done   | swh:1:rev:bb8d72c6646316967ac08a7bc4acc95c50c14d79 |
 127 | done   | swh:1:rev:d76cf5c02ce421f157d3fa624ad134a2efd18193 |
 126 | done   | swh:1:rev:84567c10d3c2383a878a9d8ab6773c1665e08419 |
 125 | done   | swh:1:rev:35ff14e6e4514adae3f950825a4b8b9b9f22767f |
 124 | done   | swh:1:rev:279a8ea930ddd6ef54f10f2f0784ea14a2205215 |
 123 | done   | swh:1:rev:e2a3373925db0f9f4307699e913b9fea9516cf6b |
 116 | done   | swh:1:rev:e2cdf2d3ce49f933ac6d23054183f92eacc4faef |
 114 | done   | swh:1:rev:a5e8b3d276e3a05989d00628e6e611ec7c51252a |
 112 | done   | swh:1:rev:b167902daf3a8a163d947adb62ad4269df471597 |
 110 | done   | swh:1:rev:b260ac6c02987fdf66e7dd1d2e647134cc3bed72 |
 108 | done   | swh:1:rev:d3f9947006289c67be6fd2a5081e466d61a80996 |
  93 | done   | swh:1:rev:734786ca12ca626b3a82a9d2a6fb5f6b968e7bd6 |
  92 | done   | swh:1:rev:4eb1d36683af77b946cdcb5875798d03bd6b775a |
  86 | done   | swh:1:rev:a0b9fc8f8a8bd7e1d29a18b9ac1a7d6e402d31cd |
  85 | done   | swh:1:rev:c29acbad74bb6cc01f9b7d61dd4f01ac747d771d |
  84 | done   | swh:1:rev:afb67a44c5de98891f4f21d04c449cc200b7e739 |
  83 | done   | swh:1:rev:bc3a12c0a288d74eafeb564ba03d8466f5fdb0f2 |
  82 | done   | swh:1:rev:31578998456025e4ebdb396b08dda0a63777b80e |
  81 | done   | swh:1:rev:85a127f023c84b2326c72fa669f0e3ad73a4fb68 |
  80 | done   | swh:1:rev:2a97f21995bab29548d7b41ec75fdd5639dbd325 |
  79 | done   | swh:1:rev:03987f056eaf4596cd20d7b2ee01c9b84ceddfa8 |
  78 | done   | swh:1:rev:7b844a98f54466cb189d27dbc1eede17f39e1c52 |
  77 | done   | swh:1:rev:4cf243a0645d5cd10c689eafd22ab38d685ad2d4 |
(24 rows)

Migrated from D3153 (view on Phabricator)

Merge request reports