Skip to content

Modify deposit workflow to check duplicated POST requests

After the session with Bruno this week, we saw that multiple request of the same deposit that are waiting for the workers create a corner case where each is treated as a different deposit and each is loaded into the archive separately. For example this deposit -https://archive.softwareheritage.org/browse/origin/https://hal.archives-ouvertes.fr/hal-01862659/visits/ with 9 visits but not related through the parent history.

post_deposit_workflow Procedure:

  1. if external id exists
  2. if md5 identical 3. calculate metadata hash 4. if metadata hash identical 5. return 400 //we have already received this deposit
  3. mark deposit with last identical external-id as parent-id 3. if parent is 'rejected' status iterate until last non-rejected parent
  4. return 201 with new deposit-id

Comment: when parent is not in status 'done' the deposit can't be loaded


Migrated from T1171 (view on Phabricator)

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information