swh.deposit.parsers: Ensure SWHIDs with context are strictly valid
If not, those errors are caught early enough so the deposit users are notified and can fix those wrong inputs.
This further checks the SWHIDs with qualifiers entries. Typically, ensuring the "anchor" and "visit" SWHIDs if provided are valid according to the documentation
-
[1]
-
"visit" is a snapshot core swhid (if any)
-
"anchor" is a core swhid of leaf node (either directory, a revision, a release or a snapshot)
I did not push this in swh.model.identifiers.parse_swhid
but it should ideally go there. As
people already mentioned it was slow already, i thought of keeping it here first.
Related to #2537 (closed)
Test Plan
tox
Migrated from D4490 (view on Phabricator)
Merge request reports
Activity
Build is green
Patch application report for D4490 (id=15928)
Could not rebase; Attempt merge onto a36ea4db...
Updating a36ea4db..f1dfe96d Fast-forward swh/deposit/api/common.py | 169 ++++++++++++- swh/deposit/api/deposit_update.py | 95 +------- swh/deposit/config.py | 5 + swh/deposit/parsers.py | 36 ++- swh/deposit/tests/api/test_deposit_metadata.py | 268 +++++++++++++++++++++ swh/deposit/tests/api/test_parsers.py | 40 +-- swh/deposit/tests/conftest.py | 6 +- .../tests/data/atom/entry-data-with-origin.xml | 13 + .../tests/data/atom/entry-data-with-swhid.xml | 13 + 9 files changed, 532 insertions(+), 113 deletions(-) create mode 100644 swh/deposit/tests/api/test_deposit_metadata.py create mode 100644 swh/deposit/tests/data/atom/entry-data-with-origin.xml create mode 100644 swh/deposit/tests/data/atom/entry-data-with-swhid.xml
Changes applied before test
commit f1dfe96d5fce0368fecdd397e06adc2f2c1d88ed Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Nov 17 11:49:55 2020 +0100 swh.deposit.parsers: Ensure SWHIDs with context are valid If not, those errors are caught early enough so the deposit users are notified and can fix those. This further checks the SWHIDs with qualifiers entries. Typically, ensuring the "anchor" and "visit" SWHIDs if provided are valid according to the documentation - [1] - "visit" is a snapshot core swhid (if any) - "anchor" is a core swhid of leaf node (either directory, a revision, a release or a snapshot) - [1] https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#qualifiers Related to #2537 commit 32104ea235d5c736d7f26712bdf48429a45c0706 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Nov 17 09:55:10 2020 +0100 metadata-only: Finalize deposit with the swhid (if any) and dates Related to #2537 commit c2795d60037e981aee2868851755c019474a992d Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Fri Nov 13 19:15:58 2020 +0100 metadata-only: Add tests on deposit with swhid and origin reference Related to #2537 commit 477a0e0edc1ff01a6bdbd72ea27955a6096ff16a Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Fri Nov 13 18:27:23 2020 +0100 metadata-only: Reuse existing metadata deposit code Related to #2537 commit 77fca0e1d8b452d22e26771cf243e7fa99bab8f2 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Fri Nov 13 14:32:05 2020 +0100 metadata-only: Start deposit It does nothing right now but add the detection of the metadata-only deposit. And fails in case of invalid input. Related to #2537
See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/335/ for more details.
! In !142 (closed), @vlorentz wrote: IMO the
SWHID
class should store the parsed core SWHIDs for its qualifiers, so they don't need to be parsed twice (+ error handling)shouldn't this be a dedicated task?
mentioned in issue swh-model#2788 (closed)
IMO the
SWHID
class should store the parsed core SWHIDs for its qualifiers, so they don't need to be parsed twice (+ error handling) shouldn't this be a dedicated task? trueAs a result, I opened swh-model#2788 (closed). I tried to synthetize the different suggestions from here and the other deposit diff. Feel free to improve the description ;)
Cheers,
Some references in the commit message have been migrated:
- T2537 is now #2537 (closed)
Build is green
Patch application report for D4490 (id=15929)
Rebasing onto a36ea4db...
Current branch diff-target is up to date.
Changes applied before test
commit c1a45162d3313e2cbb19ddc31977deac029d3411 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Nov 17 11:49:55 2020 +0100 swh.deposit.parsers: Ensure SWHIDs with context are valid If not, those errors are caught early enough so the deposit users are notified and can fix those. This further checks the SWHIDs with qualifiers entries. Typically, ensuring the "anchor" and "visit" SWHIDs if provided are valid according to the documentation - [1] - "visit" is a snapshot core swhid (if any) - "anchor" is a core swhid of leaf node (either directory, a revision, a release or a snapshot) - [1] https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#qualifiers Related to #2537
See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/336/ for more details.