codemeta: Fix malformed dates that used to be allowed by the deposit
Closed
requested to merge generated-differential-D8779-source into generated-differential-D8779-target
1 unresolved thread
Merge request reports
Activity
Build is green
Patch application report for D8779 (id=31645)
Could not rebase; Attempt merge onto a51cbf39...
Updating a51cbf3..3bad414 Fast-forward mypy.ini | 3 ++ requirements.txt | 1 + swh/indexer/metadata_dictionary/base.py | 25 +++++++------ swh/indexer/metadata_dictionary/cff.py | 7 +++- swh/indexer/metadata_dictionary/codemeta.py | 32 +++++++++++------ swh/indexer/metadata_dictionary/github.py | 13 ++++--- swh/indexer/metadata_dictionary/maven.py | 11 +++--- swh/indexer/metadata_dictionary/npm.py | 16 ++------- swh/indexer/metadata_dictionary/nuget.py | 4 +-- swh/indexer/metadata_dictionary/utils.py | 42 +++++++++++++++++++++- .../tests/metadata_dictionary/test_codemeta.py | 33 +++++++++++++++-- swh/indexer/tests/metadata_dictionary/test_npm.py | 11 ++++++ 12 files changed, 144 insertions(+), 54 deletions(-)
Changes applied before test
commit 3bad41489c4b5412fbf250d7dd53c3b188956f65 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Oct 26 14:19:26 2022 +0200 codemeta: Fix malformed dates that used to be allowed by the deposit commit c0052f8e48fa4cf2c0034c48d2e66355558af62a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Oct 26 14:08:33 2022 +0200 codemeta: Fix incorrect output namespace for dates and URLs Codemeta reexports schema:url, schema:dateCreated, ... with `"@type": "@id"` and `"type": "schema:Date"` so that ``` { "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "url": "http://example.org", "dateCreated": "2022-10-26" } ``` expands to: ``` { "http://schema.org/url": { "@type": "@id", "@value": "http://example.org" }, "dateCreated": { "@type": "http://schema.org/Date", "@value": "2022-10-26" } } ``` However, our translation tried to translate directly to a partially expanded form, like this: ``` { "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "url": { "@value": "http://example.org" }, "dateCreated": { "@value": "2022-10-26" } } ``` which prevents the compaction and expansion algorithms from adding a type themselves, causing the document to be compacted to: ``` { "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "schema:url": "http://example.org" "schema:dateCreated": "2022-10-26" } ``` or expanded to: ``` { "http://schema.org/url": { "@value": "http://example.org" }, "http://schema.org/dateCreated": { "@value": "2022-10-26" } } ``` which are not what we want. This commit replaces the hack for `@type` with the right solution that works for all properties. commit a66d5b240ab77e6d8d1b9accf43d571489a3f7f0 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Oct 25 16:02:16 2022 +0200 metadata_dictionary: Systematically check input URLs before adding to graph This is hopefully the definitive workaround for the PyLD issue.
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/521/ for more details.
84 86 # expansion will convert it to a full URI based on 85 87 # "@context": CODEMETA_CONTEXT_URL 86 88 jsonld_child = self.xml_to_jsonld(child) 89 if ( 90 localname 91 in ( 92 "dateCreated", 93 "dateModified", 94 "datePublished", 95 ) 96 and isinstance(jsonld_child, str) 97 and _DATE_RE.match(jsonld_child)
Please register or sign in to reply