codemeta: Fix incorrect output namespace for dates and URLs
Codemeta reexports schema:url, schema:dateCreated, ... with
"@type": "@id"
and "type": "schema:Date"
so that
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"url": "http://example.org",
"dateCreated": "2022-10-26"
}
expands to:
{
"http://schema.org/url": {
"@type": "@id",
"@value": "http://example.org"
},
"dateCreated": {
"@type": "http://schema.org/Date",
"@value": "2022-10-26"
}
}
However, our translation tried to translate directly to a partially expanded form, like this:
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"url": {
"@value": "http://example.org"
},
"dateCreated": {
"@value": "2022-10-26"
}
}
which prevents the compaction and expansion algorithms from adding a type themselves, causing the document to be compacted to:
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"schema:url": "http://example.org"
"schema:dateCreated": "2022-10-26"
}
or expanded to:
{
"http://schema.org/url": {
"@value": "http://example.org"
},
"http://schema.org/dateCreated": {
"@value": "2022-10-26"
}
}
which are not what we want.
This commit replaces the hack for @type
with the right solution that
works for all properties.
I noticed this issue while writing tests for the diff that will resolve T4654.
Migrated from D8778 (view on Phabricator)
Merge request reports
Activity
Build is green
Patch application report for D8778 (id=31643)
Could not rebase; Attempt merge onto a51cbf39...
Updating a51cbf3..5b8b04a Fast-forward swh/indexer/metadata_dictionary/base.py | 25 +++++++------ swh/indexer/metadata_dictionary/cff.py | 7 +++- swh/indexer/metadata_dictionary/codemeta.py | 16 ++++----- swh/indexer/metadata_dictionary/github.py | 13 ++++--- swh/indexer/metadata_dictionary/maven.py | 11 +++--- swh/indexer/metadata_dictionary/npm.py | 16 ++------- swh/indexer/metadata_dictionary/nuget.py | 4 +-- swh/indexer/metadata_dictionary/utils.py | 42 +++++++++++++++++++++- .../tests/metadata_dictionary/test_codemeta.py | 4 ++- swh/indexer/tests/metadata_dictionary/test_npm.py | 11 ++++++ 10 files changed, 96 insertions(+), 53 deletions(-)
Changes applied before test
commit 5b8b04ab55eb73fdd506ae6b00b21f51e183d883 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Oct 26 14:08:33 2022 +0200 codemeta: Fix incorrect output namespace for dates and URLs Codemeta reexports schema:url, schema:dateCreated, ... with `"@type": "@id"` and `"type": "schema:Date"` so that ``` { "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "url": "http://example.org", "dateCreated": "2022-10-26" } ``` expands to: ``` { "http://schema.org/url": { "@type": "@id", "@value": "http://example.org" }, "dateCreated": { "@type": "http://schema.org/Date", "@value": "2022-10-26" } } ``` However, our translation tried to translate directly to a partially expanded form, like this: ``` { "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "url": { "@value": "http://example.org" }, "dateCreated": { "@value": "2022-10-26" } } ``` which prevents the compaction and expansion algorithms from adding a type themselves, causing the document to be compacted to: ``` { "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "schema:url": "http://example.org" "schema:dateCreated": "2022-10-26" } ``` or expanded to: ``` { "http://schema.org/url": { "@value": "http://example.org" }, "http://schema.org/dateCreated": { "@value": "2022-10-26" } } ``` which are not what we want. This commit replaces the hack for `@type` with the right solution that works for all properties. commit a66d5b240ab77e6d8d1b9accf43d571489a3f7f0 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Oct 25 16:02:16 2022 +0200 metadata_dictionary: Systematically check input URLs before adding to graph This is hopefully the definitive workaround for the PyLD issue.
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/519/ for more details.
Build is green
Patch application report for D8778 (id=31644)
Could not rebase; Attempt merge onto a51cbf39...
Updating a51cbf3..c0052f8 Fast-forward swh/indexer/metadata_dictionary/base.py | 25 +++++++------ swh/indexer/metadata_dictionary/cff.py | 7 +++- swh/indexer/metadata_dictionary/codemeta.py | 16 ++++----- swh/indexer/metadata_dictionary/github.py | 13 ++++--- swh/indexer/metadata_dictionary/maven.py | 11 +++--- swh/indexer/metadata_dictionary/npm.py | 16 ++------- swh/indexer/metadata_dictionary/nuget.py | 4 +-- swh/indexer/metadata_dictionary/utils.py | 42 +++++++++++++++++++++- .../tests/metadata_dictionary/test_codemeta.py | 11 ++++-- swh/indexer/tests/metadata_dictionary/test_npm.py | 11 ++++++ 10 files changed, 102 insertions(+), 54 deletions(-)
Changes applied before test
commit c0052f8e48fa4cf2c0034c48d2e66355558af62a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Wed Oct 26 14:08:33 2022 +0200 codemeta: Fix incorrect output namespace for dates and URLs Codemeta reexports schema:url, schema:dateCreated, ... with `"@type": "@id"` and `"type": "schema:Date"` so that ``` { "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "url": "http://example.org", "dateCreated": "2022-10-26" } ``` expands to: ``` { "http://schema.org/url": { "@type": "@id", "@value": "http://example.org" }, "dateCreated": { "@type": "http://schema.org/Date", "@value": "2022-10-26" } } ``` However, our translation tried to translate directly to a partially expanded form, like this: ``` { "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "url": { "@value": "http://example.org" }, "dateCreated": { "@value": "2022-10-26" } } ``` which prevents the compaction and expansion algorithms from adding a type themselves, causing the document to be compacted to: ``` { "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "schema:url": "http://example.org" "schema:dateCreated": "2022-10-26" } ``` or expanded to: ``` { "http://schema.org/url": { "@value": "http://example.org" }, "http://schema.org/dateCreated": { "@value": "2022-10-26" } } ``` which are not what we want. This commit replaces the hack for `@type` with the right solution that works for all properties. commit a66d5b240ab77e6d8d1b9accf43d571489a3f7f0 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Oct 25 16:02:16 2022 +0200 metadata_dictionary: Systematically check input URLs before adding to graph This is hopefully the definitive workaround for the PyLD issue.
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/520/ for more details.
mentioned in merge request !477 (closed)