Skip to content
Snippets Groups Projects

codemeta: Fix incorrect output namespace for dates and URLs

Codemeta reexports schema:url, schema:dateCreated, ... with "@type": "@id" and "type": "schema:Date" so that

{
    "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
    "url": "http://example.org",
    "dateCreated": "2022-10-26"
}

expands to:

{
    "http://schema.org/url": {
        "@type": "@id",
        "@value": "http://example.org"
    },
    "dateCreated": {
        "@type": "http://schema.org/Date",
        "@value": "2022-10-26"
    }
}

However, our translation tried to translate directly to a partially expanded form, like this:

{
    "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
    "url": {
        "@value": "http://example.org"
    },
    "dateCreated": {
        "@value": "2022-10-26"
    }
}

which prevents the compaction and expansion algorithms from adding a type themselves, causing the document to be compacted to:

{
    "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
    "schema:url": "http://example.org"
    "schema:dateCreated": "2022-10-26"
}

or expanded to:

{
    "http://schema.org/url": {
        "@value": "http://example.org"
    },
    "http://schema.org/dateCreated": {
        "@value": "2022-10-26"
    }
}

which are not what we want.

This commit replaces the hack for @type with the right solution that works for all properties.

I noticed this issue while writing tests for the diff that will resolve T4654.


Migrated from D8778 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Author Maintainer

    add missing test

  • Build is green

    Patch application report for D8778 (id=31643)

    Could not rebase; Attempt merge onto a51cbf39...

    Updating a51cbf3..5b8b04a
    Fast-forward
     swh/indexer/metadata_dictionary/base.py            | 25 +++++++------
     swh/indexer/metadata_dictionary/cff.py             |  7 +++-
     swh/indexer/metadata_dictionary/codemeta.py        | 16 ++++-----
     swh/indexer/metadata_dictionary/github.py          | 13 ++++---
     swh/indexer/metadata_dictionary/maven.py           | 11 +++---
     swh/indexer/metadata_dictionary/npm.py             | 16 ++-------
     swh/indexer/metadata_dictionary/nuget.py           |  4 +--
     swh/indexer/metadata_dictionary/utils.py           | 42 +++++++++++++++++++++-
     .../tests/metadata_dictionary/test_codemeta.py     |  4 ++-
     swh/indexer/tests/metadata_dictionary/test_npm.py  | 11 ++++++
     10 files changed, 96 insertions(+), 53 deletions(-)
    Changes applied before test
    commit 5b8b04ab55eb73fdd506ae6b00b21f51e183d883
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Wed Oct 26 14:08:33 2022 +0200
    
        codemeta: Fix incorrect output namespace for dates and URLs
        
        Codemeta reexports schema:url, schema:dateCreated, ... with
        `"@type": "@id"` and `"type": "schema:Date"` so that
        
        ```
        {
            "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
            "url": "http://example.org",
            "dateCreated": "2022-10-26"
        }
        ```
        
        expands to:
        
        ```
        {
            "http://schema.org/url": {
                "@type": "@id",
                "@value": "http://example.org"
            },
            "dateCreated": {
                "@type": "http://schema.org/Date",
                "@value": "2022-10-26"
            }
        }
        ```
        
        However, our translation tried to translate directly to a partially expanded
        form, like this:
        
        ```
        {
            "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
            "url": {
                "@value": "http://example.org"
            },
            "dateCreated": {
                "@value": "2022-10-26"
            }
        }
        ```
        
        which prevents the compaction and expansion algorithms from adding a
        type themselves, causing the document to be compacted to:
        
        ```
        {
            "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
            "schema:url": "http://example.org"
            "schema:dateCreated": "2022-10-26"
        }
        ```
        
        or expanded to:
        
        ```
        {
            "http://schema.org/url": {
                "@value": "http://example.org"
            },
            "http://schema.org/dateCreated": {
                "@value": "2022-10-26"
            }
        }
        ```
        
        which are not what we want.
        
        This commit replaces the hack for `@type` with the right solution that
        works for all properties.
    
    commit a66d5b240ab77e6d8d1b9accf43d571489a3f7f0
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Oct 25 16:02:16 2022 +0200
    
        metadata_dictionary: Systematically check input URLs before adding to graph
        
        This is hopefully the definitive workaround for the PyLD issue.

    See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/519/ for more details.

  • Build is green

    Patch application report for D8778 (id=31644)

    Could not rebase; Attempt merge onto a51cbf39...

    Updating a51cbf3..c0052f8
    Fast-forward
     swh/indexer/metadata_dictionary/base.py            | 25 +++++++------
     swh/indexer/metadata_dictionary/cff.py             |  7 +++-
     swh/indexer/metadata_dictionary/codemeta.py        | 16 ++++-----
     swh/indexer/metadata_dictionary/github.py          | 13 ++++---
     swh/indexer/metadata_dictionary/maven.py           | 11 +++---
     swh/indexer/metadata_dictionary/npm.py             | 16 ++-------
     swh/indexer/metadata_dictionary/nuget.py           |  4 +--
     swh/indexer/metadata_dictionary/utils.py           | 42 +++++++++++++++++++++-
     .../tests/metadata_dictionary/test_codemeta.py     | 11 ++++--
     swh/indexer/tests/metadata_dictionary/test_npm.py  | 11 ++++++
     10 files changed, 102 insertions(+), 54 deletions(-)
    Changes applied before test
    commit c0052f8e48fa4cf2c0034c48d2e66355558af62a
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Wed Oct 26 14:08:33 2022 +0200
    
        codemeta: Fix incorrect output namespace for dates and URLs
        
        Codemeta reexports schema:url, schema:dateCreated, ... with
        `"@type": "@id"` and `"type": "schema:Date"` so that
        
        ```
        {
            "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
            "url": "http://example.org",
            "dateCreated": "2022-10-26"
        }
        ```
        
        expands to:
        
        ```
        {
            "http://schema.org/url": {
                "@type": "@id",
                "@value": "http://example.org"
            },
            "dateCreated": {
                "@type": "http://schema.org/Date",
                "@value": "2022-10-26"
            }
        }
        ```
        
        However, our translation tried to translate directly to a partially expanded
        form, like this:
        
        ```
        {
            "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
            "url": {
                "@value": "http://example.org"
            },
            "dateCreated": {
                "@value": "2022-10-26"
            }
        }
        ```
        
        which prevents the compaction and expansion algorithms from adding a
        type themselves, causing the document to be compacted to:
        
        ```
        {
            "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
            "schema:url": "http://example.org"
            "schema:dateCreated": "2022-10-26"
        }
        ```
        
        or expanded to:
        
        ```
        {
            "http://schema.org/url": {
                "@value": "http://example.org"
            },
            "http://schema.org/dateCreated": {
                "@value": "2022-10-26"
            }
        }
        ```
        
        which are not what we want.
        
        This commit replaces the hack for `@type` with the right solution that
        works for all properties.
    
    commit a66d5b240ab77e6d8d1b9accf43d571489a3f7f0
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Oct 25 16:02:16 2022 +0200
    
        metadata_dictionary: Systematically check input URLs before adding to graph
        
        This is hopefully the definitive workaround for the PyLD issue.

    See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/520/ for more details.

  • Antoine Lambert mentioned in merge request !477 (closed)

    mentioned in merge request !477 (closed)

  • Merge request was accepted

  • Antoine Lambert approved this merge request

    approved this merge request

  • Author Maintainer

    Merge request was merged

  • closed

Please register or sign in to reply
Loading