Skip to content
Snippets Groups Projects

github and gitea: Use html_url as @id and clone_url as codeRepository

They are closer semantics as 'html_url' is the main page of the repository, so it is the best to identify it; and 'clone_url' is the URL that should be given to 'git clone', as documented by https://schema.org/codeRepository

Additionally, that property was missing so far; but a future commit will need to use it to identify fork relationships (node ids are required to representation relationships between documents as we cannot use blank nodes for that)

Depends on !391 (closed).


Migrated from D8468 (view on Phabricator)

Merge request reports

Closed by Phabricator Migration userPhabricator Migration user 2 years ago (Sep 27, 2022 3:37pm UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build has FAILED

    Patch application report for D8468 (id=30509)

    Could not rebase; Attempt merge onto e25a2f4e...

    Merge made by the 'recursive' strategy.
     swh/indexer/data/Gitea.csv                         |  76 +++++++++++
     swh/indexer/metadata_dictionary/__init__.py        |  15 ++-
     swh/indexer/metadata_dictionary/base.py            | 108 ++++++++++------
     swh/indexer/metadata_dictionary/cff.py             |   5 +-
     swh/indexer/metadata_dictionary/gitea.py           | 124 ++++++++++++++++++
     swh/indexer/metadata_dictionary/github.py          |  17 ++-
     .../tests/metadata_dictionary/test_gitea.py        | 143 +++++++++++++++++++++
     .../tests/metadata_dictionary/test_github.py       |  10 +-
     swh/indexer/tests/test_cli.py                      |   1 +
     9 files changed, 451 insertions(+), 48 deletions(-)
     create mode 100644 swh/indexer/data/Gitea.csv
     create mode 100644 swh/indexer/metadata_dictionary/gitea.py
     create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
    Changes applied before test
    commit 2df4dc14566284c7339f70e284742dddb7363a26
    Merge: e25a2f4 d2e42fa
    Author: Jenkins user <jenkins@localhost>
    Date:   Tue Sep 13 15:08:20 2022 +0000
    
        Merge branch 'diff-target' into HEAD
    
    commit d2e42fae761ca540b8708145563fef712a7c329d
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 17:06:08 2022 +0200
    
        github and gitea: Use html_url as @id and clone_url as codeRepository
        
        They are closer semantics as 'html_url' is the main page of the repository,
        so it is the best to identify it; and 'clone_url' is the URL that should
        be given to 'git clone', as documented by https://schema.org/codeRepository
        
        Additionally, that property was missing so far; but a future commit will
        need to use it to identify fork relationships (node ids are required to
        representation relationships between documents as we cannot use blank
        nodes for that)
    
    commit 9f6b75cad02745311f3d29a564b3db2d5b756af7
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 13:30:54 2022 +0200
    
        Add Gitea metadata mapping
    
    commit 3a3a348bd86e714ab016a93617bc197010ee145d
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 12:34:22 2022 +0200
    
        GitHub: use correct JSON-LD types for URLs and dates

    Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/493/ See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/493/console

  • Author Maintainer

    fix type

  • Build has FAILED

    Patch application report for D8468 (id=30539)

    Could not rebase; Attempt merge onto e25a2f4e...

    Merge made by the 'recursive' strategy.
     swh/indexer/data/Gitea.csv                         |  76 +++++++++++
     swh/indexer/metadata_dictionary/__init__.py        |  15 ++-
     swh/indexer/metadata_dictionary/base.py            | 108 ++++++++++------
     swh/indexer/metadata_dictionary/cff.py             |   5 +-
     swh/indexer/metadata_dictionary/gitea.py           | 124 ++++++++++++++++++
     swh/indexer/metadata_dictionary/github.py          |  17 ++-
     .../tests/metadata_dictionary/test_gitea.py        | 143 +++++++++++++++++++++
     .../tests/metadata_dictionary/test_github.py       |  10 +-
     swh/indexer/tests/test_cli.py                      |   1 +
     9 files changed, 451 insertions(+), 48 deletions(-)
     create mode 100644 swh/indexer/data/Gitea.csv
     create mode 100644 swh/indexer/metadata_dictionary/gitea.py
     create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
    Changes applied before test
    commit 395d0aae0a41c91cd40d472618849c3a6249a8bc
    Merge: e25a2f4 8055d0d
    Author: Jenkins user <jenkins@localhost>
    Date:   Thu Sep 15 06:41:19 2022 +0000
    
        Merge branch 'diff-target' into HEAD
    
    commit 8055d0d6390364cdd6fcb73eaedf7203d7c10185
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 17:06:08 2022 +0200
    
        github and gitea: Use html_url as @id and clone_url as codeRepository
        
        They are closer semantics as 'html_url' is the main page of the repository,
        so it is the best to identify it; and 'clone_url' is the URL that should
        be given to 'git clone', as documented by https://schema.org/codeRepository
        
        Additionally, that property was missing so far; but a future commit will
        need to use it to identify fork relationships (node ids are required to
        representation relationships between documents as we cannot use blank
        nodes for that)
    
    commit 9f6b75cad02745311f3d29a564b3db2d5b756af7
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 13:30:54 2022 +0200
    
        Add Gitea metadata mapping
    
    commit 3a3a348bd86e714ab016a93617bc197010ee145d
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 12:34:22 2022 +0200
    
        GitHub: use correct JSON-LD types for URLs and dates

    Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/494/ See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/494/console

  • Author Maintainer

    fix test

  • Build was aborted

    Patch application report for D8468 (id=30558)

    Could not rebase; Attempt merge onto e25a2f4e...

    Merge made by the 'recursive' strategy.
     swh/indexer/data/Gitea.csv                         |  76 +++++++++++
     swh/indexer/metadata_dictionary/__init__.py        |  15 ++-
     swh/indexer/metadata_dictionary/base.py            | 108 ++++++++++------
     swh/indexer/metadata_dictionary/cff.py             |   5 +-
     swh/indexer/metadata_dictionary/gitea.py           | 124 ++++++++++++++++++
     swh/indexer/metadata_dictionary/github.py          |  19 ++-
     .../tests/metadata_dictionary/test_gitea.py        | 143 +++++++++++++++++++++
     .../tests/metadata_dictionary/test_github.py       |  10 +-
     swh/indexer/tests/test_cli.py                      |   1 +
     swh/indexer/tests/test_metadata.py                 |   3 +-
     10 files changed, 455 insertions(+), 49 deletions(-)
     create mode 100644 swh/indexer/data/Gitea.csv
     create mode 100644 swh/indexer/metadata_dictionary/gitea.py
     create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
    Changes applied before test
    commit a4c38aebe66ccba13f348682a599ae6a29deb705
    Merge: e25a2f4 c518541
    Author: Jenkins user <jenkins@localhost>
    Date:   Thu Sep 15 12:02:55 2022 +0000
    
        Merge branch 'diff-target' into HEAD
    
    commit c518541b21bfbf1dd6415a369777a57ef3430c7b
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 17:06:08 2022 +0200
    
        github and gitea: Use html_url as @id and clone_url as codeRepository
        
        They are closer semantics as 'html_url' is the main page of the repository,
        so it is the best to identify it; and 'clone_url' is the URL that should
        be given to 'git clone', as documented by https://schema.org/codeRepository
        
        Additionally, that property was missing so far; but a future commit will
        need to use it to identify fork relationships (node ids are required to
        representation relationships between documents as we cannot use blank
        nodes for that)
    
    commit 9f6b75cad02745311f3d29a564b3db2d5b756af7
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 13:30:54 2022 +0200
    
        Add Gitea metadata mapping
    
    commit 3a3a348bd86e714ab016a93617bc197010ee145d
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 12:34:22 2022 +0200
    
        GitHub: use correct JSON-LD types for URLs and dates

    Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/496/ See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/496/console

  • Build has FAILED

    Patch application report for D8468 (id=30558)

    Could not rebase; Attempt merge onto e25a2f4e...

    Merge made by the 'recursive' strategy.
     swh/indexer/data/Gitea.csv                         |  76 +++++++++++
     swh/indexer/metadata_dictionary/__init__.py        |  15 ++-
     swh/indexer/metadata_dictionary/base.py            | 108 ++++++++++------
     swh/indexer/metadata_dictionary/cff.py             |   5 +-
     swh/indexer/metadata_dictionary/gitea.py           | 124 ++++++++++++++++++
     swh/indexer/metadata_dictionary/github.py          |  19 ++-
     .../tests/metadata_dictionary/test_gitea.py        | 143 +++++++++++++++++++++
     .../tests/metadata_dictionary/test_github.py       |  10 +-
     swh/indexer/tests/test_cli.py                      |   1 +
     swh/indexer/tests/test_metadata.py                 |   3 +-
     10 files changed, 455 insertions(+), 49 deletions(-)
     create mode 100644 swh/indexer/data/Gitea.csv
     create mode 100644 swh/indexer/metadata_dictionary/gitea.py
     create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
    Changes applied before test
    commit e8cce65695857490924764b1b04d050969614b57
    Merge: e25a2f4 c518541
    Author: Jenkins user <jenkins@localhost>
    Date:   Fri Sep 16 12:51:51 2022 +0000
    
        Merge branch 'diff-target' into HEAD
    
    commit c518541b21bfbf1dd6415a369777a57ef3430c7b
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 17:06:08 2022 +0200
    
        github and gitea: Use html_url as @id and clone_url as codeRepository
        
        They are closer semantics as 'html_url' is the main page of the repository,
        so it is the best to identify it; and 'clone_url' is the URL that should
        be given to 'git clone', as documented by https://schema.org/codeRepository
        
        Additionally, that property was missing so far; but a future commit will
        need to use it to identify fork relationships (node ids are required to
        representation relationships between documents as we cannot use blank
        nodes for that)
    
    commit 9f6b75cad02745311f3d29a564b3db2d5b756af7
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 13:30:54 2022 +0200
    
        Add Gitea metadata mapping
    
    commit 3a3a348bd86e714ab016a93617bc197010ee145d
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 12:34:22 2022 +0200
    
        GitHub: use correct JSON-LD types for URLs and dates

    Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/498/ See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/498/console

  • Build has FAILED

    Patch application report for D8468 (id=30558)

    Could not rebase; Attempt merge onto e25a2f4e...

    Merge made by the 'recursive' strategy.
     swh/indexer/data/Gitea.csv                         |  76 +++++++++++
     swh/indexer/metadata_dictionary/__init__.py        |  15 ++-
     swh/indexer/metadata_dictionary/base.py            | 108 ++++++++++------
     swh/indexer/metadata_dictionary/cff.py             |   5 +-
     swh/indexer/metadata_dictionary/gitea.py           | 124 ++++++++++++++++++
     swh/indexer/metadata_dictionary/github.py          |  19 ++-
     .../tests/metadata_dictionary/test_gitea.py        | 143 +++++++++++++++++++++
     .../tests/metadata_dictionary/test_github.py       |  10 +-
     swh/indexer/tests/test_cli.py                      |   1 +
     swh/indexer/tests/test_metadata.py                 |   3 +-
     10 files changed, 455 insertions(+), 49 deletions(-)
     create mode 100644 swh/indexer/data/Gitea.csv
     create mode 100644 swh/indexer/metadata_dictionary/gitea.py
     create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
    Changes applied before test
    commit 1c3e4c305bf3714ccf9457d9174a20affcfbe638
    Merge: e25a2f4 c518541
    Author: Jenkins user <jenkins@localhost>
    Date:   Fri Sep 16 12:52:17 2022 +0000
    
        Merge branch 'diff-target' into HEAD
    
    commit c518541b21bfbf1dd6415a369777a57ef3430c7b
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 17:06:08 2022 +0200
    
        github and gitea: Use html_url as @id and clone_url as codeRepository
        
        They are closer semantics as 'html_url' is the main page of the repository,
        so it is the best to identify it; and 'clone_url' is the URL that should
        be given to 'git clone', as documented by https://schema.org/codeRepository
        
        Additionally, that property was missing so far; but a future commit will
        need to use it to identify fork relationships (node ids are required to
        representation relationships between documents as we cannot use blank
        nodes for that)
    
    commit 9f6b75cad02745311f3d29a564b3db2d5b756af7
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 13:30:54 2022 +0200
    
        Add Gitea metadata mapping
    
    commit 3a3a348bd86e714ab016a93617bc197010ee145d
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 12:34:22 2022 +0200
    
        GitHub: use correct JSON-LD types for URLs and dates

    Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/499/ See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/499/console

  • Author Maintainer

    fix test

  • Build is green

    Patch application report for D8468 (id=30598)

    Could not rebase; Attempt merge onto e25a2f4e...

    Merge made by the 'recursive' strategy.
     swh/indexer/data/Gitea.csv                         |  76 +++++++++++
     swh/indexer/metadata_dictionary/__init__.py        |  15 ++-
     swh/indexer/metadata_dictionary/base.py            | 108 ++++++++++------
     swh/indexer/metadata_dictionary/cff.py             |   5 +-
     swh/indexer/metadata_dictionary/gitea.py           | 124 ++++++++++++++++++
     swh/indexer/metadata_dictionary/github.py          |  19 ++-
     .../tests/metadata_dictionary/test_gitea.py        | 143 +++++++++++++++++++++
     .../tests/metadata_dictionary/test_github.py       |  10 +-
     swh/indexer/tests/test_cli.py                      |   2 +
     swh/indexer/tests/test_metadata.py                 |   3 +-
     10 files changed, 456 insertions(+), 49 deletions(-)
     create mode 100644 swh/indexer/data/Gitea.csv
     create mode 100644 swh/indexer/metadata_dictionary/gitea.py
     create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
    Changes applied before test
    commit 26653bae18c047acfb4e7219d705d53680bb5652
    Merge: e25a2f4 9d7a6a4
    Author: Jenkins user <jenkins@localhost>
    Date:   Sun Sep 18 12:18:01 2022 +0000
    
        Merge branch 'diff-target' into HEAD
    
    commit 9d7a6a47e157d443849dc749765ecb010ba856c2
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 17:06:08 2022 +0200
    
        github and gitea: Use html_url as @id and clone_url as codeRepository
        
        They are closer semantics as 'html_url' is the main page of the repository,
        so it is the best to identify it; and 'clone_url' is the URL that should
        be given to 'git clone', as documented by https://schema.org/codeRepository
        
        Additionally, that property was missing so far; but a future commit will
        need to use it to identify fork relationships (node ids are required to
        representation relationships between documents as we cannot use blank
        nodes for that)
    
    commit 9f6b75cad02745311f3d29a564b3db2d5b756af7
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 13:30:54 2022 +0200
    
        Add Gitea metadata mapping
    
    commit 3a3a348bd86e714ab016a93617bc197010ee145d
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 12:34:22 2022 +0200
    
        GitHub: use correct JSON-LD types for URLs and dates

    See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/500/ for more details.

  • This seems reasonable in terms of ""ontology design"".

    I wonder if the inconsistency between origin urls (which are sometimes the html url, sometimes the clone urls, depending on the lister) and object ids in metadata will end up biting us eventually. Considering that the extrinsic metadata is always attached to an origin url, it should be fine, but it should probably be documented for future users of the metadata.

  • Merge request was accepted

  • Nicolas Dandrimont approved this merge request

    approved this merge request

  • Author Maintainer

    rebase

  • Build was aborted

    Patch application report for D8468 (id=30858)

    Could not rebase; Attempt merge onto e25a2f4e...

    Updating e25a2f4..ac0e263
    Fast-forward
     swh/indexer/data/Gitea.csv                         |  76 +++++++++++
     swh/indexer/metadata_dictionary/__init__.py        |  15 ++-
     swh/indexer/metadata_dictionary/base.py            | 108 ++++++++++------
     swh/indexer/metadata_dictionary/cff.py             |   5 +-
     swh/indexer/metadata_dictionary/gitea.py           | 124 ++++++++++++++++++
     swh/indexer/metadata_dictionary/github.py          |  19 ++-
     .../tests/metadata_dictionary/test_gitea.py        | 143 +++++++++++++++++++++
     .../tests/metadata_dictionary/test_github.py       |  10 +-
     swh/indexer/tests/test_cli.py                      |   2 +
     swh/indexer/tests/test_metadata.py                 |   3 +-
     10 files changed, 456 insertions(+), 49 deletions(-)
     create mode 100644 swh/indexer/data/Gitea.csv
     create mode 100644 swh/indexer/metadata_dictionary/gitea.py
     create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
    Changes applied before test
    commit ac0e263bbfc17ee2905b97bbbbbb4929419170cd
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 17:06:08 2022 +0200
    
        github and gitea: Use html_url as @id and clone_url as codeRepository
        
        They are closer semantics as 'html_url' is the main page of the repository,
        so it is the best to identify it; and 'clone_url' is the URL that should
        be given to 'git clone', as documented by https://schema.org/codeRepository
        
        Additionally, that property was missing so far; but a future commit will
        need to use it to identify fork relationships (node ids are required to
        representation relationships between documents as we cannot use blank
        nodes for that)
    
    commit cb435e59ca91ac7b71cff18e5e6b3885e5be9ac1
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 13:30:54 2022 +0200
    
        Add Gitea metadata mapping
    
    commit 20becf4a90fa6b626e972bba3d57db46604cb7b2
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Sep 13 12:34:22 2022 +0200
    
        GitHub: use correct JSON-LD types for URLs and dates

    Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/507/ See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/507/console

  • Author Maintainer

    Merge request was merged

  • closed

Please register or sign in to reply
Loading