github and gitea: Use html_url as @id and clone_url as codeRepository
They are closer semantics as 'html_url' is the main page of the repository, so it is the best to identify it; and 'clone_url' is the URL that should be given to 'git clone', as documented by https://schema.org/codeRepository
Additionally, that property was missing so far; but a future commit will need to use it to identify fork relationships (node ids are required to representation relationships between documents as we cannot use blank nodes for that)
Depends on !391 (closed).
Migrated from D8468 (view on Phabricator)
Merge request reports
Activity
Build has FAILED
Patch application report for D8468 (id=30509)
Could not rebase; Attempt merge onto e25a2f4e...
Merge made by the 'recursive' strategy. swh/indexer/data/Gitea.csv | 76 +++++++++++ swh/indexer/metadata_dictionary/__init__.py | 15 ++- swh/indexer/metadata_dictionary/base.py | 108 ++++++++++------ swh/indexer/metadata_dictionary/cff.py | 5 +- swh/indexer/metadata_dictionary/gitea.py | 124 ++++++++++++++++++ swh/indexer/metadata_dictionary/github.py | 17 ++- .../tests/metadata_dictionary/test_gitea.py | 143 +++++++++++++++++++++ .../tests/metadata_dictionary/test_github.py | 10 +- swh/indexer/tests/test_cli.py | 1 + 9 files changed, 451 insertions(+), 48 deletions(-) create mode 100644 swh/indexer/data/Gitea.csv create mode 100644 swh/indexer/metadata_dictionary/gitea.py create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit 2df4dc14566284c7339f70e284742dddb7363a26 Merge: e25a2f4 d2e42fa Author: Jenkins user <jenkins@localhost> Date: Tue Sep 13 15:08:20 2022 +0000 Merge branch 'diff-target' into HEAD commit d2e42fae761ca540b8708145563fef712a7c329d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 17:06:08 2022 +0200 github and gitea: Use html_url as @id and clone_url as codeRepository They are closer semantics as 'html_url' is the main page of the repository, so it is the best to identify it; and 'clone_url' is the URL that should be given to 'git clone', as documented by https://schema.org/codeRepository Additionally, that property was missing so far; but a future commit will need to use it to identify fork relationships (node ids are required to representation relationships between documents as we cannot use blank nodes for that) commit 9f6b75cad02745311f3d29a564b3db2d5b756af7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 13:30:54 2022 +0200 Add Gitea metadata mapping commit 3a3a348bd86e714ab016a93617bc197010ee145d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 12:34:22 2022 +0200 GitHub: use correct JSON-LD types for URLs and dates
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/493/ See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/493/console
Build has FAILED
Patch application report for D8468 (id=30539)
Could not rebase; Attempt merge onto e25a2f4e...
Merge made by the 'recursive' strategy. swh/indexer/data/Gitea.csv | 76 +++++++++++ swh/indexer/metadata_dictionary/__init__.py | 15 ++- swh/indexer/metadata_dictionary/base.py | 108 ++++++++++------ swh/indexer/metadata_dictionary/cff.py | 5 +- swh/indexer/metadata_dictionary/gitea.py | 124 ++++++++++++++++++ swh/indexer/metadata_dictionary/github.py | 17 ++- .../tests/metadata_dictionary/test_gitea.py | 143 +++++++++++++++++++++ .../tests/metadata_dictionary/test_github.py | 10 +- swh/indexer/tests/test_cli.py | 1 + 9 files changed, 451 insertions(+), 48 deletions(-) create mode 100644 swh/indexer/data/Gitea.csv create mode 100644 swh/indexer/metadata_dictionary/gitea.py create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit 395d0aae0a41c91cd40d472618849c3a6249a8bc Merge: e25a2f4 8055d0d Author: Jenkins user <jenkins@localhost> Date: Thu Sep 15 06:41:19 2022 +0000 Merge branch 'diff-target' into HEAD commit 8055d0d6390364cdd6fcb73eaedf7203d7c10185 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 17:06:08 2022 +0200 github and gitea: Use html_url as @id and clone_url as codeRepository They are closer semantics as 'html_url' is the main page of the repository, so it is the best to identify it; and 'clone_url' is the URL that should be given to 'git clone', as documented by https://schema.org/codeRepository Additionally, that property was missing so far; but a future commit will need to use it to identify fork relationships (node ids are required to representation relationships between documents as we cannot use blank nodes for that) commit 9f6b75cad02745311f3d29a564b3db2d5b756af7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 13:30:54 2022 +0200 Add Gitea metadata mapping commit 3a3a348bd86e714ab016a93617bc197010ee145d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 12:34:22 2022 +0200 GitHub: use correct JSON-LD types for URLs and dates
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/494/ See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/494/console
Build was aborted
Patch application report for D8468 (id=30558)
Could not rebase; Attempt merge onto e25a2f4e...
Merge made by the 'recursive' strategy. swh/indexer/data/Gitea.csv | 76 +++++++++++ swh/indexer/metadata_dictionary/__init__.py | 15 ++- swh/indexer/metadata_dictionary/base.py | 108 ++++++++++------ swh/indexer/metadata_dictionary/cff.py | 5 +- swh/indexer/metadata_dictionary/gitea.py | 124 ++++++++++++++++++ swh/indexer/metadata_dictionary/github.py | 19 ++- .../tests/metadata_dictionary/test_gitea.py | 143 +++++++++++++++++++++ .../tests/metadata_dictionary/test_github.py | 10 +- swh/indexer/tests/test_cli.py | 1 + swh/indexer/tests/test_metadata.py | 3 +- 10 files changed, 455 insertions(+), 49 deletions(-) create mode 100644 swh/indexer/data/Gitea.csv create mode 100644 swh/indexer/metadata_dictionary/gitea.py create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit a4c38aebe66ccba13f348682a599ae6a29deb705 Merge: e25a2f4 c518541 Author: Jenkins user <jenkins@localhost> Date: Thu Sep 15 12:02:55 2022 +0000 Merge branch 'diff-target' into HEAD commit c518541b21bfbf1dd6415a369777a57ef3430c7b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 17:06:08 2022 +0200 github and gitea: Use html_url as @id and clone_url as codeRepository They are closer semantics as 'html_url' is the main page of the repository, so it is the best to identify it; and 'clone_url' is the URL that should be given to 'git clone', as documented by https://schema.org/codeRepository Additionally, that property was missing so far; but a future commit will need to use it to identify fork relationships (node ids are required to representation relationships between documents as we cannot use blank nodes for that) commit 9f6b75cad02745311f3d29a564b3db2d5b756af7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 13:30:54 2022 +0200 Add Gitea metadata mapping commit 3a3a348bd86e714ab016a93617bc197010ee145d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 12:34:22 2022 +0200 GitHub: use correct JSON-LD types for URLs and dates
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/496/ See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/496/console
Build has FAILED
Patch application report for D8468 (id=30558)
Could not rebase; Attempt merge onto e25a2f4e...
Merge made by the 'recursive' strategy. swh/indexer/data/Gitea.csv | 76 +++++++++++ swh/indexer/metadata_dictionary/__init__.py | 15 ++- swh/indexer/metadata_dictionary/base.py | 108 ++++++++++------ swh/indexer/metadata_dictionary/cff.py | 5 +- swh/indexer/metadata_dictionary/gitea.py | 124 ++++++++++++++++++ swh/indexer/metadata_dictionary/github.py | 19 ++- .../tests/metadata_dictionary/test_gitea.py | 143 +++++++++++++++++++++ .../tests/metadata_dictionary/test_github.py | 10 +- swh/indexer/tests/test_cli.py | 1 + swh/indexer/tests/test_metadata.py | 3 +- 10 files changed, 455 insertions(+), 49 deletions(-) create mode 100644 swh/indexer/data/Gitea.csv create mode 100644 swh/indexer/metadata_dictionary/gitea.py create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit e8cce65695857490924764b1b04d050969614b57 Merge: e25a2f4 c518541 Author: Jenkins user <jenkins@localhost> Date: Fri Sep 16 12:51:51 2022 +0000 Merge branch 'diff-target' into HEAD commit c518541b21bfbf1dd6415a369777a57ef3430c7b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 17:06:08 2022 +0200 github and gitea: Use html_url as @id and clone_url as codeRepository They are closer semantics as 'html_url' is the main page of the repository, so it is the best to identify it; and 'clone_url' is the URL that should be given to 'git clone', as documented by https://schema.org/codeRepository Additionally, that property was missing so far; but a future commit will need to use it to identify fork relationships (node ids are required to representation relationships between documents as we cannot use blank nodes for that) commit 9f6b75cad02745311f3d29a564b3db2d5b756af7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 13:30:54 2022 +0200 Add Gitea metadata mapping commit 3a3a348bd86e714ab016a93617bc197010ee145d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 12:34:22 2022 +0200 GitHub: use correct JSON-LD types for URLs and dates
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/498/ See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/498/console
Build has FAILED
Patch application report for D8468 (id=30558)
Could not rebase; Attempt merge onto e25a2f4e...
Merge made by the 'recursive' strategy. swh/indexer/data/Gitea.csv | 76 +++++++++++ swh/indexer/metadata_dictionary/__init__.py | 15 ++- swh/indexer/metadata_dictionary/base.py | 108 ++++++++++------ swh/indexer/metadata_dictionary/cff.py | 5 +- swh/indexer/metadata_dictionary/gitea.py | 124 ++++++++++++++++++ swh/indexer/metadata_dictionary/github.py | 19 ++- .../tests/metadata_dictionary/test_gitea.py | 143 +++++++++++++++++++++ .../tests/metadata_dictionary/test_github.py | 10 +- swh/indexer/tests/test_cli.py | 1 + swh/indexer/tests/test_metadata.py | 3 +- 10 files changed, 455 insertions(+), 49 deletions(-) create mode 100644 swh/indexer/data/Gitea.csv create mode 100644 swh/indexer/metadata_dictionary/gitea.py create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit 1c3e4c305bf3714ccf9457d9174a20affcfbe638 Merge: e25a2f4 c518541 Author: Jenkins user <jenkins@localhost> Date: Fri Sep 16 12:52:17 2022 +0000 Merge branch 'diff-target' into HEAD commit c518541b21bfbf1dd6415a369777a57ef3430c7b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 17:06:08 2022 +0200 github and gitea: Use html_url as @id and clone_url as codeRepository They are closer semantics as 'html_url' is the main page of the repository, so it is the best to identify it; and 'clone_url' is the URL that should be given to 'git clone', as documented by https://schema.org/codeRepository Additionally, that property was missing so far; but a future commit will need to use it to identify fork relationships (node ids are required to representation relationships between documents as we cannot use blank nodes for that) commit 9f6b75cad02745311f3d29a564b3db2d5b756af7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 13:30:54 2022 +0200 Add Gitea metadata mapping commit 3a3a348bd86e714ab016a93617bc197010ee145d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 12:34:22 2022 +0200 GitHub: use correct JSON-LD types for URLs and dates
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/499/ See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/499/console
Build is green
Patch application report for D8468 (id=30598)
Could not rebase; Attempt merge onto e25a2f4e...
Merge made by the 'recursive' strategy. swh/indexer/data/Gitea.csv | 76 +++++++++++ swh/indexer/metadata_dictionary/__init__.py | 15 ++- swh/indexer/metadata_dictionary/base.py | 108 ++++++++++------ swh/indexer/metadata_dictionary/cff.py | 5 +- swh/indexer/metadata_dictionary/gitea.py | 124 ++++++++++++++++++ swh/indexer/metadata_dictionary/github.py | 19 ++- .../tests/metadata_dictionary/test_gitea.py | 143 +++++++++++++++++++++ .../tests/metadata_dictionary/test_github.py | 10 +- swh/indexer/tests/test_cli.py | 2 + swh/indexer/tests/test_metadata.py | 3 +- 10 files changed, 456 insertions(+), 49 deletions(-) create mode 100644 swh/indexer/data/Gitea.csv create mode 100644 swh/indexer/metadata_dictionary/gitea.py create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit 26653bae18c047acfb4e7219d705d53680bb5652 Merge: e25a2f4 9d7a6a4 Author: Jenkins user <jenkins@localhost> Date: Sun Sep 18 12:18:01 2022 +0000 Merge branch 'diff-target' into HEAD commit 9d7a6a47e157d443849dc749765ecb010ba856c2 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 17:06:08 2022 +0200 github and gitea: Use html_url as @id and clone_url as codeRepository They are closer semantics as 'html_url' is the main page of the repository, so it is the best to identify it; and 'clone_url' is the URL that should be given to 'git clone', as documented by https://schema.org/codeRepository Additionally, that property was missing so far; but a future commit will need to use it to identify fork relationships (node ids are required to representation relationships between documents as we cannot use blank nodes for that) commit 9f6b75cad02745311f3d29a564b3db2d5b756af7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 13:30:54 2022 +0200 Add Gitea metadata mapping commit 3a3a348bd86e714ab016a93617bc197010ee145d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 12:34:22 2022 +0200 GitHub: use correct JSON-LD types for URLs and dates
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/500/ for more details.
This seems reasonable in terms of ""ontology design"".
I wonder if the inconsistency between origin urls (which are sometimes the html url, sometimes the clone urls, depending on the lister) and object ids in metadata will end up biting us eventually. Considering that the extrinsic metadata is always attached to an origin url, it should be fine, but it should probably be documented for future users of the metadata.
Build was aborted
Patch application report for D8468 (id=30858)
Could not rebase; Attempt merge onto e25a2f4e...
Updating e25a2f4..ac0e263 Fast-forward swh/indexer/data/Gitea.csv | 76 +++++++++++ swh/indexer/metadata_dictionary/__init__.py | 15 ++- swh/indexer/metadata_dictionary/base.py | 108 ++++++++++------ swh/indexer/metadata_dictionary/cff.py | 5 +- swh/indexer/metadata_dictionary/gitea.py | 124 ++++++++++++++++++ swh/indexer/metadata_dictionary/github.py | 19 ++- .../tests/metadata_dictionary/test_gitea.py | 143 +++++++++++++++++++++ .../tests/metadata_dictionary/test_github.py | 10 +- swh/indexer/tests/test_cli.py | 2 + swh/indexer/tests/test_metadata.py | 3 +- 10 files changed, 456 insertions(+), 49 deletions(-) create mode 100644 swh/indexer/data/Gitea.csv create mode 100644 swh/indexer/metadata_dictionary/gitea.py create mode 100644 swh/indexer/tests/metadata_dictionary/test_gitea.py
Changes applied before test
commit ac0e263bbfc17ee2905b97bbbbbb4929419170cd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 17:06:08 2022 +0200 github and gitea: Use html_url as @id and clone_url as codeRepository They are closer semantics as 'html_url' is the main page of the repository, so it is the best to identify it; and 'clone_url' is the URL that should be given to 'git clone', as documented by https://schema.org/codeRepository Additionally, that property was missing so far; but a future commit will need to use it to identify fork relationships (node ids are required to representation relationships between documents as we cannot use blank nodes for that) commit cb435e59ca91ac7b71cff18e5e6b3885e5be9ac1 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 13:30:54 2022 +0200 Add Gitea metadata mapping commit 20becf4a90fa6b626e972bba3d57db46604cb7b2 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 13 12:34:22 2022 +0200 GitHub: use correct JSON-LD types for URLs and dates
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/507/ See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/507/console