Choose/define an ontology to use for indexed extrinsic origin metadata
Indexers will translate GitHub/Gitlab/Gogs/Gitea/...'s API response into this ontology.
The grammar itself would mostly likely be JSON-LD so it is compatible with our other metadata (CodeMeta/schema.org)
Current options:
- create our own
- pick an existing forge's ontology, and map everything to it -> this would eventually turn into inventing our own ontology anyway
- ForgeFed's ontology (and ActivityPub) -> not very suitable for us; at the moment it's a disjoint set from what we want (ForgeFed only cares about issues and PRs)
- GHTorrent -> I am told they are working on a generic way to represent data that isn't tied to GitHub, but I cannot find it
- schema.org (not codemeta, as it is only meant to describe source code and applications, not repositories, which are different objects)
- Wikidata (eg. https://www.wikidata.org/wiki/Property:P9100 )
- DOAP
Namespace prefixes used below:
- as = https://www.w3.org/ns/activitystreams
- codemeta = https://codemeta.github.io/terms/
- doap = http://usefulinc.com/ns/doap#
- forge = https://forgefed.org/ns
- schema = http://schema.org
Description | GitHub (no NS) | Gitlab (no NS) | Gitea (no NS) | schema.org / Codemeta | Wikidata | DOAP | ForgeFed (and ActivityStreams) | libraries.io (no NS) |
---|---|---|---|---|---|---|---|---|
Name and description | ||||||||
-- | -- | -- | -- | -- | -- | -- | -- | -- |
Project name | name | name | name / full_name | schema:name | relies on RDF | relies on RDF + doap:name | as:name | |
Project avatar | avatar_url | schema:image | P18 (image) | logo_url | ||||
Owner name | owner.login | namespace.name | being discussed | |||||
Owner avatar | owner.avatar_url | namespace.avatar_url | being discussed | |||||
Owner homepage | ? | ? | being discussed | |||||
description | description | description | description | schema:description | relies on RDF | doap:shortdesc + doap:description | as:description (not forge:description!) | description |
homepage | website | schema:url | P2699 (URL) P856 (official website, less accurate but more popular) | doap:homepage + old-homepage | homepage | |||
Tags/labels | topics | topics | schema:keywords | P9100 (Github topic) | keywords | |||
schema:applicationCategory + schema:applicationSubcategory | doap:category | |||||||
URL to clone/checkout the repo | clone_url | ssh_url_to_repo / http_url_to_repo | clone_url | P1324 (source code repository) (kind of) | doap:repository -> (doap:anon-root / doap:location) | forge:cloneUri | ||
Dates | ||||||||
-- | -- | -- | -- | -- | -- | -- | -- | -- |
created_at | created_at | created_at | schema:dateCreated | P571 (inception) | doap:created | created_at | ||
updated_at | last_activity_at | updated_at | schema:dateModified (kind of) | as:updated | updated_at | |||
pushed_at | schema:dateModified (kind of) | (doesn't make sense in its data model) | pushed_at | |||||
schema:datePublished | P577 (publication date) | as:published | ||||||
marked_for_deletion_on | ||||||||
Relationship with other repositories + current status | ||||||||
-- | -- | -- | -- | -- | -- | -- | -- | -- |
whether it's a fork | fork | fork | ||||||
what it's a fork of | parent + source | parent.web_url | parent.html_url | schema:isBasedOn | P144 (based on) | forge:forkedFrom | ||
number of forks | forks_count | forks_count | forks_count | forge:forks -> as:totalItems | forks (on projects) / forks_count (on repositories) | |||
list of forks | forge:forks | |||||||
Whether the repository is disabled | archived | archived | archived | being discussed | ||||
empty | ||||||||
whether it's a mirror | mirror | mirror | ||||||
mirror_interval | ||||||||
mirror_updated | ||||||||
what it is a mirror of | mirror_url (deprecated) | (no API) | original_url | being discussed | mirror_url | |||
Whether this is a template repository | is_template | template | ||||||
Presumably, what template repository was used to create this one | template_repository | |||||||
visibility (private/internal/public) | visibility (private/internal/public) | internal + private (booleans) | ||||||
Social features | ||||||||
-- | -- | -- | -- | -- | -- | -- | -- | -- |
stargazers_count | star_count | stars_count | schema:interactionStatistic -> filter on schema:LikeAction -> schema:userInteractionCount | as:likes -> as:totalItems | stargazers_count | |||
watchers_count | watchers_count | schema:interactionStatistic -> filter on schema:FollowAction -> schema:userInteractionCount | as:followers -> as:totalItems | subscribers_count | ||||
watchers | as:followers | |||||||
open_issues / open_issues_count | open_issues_count | open_issues_count | open_issues_count | |||||
open_pr_counter | ||||||||
(always true if repo not archived) | merge_request_enabled | has_pull_requests | ||||||
Configuration | ||||||||
-- | -- | -- | -- | -- | -- | -- | -- | -- |
default_branch | default_branch | default_branch | default_branch | |||||
has_issues | has_issues | has_issues | ||||||
codemeta:issueTracker | P1401 bug tracking system | doap:bug-database | forge:ticketsTrackedBy / forge:sendPatchesTo (see also) | |||||
internal_tracker.* + external_tracker.* | ||||||||
doap:mailing-list | ||||||||
doap:support-forum | ||||||||
doap:developer-forum | ||||||||
jobs_enabled | ||||||||
snippets_enabled | ||||||||
can_create_merge_request_in | ||||||||
resolve_outdated_diff_discussions | ||||||||
(different semantics) | merge_method | allow_merge_commits + allow_rebase + allow_rebase_explicit + allow_squash_merge + default_merge_style | ||||||
squash_option | ||||||||
has_projects | has_projects | |||||||
has_downloads | ||||||||
has_wiki | wiki_enabled | has_wiki | has_wiki | |||||
external_wiki.* | ||||||||
has_pages | ||||||||
merge_commit_template | ||||||||
squash_commit_template | ||||||||
Statistics | ||||||||
-- | -- | -- | -- | -- | -- | -- | -- | -- |
not documented | size | |||||||
not documented | size | |||||||
not documented | size | |||||||
statistics.commit_count | ||||||||
statistics.storage_size | ||||||||
statistics.repository_size | ||||||||
statistics. | ||||||||
release_counter | ||||||||
License | ||||||||
-- | -- | -- | -- | -- | -- | -- | -- | -- |
SPDX id | license.spdx_id | |||||||
license URL (usually on a small set of domains) | by dereferencing license.url then getting html_url | license.html_url / license.source_url | schema:license | |||||
license URI | schema:license | P275 (copyright license) | doap:license | |||||
possibly inconsistent | license.nickname | |||||||
possibly inconsistent | license.key | license.key | ||||||
possibly inconsistent | license.name | license.name | ||||||
licenses / licenses_normalized / repository_license | ||||||||
Other mined metadata | ||||||||
-- | -- | -- | -- | -- | -- | -- | -- | -- |
programming language | language | language | schema:programmingLanguage | P277 (programming language) | doap:programming-language | language | ||
readme_url | codemeta:readme | |||||||
readme filename | has_readme |
Migrated from T4249 (view on Phabricator)
Edited by vlorentz