Choose/define an ontology to use for indexed extrinsic origin metadata
Indexers will translate GitHub/Gitlab/Gogs/Gitea/...'s API response into this ontology.
The grammar itself would mostly likely be JSON-LD so it is compatible with our other metadata (CodeMeta/schema.org)
Current options:
- create our own
- pick an existing forge's ontology, and map everything to it -> this would eventually turn into inventing our own ontology anyway
- ForgeFed's ontology (and ActivityPub) -> not very suitable for us; at the moment it's a disjoint set from what we want (ForgeFed only cares about issues and PRs)
- GHTorrent -> I am told they are working on a generic way to represent data that isn't tied to GitHub, but I cannot find it
- schema.org (not codemeta, as it is only meant to describe source code and applications, not repositories, which are different objects)
- Wikidata (eg. https://www.wikidata.org/wiki/Property:P9100 )
- DOAP
Namespace prefixes used below:
- as = https://www.w3.org/ns/activitystreams
- codemeta = https://codemeta.github.io/terms/
- doap = http://usefulinc.com/ns/doap#
- forge = https://forgefed.org/ns
- schema = http://schema.org
| Description | GitHub (no NS) | Gitlab (no NS) | Gitea (no NS) | schema.org / Codemeta | Wikidata | DOAP | ForgeFed (and ActivityStreams) | libraries.io (no NS) |
|---|---|---|---|---|---|---|---|---|
| Name and description | ||||||||
| -- | -- | -- | -- | -- | -- | -- | -- | -- |
| Project name | name | name | name / full_name | schema:name | relies on RDF | relies on RDF + doap:name | as:name | |
| Project avatar | avatar_url | schema:image | P18 (image) | logo_url | ||||
| Owner name | owner.login | namespace.name | being discussed | |||||
| Owner avatar | owner.avatar_url | namespace.avatar_url | being discussed | |||||
| Owner homepage | ? | ? | being discussed | |||||
| description | description | description | description | schema:description | relies on RDF | doap:shortdesc + doap:description | as:description (not forge:description!) | description |
| homepage | website | schema:url | P2699 (URL) P856 (official website, less accurate but more popular) | doap:homepage + old-homepage | homepage | |||
| Tags/labels | topics | topics | schema:keywords | P9100 (Github topic) | keywords | |||
| schema:applicationCategory + schema:applicationSubcategory | doap:category | |||||||
| URL to clone/checkout the repo | clone_url | ssh_url_to_repo / http_url_to_repo | clone_url | P1324 (source code repository) (kind of) | doap:repository -> (doap:anon-root / doap:location) | forge:cloneUri | ||
| Dates | ||||||||
| -- | -- | -- | -- | -- | -- | -- | -- | -- |
| created_at | created_at | created_at | schema:dateCreated | P571 (inception) | doap:created | created_at | ||
| updated_at | last_activity_at | updated_at | schema:dateModified (kind of) | as:updated | updated_at | |||
| pushed_at | schema:dateModified (kind of) | (doesn't make sense in its data model) | pushed_at | |||||
| schema:datePublished | P577 (publication date) | as:published | ||||||
| marked_for_deletion_on | ||||||||
| Relationship with other repositories + current status | ||||||||
| -- | -- | -- | -- | -- | -- | -- | -- | -- |
| whether it's a fork | fork | fork | ||||||
| what it's a fork of | parent + source | parent.web_url | parent.html_url | schema:isBasedOn | P144 (based on) | forge:forkedFrom | ||
| number of forks | forks_count | forks_count | forks_count | forge:forks -> as:totalItems | forks (on projects) / forks_count (on repositories) | |||
| list of forks | forge:forks | |||||||
| Whether the repository is disabled | archived | archived | archived | being discussed | ||||
| empty | ||||||||
| whether it's a mirror | mirror | mirror | ||||||
| mirror_interval | ||||||||
| mirror_updated | ||||||||
| what it is a mirror of | mirror_url (deprecated) | (no API) | original_url | being discussed | mirror_url | |||
| Whether this is a template repository | is_template | template | ||||||
| Presumably, what template repository was used to create this one | template_repository | |||||||
| visibility (private/internal/public) | visibility (private/internal/public) | internal + private (booleans) | ||||||
| Social features | ||||||||
| -- | -- | -- | -- | -- | -- | -- | -- | -- |
| stargazers_count | star_count | stars_count | schema:interactionStatistic -> filter on schema:LikeAction -> schema:userInteractionCount | as:likes -> as:totalItems | stargazers_count | |||
| watchers_count | watchers_count | schema:interactionStatistic -> filter on schema:FollowAction -> schema:userInteractionCount | as:followers -> as:totalItems | subscribers_count | ||||
| watchers | as:followers | |||||||
| open_issues / open_issues_count | open_issues_count | open_issues_count | open_issues_count | |||||
| open_pr_counter | ||||||||
| (always true if repo not archived) | merge_request_enabled | has_pull_requests | ||||||
| Configuration | ||||||||
| -- | -- | -- | -- | -- | -- | -- | -- | -- |
| default_branch | default_branch | default_branch | default_branch | |||||
| has_issues | has_issues | has_issues | ||||||
| codemeta:issueTracker | P1401 bug tracking system | doap:bug-database | forge:ticketsTrackedBy / forge:sendPatchesTo (see also) | |||||
| internal_tracker.* + external_tracker.* | ||||||||
| doap:mailing-list | ||||||||
| doap:support-forum | ||||||||
| doap:developer-forum | ||||||||
| jobs_enabled | ||||||||
| snippets_enabled | ||||||||
| can_create_merge_request_in | ||||||||
| resolve_outdated_diff_discussions | ||||||||
| (different semantics) | merge_method | allow_merge_commits + allow_rebase + allow_rebase_explicit + allow_squash_merge + default_merge_style | ||||||
| squash_option | ||||||||
| has_projects | has_projects | |||||||
| has_downloads | ||||||||
| has_wiki | wiki_enabled | has_wiki | has_wiki | |||||
| external_wiki.* | ||||||||
| has_pages | ||||||||
| merge_commit_template | ||||||||
| squash_commit_template | ||||||||
| Statistics | ||||||||
| -- | -- | -- | -- | -- | -- | -- | -- | -- |
| not documented | size | |||||||
| not documented | size | |||||||
| not documented | size | |||||||
| statistics.commit_count | ||||||||
| statistics.storage_size | ||||||||
| statistics.repository_size | ||||||||
| statistics. | ||||||||
| release_counter | ||||||||
| License | ||||||||
| -- | -- | -- | -- | -- | -- | -- | -- | -- |
| SPDX id | license.spdx_id | |||||||
| license URL (usually on a small set of domains) | by dereferencing license.url then getting html_url | license.html_url / license.source_url | schema:license | |||||
| license URI | schema:license | P275 (copyright license) | doap:license | |||||
| possibly inconsistent | license.nickname | |||||||
| possibly inconsistent | license.key | license.key | ||||||
| possibly inconsistent | license.name | license.name | ||||||
| licenses / licenses_normalized / repository_license | ||||||||
| Other mined metadata | ||||||||
| -- | -- | -- | -- | -- | -- | -- | -- | -- |
| programming language | language | language | schema:programmingLanguage | P277 (programming language) | doap:programming-language | language | ||
| readme_url | codemeta:readme | |||||||
| readme filename | has_readme |
Migrated from T4249 (view on Phabricator)
Edited by vlorentz