Skip to content

Choose/define an ontology to use for indexed extrinsic origin metadata

Indexers will translate GitHub/Gitlab/Gogs/Gitea/...'s API response into this ontology.

The grammar itself would mostly likely be JSON-LD so it is compatible with our other metadata (CodeMeta/schema.org)

Current options:

  • create our own
  • pick an existing forge's ontology, and map everything to it -> this would eventually turn into inventing our own ontology anyway
  • ForgeFed's ontology (and ActivityPub) -> not very suitable for us; at the moment it's a disjoint set from what we want (ForgeFed only cares about issues and PRs)
  • GHTorrent -> I am told they are working on a generic way to represent data that isn't tied to GitHub, but I cannot find it
  • schema.org (not codemeta, as it is only meant to describe source code and applications, not repositories, which are different objects)
  • Wikidata (eg. https://www.wikidata.org/wiki/Property:P9100 )
  • DOAP

Namespace prefixes used below:

Description GitHub (no NS) Gitlab (no NS) Gitea (no NS) schema.org / Codemeta Wikidata DOAP ForgeFed (and ActivityStreams) libraries.io (no NS)
Name and description
-- -- -- -- -- -- -- -- --
Project name name name name / full_name schema:name relies on RDF relies on RDF + doap:name as:name
Project avatar avatar_url schema:image P18 (image) logo_url
Owner name owner.login namespace.name being discussed
Owner avatar owner.avatar_url namespace.avatar_url being discussed
Owner homepage ? ? being discussed
description description description description schema:description relies on RDF doap:shortdesc + doap:description as:description (not forge:description!) description
homepage website schema:url P2699 (URL) P856 (official website, less accurate but more popular) doap:homepage + old-homepage homepage
Tags/labels topics topics schema:keywords P9100 (Github topic) keywords
schema:applicationCategory + schema:applicationSubcategory doap:category
URL to clone/checkout the repo clone_url ssh_url_to_repo / http_url_to_repo clone_url P1324 (source code repository) (kind of) doap:repository -> (doap:anon-root / doap:location) forge:cloneUri
Dates
-- -- -- -- -- -- -- -- --
created_at created_at created_at schema:dateCreated P571 (inception) doap:created created_at
updated_at last_activity_at updated_at schema:dateModified (kind of) as:updated updated_at
pushed_at schema:dateModified (kind of) (doesn't make sense in its data model) pushed_at
schema:datePublished P577 (publication date) as:published
marked_for_deletion_on
Relationship with other repositories + current status
-- -- -- -- -- -- -- -- --
whether it's a fork fork fork
what it's a fork of parent + source parent.web_url parent.html_url schema:isBasedOn P144 (based on) forge:forkedFrom
number of forks forks_count forks_count forks_count forge:forks -> as:totalItems forks (on projects) / forks_count (on repositories)
list of forks forge:forks
Whether the repository is disabled archived archived archived being discussed
empty
whether it's a mirror mirror mirror
mirror_interval
mirror_updated
what it is a mirror of mirror_url (deprecated) (no API) original_url being discussed mirror_url
Whether this is a template repository is_template template
Presumably, what template repository was used to create this one template_repository
visibility (private/internal/public) visibility (private/internal/public) internal + private (booleans)
Social features
-- -- -- -- -- -- -- -- --
stargazers_count star_count stars_count schema:interactionStatistic -> filter on schema:LikeAction -> schema:userInteractionCount as:likes -> as:totalItems stargazers_count
watchers_count watchers_count schema:interactionStatistic -> filter on schema:FollowAction -> schema:userInteractionCount as:followers -> as:totalItems subscribers_count
watchers as:followers
open_issues / open_issues_count open_issues_count open_issues_count open_issues_count
open_pr_counter
(always true if repo not archived) merge_request_enabled has_pull_requests
Configuration
-- -- -- -- -- -- -- -- --
default_branch default_branch default_branch default_branch
has_issues has_issues has_issues
codemeta:issueTracker P1401 bug tracking system doap:bug-database forge:ticketsTrackedBy / forge:sendPatchesTo (see also)
internal_tracker.* + external_tracker.*
doap:mailing-list
doap:support-forum
doap:developer-forum
jobs_enabled
snippets_enabled
can_create_merge_request_in
resolve_outdated_diff_discussions
(different semantics) merge_method allow_merge_commits + allow_rebase + allow_rebase_explicit + allow_squash_merge + default_merge_style
squash_option
has_projects has_projects
has_downloads
has_wiki wiki_enabled has_wiki has_wiki
external_wiki.*
has_pages
merge_commit_template
squash_commit_template
Statistics
-- -- -- -- -- -- -- -- --
not documented size
not documented size
not documented size
statistics.commit_count
statistics.storage_size
statistics.repository_size
statistics.
release_counter
License
-- -- -- -- -- -- -- -- --
SPDX id license.spdx_id
license URL (usually on a small set of domains) by dereferencing license.url then getting html_url license.html_url / license.source_url schema:license
license URI schema:license P275 (copyright license) doap:license
possibly inconsistent license.nickname
possibly inconsistent license.key license.key
possibly inconsistent license.name license.name
licenses / licenses_normalized / repository_license
Other mined metadata
-- -- -- -- -- -- -- -- --
programming language language language schema:programmingLanguage P277 (programming language) doap:programming-language language
readme_url codemeta:readme
readme filename has_readme

Migrated from T4249 (view on Phabricator)

Edited by vlorentz