Discuss the project <-> origin mapping
The current db schema maps each "project" to a single "origin" (while keeping the history of this mapping).
This prevents us from having a single project point to, e.g., a tarball directory and a git/hg repository at the same time.
Do we want to register that Python-3.5.0.tar.xz is different to the 3.5.0 tag on hg.python.org?
Migrated from T3 (view on Phabricator)
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Nicolas Dandrimont added Storage manager priority:Triage labels
added Storage manager priority:Triage labels
- Stefano Zacchiroli added priority:Normal label and removed priority:Triage label
added priority:Normal label and removed priority:Triage label
- Author Maintainer
I've had some thoughts about this problem, and here are my propositions.
The most flexible way to store the Project to Origin mapping is a three-way map:
- Organization
- Project
- Origin
The idea is that each project can be hosted by different organizations, and that we can map those together.
In our current schema, we should create one organization per way of listing importable artefacts :
- GitHub
- GitHub git hosting -> #lister-github, generates suborganizations for "GitHub organizations"
- hylang -> no specific lister, "shallow" organization
- zacchiro
- olasd
- ...
- GitHub asset hosting -> swh-loader-git#17
- GitHub git hosting -> #lister-github, generates suborganizations for "GitHub organizations"
- Debian
- snapshot.debian.org -> swh-loader-core#119 (closed)
- archive.debian.org -> swh-loader-core#120 (closed)
- alioth.debian.org -> #lister-cgit
Each github repo would be one project. Forks would be associated to the same project, but with another organization (or even the same organization but a different origin).
Each debian source package name would be one project too, and be associated with two origins (if applicable), one for the snapshot.d.o organization, and one for the archive.d.o organization.
We then need a way to "deduplicate" projects (and for instance associate debian's python-hy with github's hylang/hy). My opinion on this is to leave the "automatically generated" projects alone, and to keep a separate "association table" that would be filled separately.
- Author Maintainer
! In #3 (closed), @olasd wrote:
- GitHub
- GitHub git hosting -> #lister-github, generates suborganizations for "GitHub organizations"
- hylang -> no specific lister, "shallow" organization
- zacchiro
- olasd
- ...
- GitHub asset hosting -> swh-loader-git#17
- GitHub git hosting -> #lister-github, generates suborganizations for "GitHub organizations"
Mulling this over, this could be a bit different:
-
GitHub
-
GitHub hosting
- GitHub git hosting
- GitHub asset hosting
-
GitHub organizations
- hylang
- debian
- ...
-
GitHub users
-
olasd
-
zacchiro
-
...
-
Debian
-
Debian hosting
-
snapshot.debian.org
-
archive.debian.org
-
alioth.debian.org
-
git.debian.org
-
svn.debian.org
-
...
-
Debian teams (generated from alioth)
-
pkg-foo
-
pkg-bar
-
...
-
Debian People (generated from alioth in the /users/ hierarchy)
-
olasd
-
baz-guest
-
...
-
GNU
-
GNU Hosting
-
mirror.gnu.org/gnu/
-
mirror.gnu.org/old-gnu/
-
GNU Projects (generated from the mirror hierarchy)
-
bash
-
glibc
-
...
-
Apache
-
Apache Hosting
-
archive.apache.org
-
Apache Projects (generated from the archive hierarchy)
-
httpd
-
...
We probably need to add an "autogenerated" flag to organizations, and a "matching" table like we do for projects.
- GitHub
- Author Maintainer
@zack points out that organization does not feel like the right term anymore.
Possible alternatives :
- (source) entity
- umbrella
- authority
-
source(probably overloaded)
We will also need to define an //entity// typology
- organization (= Software Heritage, Debian, GNU, GitHub, Apache, ...)
- group of //entities// (for hierarchy-only //entities// like "GitHub Hosting")
- hosting facility (= snapshot.debian.org, GitHub git hosting, ...)
- group of persons (= GitHub Organization, Debian Team)
- person (= GitHub User, Debian People)
- project (= GNU Projects, Apache Projects)
- Phabricator Migration user mentioned in commit 5c2a6fc2
mentioned in commit 5c2a6fc2
- Phabricator Migration user mentioned in commit swh-objstorage@efe2dc31
mentioned in commit swh-objstorage@efe2dc31
- Nicolas Dandrimont closed
closed
- Phabricator Migration user mentioned in issue #49 (closed)
mentioned in issue #49 (closed)