Listers: Canonicalize listed github origins
As part of the maven lister, it's been put into attention that some urls can be listed without being the main canonical urls. This can result in origins duplication for no good reason.
So let's reuse some existing url canonicalization code (for gh origins) in listers and reuse when possible. That code should exist in swh-web and be refactored out into swh.core then be reused both in swh-web and listers (starting with the maven one, possibly nixguix, and packagist listers can be done later as well).
Plan:
-
swh-core!271 (closed): Compute canonical gh urls in an exposed library function in swh.core -
swh-core!322 (closed): Refactor GitHubSession request management out of swh.lister in swh.core -
Release [2.6.0) -
Unstuck debian build if problem (new deps) -
swh-core!273 (closed): Use GitHubSession to make the canonical computation deal with rate limit -
Release (2.7.0) -
!284 (closed): Refactor swh.lister to reuse the code moved in swh.core -
swh-core!274 (closed): Add missing canonical case in swh.core -
Release (2.8.0) -
!335 (closed): (Goal) Adapt maven lister to list canonical gh urls if any -
swh-core!328 (closed): Extra work for exotic github urls (deployed on staging)
Extra plan got extracted out of this task [1]
- [1] #4279
Note: gh
refers to GitHub
Migrated from T4232 (view on Phabricator)
Edited by Phabricator Migration user