packagist: Improve extract package metadata information algorithm
The current lister implementation lists very few metadata with the hard-coded /p/ base url (404 on mostly all packages). The packagist api implementation must have evolved since the initial implementation of the lister (and the first deployment on staging).
Following the upstream documentation [1], it's sensible to first use the /p2/ as it's performant from the packagist api side. It's then fallbacking to use /p2/+~dev url scheme, then the /p/ scheme and finally the /packages/ base url if previous result are either not found or empty (different than no modification since the last visit).
It keeps the initial implementation behavior of stopping immediately if a 304 NotModifiedSince is returned by the server.
[1] https://repo.packagist.org/apidoc
Refs. swh/meta#5001 (closed)
Merge request reports
Activity
Jenkins job DLS/gitlab-builds #164 succeeded .
See Console Output and Coverage Report for more details.The upstream documentation mentions:
- urls of the form https://repo.packagist.org/p2/... (preferred, static, efficient on their side as it can be cached)
- urls of the form https://repo.packagist.org/p/... (composer v1 metadata, static, efficient but deprecated on their side)
- urls of the form https://repo.packagist.org/packages/... (dynamic, inefficient on their end, last resort)
It seems that the first two URL forms only reference modules with numbered, non-development releases, which the example you've provided is not (it only has a dev-master release).
For that module, the p2 URL looks like a valid (but empty) response, the p URL is a 404.
I think we should only fall back to the /packages/ URL if the /p2/ URL doesn't yield something that makes our lister happy (or not at all).
For your example module, the "p2 dev" form of the URL, https://repo.packagist.org/p2/ajthinking/data-query~dev.json, seems to yield a sensible result too.
- Resolved by Antoine R. Dumont
(so I guess we should go the /p2/ route and fall back to /p2/+~dev if that yields an empty response?)
added 1 commit
- b6e36327 - packagist: Extract metadata in order from /p2/ then /packages/ base urls
Jenkins job DLS/gitlab-builds #165 succeeded .
See Console Output and Coverage Report for more details.added 1 commit
- b9afe098 - packagist: Extract metadata with fallbacks from /p2/, /p2/+~dev then /packages/ urls
Jenkins job DLS/gitlab-builds #166 succeeded .
See Console Output and Coverage Report for more details.Jenkins job DLS/gitlab-builds #167 succeeded .
See Console Output and Coverage Report for more details.- Resolved by Antoine R. Dumont
In the earlier logic there were two reasons for an empty response:
- 304 not modified (expected, the outcome was to skip the package)
- or a 404 (unexpected, warning raised)
From what I understand of the current logic, the fallback to ~dev (and then to the /package/ url, and then to the /package/ + ~dev) applies even if the first request has returned a 304 Not-Modified (which would yield an empty response).
I think the logic should be changed to handle 304 status codes explicitly (properly skipping the unmodified packages) instead of relying on the formerly implicit noop of having no packages listed.
I also think the URL formatting logic is a bit brittle. It may be clearer to use a list of string formats, e.g.
PACKAGIST_PACKAGE_URL_FORMATS = [ "https://repo.packagist.org/p2/{package}.json", "https://repo.packagist.org/p2/{package}~dev.json", "https://repo.packagist.org/packages/{package}.json", ]
so that it's crystal clear which URLs will be used.
added 1 commit
- f1ae6825 - packagist: Improve extract package metadata information algorithm
Jenkins job DLS/gitlab-builds #168 succeeded .
See Console Output and Coverage Report for more details.