We want to be able to list all packages on PyPI and, more importantly, to list new (releases of) packages since the last state of PyPI that has been ingested into Software Heritage.
basic json one [1] which permits to request information on a per project basis (no listing) [1] (~> foresee the use of this one for the loader)
xmlrpc deprecated one [2] (this one lists ~> that would be for the lister use)
html page (listing all packages)
rss feed (update events)
As already mentioned in their faq, they push towards mirroring, quoting [3]:
If your consumer is actually an organization or service that will be downloading a lot of packages from PyPI, consider using your own index mirror or cache.
That's not a sustainable way. If we choose that path for all the forges we need to archive... that will be difficult in terms of infrastructure and maintenance.
! In #422 (closed), @ardumont wrote:
If your consumer is actually an organization or service that will be downloading a lot of packages from PyPI, consider using your own index mirror or cache.
That's not a sustainable way. If we choose that path for all the forges we need to archive... that will be difficult in terms of infrastructure and maintenance.
Agreed: we do not maintain actual mirrors of other big "things" we archive (e.g., GitHub, GitLab.com, Debian, etc.) and for a reason. We really want to hook into existing PyPi APIs to incrementally ingest new stuff that arrive there, //without// maintaining an actual mirror.