rubygems: Use gems database dump to improve listing output
Instead of using an undocumented rubygems HTTP endpoint that only gives us the names of the gems, prefer to exploit the daily PostgreSQL dump of the rubygems.org database. It enables to list all gems but also all versions of a gem and its release artifacts. For each relase artifact, the following info are extracted: version, download URL, sha256 checksum, release date plus a couple of extra metadata. The lister will now set list of artifacts and list of metadata as extra loader arguments when sending a listed origin to the scheduler database. A last_update date is also computed which should ensure loading tasks for rubygems will be scheduled only when new releases are available since last loadings. To be noted, the lister will spawn a temporary postgres instance so this require the initdb executable from postgres server installation to be available in the execution environment. Related to T1777
Showing
- mypy.ini 6 additions, 0 deletionsmypy.ini
- requirements.txt 2 additions, 0 deletionsrequirements.txt
- swh/lister/rubygems/lister.py 171 additions, 32 deletionsswh/lister/rubygems/lister.py
- swh/lister/rubygems/tests/data/https_rubygems.org/versions 0 additions, 6 deletionsswh/lister/rubygems/tests/data/https_rubygems.org/versions
- swh/lister/rubygems/tests/data/rubygems_dumps.xml 22 additions, 0 deletionsswh/lister/rubygems/tests/data/rubygems_dumps.xml
- swh/lister/rubygems/tests/data/rubygems_pgsql_dump.tar 0 additions, 0 deletionsswh/lister/rubygems/tests/data/rubygems_pgsql_dump.tar
- swh/lister/rubygems/tests/data/small_rubygems_dump.sh 38 additions, 0 deletionsswh/lister/rubygems/tests/data/small_rubygems_dump.sh
- swh/lister/rubygems/tests/test_lister.py 140 additions, 13 deletionsswh/lister/rubygems/tests/test_lister.py
... | ... | @@ -7,3 +7,5 @@ launchpadlib |
tenacity >= 6.2 | ||
lxml | ||
dulwich | ||
testing.postgresql | ||
psycopg2 |
File added
Please register or sign in to comment