Reimplement PyPI lister using new Lister API
This is a straight port of the old lister. The lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically.
Related to #2956 (closed)
Test Plan
tox
Migrated from D4867 (view on Phabricator)
Merge request reports
Activity
Build has FAILED
Patch application report for D4867 (id=17237)
Could not rebase; Attempt merge onto c7822752...
Updating c782275..a51be23 Fast-forward requirements-test.txt | 1 + swh/lister/bitbucket/__init__.py | 9 +- swh/lister/bitbucket/lister.py | 269 +++++-- swh/lister/bitbucket/models.py | 16 - swh/lister/bitbucket/tasks.py | 68 +- swh/lister/bitbucket/tests/conftest.py | 2 +- .../tests/data/bb_api_repositories_page1.json | 124 ++++ .../tests/data/bb_api_repositories_page2.json | 123 ++++ ...ies,after=1970-01-01T00:00:00+00:00,pagelen=100 | 806 --------------------- .../https_api.bitbucket.org/empty_response.json | 4 - .../data/https_api.bitbucket.org/response.json | 1 - swh/lister/bitbucket/tests/test_lister.py | 290 +++++--- swh/lister/bitbucket/tests/test_tasks.py | 81 +-- swh/lister/pypi/__init__.py | 3 +- swh/lister/pypi/lister.py | 106 ++- swh/lister/pypi/tasks.py | 10 +- swh/lister/pypi/tests/test_lister.py | 53 ++ swh/lister/pypi/tests/test_tasks.py | 5 +- 18 files changed, 845 insertions(+), 1126 deletions(-) delete mode 100644 swh/lister/bitbucket/models.py create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page1.json create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page2.json delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/2.0_repositories,after=1970-01-01T00:00:00+00:00,pagelen=100 delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/empty_response.json delete mode 120000 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/response.json
Changes applied before test
commit a51be23a3617d95a20c937e4ebd8d18bf3716861 Author: tenma <tenma+swh@mailbox.org> Date: Thu Jan 14 18:47:26 2021 +0100 [WIP] Reimplement PyPI lister using new Lister API The new lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically. commit 4dd90ca2f489f406ef924daad33832a38fef96b1 Author: tenma <tenma+swh@mailbox.org> Date: Wed Jan 13 15:44:07 2021 +0100 [WIP] Reimplement Bitbucket lister using new Lister API The new lister has incremental and full listing capability. It can request the Bitbucket API in anonymous and HTTP basic authentication modes. Rate-limiting is not aggressive and is handled. Listing mode, credentials and pagination parameters can be updated after creation.
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/96/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/96/console
Build has FAILED
Patch application report for D4867 (id=17239)
Could not rebase; Attempt merge onto c7822752...
Updating c782275..480caad Fast-forward requirements-test.txt | 1 + swh/lister/bitbucket/__init__.py | 9 +- swh/lister/bitbucket/lister.py | 269 +++++-- swh/lister/bitbucket/models.py | 16 - swh/lister/bitbucket/tasks.py | 68 +- swh/lister/bitbucket/tests/conftest.py | 2 +- .../tests/data/bb_api_repositories_page1.json | 124 ++++ .../tests/data/bb_api_repositories_page2.json | 123 ++++ ...ies,after=1970-01-01T00:00:00+00:00,pagelen=100 | 806 --------------------- .../https_api.bitbucket.org/empty_response.json | 4 - .../data/https_api.bitbucket.org/response.json | 1 - swh/lister/bitbucket/tests/test_lister.py | 290 +++++--- swh/lister/bitbucket/tests/test_tasks.py | 81 +-- swh/lister/pypi/__init__.py | 3 +- swh/lister/pypi/lister.py | 131 ++-- swh/lister/pypi/tasks.py | 10 +- swh/lister/pypi/tests/test_lister.py | 53 ++ swh/lister/pypi/tests/test_tasks.py | 5 +- 18 files changed, 827 insertions(+), 1169 deletions(-) delete mode 100644 swh/lister/bitbucket/models.py create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page1.json create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page2.json delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/2.0_repositories,after=1970-01-01T00:00:00+00:00,pagelen=100 delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/empty_response.json delete mode 120000 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/response.json
Changes applied before test
commit 480caadc455e282441ff56b81bb931e96fb35149 Author: tenma <tenma+swh@mailbox.org> Date: Thu Jan 14 18:47:26 2021 +0100 [WIP] Reimplement PyPI lister using new Lister API The new lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically. commit 4dd90ca2f489f406ef924daad33832a38fef96b1 Author: tenma <tenma+swh@mailbox.org> Date: Wed Jan 13 15:44:07 2021 +0100 [WIP] Reimplement Bitbucket lister using new Lister API The new lister has incremental and full listing capability. It can request the Bitbucket API in anonymous and HTTP basic authentication modes. Rate-limiting is not aggressive and is handled. Listing mode, credentials and pagination parameters can be updated after creation.
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/97/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/97/console
Build is green
Patch application report for D4867 (id=17240)
Could not rebase; Attempt merge onto c7822752...
Updating c782275..083e858 Fast-forward requirements-test.txt | 1 + swh/lister/bitbucket/__init__.py | 9 +- swh/lister/bitbucket/lister.py | 269 +++++-- swh/lister/bitbucket/models.py | 16 - swh/lister/bitbucket/tasks.py | 68 +- swh/lister/bitbucket/tests/conftest.py | 2 +- .../tests/data/bb_api_repositories_page1.json | 124 ++++ .../tests/data/bb_api_repositories_page2.json | 123 ++++ ...ies,after=1970-01-01T00:00:00+00:00,pagelen=100 | 806 --------------------- .../https_api.bitbucket.org/empty_response.json | 4 - .../data/https_api.bitbucket.org/response.json | 1 - swh/lister/bitbucket/tests/test_lister.py | 290 +++++--- swh/lister/bitbucket/tests/test_tasks.py | 81 +-- swh/lister/pypi/__init__.py | 3 +- swh/lister/pypi/lister.py | 131 ++-- swh/lister/pypi/tasks.py | 10 +- swh/lister/pypi/tests/test_lister.py | 53 ++ swh/lister/pypi/tests/test_tasks.py | 9 +- 18 files changed, 829 insertions(+), 1171 deletions(-) delete mode 100644 swh/lister/bitbucket/models.py create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page1.json create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page2.json delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/2.0_repositories,after=1970-01-01T00:00:00+00:00,pagelen=100 delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/empty_response.json delete mode 120000 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/response.json
Changes applied before test
commit 083e8585a50e3d2ece89e6f001befa0561d6312f Author: tenma <tenma+swh@mailbox.org> Date: Thu Jan 14 18:47:26 2021 +0100 [WIP] Reimplement PyPI lister using new Lister API The new lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically. commit 4dd90ca2f489f406ef924daad33832a38fef96b1 Author: tenma <tenma+swh@mailbox.org> Date: Wed Jan 13 15:44:07 2021 +0100 [WIP] Reimplement Bitbucket lister using new Lister API The new lister has incremental and full listing capability. It can request the Bitbucket API in anonymous and HTTP basic authentication modes. Rate-limiting is not aggressive and is handled. Listing mode, credentials and pagination parameters can be updated after creation.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/98/ for more details.
Build is green
Patch application report for D4867 (id=17241)
Rebasing onto c7822752...
Current branch diff-target is up to date.
Changes applied before test
commit 9de12fb5d2fef03177be38180a8cdd851a79a6a5 Author: tenma <tenma+swh@mailbox.org> Date: Thu Jan 14 18:47:26 2021 +0100 [WIP] Reimplement PyPI lister using new Lister API The new lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/99/ for more details.
Build is green
Patch application report for D4867 (id=17245)
Rebasing onto c7822752...
Current branch diff-target is up to date.
Changes applied before test
commit 46a216ee820f87e88c7104b8c8e991f02813e6f8 Author: tenma <tenma+swh@mailbox.org> Date: Thu Jan 14 18:47:26 2021 +0100 [WIP] Reimplement PyPI lister using new Lister API The new lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/100/ for more details.
Thanks!
I have made a few comments inline. All in all, as this lister makes a single request, I think we can drop the whole ratelimit handling logic which will make it much, much simpler.
You'll also need to remove the
models.py
file which should not be used any longer, as well as clean up the commented / stringified old code.Build is green
Patch application report for D4867 (id=17290)
Rebasing onto c7822752...
Current branch diff-target is up to date.
Changes applied before test
commit c087c93e8e2a81d93b846c50dff3d293e54ac713 Author: tenma <tenma+swh@mailbox.org> Date: Thu Jan 14 18:47:26 2021 +0100 [WIP] Reimplement PyPI lister using new Lister API The new lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/108/ for more details.
Looks fine to me, thanks. Please apply @anlambert's remaining comments before landing! :)
Build is green
Patch application report for D4867 (id=17331)
Rebasing onto 9fd91f00...
Current branch diff-target is up to date.
Changes applied before test
commit a0ef8dfd795e96958a07847a6bd101f8dac1d9f8 Author: tenma <tenma+swh@mailbox.org> Date: Thu Jan 14 18:47:26 2021 +0100 [WIP] Reimplement PyPI lister using new Lister API The new lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/122/ for more details.