Skip to content
Snippets Groups Projects

Reimplement PyPI lister using new Lister API

This is a straight port of the old lister. The lister has only full listing capability. It scrapes pypi.org list of packages. Rate-limiting was not encountered but is handled generically.

Related to #2956 (closed)

Test Plan

tox


Migrated from D4867 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build has FAILED

    Patch application report for D4867 (id=17237)

    Could not rebase; Attempt merge onto c7822752...

    Updating c782275..a51be23
    Fast-forward
     requirements-test.txt                              |   1 +
     swh/lister/bitbucket/__init__.py                   |   9 +-
     swh/lister/bitbucket/lister.py                     | 269 +++++--
     swh/lister/bitbucket/models.py                     |  16 -
     swh/lister/bitbucket/tasks.py                      |  68 +-
     swh/lister/bitbucket/tests/conftest.py             |   2 +-
     .../tests/data/bb_api_repositories_page1.json      | 124 ++++
     .../tests/data/bb_api_repositories_page2.json      | 123 ++++
     ...ies,after=1970-01-01T00:00:00+00:00,pagelen=100 | 806 ---------------------
     .../https_api.bitbucket.org/empty_response.json    |   4 -
     .../data/https_api.bitbucket.org/response.json     |   1 -
     swh/lister/bitbucket/tests/test_lister.py          | 290 +++++---
     swh/lister/bitbucket/tests/test_tasks.py           |  81 +--
     swh/lister/pypi/__init__.py                        |   3 +-
     swh/lister/pypi/lister.py                          | 106 ++-
     swh/lister/pypi/tasks.py                           |  10 +-
     swh/lister/pypi/tests/test_lister.py               |  53 ++
     swh/lister/pypi/tests/test_tasks.py                |   5 +-
     18 files changed, 845 insertions(+), 1126 deletions(-)
     delete mode 100644 swh/lister/bitbucket/models.py
     create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page1.json
     create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page2.json
     delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/2.0_repositories,after=1970-01-01T00:00:00+00:00,pagelen=100
     delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/empty_response.json
     delete mode 120000 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/response.json
    Changes applied before test
    commit a51be23a3617d95a20c937e4ebd8d18bf3716861
    Author: tenma <tenma+swh@mailbox.org>
    Date:   Thu Jan 14 18:47:26 2021 +0100
    
        [WIP] Reimplement PyPI lister using new Lister API
        
        The new lister has only full listing capability.
        It scrapes pypi.org list of packages.
        Rate-limiting was not encountered but is handled generically.
    
    commit 4dd90ca2f489f406ef924daad33832a38fef96b1
    Author: tenma <tenma+swh@mailbox.org>
    Date:   Wed Jan 13 15:44:07 2021 +0100
    
        [WIP] Reimplement Bitbucket lister using new Lister API
        
        The new lister has incremental and full listing capability.
        It can request the Bitbucket API in anonymous and HTTP basic authentication
        modes. Rate-limiting is not aggressive and is handled.
        Listing mode, credentials and pagination parameters can be updated after
        creation.

    Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/96/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/96/console

  • Removed commented out code

  • Build has FAILED

    Patch application report for D4867 (id=17239)

    Could not rebase; Attempt merge onto c7822752...

    Updating c782275..480caad
    Fast-forward
     requirements-test.txt                              |   1 +
     swh/lister/bitbucket/__init__.py                   |   9 +-
     swh/lister/bitbucket/lister.py                     | 269 +++++--
     swh/lister/bitbucket/models.py                     |  16 -
     swh/lister/bitbucket/tasks.py                      |  68 +-
     swh/lister/bitbucket/tests/conftest.py             |   2 +-
     .../tests/data/bb_api_repositories_page1.json      | 124 ++++
     .../tests/data/bb_api_repositories_page2.json      | 123 ++++
     ...ies,after=1970-01-01T00:00:00+00:00,pagelen=100 | 806 ---------------------
     .../https_api.bitbucket.org/empty_response.json    |   4 -
     .../data/https_api.bitbucket.org/response.json     |   1 -
     swh/lister/bitbucket/tests/test_lister.py          | 290 +++++---
     swh/lister/bitbucket/tests/test_tasks.py           |  81 +--
     swh/lister/pypi/__init__.py                        |   3 +-
     swh/lister/pypi/lister.py                          | 131 ++--
     swh/lister/pypi/tasks.py                           |  10 +-
     swh/lister/pypi/tests/test_lister.py               |  53 ++
     swh/lister/pypi/tests/test_tasks.py                |   5 +-
     18 files changed, 827 insertions(+), 1169 deletions(-)
     delete mode 100644 swh/lister/bitbucket/models.py
     create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page1.json
     create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page2.json
     delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/2.0_repositories,after=1970-01-01T00:00:00+00:00,pagelen=100
     delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/empty_response.json
     delete mode 120000 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/response.json
    Changes applied before test
    commit 480caadc455e282441ff56b81bb931e96fb35149
    Author: tenma <tenma+swh@mailbox.org>
    Date:   Thu Jan 14 18:47:26 2021 +0100
    
        [WIP] Reimplement PyPI lister using new Lister API
        
        The new lister has only full listing capability.
        It scrapes pypi.org list of packages.
        Rate-limiting was not encountered but is handled generically.
    
    commit 4dd90ca2f489f406ef924daad33832a38fef96b1
    Author: tenma <tenma+swh@mailbox.org>
    Date:   Wed Jan 13 15:44:07 2021 +0100
    
        [WIP] Reimplement Bitbucket lister using new Lister API
        
        The new lister has incremental and full listing capability.
        It can request the Bitbucket API in anonymous and HTTP basic authentication
        modes. Rate-limiting is not aggressive and is handled.
        Listing mode, credentials and pagination parameters can be updated after
        creation.

    Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/97/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/97/console

  • Fix offending test, thanks anlambert

  • Build is green

    Patch application report for D4867 (id=17240)

    Could not rebase; Attempt merge onto c7822752...

    Updating c782275..083e858
    Fast-forward
     requirements-test.txt                              |   1 +
     swh/lister/bitbucket/__init__.py                   |   9 +-
     swh/lister/bitbucket/lister.py                     | 269 +++++--
     swh/lister/bitbucket/models.py                     |  16 -
     swh/lister/bitbucket/tasks.py                      |  68 +-
     swh/lister/bitbucket/tests/conftest.py             |   2 +-
     .../tests/data/bb_api_repositories_page1.json      | 124 ++++
     .../tests/data/bb_api_repositories_page2.json      | 123 ++++
     ...ies,after=1970-01-01T00:00:00+00:00,pagelen=100 | 806 ---------------------
     .../https_api.bitbucket.org/empty_response.json    |   4 -
     .../data/https_api.bitbucket.org/response.json     |   1 -
     swh/lister/bitbucket/tests/test_lister.py          | 290 +++++---
     swh/lister/bitbucket/tests/test_tasks.py           |  81 +--
     swh/lister/pypi/__init__.py                        |   3 +-
     swh/lister/pypi/lister.py                          | 131 ++--
     swh/lister/pypi/tasks.py                           |  10 +-
     swh/lister/pypi/tests/test_lister.py               |  53 ++
     swh/lister/pypi/tests/test_tasks.py                |   9 +-
     18 files changed, 829 insertions(+), 1171 deletions(-)
     delete mode 100644 swh/lister/bitbucket/models.py
     create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page1.json
     create mode 100644 swh/lister/bitbucket/tests/data/bb_api_repositories_page2.json
     delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/2.0_repositories,after=1970-01-01T00:00:00+00:00,pagelen=100
     delete mode 100644 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/empty_response.json
     delete mode 120000 swh/lister/bitbucket/tests/data/https_api.bitbucket.org/response.json
    Changes applied before test
    commit 083e8585a50e3d2ece89e6f001befa0561d6312f
    Author: tenma <tenma+swh@mailbox.org>
    Date:   Thu Jan 14 18:47:26 2021 +0100
    
        [WIP] Reimplement PyPI lister using new Lister API
        
        The new lister has only full listing capability.
        It scrapes pypi.org list of packages.
        Rate-limiting was not encountered but is handled generically.
    
    commit 4dd90ca2f489f406ef924daad33832a38fef96b1
    Author: tenma <tenma+swh@mailbox.org>
    Date:   Wed Jan 13 15:44:07 2021 +0100
    
        [WIP] Reimplement Bitbucket lister using new Lister API
        
        The new lister has incremental and full listing capability.
        It can request the Bitbucket API in anonymous and HTTP basic authentication
        modes. Rate-limiting is not aggressive and is handled.
        Listing mode, credentials and pagination parameters can be updated after
        creation.

    See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/98/ for more details.

  • Rebase

  • Build is green

    Patch application report for D4867 (id=17241)

    Rebasing onto c7822752...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 9de12fb5d2fef03177be38180a8cdd851a79a6a5
    Author: tenma <tenma+swh@mailbox.org>
    Date:   Thu Jan 14 18:47:26 2021 +0100
    
        [WIP] Reimplement PyPI lister using new Lister API
        
        The new lister has only full listing capability.
        It scrapes pypi.org list of packages.
        Rate-limiting was not encountered but is handled generically.

    See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/99/ for more details.

  • Use requests.Response.raise_for_status to handle errors generically

  • Build is green

    Patch application report for D4867 (id=17245)

    Rebasing onto c7822752...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 46a216ee820f87e88c7104b8c8e991f02813e6f8
    Author: tenma <tenma+swh@mailbox.org>
    Date:   Thu Jan 14 18:47:26 2021 +0100
    
        [WIP] Reimplement PyPI lister using new Lister API
        
        The new lister has only full listing capability.
        It scrapes pypi.org list of packages.
        Rate-limiting was not encountered but is handled generically.

    See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/100/ for more details.

  • Thanks!

    I have made a few comments inline. All in all, as this lister makes a single request, I think we can drop the whole ratelimit handling logic which will make it much, much simpler.

    You'll also need to remove the models.py file which should not be used any longer, as well as clean up the commented / stringified old code.

  • Merge request was returned for changes

  • In short, let's KISS!

    • remove useless code as per feeback
    • improve tests
  • Build is green

    Patch application report for D4867 (id=17290)

    Rebasing onto c7822752...

    Current branch diff-target is up to date.
    Changes applied before test
    commit c087c93e8e2a81d93b846c50dff3d293e54ac713
    Author: tenma <tenma+swh@mailbox.org>
    Date:   Thu Jan 14 18:47:26 2021 +0100
    
        [WIP] Reimplement PyPI lister using new Lister API
        
        The new lister has only full listing capability.
        It scrapes pypi.org list of packages.
        Rate-limiting was not encountered but is handled generically.

    See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/108/ for more details.

  • Looks fine to me, thanks. Please apply @anlambert's remaining comments before landing! :)

  • Merge request was accepted

  • Nicolas Dandrimont approved this merge request

    approved this merge request

  • Clean code and fix page type

  • Build is green

    Patch application report for D4867 (id=17331)

    Rebasing onto 9fd91f00...

    Current branch diff-target is up to date.
    Changes applied before test
    commit a0ef8dfd795e96958a07847a6bd101f8dac1d9f8
    Author: tenma <tenma+swh@mailbox.org>
    Date:   Thu Jan 14 18:47:26 2021 +0100
    
        [WIP] Reimplement PyPI lister using new Lister API
        
        The new lister has only full listing capability.
        It scrapes pypi.org list of packages.
        Rate-limiting was not encountered but is handled generically.

    See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/122/ for more details.

  • Remove "WIP" from the commit message Remove the lister model

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading