Skip to content
Snippets Groups Projects

packagist: Reimplement lister using new Lister API

The previous implementation was generating tasks for a non implemented Packagist loader.

The new implementation extracts source repository URL, VCS type and last update date for each package referenced by Packagist and send those info to the scheduler.

Packages metadata are retrieved using Packagist API endpoints whose responses are served from static files, which are guaranteed to be efficient on the Packagist side (no dymamic queries). Furthermore, subsequent listing will send the If-Modified-Since HTTP header to only retrieve packages metadata updated since the previous listing operation in order to save bandwidth and return only origins which might have new released versions.

I tested intensively the lister yersteday and it worked without any issues each time I executed it. First execution took around 90 minutes and listed 286510 origins with three different visit types: git, hg and svn. Subsequent calls took less time thanks to the If-Mofified-Since HTTP header use and only returned packages modified since last listing.

Closes #2991 (closed)


Migrated from D4990 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build is green

    Patch application report for D4990 (id=17798)

    Rebasing onto 8e4dd178...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 478081c1513b240f85c78cc66e9a3109eff91608
    Author: Antoine Lambert <antoine.lambert@inria.fr>
    Date:   Mon Feb 1 17:34:10 2021 +0100
    
        packagist: Reimplement lister using new Lister API
        
        The previous implementation was generating tasks for a non implemented
        Packagist loader.
        
        The new implementation extracts source repository URL, VCS type and
        last update date for each package referenced by Packagist and send
        those info to the scheduler.
        
        Packages metadata are retrieved using Packagist API endpoints whose
        responses are served from static files, which are guaranteed to be
        efficient on the Packagist side (no dymamic queries).
        Furthermore, subsequent listing will send the "If-Modified-Since" HTTP
        header to only retrieve packages metadata updated since the previous
        listing operation in order to save bandwidth and return only origins
        which might have new released versions.
        
        Closes #2991

    See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/234/ for more details.

  • lgtm

    But it's missing some coverage on conditionals (according to jenkins).

    Maybe simply enrich the current test dataset with some of those skipped packages in the current new dataset you added (one bitbucket entry, another with missing origin_url, another with missing time, etc...)

  • Author Maintainer

    ! In !204 (closed), @ardumont wrote: lgtm

    But it's missing some coverage on conditionals (according to jenkins).

    Maybe simply enrich the current test dataset with some of those skipped packages in the current new dataset you added (one bitbucket entry, another with missing origin_url, another with missing time, etc...)

    Ack, will improve coverage then.

  • Author Maintainer

    Rebase and improve coverage

  • Build is green

    Patch application report for D4990 (id=17810)

    Rebasing onto 82ab96ad...

    Current branch diff-target is up to date.
    Changes applied before test
    commit ff05191b7db7b217c8682e9888338b8813e2df6a
    Author: Antoine Lambert <antoine.lambert@inria.fr>
    Date:   Mon Feb 1 17:34:10 2021 +0100
    
        packagist: Reimplement lister using new Lister API
        
        The previous implementation was generating tasks for a non implemented
        Packagist loader.
        
        The new implementation extracts source repository URL, VCS type and
        last update date for each package referenced by Packagist and send
        those info to the scheduler.
        
        Packages metadata are retrieved using Packagist API endpoints whose
        responses are served from static files, which are guaranteed to be
        efficient on the Packagist side (no dymamic queries).
        Furthermore, subsequent listing will send the "If-Modified-Since" HTTP
        header to only retrieve packages metadata updated since the previous
        listing operation in order to save bandwidth and return only origins
        which might have new released versions.
        
        Closes #2991

    See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/238/ for more details.

  • Antoine R. Dumont mentioned in merge request !357 (closed)

    mentioned in merge request !357 (closed)

  • Thanks.

    (I had forgotten to actually validate it ¯_(ツ)_/¯ )

  • Merge request was accepted

  • Antoine R. Dumont approved this merge request

    approved this merge request

  • Author Maintainer

    Merge request was merged

Please register or sign in to reply
Loading