Skip to content
Snippets Groups Projects

Hackage: Loads Hackage Listed origins

The loader make an http api call to retrieve package related versions. It then download tar.gz archive for each version.


Migrated from D8379 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build has FAILED

    Patch application report for D8379 (id=30240)

    Rebasing onto 68e68e3f...

    First, rewinding head to replay your work on top of it...
    Applying: Hackage: Loads Hackage Listed origins
    Using index info to reconstruct a base tree...
    M	setup.py
    Falling back to patching base and 3-way merge...
    Auto-merging setup.py
    CONFLICT (content): Merge conflict in setup.py
    Patch failed at 0001 Hackage: Loads Hackage Listed origins
    
    Resolve all conflicts manually, mark them as resolved with
    "git add/rm <conflicted_files>", then run "git rebase --continue".
    You can instead skip this commit: run "git rebase --skip".
    To abort and get back to the state before "git rebase", run "git rebase --abort".
    

    Rebase failed (ret=1)!

    Could not rebase; Attempt merge onto 68e68e3f...

    Already up to date.
    Changes applied before test
    commit 78b960bcb579a682cb89182734d1c77324bae904
    Author: Franck Bret <franck.bret@octobus.net>
    Date:   Fri Sep 2 09:06:15 2022 +0200
    
        Hackage: Loads Hackage Listed origins
        
        The loader make an http api call to retrieve package related versions.
        It then download tar.gz archive for each version.

    Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/850/ See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/850/console

  • Add Hackage loader entry to Package Loader specifications documentation

  • Build is green

    Patch application report for D8379 (id=30243)

    Rebasing onto 68e68e3f...

    First, rewinding head to replay your work on top of it...
    Applying: Hackage: Loads Hackage Listed origins
    Using index info to reconstruct a base tree...
    M	docs/package-loader-specifications.rst
    M	setup.py
    Falling back to patching base and 3-way merge...
    Auto-merging setup.py
    CONFLICT (content): Merge conflict in setup.py
    Auto-merging docs/package-loader-specifications.rst
    Patch failed at 0001 Hackage: Loads Hackage Listed origins
    
    Resolve all conflicts manually, mark them as resolved with
    "git add/rm <conflicted_files>", then run "git rebase --continue".
    You can instead skip this commit: run "git rebase --skip".
    To abort and get back to the state before "git rebase", run "git rebase --abort".
    

    Rebase failed (ret=1)!

    Could not rebase; Attempt merge onto 68e68e3f...

    Already up to date.
    Changes applied before test
    commit d8b15a65dbded63842c31f3a447c2d488e4e39a8
    Author: Franck Bret <franck.bret@octobus.net>
    Date:   Fri Sep 2 09:06:15 2022 +0200
    
        Hackage: Loads Hackage Listed origins
        
        The loader make an http api call to retrieve package related versions.
        It then download tar.gz archive for each version.

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/851/ for more details.

  • @ardumont @vlorentz Did not found a way to get a release date per version with this one.

  • ! In !313 (closed), @franckbret wrote: @ardumont @vlorentz Did not found a way to get a release date per version with this one.

    EDIT: nevermind, it's not per-version

  • Found a way:

    $ curl https://hackage.haskell.org/package/colors-0.1/revisions/ -H "Accept: application/json"   
    [{"number":0,"time":"2013-06-01T13:59:19Z","user":"FumiakiKinoshita"}]
  • ! In !313 (closed), @vlorentz wrote: Found a way:

    $ curl https://hackage.haskell.org/package/colors-0.1/revisions/ -H "Accept: application/json"   
    [{"number":0,"time":"2013-06-01T13:59:19Z","user":"FumiakiKinoshita"}]

    Yep, looks good. I had explored it but been deceived by the results of the revisions endpoint that did not return the same length:

    (.venv) franck@debian-franck:~/playground/swh$ http -b --json https://hackage.haskell.org/package/colors
    {
        "0.1": "normal",
        "0.1.1": "normal",
        "0.2": "normal",
        "0.2.0.1": "normal",
        "0.3": "normal",
        "0.3.0.1": "normal",
        "0.3.0.2": "normal"
    }
    
    (.venv) franck@debian-franck:~/playground/swh$ http -b --json https://hackage.haskell.org/package/colors/revisions/
    [
        {
            "number": 0,
            "time": "2015-02-23T03:50:39Z",
            "user": "FumiakiKinoshita"
        },
        {
            "number": 1,
            "time": "2015-06-06T14:20:04Z",
            "user": "FumiakiKinoshita"
        },
        {
            "number": 2,
            "time": "2022-08-27T11:55:04Z",
            "user": "FumiakiKinoshita"
        }
    ]
    

    Finding more information here https://github.com/haskell-infra/hackage-trustees/blob/master/revisions-information.md I understand that a version can have several revisions date. Also when calling https://hackage.haskell.org/package/colors/revisions/ the results are the same as for the latest version.

    (.venv) franck@debian-franck:~/playground/swh$ http -b --json https://hackage.haskell.org/package/colors-0.3.0.2/revisions/
    [
        {
            "number": 0,
            "time": "2015-02-23T03:50:39Z",
            "user": "FumiakiKinoshita"
        },
        {
            "number": 1,
            "time": "2015-06-06T14:20:04Z",
            "user": "FumiakiKinoshita"
        },
        {
            "number": 2,
            "time": "2022-08-27T11:55:04Z",
            "user": "FumiakiKinoshita"
        }
    ]
     
    

    For the colors module I've checked all versions/revisions and it looks good. Will implement now the that for each version we make an api call to get its last revision date.

  • Ensure we make a json get request adding correct headers

  • Build is green

    Patch application report for D8379 (id=30253)

    Rebasing onto 68e68e3f...

    First, rewinding head to replay your work on top of it...
    Applying: Hackage: Loads Hackage Listed origins
    Using index info to reconstruct a base tree...
    M	docs/package-loader-specifications.rst
    M	setup.py
    Falling back to patching base and 3-way merge...
    Auto-merging setup.py
    CONFLICT (content): Merge conflict in setup.py
    Auto-merging docs/package-loader-specifications.rst
    Patch failed at 0001 Hackage: Loads Hackage Listed origins
    
    Resolve all conflicts manually, mark them as resolved with
    "git add/rm <conflicted_files>", then run "git rebase --continue".
    You can instead skip this commit: run "git rebase --skip".
    To abort and get back to the state before "git rebase", run "git rebase --abort".
    

    Rebase failed (ret=1)!

    Could not rebase; Attempt merge onto 68e68e3f...

    Already up to date.
    Changes applied before test
    commit 7bc768e7150d49bc5cec69c4f6afc1ab8d64b1ba
    Author: Franck Bret <franck.bret@octobus.net>
    Date:   Fri Sep 2 09:06:15 2022 +0200
    
        Hackage: Loads Hackage Listed origins
        
        The loader make an http api call to retrieve package related versions.
        It then download tar.gz archive for each version.

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/852/ for more details.

  • Cabal keys may be Capitalized

    Just discovered while running the loader in docker that some .cabl files may have their key name capitalized. Manage the case and add a test.

  • Build is green

    Patch application report for D8379 (id=30254)

    Rebasing onto 68e68e3f...

    First, rewinding head to replay your work on top of it...
    Applying: Hackage: Loads Hackage Listed origins
    Using index info to reconstruct a base tree...
    M	docs/package-loader-specifications.rst
    M	setup.py
    Falling back to patching base and 3-way merge...
    Auto-merging setup.py
    CONFLICT (content): Merge conflict in setup.py
    Auto-merging docs/package-loader-specifications.rst
    Patch failed at 0001 Hackage: Loads Hackage Listed origins
    
    Resolve all conflicts manually, mark them as resolved with
    "git add/rm <conflicted_files>", then run "git rebase --continue".
    You can instead skip this commit: run "git rebase --skip".
    To abort and get back to the state before "git rebase", run "git rebase --abort".
    

    Rebase failed (ret=1)!

    Could not rebase; Attempt merge onto 68e68e3f...

    Already up to date.
    Changes applied before test
    commit 59765fb4a36b2b362d8fe09125471b65f6909284
    Author: Franck Bret <franck.bret@octobus.net>
    Date:   Fri Sep 2 09:06:15 2022 +0200
    
        Hackage: Loads Hackage Listed origins
        
        The loader make an http api call to retrieve package related versions.
        It then download tar.gz archive for each version.

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/853/ for more details.

  • Fill release date

    p_info now call an http api endpoint to retrieve revisions date for a version and use the most recent one to set last_modified date. Add fixture and adap tests.

  • Build is green

    Patch application report for D8379 (id=30272)

    Rebasing onto 68e68e3f...

    First, rewinding head to replay your work on top of it...
    Applying: Hackage: Loads Hackage Listed origins
    Using index info to reconstruct a base tree...
    M	docs/package-loader-specifications.rst
    M	setup.py
    Falling back to patching base and 3-way merge...
    Auto-merging setup.py
    CONFLICT (content): Merge conflict in setup.py
    Auto-merging docs/package-loader-specifications.rst
    Patch failed at 0001 Hackage: Loads Hackage Listed origins
    
    Resolve all conflicts manually, mark them as resolved with
    "git add/rm <conflicted_files>", then run "git rebase --continue".
    You can instead skip this commit: run "git rebase --skip".
    To abort and get back to the state before "git rebase", run "git rebase --abort".
    

    Rebase failed (ret=1)!

    Could not rebase; Attempt merge onto 68e68e3f...

    Already up to date.
    Changes applied before test
    commit c42cd46a89c60851b99c629cf9474f63ec8bbb5e
    Author: Franck Bret <franck.bret@octobus.net>
    Date:   Fri Sep 2 09:06:15 2022 +0200
    
        Hackage: Loads Hackage Listed origins
        
        The loader make an http api call to retrieve package related versions.
        It then download tar.gz archive for each version.

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/855/ for more details.

  • I've run the loader tasks for a few hours. no errors, some not_found.

    swh-scheduler=# select count(*) from origin_visit_stats where visit_type='hackage' and last_visit_status='not_found';                                                    
     count 
    -------
         8
    (1 row)
    
    swh-scheduler=# select count(*) from origin_visit_stats where visit_type='hackage' and last_visit_status='successful';                                                   
     count 
    -------
      9195
    (1 row)
    
    swh-scheduler=# select count(*) from origin_visit_stats where visit_type='hackage' and last_visit_status='failed';                                                       
     count 
    -------
         0
    
  • @vlorentz @ardumont Can we merge this one?

  • Build is green

    Patch application report for D8379 (id=30349)

    Rebasing onto 6cdf6d30...

    Current branch diff-target is up to date.
    Changes applied before test
    commit fb5ccdf5085998df8b4d20ef00fd337ea03b9c5c
    Author: Franck Bret <franck.bret@octobus.net>
    Date:   Fri Sep 2 09:06:15 2022 +0200
    
        Hackage: Loads Hackage Listed origins
        
        The loader make an http api call to retrieve package related versions.
        It then download tar.gz archive for each version.

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/863/ for more details.

  • Could you investigate the source of not_found packages? It shouldn't happen unless the package was deleted between listing time and loading time; which seems unlikely here

  • That diff requires some changes as the api_info function got renamed (see inline comments).

    Also while testing the loader in docker, I got a couple of errors on some packages, see below:

    docker-swh-loader-1  | [2022-09-26 13:06:42,922: ERROR/ForkPoolWorker-8] Failed to load branch releases/0.1.0 for https://hackage.haskell.org/package/numeric-qq
    docker-swh-loader-1  | Traceback (most recent call last):
    docker-swh-loader-1  |   File "/src/swh-loader-core/swh/loader/package/loader.py", line 672, in load
    docker-swh-loader-1  |     res = self._load_release(p_info, origin)
    docker-swh-loader-1  |   File "/src/swh-loader-core/swh/loader/package/loader.py", line 851, in _load_release
    docker-swh-loader-1  |     p_info, uncompressed_path, directory=directory.hash
    docker-swh-loader-1  |   File "/src/swh-loader-core/swh/loader/package/hackage/loader.py", line 171, in build_release
    docker-swh-loader-1  |     assert version == p_info.version
    docker-swh-loader-1  | AssertionError
    docker-swh-loader-1  | [2022-09-26 13:08:03,416: ERROR/ForkPoolWorker-11] Failed to load branch releases/1.0.0.0 for https://hackage.haskell.org/package/haskell2010
    docker-swh-loader-1  | Traceback (most recent call last):
    docker-swh-loader-1  |   File "/src/swh-loader-core/swh/loader/package/loader.py", line 672, in load
    docker-swh-loader-1  |     res = self._load_release(p_info, origin)
    docker-swh-loader-1  |   File "/src/swh-loader-core/swh/loader/package/loader.py", line 851, in _load_release
    docker-swh-loader-1  |     p_info, uncompressed_path, directory=directory.hash
    docker-swh-loader-1  |   File "/src/swh-loader-core/swh/loader/package/hackage/loader.py", line 172, in build_release
    docker-swh-loader-1  |     author = Person.from_fullname(intrinsic_metadata["author"].encode())
    docker-swh-loader-1  | KeyError: 'author'
    docker-swh-loader-1  | [2022-09-26 13:21:31,790: ERROR/ForkPoolWorker-40] Failed to load branch releases/0.1.0.0 for https://hackage.haskell.org/package/hs-inspector
    docker-swh-loader-1  | Traceback (most recent call last):
    docker-swh-loader-1  |   File "/src/swh-loader-core/swh/loader/package/loader.py", line 672, in load
    docker-swh-loader-1  |     res = self._load_release(p_info, origin)
    docker-swh-loader-1  |   File "/src/swh-loader-core/swh/loader/package/loader.py", line 851, in _load_release
    docker-swh-loader-1  |     p_info, uncompressed_path, directory=directory.hash
    docker-swh-loader-1  |   File "/src/swh-loader-core/swh/loader/package/hackage/loader.py", line 173, in build_release
    docker-swh-loader-1  |     description: str = intrinsic_metadata["synopsis"]
    docker-swh-loader-1  | KeyError: 'synopsis'
    docker-swh-loader-1  | Traceback (most recent call last):
    docker-swh-loader-1  |   File "/src/swh-loader-core/swh/loader/package/loader.py", line 672, in load
    docker-swh-loader-1  |     res = self._load_release(p_info, origin)
    docker-swh-loader-1  |   File "/src/swh-loader-core/swh/loader/package/loader.py", line 851, in _load_release
    docker-swh-loader-1  |     p_info, uncompressed_path, directory=directory.hash
    docker-swh-loader-1  |   File "/src/swh-loader-core/swh/loader/package/hackage/loader.py", line 170, in build_release
    docker-swh-loader-1  |     version: str = intrinsic_metadata["version"]
    docker-swh-loader-1  | KeyError: 'version'
  • Merge request was returned for changes

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading