Skip to content
Snippets Groups Projects

{Content|Directory}Loader: Register tasks

Related to T3781 Depends on !437 (closed)


Migrated from D8601 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build is green

    Patch application report for D8601 (id=31059)

    Could not rebase; Attempt merge onto dbf7f3dc...

    Updating dbf7f3d..ccb8f36
    Fast-forward
     docs/README.rst                      |   7 ++-
     setup.py                             |   2 +
     swh/loader/core/__init__.py          |  27 ++++++++++
     swh/loader/core/loader.py            |  73 +++++++++++++------------
     swh/loader/core/tasks.py             |  20 +++++++
     swh/loader/core/tests/test_loader.py | 100 +++++++++++++++++++++++++----------
     swh/loader/core/tests/test_tasks.py  |  34 ++++++++++++
     7 files changed, 196 insertions(+), 67 deletions(-)
     create mode 100644 swh/loader/core/tasks.py
     create mode 100644 swh/loader/core/tests/test_tasks.py
    Changes applied before test
    commit ccb8f36834bd9a7463f7182321a4adfc7915c26f
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Sun Oct 2 19:18:23 2022 +0200
    
        {Content|Directory}Loader: Register tasks
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
    
    commit 39c33a66c27c030c7ae3cfba4a393c2ced468fbc
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Fri Sep 30 15:10:49 2022 +0200
    
        {Content|Directory}Loader: Adapt support for checksums
        
        This adapts the content/directory loader implementations to use directly a checksums
        dict which is now sent by the listers.
        
        This improves the loader to check those checksums when retrieving the artifact (content
        or tarball). Thanks to a bump in the swh.model version, this is now able to deal with
        sha512 checksums checks as well.
        
        This also aligns with the current package loaders which now are also checking the
        integrity of the tarballs they ingest.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/944/ for more details.

  • I recalled it is also important to test the loader parameters when created from ListedOrigin instances so I have added this test below:

    diff --git a/swh/loader/core/tests/test_tasks.py b/swh/loader/core/tests/test_tasks.py
    index bf8d2e1..c0ab361 100644
    --- a/swh/loader/core/tests/test_tasks.py
    +++ b/swh/loader/core/tests/test_tasks.py
    @@ -3,8 +3,13 @@
     # License: GNU General Public License version 3, or any later version
     # See top-level LICENSE file for more information
     
    +import uuid
    +
     import pytest
     
    +from swh.scheduler.model import ListedOrigin, Lister
    +from swh.scheduler.utils import create_origin_task_dict
    +
     NAMESPACE = "swh.loader.core"
     
     
    @@ -32,3 +37,44 @@ def test_tasks_content_loader(
         assert res.successful()
         assert mock_load.called
         assert res.result == {"status": "eventful"}
    +
    +
    +@pytest.fixture
    +def nixguix_lister():
    +    return Lister(name="nixguix", instance_name="example", id=uuid.uuid4())
    +
    +
    +@pytest.mark.parametrize("loader_name", ["Content", "Directory"])
    +def test_tasks_loader_for_listed_origin(
    +    mocker,
    +    swh_scheduler_celery_app,
    +    swh_scheduler_celery_worker,
    +    swh_config,
    +    nixguix_lister,
    +    loader_name,
    +):
    +    mock_load = mocker.patch(f"{NAMESPACE}.loader.{loader_name}Loader.load")
    +    mock_load.return_value = {"status": "eventful"}
    +
    +    listed_origin = ListedOrigin(
    +        lister_id=nixguix_lister.id,
    +        url="https://example.org/archives",
    +        visit_type=loader_name.lower(),
    +        extra_loader_arguments={
    +            "origin_url": "https://example.org/artifact/artifact",
    +            "fallback_urls": ["https://example.org/mirror/artifact-0.0.1.pkg.xz"],
    +            "checksums": {"sha256": "some-valid-checksum"},
    +        },
    +    )
    +
    +    task_dict = create_origin_task_dict(listed_origin, nixguix_lister)
    +
    +    res = swh_scheduler_celery_app.send_task(
    +        f"{NAMESPACE}.tasks.Load{loader_name}",
    +        kwargs=task_dict["arguments"]["kwargs"],
    +    )
    +    assert res
    +    res.wait()
    +    assert res.successful()
    +    assert mock_load.called
    +    assert res.result == {"status": "eventful"}

    I got the following error when running tests:

    [2022-10-03 15:08:28,803: ERROR/MainProcess] Task swh.loader.core.tasks.LoadContent[5da7f77a-d698-44d4-b6e5-4a3b2d72e852] raised unexpected: TypeError("__init__() got an unexpected keyword argument 'url'")
    Traceback (most recent call last):
      File "/home/anlambert/.virtualenvs/swh/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task
        R = retval = fun(*args, **kwargs)
      File "/home/anlambert/swh/swh-environment/swh-scheduler/swh/scheduler/task.py", line 61, in __call__
        result = super().__call__(*args, **kwargs)
      File "/home/anlambert/.virtualenvs/swh/lib/python3.9/site-packages/celery/app/task.py", line 392, in __call__
        return self.run(*args, **kwargs)
      File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/tasks.py", line 14, in load_content
        return ContentLoader.from_configfile(**kwargs).load()
      File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 188, in from_configfile
        return cls.from_config(**config)
      File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 174, in from_config
        return cls(storage=storage_instance, **config)
      File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 713, in __init__
        super().__init__(*args, **kwargs)
      File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 675, in __init__
        super().__init__(*args, **kwargs)
    TypeError: __init__() got an unexpected keyword argument 'url'

    It seems some loader parameters should be renamed or made explicit or the loading task will fail to execute otherwise.

  • Merge request was returned for changes

  • ah! yeah, it's the first time we are scheduling ListedOrigins for non package loaders... Thanks for raising this up!

  • Fix constructor mess

  • Build is green

    Patch application report for D8601 (id=31075)

    Rebasing onto c631349a...

    Current branch diff-target is up to date.
    Changes applied before test
    commit a5255f1064533efc5f9032eb66ad7f12c2468f9b
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Sun Oct 2 19:18:23 2022 +0200
    
        {Content|Directory}Loader: Register tasks
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/951/ for more details.

  • fwiw, this is working in docker ^.

  • Looks good to me, makes me think that I should deduplicate the tests for the tasks created from ListedOrigins in loaders through a fixture and add missing ones, I will push a diff for that.

  • Merge request was accepted

  • Antoine Lambert approved this merge request

    approved this merge request

  • Merge request was merged

Please register or sign in to reply
Loading