Skip to content
Snippets Groups Projects

mercurial.loader: Make it run within docker

  • [1] Without this diff, mercurial loader run within docker fails with multiple errors (in order, one error appears after another):
  • TypeError: can not serialize 'map' object
  • TypeError: can not serialize 'set' object

So this diff fixes those:

  • map is not ok when calling storage.content_missing
  • set are not ok when calling storage.{revision|release}_missing

No idea why the tests do not capture any of those issues though. I'm just unstucking this so people can run it within docker.

  • [1] The initial problem was along those lines (exactly like D3258#79482):
- swh.core.api.RemoteException: <RemoteException 500 AttributeError: ["'dict' object has no attribute 'url'"]>

where the self.origin being written to storage was a dict instead of an Origin model object [1].

That error is now gone with the current loader-core at least v0.2.0.

Test Plan

tox + run on docker:

docker-compose.override.yml:

version: '2'

services:
  swh-loader:
    volumes:
      # - "$SWH_ENVIRONMENT_HOME/swh-loader-core:/src/swh-loader-core"
      - "$SWH_ENVIRONMENT_HOME/swh-loader-mercurial:/src/swh-loader-mercurial"
$ doco up
$ doco exec swh-loader run mercurial https://www.mercurial-scm.org/repo/evolve/

Finally:

$ time doco exec swh-loader swh loader run mercurial https://www.mercurial-scm.org/repo/evolve/
WARNING:swh.core.cli:Could not load subcommand search: cannot import name 'get_journal_client' from 'swh.journal.cli' (/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/journal/cli.py)
INFO:swh.core.config:Loading config file /loader.yml
WARNING:swh.loader.mercurial.Bundle20Loader:No matching revision for tag 5.6.1 (hg changeset: 70694b2621ba9d919bc38303f8901e84caf5da0f). Skipping
{'status': 'eventful'}
docker-compose exec swh-loader swh loader run mercurial   0.59s user 0.61s system 0% cpu 2:10.59 total
$  time doco exec swh-loader swh loader run mercurial https://www.mercurial-scm.org/repo/evolve/
WARNING:swh.core.cli:Could not load subcommand search: cannot import name 'get_journal_client' from 'swh.journal.cli' (/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/journal/cli.py)
INFO:swh.core.config:Loading config file /loader.yml
WARNING:swh.loader.mercurial.Bundle20Loader:No matching revision for tag 5.6.1 (hg changeset: 70694b2621ba9d919bc38303f8901e84caf5da0f). Skipping
{'status': 'uneventful'}
docker-compose exec swh-loader swh loader run mercurial   0.59s user 0.53s system 2% cpu 40.954 total

Migrated from D3258 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build is green

    Patch application report for D3258 (id=11549)

    Rebasing onto 03c34b9e...

    Current branch diff-target is up to date.
    Changes applied before test
    commit f20891013265ed64094e763c75cf2a4d3ff330cd
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 10 16:26:34 2020 +0200
    
        mercurial.loader: Add missing type annotation to respect base class

    See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/7/ for more details.

    • mercurial.loader: Use list comprehension over map
    • mercurial.loader: Wrap list when calling _missing endpoints
  • Build is green

    Patch application report for D3258 (id=11550)

    Rebasing onto 03c34b9e...

    Current branch diff-target is up to date.
    Changes applied before test
    commit f1866671a417194e94a158583924985fedaae293
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 10 16:49:35 2020 +0200
    
        mercurial.loader: Wrap list when calling <object>_missing endpoints
        
        Prior to this commit, those calls were raising type error:
        
        ```
        TypeError: can not serialize 'set' object
        ```
    
    commit 1cbcc8ddb59ed8c6a37df78ad003a58a80f2cdc4
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 10 16:37:09 2020 +0200
    
        mercurial.loader: Use list comprehension over map
        
        Prior to this commit, map was raising type error during serialization step
        
        ```
        TypeError: can not serialize 'map' object
        ```
    
    commit f20891013265ed64094e763c75cf2a4d3ff330cd
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Wed Jun 10 16:26:34 2020 +0200
    
        mercurial.loader: Add missing type annotation to respect base class

    See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/8/ for more details.

  • Looks good to me. I have just tested with docker and prior to this diff, the mercurial loader was failing with that error:

    swh-loader_1                    | [2020-06-10 15:19:48,291: ERROR/ForkPoolWorker-1] Task swh.loader.mercurial.tasks.LoadMercurial[92e86f02-f56c-4cdd-8c59-580d9850b739] raised unexpected: RemoteException({'type': 'AttributeError', 'args': ["'dict' object has no attribute 'url'"], 'message': "'dict' object has no attribute 'url'", 'traceback': ['Traceback (most recent call last):\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request\n    rv = self.dispatch_request()\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request\n    return self.view_functions[rule.endpoint](**req.view_args)\n', '  File "<decorator-gen-110>", line 2, in origin_add_one\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 148, in _negotiate\n    return f.negotiator(*args, **kwargs)\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 82, in __call__\n    result = self.func(*args, **kwargs)\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 453, in _f\n    return obj_meth(**kw)\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/metrics.py", line 24, in d\n    return f(*a, **kw)\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/db/common.py", line 62, in _meth\n    return meth(self, *args, db=db, cur=cur, **kwargs)\n', '  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/storage.py", line 1201, in origin_add_one\n    origin_row = list(db.origin_get_by_url([origin.url], cur))[0]\n', "AttributeError: 'dict' object has no attribute 'url'\n"]})
    swh-loader_1                    | Traceback (most recent call last):
    swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 412, in trace_task
    swh-loader_1                    |     R = retval = fun(*args, **kwargs)
    swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 51, in __call__
    swh-loader_1                    |     result = super().__call__(*args, **kwargs)
    swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 704, in __protected_call__
    swh-loader_1                    |     return self.run(*args, **kwargs)
    swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/mercurial/tasks.py", line 22, in load_hg
    swh-loader_1                    |     return loader.load()
    swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 293, in load
    swh-loader_1                    |     self._store_origin_visit()
    swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 170, in _store_origin_visit
    swh-loader_1                    |     self.storage.origin_add_one(self.origin)
    swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 181, in meth_
    swh-loader_1                    |     return self.post(meth._endpoint_path, post_data)
    swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 278, in post
    swh-loader_1                    |     return self._decode_response(response)
    swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 352, in _decode_response
    swh-loader_1                    |     self.raise_for_status(response)
    swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/api/client.py", line 30, in raise_for_status
    swh-loader_1                    |     super().raise_for_status(response)
    swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 342, in raise_for_status
    swh-loader_1                    |     raise exception from None
    swh-loader_1                    | swh.core.api.RemoteException: <RemoteException 500 AttributeError: ["'dict' object has no attribute 'url'"]>
    

    Applying arc patch !113 in swh-loader-mercurial and using it through docker-compose.override.yml makes the issue goes away.

    I think is is time to add a docker test for the mercurial loader as we only have one for the git loader currently.

  • Merge request was accepted

  • Antoine Lambert approved this merge request

    approved this merge request

  • I think is is time to add a docker test for the mercurial loader as we only have one for the git loader currently.

    I agree but my understanding is that it will be rewritten completely soon. So might be not immediately ;)

    I have fixed it so @azecar (irc) could work without having first to debunk this ;)

  • Merge request was merged

Please register or sign in to reply
Loading