mercurial.loader: Make it run within docker
- [1] Without this diff, mercurial loader run within docker fails with multiple errors (in order, one error appears after another):
- TypeError: can not serialize 'map' object
- TypeError: can not serialize 'set' object
So this diff fixes those:
-
map
is not ok when callingstorage.content_missing
- set are not ok when calling
storage.{revision|release}_missing
No idea why the tests do not capture any of those issues though. I'm just unstucking this so people can run it within docker.
- [1] The initial problem was along those lines (exactly like D3258#79482):
- swh.core.api.RemoteException: <RemoteException 500 AttributeError: ["'dict' object has no attribute 'url'"]>
where the self.origin being written to storage was a dict instead of an Origin model object [1].
That error is now gone with the current loader-core at least v0.2.0.
Test Plan
tox + run on docker:
docker-compose.override.yml:
version: '2'
services:
swh-loader:
volumes:
# - "$SWH_ENVIRONMENT_HOME/swh-loader-core:/src/swh-loader-core"
- "$SWH_ENVIRONMENT_HOME/swh-loader-mercurial:/src/swh-loader-mercurial"
$ doco up
$ doco exec swh-loader run mercurial https://www.mercurial-scm.org/repo/evolve/
Finally:
$ time doco exec swh-loader swh loader run mercurial https://www.mercurial-scm.org/repo/evolve/
WARNING:swh.core.cli:Could not load subcommand search: cannot import name 'get_journal_client' from 'swh.journal.cli' (/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/journal/cli.py)
INFO:swh.core.config:Loading config file /loader.yml
WARNING:swh.loader.mercurial.Bundle20Loader:No matching revision for tag 5.6.1 (hg changeset: 70694b2621ba9d919bc38303f8901e84caf5da0f). Skipping
{'status': 'eventful'}
docker-compose exec swh-loader swh loader run mercurial 0.59s user 0.61s system 0% cpu 2:10.59 total
$ time doco exec swh-loader swh loader run mercurial https://www.mercurial-scm.org/repo/evolve/
WARNING:swh.core.cli:Could not load subcommand search: cannot import name 'get_journal_client' from 'swh.journal.cli' (/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/journal/cli.py)
INFO:swh.core.config:Loading config file /loader.yml
WARNING:swh.loader.mercurial.Bundle20Loader:No matching revision for tag 5.6.1 (hg changeset: 70694b2621ba9d919bc38303f8901e84caf5da0f). Skipping
{'status': 'uneventful'}
docker-compose exec swh-loader swh loader run mercurial 0.59s user 0.53s system 2% cpu 40.954 total
Migrated from D3258 (view on Phabricator)
Merge request reports
Activity
Build is green
Patch application report for D3258 (id=11549)
Rebasing onto 03c34b9e...
Current branch diff-target is up to date.
Changes applied before test
commit f20891013265ed64094e763c75cf2a4d3ff330cd Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 10 16:26:34 2020 +0200 mercurial.loader: Add missing type annotation to respect base class
See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/7/ for more details.
Build is green
Patch application report for D3258 (id=11550)
Rebasing onto 03c34b9e...
Current branch diff-target is up to date.
Changes applied before test
commit f1866671a417194e94a158583924985fedaae293 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 10 16:49:35 2020 +0200 mercurial.loader: Wrap list when calling <object>_missing endpoints Prior to this commit, those calls were raising type error: ``` TypeError: can not serialize 'set' object ``` commit 1cbcc8ddb59ed8c6a37df78ad003a58a80f2cdc4 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 10 16:37:09 2020 +0200 mercurial.loader: Use list comprehension over map Prior to this commit, map was raising type error during serialization step ``` TypeError: can not serialize 'map' object ``` commit f20891013265ed64094e763c75cf2a4d3ff330cd Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jun 10 16:26:34 2020 +0200 mercurial.loader: Add missing type annotation to respect base class
See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/8/ for more details.
Looks good to me. I have just tested with docker and prior to this diff, the mercurial loader was failing with that error:
swh-loader_1 | [2020-06-10 15:19:48,291: ERROR/ForkPoolWorker-1] Task swh.loader.mercurial.tasks.LoadMercurial[92e86f02-f56c-4cdd-8c59-580d9850b739] raised unexpected: RemoteException({'type': 'AttributeError', 'args': ["'dict' object has no attribute 'url'"], 'message': "'dict' object has no attribute 'url'", 'traceback': ['Traceback (most recent call last):\n', ' File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request\n rv = self.dispatch_request()\n', ' File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request\n return self.view_functions[rule.endpoint](**req.view_args)\n', ' File "<decorator-gen-110>", line 2, in origin_add_one\n', ' File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 148, in _negotiate\n return f.negotiator(*args, **kwargs)\n', ' File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 82, in __call__\n result = self.func(*args, **kwargs)\n', ' File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 453, in _f\n return obj_meth(**kw)\n', ' File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/metrics.py", line 24, in d\n return f(*a, **kw)\n', ' File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/db/common.py", line 62, in _meth\n return meth(self, *args, db=db, cur=cur, **kwargs)\n', ' File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/storage.py", line 1201, in origin_add_one\n origin_row = list(db.origin_get_by_url([origin.url], cur))[0]\n', "AttributeError: 'dict' object has no attribute 'url'\n"]}) swh-loader_1 | Traceback (most recent call last): swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 412, in trace_task swh-loader_1 | R = retval = fun(*args, **kwargs) swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 51, in __call__ swh-loader_1 | result = super().__call__(*args, **kwargs) swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 704, in __protected_call__ swh-loader_1 | return self.run(*args, **kwargs) swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/mercurial/tasks.py", line 22, in load_hg swh-loader_1 | return loader.load() swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 293, in load swh-loader_1 | self._store_origin_visit() swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 170, in _store_origin_visit swh-loader_1 | self.storage.origin_add_one(self.origin) swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 181, in meth_ swh-loader_1 | return self.post(meth._endpoint_path, post_data) swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 278, in post swh-loader_1 | return self._decode_response(response) swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 352, in _decode_response swh-loader_1 | self.raise_for_status(response) swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/api/client.py", line 30, in raise_for_status swh-loader_1 | super().raise_for_status(response) swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 342, in raise_for_status swh-loader_1 | raise exception from None swh-loader_1 | swh.core.api.RemoteException: <RemoteException 500 AttributeError: ["'dict' object has no attribute 'url'"]>
Applying
arc patch !113
inswh-loader-mercurial
and using it throughdocker-compose.override.yml
makes the issue goes away.I think is is time to add a docker test for the mercurial loader as we only have one for the git loader currently.
I think is is time to add a docker test for the mercurial loader as we only have one for the git loader currently.
I agree but my understanding is that it will be rewritten completely soon. So might be not immediately ;)
I have fixed it so @azecar (irc) could work without having first to debunk this ;)