Skip to content

core.lister_base: Ensure deterministic _task_key return value

While playing with the Phabricator lister, I sometimes encoutered the following error:

$ python3 test_phabricator_lister.py 
2019-05-15 15:29:42,959 DEBUG swh.lister.core.lister_base Loading config from lister_phabricator
2019-05-15 15:29:42,960 INFO swh.core.config Loading config file /home/antoine/.config/swh/lister_phabricator.yml
2019-05-15 15:29:42,962 DEBUG swh.lister.core.lister_base <swh.lister.phabricator.lister.PhabricatorLister object at 0x7f747ce698d0> CONFIG={'lister': {'cls': 'local', 'args': {'db': 'postgresql:///lister-phabricator'}}, 'content_size_limit': 104857600, 'log_db': 'dbname=softwareheritage-log', 'cache_responses': False, 'scheduler': {'cls': 'remote', 'args': {'url': 'http://localhost:5008/'}}, 'cache_dir': '/home/antoine/.cache/swh/lister/phabricator', 'storage': {'cls': 'remote', 'args': {'url': 'http://localhost:5002/'}}, 'credentials': []}
2019-05-15 15:29:42,980 DEBUG urllib3.connectionpool Starting new HTTPS connection (1): phabricator.wikimedia.org:443
2019-05-15 15:29:43,325 DEBUG urllib3.connectionpool https://phabricator.wikimedia.org:443 "GET /api/diffusion.repository.search?api.token=api-3hqsvzuf3f7lxvlt33tbl7xcvqnm&order=oldest&attachments[uris]=1&after=&order=oldest&limit=1 HTTP/1.1" 200 None
2019-05-15 15:29:43,353 DEBUG urllib3.connectionpool Starting new HTTP connection (1): localhost:5002
2019-05-15 15:29:43,356 DEBUG urllib3.connectionpool http://localhost:5002 "POST /origin/add_multi HTTP/1.1" 200 77
2019-05-15 15:29:43,357 DEBUG urllib3.connectionpool Starting new HTTP connection (1): localhost:5008
2019-05-15 15:29:43,379 DEBUG urllib3.connectionpool http://localhost:5008 "POST /create_tasks HTTP/1.1" 200 312
Traceback (most recent call last):
  File "test_phabricator_lister.py", line 19, in <module>
    api_token='api-3hqsvzuf3f7lxvlt33tbl7xcvqnm')
  File "/usr/lib/python3/dist-packages/celery/local.py", line 191, in __call__
    return self._get_current_object()(*a, **kw)
  File "/home/antoine/swh/swh-environment/swh-scheduler/swh/scheduler/task.py", line 45, in __call__
    return super().__call__(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/celery/app/task.py", line 375, in __call__
    return self.run(*args, **kwargs)
  File "/home/antoine/swh/swh-environment/swh-lister/swh/lister/phabricator/tasks.py", line 23, in full_phabricator_lister
    lister.run()
  File "/home/antoine/swh/swh-environment/swh-lister/swh/lister/phabricator/lister.py", line 102, in run
    min_bound = self._bootstrap_repositories_listing()
  File "/home/antoine/swh/swh-environment/swh-lister/swh/lister/phabricator/lister.py", line 87, in _bootstrap_repositories_listing
    self.create_missing_origins_and_tasks(models_list, injected)
  File "/home/antoine/swh/swh-environment/swh-lister/swh/lister/core/lister_base.py", line 503, in create_missing_origins_and_tasks
    ir, m, _ = tasks[_task_key(task)]
KeyError: 'origin-update-git-{"kwargs": {}, "args": ["https://phabricator.wikimedia.org/source/mediawiki.git"]}'

This is due to the _task_key private function that is not deterministic regarding the return value. For instance, with the above example, it can either return:

  • 'origin-update-git-{"kwargs": {}, "args": ["https://phabricator.wikimedia.org/source/mediawiki.git"]}'
  • 'origin-update-git-{"args": ["https://phabricator.wikimedia.org/source/mediawiki.git"], "kwargs": {}}'

The proper fix to that issue is to sort the keys of the JSON document to dump.


Migrated from D1474 (view on Phabricator)

Merge request reports