NPM lister is failing with a database update conflict
The npm listing is stuck since the 1st of December
softwareheritage-scheduler=> select * from task where type = 'list-npm-full';
id | type | arguments | next_run | current_interval | status | policy | retries_left | priority
-----------+---------------+----------------------------+-------------------------------+------------------+--------------------+-----------+--------------+----------
153874548 | list-npm-full | {"args": [], "kwargs": {}} | 2021-12-01 15:31:30.993357+00 | 12:00:00 | next_run_scheduled | recurring | 0 |
(1 row)
The loadings are failing with this error [1] (which prevents the scheduler task's status to be updated which prevents another run) :
Dec 06 11:29:54 worker11 python3[292484]: [2021-12-06 11:29:54,941: ERROR/ForkPoolWorker-6] Task swh.lister.npm.tasks.NpmListerTask[4fe2bd37-5c48-47b8-b55e-6ee14ffd9684] raised unexpected: RemoteException({'type': 'CardinalityViolation', 'module': 'psycopg2.errors', 'args': ['ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.\n'], 'message': 'ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.\n', 'traceback': ['Traceback (most recent call last):\n', ' File "/usr/lib/python3/dist-packages/flask/app.py", line 1813, in full_dispatch_request\n rv = self.dispatch_request()\n', ' File "/usr/lib/python3/dist-packages/flask/app.py", line 1799, in dispatch_request\n return self.view_functions[rule.endpoint](**req.view_args)\n', ' File "/usr/lib/python3/dist-packages/swh/core/api/negotiation.py", line 153, in newf\n return f.negotiator(*args, **kwargs)\n', ' File "/usr/lib/python3/dist-packages/swh/core/api/negotiation.py", line 81, in __call__\n result = self.func(*args, **kwargs)\n', ' File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 460, in _f\n return obj_meth(**kw)\n', ' File "/usr/lib/python3/dist-packages/swh/core/db/common.py", line 62, in _meth\n return meth(self, *args, db=db, cur=cur, **kwargs)\n', ' File "/usr/lib/python3/dist-packages/swh/scheduler/backend.py", line 280, in record_listed_origins\n fetch=True,\n', ' File "/usr/lib/python3/dist-packages/psycopg2/extras.py", line 1281, in execute_values\n cur.execute(b\'\'.join(parts))\n', ' File "/usr/lib/python3/dist-packages/psycopg2/extras.py", line 243, in execute\n return super(RealDictCursor, self).execute(query, vars)\n', 'psycopg2.errors.CardinalityViolation: ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.\n\n']})
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 385, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/lib/python3/dist-packages/swh/scheduler/task.py", line 55, in __call__
result = super().__call__(*args, **kwargs)
File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 650, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/lib/python3/dist-packages/sentry_sdk/integrations/celery.py", line 161, in _inner
reraise(*exc_info)
File "/usr/lib/python3/dist-packages/sentry_sdk/_compat.py", line 57, in reraise
raise value
File "/usr/lib/python3/dist-packages/sentry_sdk/integrations/celery.py", line 156, in _inner
return f(*args, **kwargs)
File "/usr/lib/python3/dist-packages/swh/lister/npm/tasks.py", line 14, in list_npm_full
return lister.run().dict()
File "/usr/lib/python3/dist-packages/swh/lister/pattern.py", line 130, in run
full_stats.origins += self.send_origins(origins)
File "/usr/lib/python3/dist-packages/swh/lister/pattern.py", line 234, in send_origins
ret = self.scheduler.record_listed_origins(batch_origins)
File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 181, in meth_
return self.post(meth._endpoint_path, post_data)
File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 278, in post
return self._decode_response(response)
File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 354, in _decode_response
self.raise_for_status(response)
File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 344, in raise_for_status
raise exception from None
swh.core.api.RemoteException: <RemoteException 500 CardinalityViolation: ['ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.\n']>
Migrated from T3769 (view on Phabricator)
Edited by Antoine R. Dumont