Skip to content

NPM lister is failing with a database update conflict

The npm listing is stuck since the 1st of December

softwareheritage-scheduler=> select * from task where type = 'list-npm-full';
    id     |     type      |         arguments          |           next_run            | current_interval |       status       |  policy   | retries_left | priority 
-----------+---------------+----------------------------+-------------------------------+------------------+--------------------+-----------+--------------+----------
 153874548 | list-npm-full | {"args": [], "kwargs": {}} | 2021-12-01 15:31:30.993357+00 | 12:00:00         | next_run_scheduled | recurring |            0 | 
(1 row)

The loadings are failing with this error [1] (which prevents the scheduler task's status to be updated which prevents another run) :

Dec 06 11:29:54 worker11 python3[292484]: [2021-12-06 11:29:54,941: ERROR/ForkPoolWorker-6] Task swh.lister.npm.tasks.NpmListerTask[4fe2bd37-5c48-47b8-b55e-6ee14ffd9684] raised unexpected: RemoteException({'type': 'CardinalityViolation', 'module': 'psycopg2.errors', 'args': ['ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values.\n'], 'message': 'ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values.\n', 'traceback': ['Traceback (most recent call last):\n', '  File "/usr/lib/python3/dist-packages/flask/app.py", line 1813, in full_dispatch_request\n    rv = self.dispatch_request()\n', '  File "/usr/lib/python3/dist-packages/flask/app.py", line 1799, in dispatch_request\n    return self.view_functions[rule.endpoint](**req.view_args)\n', '  File "/usr/lib/python3/dist-packages/swh/core/api/negotiation.py", line 153, in newf\n    return f.negotiator(*args, **kwargs)\n', '  File "/usr/lib/python3/dist-packages/swh/core/api/negotiation.py", line 81, in __call__\n    result = self.func(*args, **kwargs)\n', '  File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 460, in _f\n    return obj_meth(**kw)\n', '  File "/usr/lib/python3/dist-packages/swh/core/db/common.py", line 62, in _meth\n    return meth(self, *args, db=db, cur=cur, **kwargs)\n', '  File "/usr/lib/python3/dist-packages/swh/scheduler/backend.py", line 280, in record_listed_origins\n    fetch=True,\n', '  File "/usr/lib/python3/dist-packages/psycopg2/extras.py", line 1281, in execute_values\n    cur.execute(b\'\'.join(parts))\n', '  File "/usr/lib/python3/dist-packages/psycopg2/extras.py", line 243, in execute\n    return super(RealDictCursor, self).execute(query, vars)\n', 'psycopg2.errors.CardinalityViolation: ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values.\n\n']})
                                          Traceback (most recent call last):
                                            File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 385, in trace_task
                                              R = retval = fun(*args, **kwargs)
                                            File "/usr/lib/python3/dist-packages/swh/scheduler/task.py", line 55, in __call__
                                              result = super().__call__(*args, **kwargs)
                                            File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 650, in __protected_call__
                                              return self.run(*args, **kwargs)
                                            File "/usr/lib/python3/dist-packages/sentry_sdk/integrations/celery.py", line 161, in _inner
                                              reraise(*exc_info)
                                            File "/usr/lib/python3/dist-packages/sentry_sdk/_compat.py", line 57, in reraise
                                              raise value
                                            File "/usr/lib/python3/dist-packages/sentry_sdk/integrations/celery.py", line 156, in _inner
                                              return f(*args, **kwargs)
                                            File "/usr/lib/python3/dist-packages/swh/lister/npm/tasks.py", line 14, in list_npm_full
                                              return lister.run().dict()
                                            File "/usr/lib/python3/dist-packages/swh/lister/pattern.py", line 130, in run
                                              full_stats.origins += self.send_origins(origins)
                                            File "/usr/lib/python3/dist-packages/swh/lister/pattern.py", line 234, in send_origins
                                              ret = self.scheduler.record_listed_origins(batch_origins)
                                            File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 181, in meth_
                                              return self.post(meth._endpoint_path, post_data)
                                            File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 278, in post
                                              return self._decode_response(response)
                                            File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 354, in _decode_response
                                              self.raise_for_status(response)
                                            File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 344, in raise_for_status
                                              raise exception from None
                                          swh.core.api.RemoteException: <RemoteException 500 CardinalityViolation: ['ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values.\n']>

Migrated from T3769 (view on Phabricator)

Edited by Antoine R. Dumont