Race condition on person insertion in pgsql storage
While running a parallel script that inserts lots of revisions with the same author (\x
, ie. the empty string) in an empty db, I got this error:
Traceback (most recent call last):
File "./scripts/cassandra-bench-tools.py", line 264, in <module>
cli(auto_envvar_prefix='SWH_BENCH')
File "/home/vlorentz/.local/lib/python3.5/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/vlorentz/.local/lib/python3.5/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/vlorentz/.local/lib/python3.5/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/vlorentz/.local/lib/python3.5/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/vlorentz/.local/lib/python3.5/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/vlorentz/.local/lib/python3.5/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/vlorentz/.local/lib/python3.5/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "./scripts/cassandra-bench-tools.py", line 259, in pgsql_import_dataset
push_revisions('/dev/stdin', storage)
File "./scripts/cassandra-bench-tools.py", line 127, in push_revisions
stats = storage.revision_add(batch)
File "/home/vlorentz/swh-environment/swh-core/swh/core/db/common.py", line 49, in _meth
return meth(self, *args, db=db, cur=cur, **kwargs)
File "/home/vlorentz/swh-environment/swh-storage/swh/storage/storage.py", line 717, in revision_add
db.revision_add_from_temp(cur)
File "/home/vlorentz/swh-environment/swh-core/swh/core/db/db_utils.py", line 33, in _meth
self._cursor(cur).execute('SELECT %s()' % stored_proc)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "person_fullname_idx"
DETAIL: Key (fullname)=(\x) already exists.
CONTEXT: SQL statement "with t as (
select author_fullname as fullname, author_name as name, author_email as email from tmp_revision
union
select committer_fullname as fullname, committer_name as name, committer_email as email from tmp_revision
) insert into person (fullname, name, email)
select distinct on (fullname) fullname, name, email from t
where not exists (
select 1
from person p
where t.fullname = p.fullname
)"
PL/pgSQL function swh_person_add_from_revision() line 3 at SQL statement
SQL statement "SELECT swh_person_add_from_revision()"
PL/pgSQL function swh_revision_add() line 3 at PERFORM
Migrated from T1684 (view on Phabricator)