postgresql: improve db.origin_visit_status_add to fix possible replayer issues
In case of an existing OriginVisitStatus entry (typically with the 'created' status), adding a new version of this entry should do an update instead of ignoring it. This could happen during a replayer session where the insertion of the OriginVisit used to automatically create an OriginVisitStatus in "created" status. This would then prevent the following replayer to actually insert the correct OriginVisitStatus.
Closes #4704 (closed)
Merge request reports
Activity
mentioned in issue #4704 (closed)
Jenkins job DSTO/gitlab-builds #450 succeeded .
See Console Output and Coverage Report for more details.994 999 f"(origin, {', '.join(cols)}, metadata) " 995 1000 f"VALUES ((select id from origin_id), " 996 1001 f"{', '.join(['%s']*len(cols))}, %s) " 997 f"ON CONFLICT (origin, visit, date) do nothing", 1002 f"ON CONFLICT (origin, visit, date) " 1003 "do update " 1004 " set " 1005 " type=excluded.type, " 1006 " status=excluded.status, " 1007 " snapshot=excluded.snapshot, " 1008 " metadata=excluded.metadata " 1009 " where origin_visit_status.origin=excluded.origin " 1010 " and origin_visit_status.visit=excluded.visit " 1011 " and origin_visit_status.date=excluded.date " 1012 " and origin_visit_status.status='created' " 1013 " and origin_visit_status.status!=excluded.status", - Comment on lines +1002 to +1013
1002 f"ON CONFLICT (origin, visit, date) " 1003 "do update " 1004 " set " 1005 " type=excluded.type, " 1006 " status=excluded.status, " 1007 " snapshot=excluded.snapshot, " 1008 " metadata=excluded.metadata " 1009 " where origin_visit_status.origin=excluded.origin " 1010 " and origin_visit_status.visit=excluded.visit " 1011 " and origin_visit_status.date=excluded.date " 1012 " and origin_visit_status.status='created' " 1013 " and origin_visit_status.status!=excluded.status", 1002 f"ON CONFLICT (origin, visit, date) " 1003 "DO UPDATE " 1004 " SET " 1005 " type=excluded.type, " 1006 " status=excluded.status, " 1007 " snapshot=excluded.snapshot, " 1008 " metadata=excluded.metadata " 1009 " WHERE origin_visit_status.origin=excluded.origin " 1010 " AND origin_visit_status.visit=excluded.visit " 1011 " AND origin_visit_status.date=excluded.date " 1012 " AND origin_visit_status.status='created' " 1013 " AND origin_visit_status.status!=excluded.status",
992 997 f"WITH origin_id as (select id from origin where url=%s) " 993 998 f"INSERT INTO origin_visit_status " 994 999 f"(origin, {', '.join(cols)}, metadata) " 995 1000 f"VALUES ((select id from origin_id), " 996 1001 f"{', '.join(['%s']*len(cols))}, %s) " 997 f"ON CONFLICT (origin, visit, date) do nothing", 1002 f"ON CONFLICT (origin, visit, date) " 1003 "do update " 1004 " set " 1005 " type=excluded.type, " 1006 " status=excluded.status, " 1007 " snapshot=excluded.snapshot, " 1008 " metadata=excluded.metadata " 1009 " where origin_visit_status.origin=excluded.origin " 1010 " and origin_visit_status.visit=excluded.visit " 1011 " and origin_visit_status.date=excluded.date " Even though I proposed it, I'm not too fond of the idea of adding back an UPDATE statement in the PostgreSQL, rather than rely on the idempotence of the contents of the table, especially as a workaround for a bug, and even more so on a table that we've created specifically to stop having to do UPDATEs.
If anything I'd prefer moving towards never creating origin_visit_status implicitly when creating an origin_visit, to make sure this never occurs again.
I think we should consider the option of pruning the table of entries with origin_visit_status.date = origin_visit.date and status=created and relaunching the replayer?
Even though I proposed it, I'm not too fond of the idea of adding back an UPDATE statement in the PostgreSQL, rather than rely on the idempotence of the contents of the table, especially as a workaround for a bug, and even more so on a table that we've created specifically to stop having to do UPDATEs.
Good remark, makes sense indeed...
If anything I'd prefer moving towards never creating origin_visit_status implicitly when creating an origin_visit, to make sure this never occurs again.
the question is: when/why do we need this implicitly created entry? what problem does it actually solve (apart from some idea of consistency maybe?)
I think we should consider the option of pruning the table of entries with origin_visit_status.date = origin_visit.date and status=created and relaunching the replayer?
Let me play with this idea, but as we discussed previously, we probably can ditch all the 'created' OVS entries (without the filtering on dates). Don't see why it would be an issue to do so.
If anything I'd prefer moving towards never creating origin_visit_status implicitly when creating an origin_visit, to make sure this never occurs again.
the question is: when/why do we need this implicitly created entry? what problem does it actually solve (apart from some idea of consistency maybe?)
It records that an origin visit was attempted, even if it failed so hard that it couldn't record anything else. So basically it guarantees that at least one entry in origin_visit_status exists for each origin_visit.
I think we should consider the option of pruning the table of entries with origin_visit_status.date = origin_visit.date and status=created and relaunching the replayer?
Let me play with this idea, but as we discussed previously, we probably can ditch all the 'created' OVS entries (without the filtering on dates). Don't see why it would be an issue to do so.
Yeah, that sounds fine to me.
Edited by vlorentzIt records that an origin visit was attempted, even if it failed so hard that it couldn't record anything else. So basically it guarantees that at least one entry in origin_visit_status exists for each origin_visit.
I get that, but do we really need a "created" row in the OVS table to carry that information? Why cannot this state be carried by an OV row with no corresponding OVS entry?
I see that currently, at least
Db.origin_visit_get()
depends on this (but it can be refactored), maybe there are other places depending on this behavior?
mentioned in issue swh/meta#4370