postgresql: Rewrite swh_visit_find_by_date to properly use existing indexes
swh_visit_find_by_date tries to do an index only scan on origin_visit (origin, date)
. However, by sorting the results on an extra column (the
visit id), the postgresql planner would end up doing an index scan,
which times out on origins with lots of visits.
Splitting the work across two steps (first find the date, which works with index only scans, then find the highest visit id, which usually only returns one row) fixes this issue.
Merge request reports
Activity
Filter activity
changed milestone to %MRO 2023
assigned to @olasd
on belvedere:
softwareheritage=# select * from tmp_swh_visit_find_by_date('https://foss.heptapod.net/mercurial/hgview'); origin | visit | date | type -----------+---------+-------------------------------+------ 155049819 | 1315926 | 2023-03-09 15:14:43.098082+00 | hg (1 row) Time: 1,453 ms softwareheritage=# select * from swh_visit_find_by_date('https://foss.heptapod.net/mercurial/hgview'); origin | visit | date | type -----------+---------+-------------------------------+------ 155049819 | 1315926 | 2023-03-09 15:14:43.098082+00 | hg (1 row) Time: 102106,104 ms (01:42,106)
Edited by Nicolas DandrimontJenkins job DSTO/gitlab-builds #116 failed .
See Console Output and Coverage Report for more details.added 1 commit
- 38987444 - postgresql: Rewrite swh_visit_find_by_date to properly use existing indexes
Jenkins job DSTO/gitlab-builds #117 succeeded .
See Console Output and Coverage Report for more details.
Please register or sign in to reply