Skip to content

postgresql: ensure origin_visit(_status) queries use index

When using an inner join for the single origin value, instead of a subquery, the query fails to use the (origin, visit) indexes and falls back to fetching all the visits (or all the statuses) for the origin and sorting them.

This breaks down for origins with a lot of visits, such as the ones that are being used for end to end monitoring.

Using a subselect to generate a single origin id value ensures that the queries can use the proper indexes.

Test Plan

unit tests pass unchanged

Comparison of postgresql query plans:

13:56 guest@softwareheritage => explain SELECT ov.visit, ov.date, ov.type FROM origin_visit ov INNER JOIN origin o ON o.id = ov.origin WHERE o.url = 'https://subversion.renater.fr/anonscm/svn/panda' ORDER BY ov.visit ASC LIMIT 1001;
                                                   QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Limit  (cost=612.53..612.54 rows=7 width=20)
   ->  Sort  (cost=612.53..612.54 rows=7 width=20)
         Sort Key: ov.visit
         ->  Nested Loop  (cost=0.58..612.43 rows=7 width=20)
               ->  Index Scan using origin_url_idx1 on origin o  (cost=0.00..2.02 rows=1 width=8)
                     Index Cond: (url = 'https://subversion.renater.fr/anonscm/svn/panda'::text)
               ->  Index Scan using origin_visit_pkey on origin_visit ov  (cost=0.58..604.24 rows=617 width=28)
                     Index Cond: (origin = o.id)
(8 lignes)

Temps : 10,580 ms
13:56 guest@softwareheritage => explain SELECT ov.visit, ov.date, ov.type FROM origin_visit ov INNER JOIN origin o ON o.id = ov.origin WHERE ov.origin = (select id from origin where url = 'https://subversion.renater.fr/anonscm/svn/panda') ORDER BY ov.visit ASC LIMIT 1001;
                                                QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────────────────────
 Limit  (cost=3.17..616.56 rows=617 width=20)
   InitPlan 1 (returns $0)
     ->  Index Scan using origin_url_idx1 on origin  (cost=0.00..2.02 rows=1 width=8)
           Index Cond: (url = 'https://subversion.renater.fr/anonscm/svn/panda'::text)
   ->  Nested Loop  (cost=1.15..614.54 rows=617 width=20)
         ->  Index Scan using origin_visit_pkey on origin_visit ov  (cost=0.58..604.24 rows=617 width=28)
               Index Cond: (origin = $0)
         ->  Materialize  (cost=0.57..2.59 rows=1 width=8)
               ->  Index Only Scan using origin_pkey on origin o  (cost=0.57..2.59 rows=1 width=8)
                     Index Cond: (id = $0)
(10 lignes)

Temps : 4,595 ms
13:56 guest@softwareheritage => explain SELECT ov.visit, ov.date, ov.type FROM origin_visit ov INNER JOIN origin o ON o.id = ov.origin WHERE ov.origin = (select id from origin where url = 'https://subversion.renater.fr/anonscm/svn/panda') ORDER BY ov.visit DESC LIMIT 1001;
                                                    QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Limit  (cost=3.17..616.56 rows=617 width=20)
   InitPlan 1 (returns $0)
     ->  Index Scan using origin_url_idx1 on origin  (cost=0.00..2.02 rows=1 width=8)
           Index Cond: (url = 'https://subversion.renater.fr/anonscm/svn/panda'::text)
   ->  Nested Loop  (cost=1.15..614.54 rows=617 width=20)
         ->  Index Scan Backward using origin_visit_pkey on origin_visit ov  (cost=0.58..604.24 rows=617 width=28)
               Index Cond: (origin = $0)
         ->  Materialize  (cost=0.57..2.59 rows=1 width=8)
               ->  Index Only Scan using origin_pkey on origin o  (cost=0.57..2.59 rows=1 width=8)
                     Index Cond: (id = $0)
(10 lignes)

Temps : 4,347 ms
13:56 guest@softwareheritage =>

Actual query time before change: 15s; after change: 15ms!


Migrated from D7554 (view on Phabricator)

Merge request reports