Skip to content
Snippets Groups Projects

Increase runtime of origin_visit_find_by_date

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build is green

    Patch application report for D7977 (id=28736)

    Rebasing onto c19f53f1...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 0695d2b91a102dae65082c0277e88a018a6802e6
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Jun 9 16:08:38 2022 +0200
    
        Increase runtime of origin_visit_find_by_date
        
        https://sentry.softwareheritage.org/share/issue/83a40553efc74f30bd6714ce023e6b3f/
        shows it very uniformly times out.

    See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1622/ for more details.

  • I think the proper fix would be to add a specific index to the origin_visit table in PostgreSQL database (see swh-web#4198).

    It looks like main database has it:

    15:56 $ psql service=swh
    psql (12.11 (Debian 12.11-1.pgdg110+1), server 12.9 (Debian 12.9-1.pgdg110+1))
    SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
    Type "help" for help.
    
    softwareheritage=> \d origin_visit
                        Table "public.origin_visit"
     Column |           Type           | Collation | Nullable | Default 
    --------+--------------------------+-----------+----------+---------
     origin | bigint                   |           | not null | 
     visit  | bigint                   |           | not null | 
     date   | timestamp with time zone |           | not null | 
     type   | text                     |           | not null | 
    Indexes:
        "origin_visit_pkey" PRIMARY KEY, btree (origin, visit)
        "origin_visit_date_idx" btree (date)
        "origin_visit_type_date" btree (type, date)
    Foreign-key constraints:
        "origin_visit_origin_fkey" FOREIGN KEY (origin) REFERENCES origin(id)
    Referenced by:
        TABLE "origin_visit_status" CONSTRAINT "origin_visit_status_origin_visit_fkey" FOREIGN KEY (origin, visit) REFERENCES origin_visit(origin, visit)
    Publications:
        "softwareheritage"

    while replica database, used by the webapp in production, does not have it:

    16:25 $ psql service=swh-replica
    psql (12.11 (Debian 12.11-1.pgdg110+1), server 13.5 (Debian 13.5-1.pgdg110+1))
    WARNING: psql major version 12, server major version 13.
             Some psql features might not work.
    SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
    Type "help" for help.
    
    softwareheritage=> \d origin_visit
                        Table "public.origin_visit"
     Column |           Type           | Collation | Nullable | Default 
    --------+--------------------------+-----------+----------+---------
     origin | bigint                   |           | not null | 
     visit  | bigint                   |           | not null | 
     date   | timestamp with time zone |           | not null | 
     type   | text                     |           | not null | 
    Indexes:
        "origin_visit_pkey" PRIMARY KEY, btree (origin, visit)
    
  • It seems @anlambert is right, some index [3] is missing on the replica [2] vs the main db [1] hence the query plans divergence ([2] is more costly). The main query plan [1] uses the missing replica index. So that missing index is currently being created on somerset [3].

    HTH

    • [3]
    create index concurrently on origin_visit(date);
    • [1]
    16:52:03 softwareheritage@belvedere:5432=> explain  with closest_two_visits as ((
    softwareheritage(>     select ov, (date - '2022-06-09 14:53:39.751496+00'), visit as interval
    softwareheritage(>     from origin_visit ov
    softwareheritage(>     where ov.origin = 31236370
    softwareheritage(>           and ov.date >= '2022-06-09 14:53:39.751496+00'
    softwareheritage(>     order by ov.date asc, ov.visit desc
    softwareheritage(>     limit 1
    softwareheritage(>   ) union (
    softwareheritage(>     select ov, ('2022-06-09 14:53:39.751496+00' - date), visit as interval
    softwareheritage(>     from origin_visit ov
    softwareheritage(>     where ov.origin = 31236370
    softwareheritage(>           and ov.date < '2022-06-09 14:53:39.751496+00'
    softwareheritage(>     order by ov.date desc, ov.visit desc
    softwareheritage(>     limit 1
    softwareheritage(>   )) select (ov).* from closest_two_visits order by interval, visit limit 1;
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |                                                                             QUERY PLAN                                                                              |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Limit  (cost=888.88..888.89 rows=1 width=64)                                                                                                                        |
    |   ->  Sort  (cost=888.88..888.89 rows=2 width=64)                                                                                                                   |
    |         Sort Key: closest_two_visits."interval", ((closest_two_visits.ov).visit)                                                                                    |
    |         ->  Subquery Scan on closest_two_visits  (cost=888.83..888.87 rows=2 width=64)                                                                              |
    |               ->  Unique  (cost=888.83..888.85 rows=2 width=56)                                                                                                     |
    |                     ->  Sort  (cost=888.83..888.84 rows=2 width=56)                                                                                                 |
    |                           Sort Key: "*SELECT* 1".ov, "*SELECT* 1"."?column?", "*SELECT* 1"."interval"                                                               |
    |                           ->  Append  (cost=20.10..888.82 rows=2 width=56)                                                                                          |
    |                                 ->  Subquery Scan on "*SELECT* 1"  (cost=20.10..20.12 rows=1 width=76)                                                              |
    |                                       ->  Limit  (cost=20.10..20.11 rows=1 width=84)                                                                                |
    |                                             ->  Sort  (cost=20.10..20.11 rows=1 width=84)                                                                           |
    |                                                   Sort Key: ov.date, ov.visit DESC                                                                                  |
    |                                                   ->  Bitmap Heap Scan on origin_visit ov  (cost=19.07..20.09 rows=1 width=84)                                      |
    |                                                         Recheck Cond: ((date >= '2022-06-09 14:53:39.751496+00'::timestamp with time zone) AND (origin = 31236370)) |
    |                                                         ->  BitmapAnd  (cost=19.07..19.07 rows=1 width=0)                                                           |
    |                                                               ->  Bitmap Index Scan on origin_visit_date_idx  (cost=0.00..6.58 rows=401 width=0)                    |
    |                                                                     Index Cond: (date >= '2022-06-09 14:53:39.751496+00'::timestamp with time zone)                 |
    |                                                               ->  Bitmap Index Scan on origin_visit_pkey  (cost=0.00..12.24 rows=888 width=0)                       |
    |                                                                     Index Cond: (origin = 31236370)                                                                 |
    |                                 ->  Subquery Scan on "*SELECT* 2"  (cost=868.69..868.70 rows=1 width=76)                                                            |
    |                                       ->  Limit  (cost=868.69..868.69 rows=1 width=84)                                                                              |
    |                                             ->  Sort  (cost=868.69..870.91 rows=888 width=84)                                                                       |
    |                                                   Sort Key: ov_1.date DESC, ov_1.visit DESC                                                                         |
    |                                                   ->  Index Scan using origin_visit_pkey on origin_visit ov_1  (cost=0.58..864.25 rows=888 width=84)                |
    |                                                         Index Cond: (origin = 31236370)                                                                             |
    |                                                         Filter: (date < '2022-06-09 14:53:39.751496+00'::timestamp with time zone)                                  |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    (26 rows)
    • [2]
    16:54:21 softwareheritage@somerset:5432=> explain  with closest_two_visits as ((
    softwareheritage(>     select ov, (date - '2022-06-09 14:53:39.751496+00'), visit as interval
    softwareheritage(>     from origin_visit ov
    softwareheritage(>     where ov.origin = 31236370
    softwareheritage(>           and ov.date >= '2022-06-09 14:53:39.751496+00'
    softwareheritage(>     order by ov.date asc, ov.visit desc
    softwareheritage(>     limit 1
    softwareheritage(>   ) union (
    softwareheritage(>     select ov, ('2022-06-09 14:53:39.751496+00' - date), visit as interval
    softwareheritage(>     from origin_visit ov
    softwareheritage(>     where ov.origin = 31236370
    softwareheritage(>           and ov.date < '2022-06-09 14:53:39.751496+00'
    softwareheritage(>     order by ov.date desc, ov.visit desc
    softwareheritage(>     limit 1
    softwareheritage(>   )) select (ov).* from closest_two_visits order by interval, visit limit 1;
    +-------------------------------------------------------------------------------------------------------------------------------------------------------+
    |                                                                      QUERY PLAN                                                                       |
    +-------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Limit  (cost=3400.87..3400.88 rows=1 width=64)                                                                                                        |
    |   ->  Sort  (cost=3400.87..3400.88 rows=2 width=64)                                                                                                   |
    |         Sort Key: closest_two_visits."interval", ((closest_two_visits.ov).visit)                                                                      |
    |         ->  Subquery Scan on closest_two_visits  (cost=3400.82..3400.86 rows=2 width=64)                                                              |
    |               ->  Unique  (cost=3400.82..3400.84 rows=2 width=56)                                                                                     |
    |                     ->  Sort  (cost=3400.82..3400.83 rows=2 width=56)                                                                                 |
    |                           Sort Key: "*SELECT* 1".ov, "*SELECT* 1"."?column?", "*SELECT* 1"."interval"                                                 |
    |                           ->  Append  (cost=1697.06..3400.81 rows=2 width=56)                                                                         |
    |                                 ->  Subquery Scan on "*SELECT* 1"  (cost=1697.06..1697.08 rows=1 width=76)                                            |
    |                                       ->  Limit  (cost=1697.06..1697.07 rows=1 width=84)                                                              |
    |                                             ->  Sort  (cost=1697.06..1697.07 rows=1 width=84)                                                         |
    |                                                   Sort Key: ov.date, ov.visit DESC                                                                    |
    |                                                   ->  Index Scan using origin_visit_pkey on origin_visit ov  (cost=0.58..1697.05 rows=1 width=84)     |
    |                                                         Index Cond: (origin = 31236370)                                                               |
    |                                                         Filter: (date >= '2022-06-09 14:53:39.751496+00'::timestamp with time zone)                   |
    |                                 ->  Subquery Scan on "*SELECT* 2"  (cost=1703.71..1703.72 rows=1 width=76)                                            |
    |                                       ->  Limit  (cost=1703.71..1703.71 rows=1 width=84)                                                              |
    |                                             ->  Sort  (cost=1703.71..1705.93 rows=888 width=84)                                                       |
    |                                                   Sort Key: ov_1.date DESC, ov_1.visit DESC                                                           |
    |                                                   ->  Index Scan using origin_visit_pkey on origin_visit ov_1  (cost=0.58..1699.27 rows=888 width=84) |
    |                                                         Index Cond: (origin = 31236370)                                                               |
    |                                                         Filter: (date < '2022-06-09 14:53:39.751496+00'::timestamp with time zone)                    |
    +-------------------------------------------------------------------------------------------------------------------------------------------------------+
    (22 rows)
    
    Time: 8.337 ms
  • Author Maintainer

    oh my.

    thanks!

  • Author Maintainer

    Merge request was abandoned

  • closed

  • index created and now the query plans are the same on both db.

Please register or sign in to reply
Loading