Skip to content

pgstorage: Empty temp tables instead of dropping them

Due to our pattern of adding objects [1], vacuum is triggered regularly on pg_catalog.*, having an heavy impact on performance [2]

This commit tries to avoid the "dropping temporary tables" part, emptying them instead (they still are dropped at the end of the session but less often). This should decrease the bloat on pg_catalog.* tables.

  • [1] adding object pattern:

  • create temporary table

  • insert data from temporary table to production table with filtering

  • drop temporary table

  • [2] context:

irc discussion on sysadm:

...
10:09 <douardda> ardumont: any idea why the git loading workload seems to have drop drastically?
10:10 <+ardumont> no
10:11 <+ardumont> well, that could be we are finally ingesting relevant repositories
10:11 <+ardumont> relevant as in with new data
10:11 <douardda> not really https://grafana.softwareheritage.org/d/u8eBQQVZz/object-creation-rate?orgId=1&from=now-12h&to=now&var-instance=uffizi.internal.softwareheritage.org&var-object_type=content&var-interval=$__auto_interval_interval
10:12 <douardda> (same with revisions)
...
10:26 <+ardumont> it'd worth checking the storage backend
10:27 <+ardumont> pg_activity shows me a lot more (i don't recall seeing that much yesterday) select queries
10:27 <+ardumont> on the main belvedere storage (the one to write, not really for reading)
...
10:30 <+ardumont> i also realized that's it's not only the git tasks, it's also for the indexers
10:31 <+ardumont> well all tasks even, i barely see that there are other colors
10:42 <+olasd> I don't have my laptop but yesterday postgres was starting to look like it was spending a lot of time vacuuming the catalog tables (because of our usage of temporary tables)
10:43 <+ardumont> it still does
10:45 <+olasd> I think there's a way to reduce churn on temporary tables (making transactions empty them instead of dropping them) that might help
10:46 <+olasd> or get rid of temp tables altogether
10:46 <+olasd> (but that's less incremental)
10:52 <+olasd> https://stackoverflow.com/questions/50366509/temporary-tables-bloating-pg-attribute
10:55 <+ardumont> shall we try?
10:55 <+ardumont> one of the stanza to change are at https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/sql/40-swh-func.sql$22
10:55 <+ardumont> (i'm saying for david ;)
10:56 <+ardumont> the part 'on commit delete rows'

Test Plan

tox


Migrated from D2580 (view on Phabricator)

Merge request reports

Loading