clean up swh-scheduler model db
The current scheduler holds lots of information that are no longer necessary:
- beside load-nixguix recurrent tasks all other are no longer scheduled by this old model
- only oneshot tasks are interesting but no more than 2 months old (save code now, deposit, vault, ...)
Those can be archived (and later on be dropped if we don't read in those [1] This will make the current mechanisms that needs read/write in those tables faster.
- [1] we usually don't.
Migrated from T3837 (view on Phabricator)
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Author Owner
wip script [1]
Ongoing dump [2] from the staging db to check the script [1] is ok (within equivalent db mounted in docker)
-
[1] migrated/migration$1243
-
[2]
postgres@db1:/srv/softwareheritage/postgres/12/main/backup$ pg_dump --host=db1.internal.staging.swh.network --dbname=swh-scheduler --port=5432 --username=swh-scheduler --table task_type --table priority_ratio --table task --table task_run --data-only --no-owner | gzip -c - > $(date +%Y-%m-%dT%H:%M:%SZ)-staging-swh-scheduler-first-model.sql.gz
-
- Antoine R. Dumont changed the description
changed the description
- Antoine R. Dumont added state:wip label
added state:wip label
- Phabricator Migration user mentioned in commit swh/devel/swh-scheduler@b5477ea2
mentioned in commit swh/devel/swh-scheduler@b5477ea2
- Author Owner
So script is ok and landed. I'll trigger it on staging first.
- Author Owner
staging:
Triggered the migration (using a transaction as the dataset is not that huge) and all went well.
Add to do the extra steps:
alter table task owner to "swh-scheduler"; alter table task_run owner to "swh-scheduler"; grant select on all tables in schema public to guest;
Result:
swh-scheduler=# \d+ List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+--------------------------------+----------+---------------+-------------+---------------+------------+------------------------------------------------------------------------------------------------------------------- public | archive_task | table | swh-scheduler | permanent | heap | 14 GB | Schedule of recurring tasks public | archive_task_run | table | swh-scheduler | permanent | heap | 12 GB | History of task runs sent to the job-running backend public | dbversion | table | swh-scheduler | permanent | heap | 48 kB | Schema update tracking public | listed_origins | table | swh-scheduler | permanent | heap | 1327 MB | Origins known to the origin visit scheduler public | listers | table | swh-scheduler | permanent | heap | 2344 kB | Lister instances known to the origin visit scheduler public | origin_visit_stats | table | swh-scheduler | permanent | heap | 370 MB | public | priority_ratio | table | swh-scheduler | permanent | heap | 40 kB | Oneshot task's reading ratio per priority public | scheduler_metrics | table | swh-scheduler | permanent | heap | 56 kB | Cache of per-lister metrics for the scheduler, collated between the listed_origins and origin_visit_stats tables. public | task | table | swh-scheduler | permanent | heap | 137 MB | public | task_id_seq | sequence | swh-scheduler | permanent | | 8192 bytes | public | task_run | table | swh-scheduler | permanent | heap | 192 MB | public | task_run_id_seq | sequence | swh-scheduler | permanent | | 8192 bytes | public | task_type | table | swh-scheduler | permanent | heap | 56 kB | Types of schedulable tasks public | visit_scheduler_queue_position | table | swh-scheduler | permanent | heap | 48 kB | Current queue position for the recurrent visit scheduler (14 rows)
- Author Owner
And same goes for production [1].
We gain a bit of space in those tables:
- task: 209G -> 864M
- task_run: 190G -> 84M
At some point, we can reclaim space by dropping the archive_* tables.
- [1]
15:05:07 softwareheritage-scheduler@belvedere:5432=> \d+ List of relations +--------+--------------------------------+----------+--------------+------------+-------------------------------------------------------------------------------------------------------------------+ | Schema | Name | Type | Owner | Size | Description | +--------+--------------------------------+----------+--------------+------------+-------------------------------------------------------------------------------------------------------------------+ | public | archive_task | table | swhscheduler | 209 GB | Schedule of recurring tasks | | public | archive_task_run | table | swhscheduler | 190 GB | History of task runs sent to the job-running backend | | public | dbversion | table | swhscheduler | 48 kB | Schema update tracking | | public | listed_origins | table | swhscheduler | 79 GB | Origins known to the origin visit scheduler | | public | listers | table | swhscheduler | 12 MB | Lister instances known to the origin visit scheduler | | public | origin_visit_stats | table | swhscheduler | 23 GB | | | public | pg_stat_statements | view | postgres | 0 bytes | | | public | priority_ratio | table | swhscheduler | 40 kB | Oneshot task's reading ratio per priority | | public | scheduler_metrics | table | swhscheduler | 56 kB | Cache of per-lister metrics for the scheduler, collated between the listed_origins and origin_visit_stats tables. | | public | task | table | swhscheduler | 864 MB | | | public | task_id_seq | sequence | swhscheduler | 8192 bytes | | | public | task_run | table | swhscheduler | 84 MB | | | public | task_run_id_seq | sequence | swhscheduler | 8192 bytes | | | public | task_type | table | swhscheduler | 56 kB | Types of schedulable tasks | | public | visit_scheduler_queue_position | table | swhscheduler | 56 kB | Current queue position for the recurrent visit scheduler | +--------+--------------------------------+----------+--------------+------------+-------------------------------------------------------------------------------------------------------------------+ (15 rows)
- Antoine R. Dumont removed state:wip label
removed state:wip label
- Antoine R. Dumont closed
closed
- Phabricator Migration user mentioned in merge request swh/devel/swh-scheduler!271 (closed)
mentioned in merge request swh/devel/swh-scheduler!271 (closed)