Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • S sysadm-environment
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 167
    • Issues 167
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Platform
  • Infrastructure
  • sysadm-environment
  • Issues
  • #4395
Closed
Open
Issue created Jul 18, 2022 by Antoine R. Dumont@ardumontOwner15 of 15 checklist items completed15/15 checklist items

Migrate azure worker vms to cheaper and more efficient vms

Reduce azure cost: change workers to 'b2ms' vms (current 'ds2v2' underused and costly)

Plan:

  • Reasoning: https://hedgedoc.softwareheritage.org/0_eK1R3iSFmMWxwHDQfqOw?edit
  • Provision vault-worker[01-02] as b2ms (terraform)
  • Decomission worker13
  • Check vault worker are doing their job [1]
  • Decomission worker[11-12]
  • Adapt puppet manifest to the fqdn changes ^ and deploy
  • Provision indexer-worker[01-02] as b2ms (terraform)
  • Check everything is fine ^ (firewall rule to edit to allow connection)
  • Decomission ds2v2 worker[07-10]
  • Provision indexer-worker[03-06] as b2ms (terraform)
  • Decomission remaining ds2v2 worker[03-06]
  • Update firewall rule + alias
  • Update inventory with vms and network interfaces according to ^
  • Kept worker[01-02] for now (so they finish their current job consuming old queue messages) [2]
  • Clean up old oneshot tasks related to ^ [4]

Note:

  • This talks about worker*.euwest.azure nodes

  • Decomission is deleting the node, then remove references to it within puppet master, then update inventory

  • [1]

Jul 18 14:45:59 vault-worker01 python3[2648]: [2022-07-18 14:45:59,239: INFO/MainProcess] vault_cooker@vault-worker01.euwest.azure.internal.softwareheritage.org ready.
Jul 18 14:58:49 vault-worker01 python3[2648]: [2022-07-18 14:58:49,852: INFO/MainProcess] Received task: swh.vault.cooking_tasks.SWHCookingTask[a3c95ae7-4256-4231-bca7-d3224a9149ce]
Jul 18 14:58:54 vault-worker01 python3[2670]: [2022-07-18 14:58:54,821: INFO/ForkPoolWorker-16] Task swh.vault.cooking_tasks.SWHCookingTask[a3c95ae7-4256-4231-bca7-d3224a9149ce] succeeded in 4.852631129999963s: None
Jul 18 15:01:58 vault-worker02 python3[617]: [2022-07-18 15:01:58,023: INFO/MainProcess] Connected to amqp://swhconsumer:**@rabbitmq:5672//
Jul 18 15:01:58 vault-worker02 python3[617]: [2022-07-18 15:01:58,293: INFO/MainProcess] vault_cooker@vault-worker02.euwest.azure.internal.softwareheritage.org ready.
Jul 18 15:02:59 vault-worker02 python3[617]: [2022-07-18 15:02:59,734: INFO/MainProcess] Received task: swh.vault.cooking_tasks.SWHCookingTask[e3649dcc-9d53-4d88-8245-2543e97d584a]
Jul 18 15:03:19 vault-worker02 python3[997]: [2022-07-18 15:03:19,915: INFO/ForkPoolWorker-16] Task swh.vault.cooking_tasks.SWHCookingTask[e3649dcc-9d53-4d88-8245-2543e97d584a] succeeded in 20.0749026s: None
  • [2] Too much lag that will take some time to subside with only 2 vms. Instead, as the new vms will work on the resetted topics and will pass on the missing data [3], we can just scratch those now in the end.

  • [3] #4282 (closed)

  • [4]

11:50:47 softwareheritage-scheduler@belvedere:5432=> select now(), status, count(*) from task where type = 'index-origin-metadata' group by status;
+-------------------------------+------------------------+---------+
|              now              |         status         |  count  |
+-------------------------------+------------------------+---------+
| 2022-07-19 09:50:55.403248+00 | next_run_not_scheduled | 9802941 |
| 2022-07-19 09:50:55.403248+00 | next_run_scheduled     |    5263 |
| 2022-07-19 09:50:55.403248+00 | completed              | 3225591 |
| 2022-07-19 09:50:55.403248+00 | disabled               |    5736 |
+-------------------------------+------------------------+---------+
(4 rows)

Time: 27451.213 ms (00:27.451)

softwareheritage-scheduler=# update task set status='disabled' where type = 'index-origin-metadata' and status in ('next_run_scheduled', 'next_run_not_scheduled');
UPDATE 9808204
12:28:16 softwareheritage-scheduler@belvedere:5432=> select now(), status, count(*) from task where type = 'index-origin-metadata' group by status;
+-------------------------------+-----------+---------+
|              now              |  status   |  count  |
+-------------------------------+-----------+---------+
| 2022-07-19 10:28:26.489037+00 | completed | 3225591 |
| 2022-07-19 10:28:26.489037+00 | disabled  | 9813940 |
+-------------------------------+-----------+---------+
(2 rows)

Time: 32793.481 ms (00:32.793)

(ongoing ^)


Migrated from T4395 (view on Phabricator)

Edited Oct 18, 2022 by Antoine R. Dumont
Assignee
Assign to
Time tracking