Install balin02[1-6] osd servers

added platform::CEA label

OS installed on servers balin021, balin022, balin023

mentioned in commit ipxe@5a88660a

mentioned in commit ipxe@542a971a

mentioned in commit ipxe@ecc2471a

mentioned in commit ipxe@2f28bc72

mentioned in commit ipxe@a5dd2040

mentioned in commit ipxe@2cbb53a3

changed the description

marked the checklist item balin024 as completed

mentioned in commit ipxe@daeb0cc6

marked the checklist item balin025 as completed

marked the checklist item balin026 as completed

marked the checklist item Install the OS part of the server as completed

Install the OS part of the server

done.

All servers were done by @vsellier first and then me following the docs vince started (cea/Readme.md). I've iterated over the documentation to clarify some tidbits ;).

The plan got updated accordingly. Remains the actual ceph massaging.

assigned to @olasd

mentioned in commit ipxe@0c928273

changed the description

marked the checklist item Update ansible to support OSD installation without separate blockdb disk as completed

marked the checklist item Extract new servers disk configurations as completed

I've done the following:

finish the network configuration on the hosts:
- configure mlag and allowed vlans on the switches
- fixup configuration of systemd-networkd to get consistent interface names with the other servers
- reload systemd-networkd
clean up the ansible configuration to match the machines as installed
pull the list of disks on each host into the ansible inventory (commented)
run ansible on all six hosts

After doing so, and checking that all hosts were able to run ceph status, I then:

Disabled data movement on the ceph cluster:
- ceph osd set norebalance; ceph osd set nobackfill; ceph osd set norecover
Uncommented four disks in the configuration of each server
Ran ansible to configure the OSDs
Waited for ceph status to update:

$ ceph status
  cluster:
    id:     e0a98ad0-fd1f-4079-894f-ed4554ce40c6
    health: HEALTH_WARN
            nobackfill,norebalance flag(s) set
            Degraded data redundancy: 1/747715319 objects degraded (0.000%), 1 pg degraded
            48 pgs not deep-scrubbed in time
            18 pgs not scrubbed in time
 
  services:
    mon: 3 daemons, quorum dwalin001,dwalin003,dwalin002 (age 8w)
    mgr: dwalin001(active, since 8w), standbys: dwalin002, dwalin003
    osd: 264 osds: 264 up (since 4m), 264 in (since 4m); 3218 remapped pgs
         flags nobackfill,norebalance
 
  data:
    pools:   7 pools, 6497 pgs
    objects: 121.51M objects, 463 TiB
    usage:   599 TiB used, 2.2 PiB / 2.8 PiB avail
    pgs:     1/747715319 objects degraded (0.000%)
             82518797/747715319 objects misplaced (11.036%)
             3271 active+clean
             3203 active+remapped+backfill_wait
             15   active+remapped+backfilling
             7    active+clean+scrubbing+deep
             1    active+recovery_wait+degraded
 
  io:
    client:   204 KiB/s rd, 9 op/s rd, 0 op/s wr
    recovery: 1 B/s, 0 keys/s, 0 objects/s

(Somehow one object got degraded?)

With all the new OSDs that I had configured up, I then restored data movement:

ceph osd unset norebalance; ceph osd unset nobackfill; ceph osd unset norecover

Data movement is now in progress to introduce these 24 new OSDs.

We'll have to repeat this two times for all the new OSDs to be commissionned

marked the checklist item Apply ansible and add the nodes in the ceph cluster as completed

I've now installed OSDs on all 12 disks on each host.

The data movement is in progress.

$ ceph status
  cluster:
    id:     e0a98ad0-fd1f-4079-894f-ed4554ce40c6
    health: HEALTH_WARN
            104 OSD(s) experiencing BlueFS spillover
            4914 pgs not deep-scrubbed in time
            6244 pgs not scrubbed in time
 
  services:
    mon: 3 daemons, quorum dwalin001,dwalin003,dwalin002 (age 34h)
    mgr: dwalin002(active, since 34h), standbys: dwalin003, dwalin001
    osd: 312 osds: 312 up (since 9h), 312 in (since 9h); 4328 remapped pgs
 
  data:
    pools:   7 pools, 6497 pgs
    objects: 130.12M objects, 496 TiB
    usage:   643 TiB used, 2.7 PiB / 3.3 PiB avail
    pgs:     170710916/799357874 objects misplaced (21.356%)
             4257 active+remapped+backfill_wait
             2169 active+clean
             71   active+remapped+backfilling
 
  io:
    client:   1.2 MiB/s rd, 19 op/s rd, 0 op/s wr
    recovery: 2.5 GiB/s, 645 objects/s

I've done some tuning to try and speed up the recovery.

The pgs won't be scrubbed properly until the backfill is over, which will take a couple of weeks.

If the recovery seems stuck, restarting one of the OSDs that are marked as primary of one of the pgs in backfill or backfill_wait status seems to unstick it. To find that out, do a :

$ ceph pg ls | grep backfill | head -10

PG      OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES         OMAP_BYTES*  OMAP_KEYS*  LOG   LOG_DUPS  STATE                          SINCE     VERSION         REPORTED        UP                                 ACTING                             SCRUB_STAMP                      DEEP_SCRUB_STAMP                 LAST_SCRUB_DURATION  SCRUB_SCHEDULING                                          
8.0       27323         0      54646        0  114595454976            0           0  1929      3000  active+remapped+backfill_wait        9h   109626'203400   111242:892716          [219,18,191,304,6,97]p219         [219,18,191,37,257,97]p219  2024-07-22T21:14:19.817114+0200  2024-07-22T21:14:19.817114+0200                 3055  queued for deep scrub                                     
8.1       27324         0      54648        0  114603294720            0           0  2017      3000  active+remapped+backfill_wait        9h   109586'203105   111242:785986          [254,128,87,0,306,53]p254          [217,128,87,0,256,53]p217  2024-07-22T16:46:08.040455+0200  2024-07-22T16:46:08.040455+0200                 3546  queued for deep scrub                                     
8.7       27124         0      54248        0  113761914880            0           0  1866      3000  active+remapped+backfill_wait        9h   109641'203703   111243:937106           [7,199,32,224,248,177]p7            [7,199,32,224,46,251]p7  2024-07-21T13:23:38.208798+0200  2024-07-16T01:10:32.254270+0200                  105  queued for deep scrub                                     
8.8       27233         0      81699        0  114219597824            0           0  1856      3000  active+remapped+backfill_wait        9h   109626'203644   111242:895743      [121,268,251,252,180,154]p121       [121,109,34,142,180,154]p121  2024-07-21T07:25:57.932327+0200  2024-07-21T07:25:57.932327+0200                 3279  queued for deep scrub                                     
8.9       27531         0      82593        0  115473383424            0           0  1873      3000  active+remapped+backfill_wait        9h   109560'204017   111242:851377         [263,176,102,6,62,307]p263           [107,176,24,6,62,78]p107  2024-07-23T03:04:33.086225+0200  2024-07-21T19:12:20.314656+0200                  107  queued for deep scrub                                     
8.b       27412         0      54824        0  114974261248            0           0  1831      3000  active+remapped+backfill_wait        9h   109630'203070   111243:846876         [80,278,66,280,139,178]p80         [80,253,66,151,139,178]p80  2024-07-22T16:04:47.769489+0200  2024-07-18T17:41:28.716859+0200                  106  queued for deep scrub                                     
8.c       27164         0      81492        0  113930366976            0           0  1730      3000  active+remapped+backfill_wait        9h   109641'202429   111242:831995       [243,55,262,274,213,219]p243          [29,55,11,125,213,219]p29  2024-07-22T06:26:30.674646+0200  2024-07-19T13:51:01.415851+0200                  105  queued for deep scrub                                     
8.f       27605         0      55210        0  115783561216            0           0  2115      3000  active+remapped+backfill_wait        9h   109641'202710   111242:899567         [35,182,224,301,58,271]p35         [35,182,224,146,58,244]p35  2024-07-19T18:38:09.383061+0200  2024-07-18T07:23:06.313555+0200                  105  queued for deep scrub                                     
8.10      27313         0      27313        0  114559025152            0           0  1961      3000  active+remapped+backfill_wait        9h   109625'204192   111243:814538         [18,126,286,181,36,198]p18         [18,126,162,181,36,198]p18  2024-07-22T09:56:04.555345+0200  2024-07-20T22:55:19.531073+0200                   36  queued for deep scrub                                     
8.11      27328         0      54656        0  114621939712            0           0  1780      3000  active+remapped+backfill_wait        9h   109641'204344  111243:1013967       [111,262,299,134,81,295]p111       [111,262,196,134,81,245]p111  2024-07-22T15:35:55.708052+0200  2024-07-22T15:35:55.708052+0200                 3749  queued for deep scrub

The primary OSD for the PG is after the brackets in the "ACTING" column (for instance, for pg 8.0, this is OSD 219, for pg 8.c this would be OSD 29)

To restart an OSD cleanly, we need to prevent (further) data movement, then restart the osd:

[ on the OSD host]
$ ceph osd set noout
$ sudo systemctl restart ceph-osd@${osd_number}
$ ceph status  # wait for the OSD to be back "up" and "in"
$ ceph osd unset noout

At this point the backfilling should restart.

All data movement has completed, the OSDs are fully in service.

Pending scrubs are running, it's not clear whether they'll catch up (that was already tracked in #5368)

It seems that some combination of pgs stuck in backfilling mode and OSDs being restarted has caused heavy issues on the frontend servers (blocking all reads); this is tracked separately in #5378

closed

Install balin02[1-6] osd servers

Designs

Child items ...

Activity