[rancher] staging and production clusters admin regularly crash

changed milestone to %Dynamic infrastructure [Roadmap - Tooling and infrastructure]

Off the top of my head, is the setup with a rancher manager on the far end of a VPN (and internet) connection supported?

mentioned in issue #4883 (closed)

it seems the cluster crashed due to etcd timeouts.

The management nodes have some episodes of high io pressure resulting in etcd timeout. In this case, etcd try to elect a new leader, which take some time.

The etcd compaction is enabled by default and run each 5mn. It takes ~100ms which can't explain the ios during 30s

2023-06-06T13:43:08.213081789Z stderr F {"level":"info","ts":"2023-06-06T13:43:08.212Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103639660,"took":"105.157507ms"}
2023-06-06T13:48:08.222568541Z stderr F {"level":"info","ts":"2023-06-06T13:48:08.222Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103642088,"took":"96.983351ms"}
2023-06-06T13:53:08.253215681Z stderr F {"level":"info","ts":"2023-06-06T13:53:08.252Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103644294,"took":"103.33081ms"}
2023-06-06T13:58:08.265838827Z stderr F {"level":"info","ts":"2023-06-06T13:58:08.265Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103646750,"took":"105.242367ms"}
2023-06-06T14:03:08.289396585Z stderr F {"level":"info","ts":"2023-06-06T14:03:08.287Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103649311,"took":"114.237806ms"}
2023-06-06T14:08:08.310598499Z stderr F {"level":"info","ts":"2023-06-06T14:08:08.310Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103651803,"took":"123.080496ms"}
2023-06-06T14:13:08.306273667Z stderr F {"level":"info","ts":"2023-06-06T14:13:08.306Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103654361,"took":"105.553023ms"}
2023-06-06T14:18:08.321618311Z stderr F {"level":"info","ts":"2023-06-06T14:18:08.321Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103656886,"took":"109.28005ms"}
2023-06-06T14:23:08.409672322Z stderr F {"level":"info","ts":"2023-06-06T14:23:08.405Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103659442,"took":"178.950617ms"}
2023-06-06T14:28:08.333047296Z stderr F {"level":"info","ts":"2023-06-06T14:28:08.332Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103662027,"took":"94.758173ms"}
2023-06-06T14:33:08.898939098Z stderr F {"level":"info","ts":"2023-06-06T14:33:08.898Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103664561,"took":"180.766812ms"}
2023-06-06T14:38:08.858047073Z stderr F {"level":"info","ts":"2023-06-06T14:38:08.857Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103666858,"took":"122.698765ms"}
2023-06-06T14:43:08.849632485Z stderr F {"level":"info","ts":"2023-06-06T14:43:08.849Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103669424,"took":"102.120928ms"}

----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap--- 
     time     |usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw | 1m   5m  15m | used  free  buff  cach| used  free|
06-06 13:00:54|  8   3  89   0   0|   0     0 |  14k   17k|   0     0 |8199    15k|5.48 3.28 2.71|9663M 3948M  173M 2037M|1036k  975M|
06-06 13:00:55|  5   4  81  10   0|   0    48k|  15k   20k|   0     0 |8833    16k|5.48 3.28 2.71|9664M 3946M  173M 2037M|1036k  975M|
06-06 13:00:56|  6   2  68  24   0|   0     0 |  30k   51k|   0     0 |8295    15k|5.48 3.28 2.71|9652M 3958M  173M 2037M|1036k  975M|
06-06 13:00:57| 15   5  59  21   0|   0     0 |  26k   39k|   0     0 |9970    19k|5.48 3.28 2.71|9433M 4177M  173M 2037M|1036k  975M|
06-06 13:00:58|  4   2  70  24   0|   0     0 | 111k   79k|   0     0 |7321    14k|5.68 3.36 2.74|9433M 4177M  173M 2037M|1036k  975M|
06-06 13:00:59| 11   8  60  21   0|   0     0 |  28k   54k|   0     0 |8784    15k|5.68 3.36 2.74|9432M 4178M  173M 2037M|1036k  975M|
06-06 13:01:00| 16  20  42  21   0|   0     0 | 364k  259k|   0     0 |9209    16k|5.68 3.36 2.74|9432M 4178M  173M 2037M|1036k  975M|
06-06 13:01:01| 15  19  27  40   0|   0     0 |  14k   41k|   0     0 |8706    16k|5.68 3.36 2.74|9423M 4188M  173M 2037M|1036k  975M|
...
06-06 13:01:34|  4   2   0  94   0|   0     0 |  33k  108k|   0     0 |5916    10k|8.32 4.28 3.07|9470M 4140M  173M 2037M|1036k  975M|
06-06 13:01:35|  4   2   0  95   0|   0     0 |  28k   62k|   0     0 |6488    12k|8.32 4.28 3.07|9470M 4140M  173M 2037M|1036k  975M|
06-06 13:01:36|  4   2   0  95   0|   0     0 |  11k   55k|   0     0 |6051    11k|8.32 4.28 3.07|9470M 4140M  173M 2037M|1036k  975M|
06-06 13:01:37|  4   1   6  89   0|   0   888k|  14k   45k|   0     0 |4934  8760 |8.32 4.28 3.07|9470M 4140M  173M 2037M|1036k  975M|
06-06 13:01:38|  4  42  33  21   0|   0    45M|  20k   23k|   0     0 |6022    12k|7.97 4.28 3.08|9390M 4220M  173M 2037M|1036k  975M|
06-06 13:01:39| 31  13  51   5   0|   0   534k| 107k  686k|   0     0 |7669    12k|7.97 4.28 3.08|9369M 4210M  173M 2068M|1036k  975M|

I have made and copied a snapshot of the staging's etcd.

I will move the vdb disk where the data is stored from ceph to the local disk to check if it's better.

If it's better, it's probably because etcd is very sensitive to disk performance and it's impacted by potential latencies on the ceph storage:

In such case, we could try to add more etcd nodes on different hypervisors using their local storage.

After the disk move, there were not a simple alert related to a long response time since more than 15mn. It's a great improvement.

It remains true since yesterday so it looks like disk(ceph) latencies were the culprit.

mentioned in issue #4924 (closed)

marked this issue as related to #4924 (closed)

marked this issue as related to #4925 (closed)

mentioned in issue #4925 (closed)

The fix will be implemented in the linked issues

closed

I'm very, very surprised that the latency of our ceph-based storage is "too high".

Typically 50 sequential IOPS (e.g., a 7200 RPM disk) is required. For heavily loaded clusters, 500 sequential IOPS (e.g., a typical local SSD or a high performance virtualized block device) is recommended.

I think increasing our deployment complexity by at least a factor of 10 (local storage, VM placement constraints, ...) for something that should already be within spec is a bit much?

Was an etcd benchmark run on the original deployment? https://etcd.io/docs/v3.5/op-guide/performance/

Isn't rancher doing something unexpected with etcd?

It looks like there is a real issue on the ceph cluster. After a few investigations, it seems it's related to some disk or controller issues on beaubourg.

The VMs doing some I/Os, not only the etcd ones, regularly freeze during a dozen of seconds.

It happens exactly at the same time as pve latencies and disk controller errors are logged on beaubourg's journal.

rancher-node-production-rke2-mgmt1

                             I/O waits here
                            |
                            \/
----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap---
     time     |usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw | 1m   5m  15m | used  free  buff  cach| used  free
09-06 09:18:02| 28   7  65   0   0|  76k  444k| 296k 1277k|   0     0 |  19k   33k|1.22 2.27 2.67|6405M  708M 23.2M  720M|   0     0
09-06 09:18:03| 14   5  77   4   0|   0  1705k|  99k  449k|   0     0 |  13k   24k|1.52 2.31 2.68|6395M  718M 23.2M  720M|   0     0
09-06 09:18:04|  6   2  91   0   0|   0    80k|  42k   71k|   0     0 |9864    18k|1.52 2.31 2.68|6395M  718M 23.2M  720M|   0     0
09-06 09:18:05| 12   8  68  13   0|   0    88k|  52k   97k|   0     0 |  13k   23k|1.52 2.31 2.68|6396M  717M 23.2M  720M|   0     0  <-- High I/O wait starts here and no effective reads/writes are detected
09-06 09:18:06| 15  19  47  19   0|   0     0 |  27k   32k|   0     0 |  11k   20k|1.52 2.31 2.68|6399M  714M 23.2M  720M|   0     0
09-06 09:18:07| 19  20  36  24   0|   0     0 | 213k   52k|   0     0 |  12k   23k|1.52 2.31 2.68|6414M  700M 23.2M  720M|   0     0
09-06 09:18:08| 14  10  22  55   0|   0     0 |  30k   31k|   0     0 |  12k   21k|1.96 2.39 2.70|6416M  698M 23.2M  720M|   0     0
09-06 09:18:09|  6   3  23  68   0|   0     0 |  26k   57k|   0     0 |  11k   20k|1.96 2.39 2.70|6402M  712M 23.2M  720M|   0     0
09-06 09:18:10|  3   2  24  72   0|   0     0 |  88k   64k|   0     0 |8165    16k|1.96 2.39 2.70|6402M  712M 23.2M  720M|   0     0
09-06 09:18:11|  3   2  24  72   0|   0     0 |6730B   26k|   0     0 |7743    15k|1.96 2.39 2.70|6402M  712M 23.2M  720M|   0     0
09-06 09:18:12|  9   3  22  66   0|   0     0 |9660B 9315B|   0     0 |  11k   20k|1.96 2.39 2.70|6422M  692M 23.2M  720M|   0     0
09-06 09:18:13|  3   3  24  71   0|   0     0 |5158B 8163B|   0     0 |7372    14k|2.53 2.50 2.74|6423M  690M 23.2M  720M|   0     0
09-06 09:18:14|  5   2  23  70   0|   0     0 |5432B 5752B|   0     0 |8107    15k|2.53 2.50 2.74|6413M  700M 23.2M  720M|   0     0
09-06 09:18:15|  9   5  22  64   0|  80k    0 | 130k   97k|   0     0 |  11k   20k|2.53 2.50 2.74|6401M  712M 23.2M  721M|   0     0
09-06 09:18:16| 16  19   0  65   0|   0     0 |  16k   14k|   0     0 |  10k   19k|2.53 2.50 2.74|6403M  710M 23.2M  721M|   0     0
09-06 09:18:17| 21  22   0  57   0|   0     0 | 371k  249k|   0     0 |  13k   24k|2.53 2.50 2.74|6250M  863M 23.2M  721M|   0     0
09-06 09:18:18|  8   9   0  84   0|   0     0 |  23k   88k|   0     0 |9633    18k|3.68 2.74 2.81|6252M  861M 23.2M  721M|   0     0
09-06 09:18:19|  5   3   0  93   0|   0     0 | 708k 1736k|   0     0 |  11k   20k|3.68 2.74 2.81|6241M  872M 23.2M  721M|   0     0
----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap---
     time     |usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw | 1m   5m  15m | used  free  buff  cach| used  free
09-06 09:18:20|  4   2   0  94   0|   0     0 | 335k  211k|   0     0 |9392    17k|3.68 2.74 2.81|6237M  876M 23.2M  721M|   0     0
09-06 09:18:21|  3   3   0  94   0|   0     0 |  12k   53k|   0     0 |7698    14k|3.68 2.74 2.81|6237M  876M 23.2M  721M|   0     0
09-06 09:18:22|  8   4   0  88   0|   0     0 |  27k 7269B|   0     0 |  10k   18k|3.68 2.74 2.81|6256M  857M 23.2M  721M|   0     0
09-06 09:18:23|  2   2   0  96   0|   0     0 |8352B   10k|   0     0 |6833    13k|4.83 3.00 2.90|6256M  856M 23.2M  721M|   0     0
09-06 09:18:24|  3   1   0  96   0|   0     0 |3248B 4019B|   0     0 |7276    14k|4.83 3.00 2.90|6235M  877M 23.2M  721M|   0     0
09-06 09:18:25|  6   6   0  88   0|   0     0 |  15k   20k|   0     0 |9159    16k|4.83 3.00 2.90|6231M  882M 23.2M  721M|   0     0
09-06 09:18:26| 17  20   0  64   0|   0     0 |  32k  250k|   0     0 |9647    17k|4.83 3.00 2.90|6232M  880M 23.2M  721M|   0     0
09-06 09:18:27| 19  19   0  62   0|   0     0 |  17k   19k|   0     0 |9888    18k|4.83 3.00 2.90|6245M  867M 23.2M  721M|   0     0
09-06 09:18:28| 11  11   0  78   0|   0     0 |  10k   12k|   0     0 |9024    17k|5.97 3.26 2.98|6249M  864M 23.2M  721M|   0     0
09-06 09:18:29|  4   2   0  94   0|   0     0 |1684B 2321B|   0     0 |6675    13k|5.97 3.26 2.98|6238M  875M 23.2M  721M|   0     0
09-06 09:18:30|  4   2   0  94   0|   0     0 |8290B   78k|   0     0 |7169    14k|5.97 3.26 2.98|6238M  875M 23.2M  721M|   0     0
09-06 09:18:31|  3   2   0  95   0|   0     0 |  17k   40k|   0     0 |  11k   21k|5.97 3.26 2.98|6238M  874M 23.2M  721M|   0     0
09-06 09:18:32|  9   4   0  87   0|   0     0 |3067B 4630B|   0     0 |  16k   29k|5.97 3.26 2.98|6261M  852M 23.2M  721M|   0     0
09-06 09:18:33|  2   2   0  95   0|   0     0 |2994B 3537B|   0     0 |  13k   26k|6.93 3.51 3.06|6261M  852M 23.2M  721M|   0     0
09-06 09:18:34|  3   1   0  96   0|   0     0 |7408B 9111B|   0     0 |  13k   26k|6.93 3.51 3.06|6240M  872M 23.2M  721M|   0     0
09-06 09:18:35| 17   9   0  73   0|  66k    0 | 697k  670k|   0     0 |  18k   32k|6.93 3.51 3.06|6042M 1071M 23.2M  721M|   0     0
09-06 09:18:36| 23  18   0  59   0|   0     0 |6058k   11M|   0     0 |  15k   27k|6.93 3.51 3.06|6062M 1050M 23.2M  721M|   0     0
09-06 09:18:37| 19  20   0  61   0|   0     0 |  11k   11k|   0     0 |  14k   25k|6.93 3.51 3.06|6073M 1040M 23.2M  721M|   0     0
09-06 09:18:38| 12  12   0  76   0|   0     0 |2661B 4447B|   0     0 |  13k   23k|8.14 3.81 3.17|6072M 1041M 23.2M  721M|   0     0
09-06 09:18:39|  4   2   0  94   0|   0     0 |  23k   25k|   0     0 |  14k   26k|8.14 3.81 3.17|6062M 1051M 23.2M  721M|   0     0
09-06 09:18:40|  3   2   0  95   0|   0     0 |3057B 5034B|   0     0 |  17k   34k|8.14 3.81 3.17|6061M 1051M 23.2M  721M|   0     0
09-06 09:18:41|  2   2   0  96   0|   0     0 |5032B   40k|   0     0 |  17k   34k|8.14 3.81 3.17|6061M 1051M 23.2M  721M|   0     0
09-06 09:18:42|  8   4   0  88   0|   0     0 |5357B 4641B|   0     0 |  19k   37k|8.14 3.81 3.17|6081M 1031M 23.2M  721M|   0     0
09-06 09:18:43|  2   1   0  97   0|   0     0 |2829B 2328B|   0     0 |  17k   33k|9.33 4.13 3.27|6081M 1031M 23.2M  721M|   0     0
09-06 09:18:44|  4   1   0  95   0|   0     0 |3696B 4655B|   0     0 |  17k   34k|9.33 4.13 3.27|6062M 1050M 23.2M  721M|   0     0
09-06 09:18:45|  7   5   0  88   0|  43k 4096B|  33k   49k|   0     0 |  15k   28k|9.33 4.13 3.27|6059M 1053M 23.2M  721M|   0     0   <---- High I/Os ends here
09-06 09:18:46|  8  12  18  62   0|1509k  424k|  34k   40k|   0     0 |9704    18k|9.33 4.13 3.27|5869M 1241M 23.2M  723M|   0     0
09-06 09:18:47|  4   1  47  49   0|   0  8192B|  43k   89k|   0     0 |5380  9802 |9.33 4.13 3.27|5869M 1241M 23.2M  723M|   0     0
09-06 09:18:48|  2   2  48  48   0|   0     0 |  25k   58k|   0     0 |5208  9574 |9.22 4.20 3.30|5869M 1242M 23.2M  723M|   0     0
09-06 09:18:49|  4   2  48  47   0|   0     0 |  35k   59k|   0     0 |5670    10k|9.22 4.20 3.30|5870M 1241M 23.2M  723M|   0     0
09-06 09:18:50|  2   1  49  48   0|   0     0 |  37k   58k|   0     0 |5754    11k|9.22 4.20 3.30|5870M 1241M 23.2M  723M|   0     0
09-06 09:18:51|  9   4  43  44   0|1952k  204k|  27k   91k|   0     0 |7683    13k|9.22 4.20 3.30|5885M 1224M 23.2M  725M|   0     0
09-06 09:18:52|  1   2  49  49   0|  96k    0 |  30k   36k|   0     0 |4712  8798 |9.22 4.20 3.30|5885M 1224M 23.2M  725M|   0     0
09-06 09:18:53|  2   2  48  48   0| 188k    0 |  20k   26k|   0     0 |4422  8272 |9.12 4.26 3.32|5885M 1223M 23.2M  725M|   0     0
09-06 09:18:54|  2  26  38  34   0|  29M 2724k|  54k   65k|   0     0 |8732    15k|9.12 4.26 3.32|6093M 1006M 23.2M  735M|   0     0
09-06 09:18:55|  3  12  44  41   0|5690k   23M| 133k  139k|   0     0 |  10k   18k|9.12 4.26 3.32|6155M  944M 23.2M  735M|   0     0
09-06 09:18:56| 42  27  29   1   0|3641k   18M| 200k  437k|   0     0 |7831    11k|9.12 4.26 3.32|6046M 1025M 23.2M  763M|   0     0
09-06 09:18:57| 33  10  57   0   0|1196k  128k|  18k   22k|   0     0 |6548    10k|9.12 4.26 3.32|6025M 1044M 23.2M  764M|   0     0
09-06 09:18:58|  3   2  95   0   0|   0    52k| 112k   79k|   0     0 |5129  9433 |8.39 4.19 3.31|6020M 1049M 23.2M  764M|   0     0

pergamon

----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap---
     time     |usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw | 1m   5m  15m | used  free  buff  cach| used  free
09-06 09:17:59|  3   1  96   0   0|   0  1792k| 100k   22k|   0     0 |1946  2772 |0.25 0.68 1.19|10.8G 5616M 1323M 27.4G| 806M 3288M
09-06 09:18:00|  5   3  91   1   0|   0  1924k| 143k   39k|   0     0 |2382  3208 |0.25 0.68 1.19|10.8G 5616M 1323M 27.4G| 806M 3288M
09-06 09:18:01| 14   4  80   1   0|   0  2484k| 102k   53k|   0     0 |2181  2354 |0.25 0.68 1.19|10.8G 5576M 1323M 27.4G| 806M 3288M
09-06 09:18:02|  2   1  93   4   0|   0  2160k|  73k   41k|   0     0 |1641  1957 |0.23 0.67 1.18|10.8G 5577M 1323M 27.4G| 806M 3288M
09-06 09:18:03| 11   2  52  35   0|  16k  336k|  79k  219k|   0     0 |2270  3173 |0.23 0.67 1.18|10.8G 5568M 1323M 27.4G| 806M 3288M <-- High I/O wait starts here and no effective reads/writes are detected, exactly at the same moment as mgmt1
09-06 09:18:04|  6   5  47  42   0|   0     0 | 289k   68k|   0     0 |2246  2620 |0.23 0.67 1.18|10.8G 5470M 1323M 27.4G| 806M 3288M
09-06 09:18:05|  8   3  46  43   0|   0    36k| 543k   91k|   0     0 |2149  2830 |0.23 0.67 1.18|10.8G 5468M 1323M 27.4G| 806M 3288M
09-06 09:18:06|  2   5  47  45   0|   0     0 |  75k   54k|   0     0 |  13k 2003 |0.23 0.67 1.18|10.8G 5466M 1323M 27.4G| 806M 3288M
09-06 09:18:07|  3   7  47  43   0|   0     0 | 330k   96k|   0     0 |  19k 2745 |0.77 0.77 1.21|10.8G 5468M 1323M 27.4G| 806M 3288M
09-06 09:18:08|  3   1  48  48   0|   0     0 | 140k   60k|   0     0 |1776  2488 |0.77 0.77 1.21|10.8G 5466M 1323M 27.4G| 806M 3288M
09-06 09:18:09|  3   1  46  50   0|   0     0 |  64k   31k|   0     0 |1500  1915 |0.77 0.77 1.21|10.8G 5464M 1323M 27.4G| 806M 3288M
09-06 09:18:10|  2   2  24  72   0|   0    20k|  79k  122k|   0     0 |1266  1619 |0.77 0.77 1.21|10.8G 5461M 1323M 27.4G| 806M 3288M
09-06 09:18:11|  2   1  24  73   0|   0     0 | 114k   47k|   0     0 |1541  1930 |0.77 0.77 1.21|10.8G 5460M 1323M 27.4G| 806M 3288M
09-06 09:18:12|  2   1  25  73   0|   0     0 | 111k   27k|   0     0 |1253  1604 |1.43 0.91 1.25|10.8G 5461M 1323M 27.4G| 806M 3288M
09-06 09:18:13|  2   1  24  73   0|   0     0 | 155k   19k|   0     0 |1475  2098 |1.43 0.91 1.25|10.8G 5462M 1323M 27.4G| 806M 3288M
09-06 09:18:14|  4   1  24  71   0|   0     0 | 235k   44k|   0     0 |1852  2428 |1.43 0.91 1.25|10.8G 5509M 1323M 27.4G| 806M 3288M
09-06 09:18:15|  5   3  24  68   0|   0  8192B|  79k   33k|   0     0 |1970  2597 |1.43 0.91 1.25|10.8G 5556M 1323M 27.4G| 806M 3288M
09-06 09:18:16|  8   3  23  66   0|   0     0 | 198k   75k|   0     0 |2092  2371 |1.43 0.91 1.25|10.9G 5531M 1323M 27.4G| 806M 3288M
09-06 09:18:17|  6   1  24  69   0|   0     0 | 574k   37k|   0     0 |2066  2669 |2.28 1.10 1.31|10.8G 5549M 1323M 27.4G| 806M 3288M
09-06 09:18:18|  2   1  24  73   0|   0     0 |  98k   47k|   0     0 |1453  1909 |2.28 1.10 1.31|10.8G 5549M 1323M 27.4G| 806M 3288M
09-06 09:18:19|  2   1  24  74   0|   0     0 | 110k   35k|   0     0 |1647  2276 |2.28 1.10 1.31|10.8G 5552M 1323M 27.4G| 806M 3288M
09-06 09:18:20|  5   3  22  70   0|   0     0 | 139k   41k|   0     0 |1873  2396 |2.28 1.10 1.31|10.8G 5549M 1323M 27.4G| 806M 3288M
09-06 09:18:21|  4   1  23  72   0|   0     0 | 178k   54k|   0     0 |2017  2541 |2.28 1.10 1.31|10.9G 5539M 1323M 27.4G| 806M 3288M
09-06 09:18:22|  5   2  23  70   0|   0     0 |5557k   32k|   0     0 |1678  2150 |3.06 1.28 1.37|10.9G 5536M 1323M 27.4G| 806M 3288M
09-06 09:18:23|  5   1  23  71   0|   0     0 | 330k   53k|   0     0 |2008  2687 |3.06 1.28 1.37|10.9G 5532M 1323M 27.4G| 806M 3288M
09-06 09:18:24|  2   4  24  70   0|   0     0 |  83k   40k|   0     0 |1518  1925 |3.06 1.28 1.37|10.9G 5433M 1323M 27.4G| 806M 3288M
09-06 09:18:25|  7   2  19  71   0|   0     0 |  12M   37k|   0     0 |2436  3225 |3.06 1.28 1.37|10.9G 5419M 1323M 27.4G| 806M 3288M
----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap---
     time     |usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw | 1m   5m  15m | used  free  buff  cach| used  free
09-06 09:18:26|  3   1  24  71   0|   0     0 | 203k   44k|   0     0 |1884  2404 |3.06 1.28 1.37|10.9G 5421M 1323M 27.4G| 806M 3288M
09-06 09:18:27|  3   1  24  73   0|   0     0 | 253k   41k|   0     0 |2031  2883 |3.77 1.45 1.43|10.9G 5415M 1323M 27.4G| 806M 3288M
09-06 09:18:28|  2   1  24  73   0|   0     0 |  55k   36k|   0     0 |1767  2520 |3.77 1.45 1.43|10.9G 5415M 1323M 27.4G| 806M 3288M
09-06 09:18:29|  1   1  25  74   0|   0     0 |  84k   15k|   0     0 | 995  1332 |3.77 1.45 1.43|10.9G 5414M 1323M 27.4G| 806M 3288M
09-06 09:18:30|  2   2  24  73   0|   0     0 |  81k   26k|   0     0 |1507  2113 |3.77 1.45 1.43|10.9G 5408M 1323M 27.4G| 806M 3288M
09-06 09:18:31|  3   2  23  72   0|   0     0 | 233k   36k|   0     0 |1419  1810 |3.77 1.45 1.43|10.9G 5397M 1323M 27.4G| 806M 3288M
09-06 09:18:32|  2   1  24  74   0|   0     0 | 427k   13k|   0     0 |1134  1619 |4.43 1.63 1.48|10.9G 5396M 1323M 27.4G| 806M 3288M
09-06 09:18:33|  5   1  22  72   0|   0     0 | 157k   33k|   0     0 |1660  2274 |4.43 1.63 1.48|10.9G 5377M 1323M 27.4G| 806M 3288M
09-06 09:18:34|  3   1  24  72   0|   0     0 | 102k   49k|   0     0 |1738  2291 |4.43 1.63 1.48|10.9G 5474M 1323M 27.4G| 806M 3288M
09-06 09:18:35|  5   2   1  92   0|   0     0 | 477k   20k|   0     0 |1710  2485 |4.43 1.63 1.48|10.9G 5469M 1323M 27.4G| 806M 3288M
09-06 09:18:36|  2   1   0  98   0|   0     0 |  91k   45k|   0     0 |1530  1990 |4.43 1.63 1.48|10.9G 5466M 1323M 27.4G| 806M 3288M
09-06 09:18:37|  3   1   0  97   0|   0     0 | 101k   26k|   0     0 |1367  1898 |5.12 1.82 1.55|10.9G 5464M 1323M 27.4G| 806M 3288M
09-06 09:18:38|  2   1   0  98   0|   0     0 |  60k   19k|   0     0 |1018  1480 |5.12 1.82 1.55|10.9G 5461M 1323M 27.4G| 806M 3288M
09-06 09:18:39|  4   2   0  94   0|   0     0 | 153k   29k|   0     0 |1746  2435 |5.12 1.82 1.55|10.9G 5455M 1323M 27.4G| 806M 3288M
09-06 09:18:40|  4   4   0  93   0|   0     0 | 105k   23k|   0     0 |3130  1899 |5.12 1.82 1.55|10.9G 5452M 1323M 27.4G| 806M 3288M
09-06 09:18:41|  2   1   0  97   0|   0     0 |  51k   41k|   0     0 |1349  1595 |5.12 1.82 1.55|10.9G 5454M 1323M 27.4G| 806M 3288M
09-06 09:18:42|  5   2   0  94   0|   0     0 |  53k   20k|   0     0 |6616    12k|5.75 2.00 1.61|10.9G 5454M 1323M 27.4G| 806M 3288M
09-06 09:18:43|  3   2   0  96   0|   0     0 | 159k   32k|   0     0 |1843  2444 |5.75 2.00 1.61|10.9G 5448M 1323M 27.4G| 806M 3288M
09-06 09:18:44|  3   5   0  93   0|   0   136k| 304k   46k|   0     0 |2235  2745 |5.75 2.00 1.61|10.9G 5346M 1323M 27.4G| 806M 3288M
09-06 09:18:45|  3   4   0  94   0|   0    12k| 284k   37k|   0     0 |2589  3458 |5.75 2.00 1.61|11.0G 5338M 1323M 27.4G| 806M 3288M
09-06 09:18:46| 59  10   7  22   2|   0  7680k| 657k   73k|   0     0 |4646  3617 |5.75 2.00 1.61|11.1G 5153M 1323M 27.4G| 806M 3288M  <-- <---- High I/Os ends here (same time)
09-06 09:18:47| 96   4   0   0   0|   0     0 | 115k   17k|   0     0 |3629  1600 |6.33 2.19 1.67|11.2G 5095M 1323M 27.4G| 806M 3288M
09-06 09:18:48| 35  10  14  42   0|4096B    0 | 515k   56k|   0     0 |4860  3055 |6.33 2.19 1.67|11.2G 4917M 1323M 27.6G| 806M 3288M
09-06 09:18:49|  3   1  23  72   0|   0     0 | 305k   50k|   0     0 |2312  3214 |6.33 2.19 1.67|11.2G 4919M 1323M 27.6G| 806M 3288M
09-06 09:18:50|  5   1   2  91   0|   0     0 | 297k   34k|   0     0 |2084  3173 |6.33 2.19 1.67|11.2G 4923M 1323M 27.6G| 806M 3288M
09-06 09:18:51|  1   1   0  98   0|   0    12k| 125k   39k|   0     0 |1343  1707 |6.33 2.19 1.67|11.2G 4923M 1323M 27.6G| 806M 3288M
09-06 09:18:52|  2   0   0  98   0|   0  8192B| 198k   19k|   0     0 |1307  1888 |6.62 2.32 1.71|11.2G 4923M 1323M 27.6G| 806M 3288M
----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap---
     time     |usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw | 1m   5m  15m | used  free  buff  cach| used  free
09-06 09:18:53|  6   1   2  92   0|   0  1964k| 505k   18k|   0     0 |1448  1901 |6.62 2.32 1.71|11.2G 4924M 1323M 27.6G| 806M 3288M
09-06 09:18:54| 70   8   6  14   1|  84k   15M| 569k  207k|   0     0 |6079  9717 |6.62 2.32 1.71|11.2G 5021M 1323M 27.6G| 806M 3288M
09-06 09:18:55| 72  11  15   2   1| 276k 7164k| 206k 1633k|   0     0 |5241    11k|6.62 2.32 1.71|11.2G 5029M 1323M 27.6G| 806M 3288M

hypervisor3

---system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap--- -
     time     |usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw | 1m   5m  15m | used  free  buff  cach| used  free|
09-06 09:18:01| 23   3  73   0   0| 113M   13M|  30M   57M|   0     0 |  68k   83k|17.7 17.7 17.1| 174G 6181M 65.6G  904M|  11G 4264M|   <-- nothing really special
09-06 09:18:02| 31   4  64   0   0| 105M   24M|  39M   37M|   0     0 |  77k   85k|17.7 17.7 17.1| 174G 6130M 65.6G  900M|  11G 4264M|
09-06 09:18:03| 29   4  66   0   0|  74M   37M|  69M   64M|   0     0 |  76k   81k|17.7 17.7 17.1| 174G 6139M 65.6G  905M|  11G 4264M|
09-06 09:18:04| 23   3  74   0   0| 122M 5272k|  45M   47M|   0     0 |  58k   68k|17.7 17.7 17.1| 174G 6376M 65.6G  900M|  11G 4264M|
09-06 09:18:05| 22   2  75   0   0| 118M 3112k|  72M   43M|   0     0 |  54k   64k|18.1 17.8 17.1| 174G 6362M 65.6G  901M|  11G 4264M|
09-06 09:18:06| 21   2  77   0   0| 121M 1984k|  66M   37M|   0     0 |  53k   66k|18.1 17.8 17.1| 174G 6344M 65.6G  901M|  11G 4264M|
09-06 09:18:07| 21   2  77   0   0| 125M 3400k|  11M   15M|   0     0 |  55k   51k|18.1 17.8 17.1| 174G 6347M 65.6G  900M|  11G 4264M|
09-06 09:18:08| 20   2  77   0   0| 117M 2504k|  50M   97M|   0     0 |  47k   59k|18.1 17.8 17.1| 174G 6374M 65.6G  901M|  11G 4264M|
09-06 09:18:09| 19   2  78   0   0| 122M 1096k|  11M   20M|   0     0 |  41k   53k|18.1 17.8 17.1| 174G 6378M 65.6G  901M|  11G 4264M|
09-06 09:18:10| 22   2  75   0   0| 119M 1252k|6773k 7031k|   0     0 |  42k   53k|18.0 17.7 17.1| 174G 6375M 65.6G  900M|  11G 4264M|
09-06 09:18:11| 20   2  77   0   0| 110M  280k|6545k   24M|   0     0 |  41k   49k|18.0 17.7 17.1| 174G 6353M 65.6G  901M|  11G 4264M|
09-06 09:18:12| 19   2  78   0   0| 127M  984k|  14M   26M|   0     0 |  55k   69k|18.0 17.7 17.1| 174G 6350M 65.6G  906M|  11G 4264M|
09-06 09:18:13| 20   2  78   0   0| 119M 1612k|  14M   54M|   0     0 |  47k   55k|18.0 17.7 17.1| 174G 6348M 65.6G  901M|  11G 4264M|
09-06 09:18:14| 20   2  78   0   0| 130M 5928k|9900k 9520k|   0     0 |  39k   50k|18.0 17.7 17.1| 174G 6357M 65.6G  905M|  11G 4264M|
09-06 09:18:15| 20   2  78   0   0| 125M   28k|6291k 4024k|   0     0 |  38k   51k|17.7 17.7 17.1| 174G 6358M 65.6G  900M|  11G 4264M|
09-06 09:18:16| 21   2  77   0   0| 119M  756k|  26M   33M|   0     0 |  44k   51k|17.7 17.7 17.1| 174G 6352M 65.6G  901M|  11G 4264M|
09-06 09:18:17| 23   2  74   0   0| 119M  892k|  13M   12M|   0     0 |  53k   58k|17.7 17.7 17.1| 174G 6403M 65.6G  903M|  11G 4264M|
09-06 09:18:18| 21   2  76   1   0| 128M 1424k|  19M   21M|   0     0 |  42k   50k|17.7 17.7 17.1| 174G 6421M 65.6G  900M|  11G 4264M|
09-06 09:18:19| 20   2  78   0   0| 129M  212k|4580k 4293k|   0     0 |  35k   45k|17.7 17.7 17.1| 174G 6431M 65.6G  901M|  11G 4264M|
09-06 09:18:20| 21   2  77   0   0| 128M  392k|  17M   20M|   0     0 |  44k   57k|17.6 17.7 17.1| 174G 6421M 65.6G  902M|  11G 4264M|
09-06 09:18:21| 19   2  78   0   0| 114M  892k|  16M   19M|   0     0 |  41k   53k|17.6 17.7 17.1| 174G 6395M 65.6G  900M|  11G 4264M|
09-06 09:18:22| 19   2  79   0   0| 135M  536k|  12M   32M|   0     0 |  43k   52k|17.6 17.7 17.1| 174G 6393M 65.6G  901M|  11G 4264M|
09-06 09:18:23| 19   2  79   0   0| 124M  168k|  10M   24M|   0     0 |  52k   48k|17.6 17.7 17.1| 174G 6402M 65.6G  901M|  11G 4264M|
09-06 09:18:24| 22   2  76   0   0| 133M  664k|9970k   10M|   0     0 |  44k   58k|17.6 17.7 17.1| 174G 6395M 65.6G  901M|  11G 4264M|
09-06 09:18:25| 17   3  80   0   0| 120M  620k|  11M   18M| 788k    0 |  56k   48k|17.3 17.6 17.1| 173G 7316M 65.6G  901M|  11G 4265M|
09-06 09:18:26| 20   2  78   0   0| 122M 3204k|  16M   50M|   0     0 |  43k   55k|17.3 17.6 17.1| 173G 7256M 65.6G  901M|  11G 4265M|
----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap--- 
     time     |usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw | 1m   5m  15m | used  free  buff  cach| used  free|
09-06 09:18:27| 18   2  79   0   0| 117M  516k|5237k   37M|   0     0 |  41k   50k|17.3 17.6 17.1| 173G 7198M 65.6G  906M|  11G 4265M|
09-06 09:18:28| 17   2  80   0   0| 114M  604k|5723k   24M|   0     0 |  41k   52k|17.3 17.6 17.1| 173G 7176M 65.6G  901M|  11G 4265M|
09-06 09:18:29| 17   2  81   0   0| 111M  496k|4194k 9513k|   0     0 |  34k   46k|17.3 17.6 17.1| 173G 7104M 65.6G  905M|  11G 4265M|
09-06 09:18:30| 19   2  78   0   0| 123M  296k|  58M  112M|   0     0 |  41k   54k|17.1 17.6 17.1| 174G 7010M 65.6G  901M|  11G 4265M|
09-06 09:18:31| 20   2  77   1   0| 124M  628k|5238k 5703k|   0     0 |  56k   59k|17.1 17.6 17.1| 174G 6975M 65.6G  904M|  11G 4265M|
09-06 09:18:32| 19   2  79   0   0|  77M  260k|  42M  111M|   0     0 |  38k   45k|17.1 17.6 17.1| 174G 6932M 65.6G  890M|  11G 4265M|
09-06 09:18:33| 18   1  81   0   0|2256k  580k|8906k   16M|   0     0 |  40k   56k|17.1 17.6 17.1| 174G 6961M 65.6G  889M|  11G 4265M|
09-06 09:18:34| 15   1  84   0   0|   0   132k|3960k 5071k|   0     0 |  35k   46k|17.1 17.6 17.1| 174G 6972M 65.6G  889M|  11G 4265M|
09-06 09:18:35| 19   1  80   0   0| 684k 1000k|5877k 4836k| 684k    0 |  42k   53k|16.5 17.4 17.0| 173G 7287M 65.6G  889M|  11G 4265M|
09-06 09:18:36| 17   1  81   0   0|2316k  372k|3694k 4162k|   0     0 |  34k   45k|16.5 17.4 17.0| 173G 7242M 65.6G  889M|  11G 4265M|
09-06 09:18:37| 17   1  82   0   0|  12k  880k|4275k 5946k|   0     0 |  35k   44k|16.5 17.4 17.0| 173G 7221M 65.6G  889M|  11G 4265M|
09-06 09:18:38| 16   1  83   0   0|   0   256k|3790k 4358k|   0     0 |  30k   39k|16.5 17.4 17.0| 173G 7212M 65.6G  889M|  11G 4265M|
09-06 09:18:39| 17   1  81   0   0|8192B  732k|3941k 3681k|   0     0 |  41k   44k|16.5 17.4 17.0| 173G 7200M 65.6G  889M|  11G 4265M|
09-06 09:18:40| 20   1  78   0   0|   0   292k|4244k 4244k|   0     0 |  36k   49k|16.3 17.4 17.0| 173G 7163M 65.6G  889M|  11G 4265M|
09-06 09:18:41| 19   1  80   0   0|   0   636k|5937k 5896k|   0     0 |  36k   46k|16.3 17.4 17.0| 173G 7125M 65.6G  889M|  11G 4265M|
09-06 09:18:42| 19   2  80   0   0|5120k  524k|5582k 6305k|   0     0 |  45k   58k|16.3 17.4 17.0| 174G 7094M 65.6G  889M|  11G 4265M|
09-06 09:18:43| 18   1  80   0   0|  24k  492k|5208k 5165k|   0     0 |  36k   48k|16.3 17.4 17.0| 174G 7080M 65.6G  890M|  11G 4265M|
09-06 09:18:44| 17   2  81   0   0|   0  1472k|3637k 3256k|   0     0 |  34k   47k|16.3 17.4 17.0| 174G 7050M 65.6G  890M|  11G 4265M|
09-06 09:18:45| 22   2  76   0   0| 460k 8460k|  13M 8543k| 432k    0 |  57k   67k|15.8 17.3 17.0| 173G 7376M 65.6G  890M|  11G 4265M|
09-06 09:18:46| 32   4  64   0   0| 676k   40M|  92M   86M|4096B    0 | 123k  146k|15.8 17.3 17.0| 173G 7265M 65.6G  888M|  11G 4265M|
09-06 09:18:47| 26   2  71   0   0|  56k 2312k|  74M  123M|   0     0 |  87k   95k|15.8 17.3 17.0| 173G 7306M 65.6G  890M|  11G 4265M|
09-06 09:18:48| 31   2  67   0   0|  16k 2004k|  32M   31M|   0     0 |  87k   89k|15.8 17.3 17.0| 173G 7251M 65.6G  890M|  11G 4265M|
09-06 09:18:49| 26   2  72   0   0|  48k 2368k|  21M   19M|  44k    0 |  65k   69k|15.8 17.3 17.0| 174G 7088M 65.6G  890M|  11G 4265M|
09-06 09:18:50| 24   2  75   0   0|  28k 2600k|  16M   15M|8192B    0 |  65k   54k|16.0 17.3 17.0| 174G 7009M 65.6G  890M|  11G 4265M|
09-06 09:18:51| 24   2  75   0   0| 152k 9844k|  27M   36M|   0     0 |  70k   82k|16.0 17.3 17.0| 174G 7006M 65.6G  890M|  11G 4265M|
09-06 09:18:52| 17   1  82   0   0|5372k 2868k|  12M   10M|   0     0 |  42k   54k|16.0 17.3 17.0| 174G 6992M 65.6G  890M|  11G 4265M|

beaubourg

09-06 09:18:02|  9   8  75   8   0| 563M   18M|  41M   22M|   0     0 |  27k   64k|7.80 8.44 8.75| 121G 79.9G  261G 62.8G|   0     0 
09-06 09:18:03|  3   2  73  22   0| 101M 5236k|  12M 8567k|   0     0 |  11k   20k|7.80 8.44 8.75| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:04|  3   0  68  29   0|   0     0 |8311k 4808k|   0     0 |3600  4635 |7.80 8.44 8.75| 121G 80.0G  261G 62.8G|   0     0   <---- boum
09-06 09:18:05|  4   0  65  31   0|   0     0 |4348k 5231k|   0     0 |3968  5233 |7.80 8.44 8.75| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:06|  4   0  65  31   0|   0     0 |1878k 1007k|   0     0 |2166  2651 |8.22 8.52 8.77| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:07|  4   0  57  39   0|   0     0 |8297k 9089k|   0     0 |3998  3939 |8.22 8.52 8.77| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:08|  4   0  56  40   0|   0     0 |3487k 3646k|   0     0 |3314  4021 |8.22 8.52 8.77| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:09|  4   1  56  39   0|   0     0 |2402k 2550k|   0     0 |4193  4989 |8.22 8.52 8.77| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:10|  4   0  56  40   0|   0     0 |1938k 1140k|   0     0 |2232  2009 |8.22 8.52 8.77| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:11|  4   0  56  40   0|   0     0 |1329k 1565k|   0     0 |3450  4095 |9.16 8.71 8.83| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:12|  4   0  56  40   0|   0     0 |1051k 1158k|   0     0 |3193  5046 |9.16 8.71 8.83| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:13|  4   0  56  40   0|   0    28k|2213k 2807k|   0     0 |2815  3207 |9.16 8.71 8.83| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:14|  5   1  55  40   0|   0     0 |1035k 1031k|   0     0 |8983    22k|9.16 8.71 8.83| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:15|  4   0  56  40   0|   0     0 |1068k  459k|   0     0 |9416  4303 |9.16 8.71 8.83| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:16|  3   0  51  45   0|   0     0 |1493k 1324k|   0     0 |  12k 3513 |10.3 8.96 8.92| 121G 80.0G  261G 62.8G|   0     0 
----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap---
     time     |usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw | 1m   5m  15m | used  free  buff  cach| used  free
09-06 09:18:17|  4   0  43  53   0|   0     0 |1720k 2336k|   0     0 |  11k 4721 |10.3 8.96 8.92| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:18|  3   0  40  56   0|   0     0 |1826k  929k|   0     0 |2316  2405 |10.3 8.96 8.92| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:19|  3   0  40  56   0|   0     0 | 938k  649k|   0     0 |1995  2256 |10.3 8.96 8.92| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:20|  3   0  37  59   0|   0     0 | 601k  405k|   0     0 |1536  1501 |10.3 8.96 8.92| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:21|  3   0  37  59   0|   0     0 |1667k  938k|   0     0 |1979  2303 |12.8 9.49 9.09| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:22|  4   0  33  64   0|   0     0 |1406k  746k|   0     0 |2559  2556 |12.8 9.49 9.09| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:23|  3   0  31  65   0|   0     0 | 624k  672k|   0     0 |1609  1730 |12.8 9.49 9.09| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:24|  3   0  31  65   0|   0     0 |1204k  513k|   0     0 |1889  2200 |12.8 9.49 9.09| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:25|  4   0  31  65   0|   0     0 |1022k  965k|   0     0 |3159  3653 |12.8 9.49 9.09| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:26|  4   0  31  65   0|   0     0 | 612k  581k|   0     0 |1879  2195 |15.3 10.1 9.28| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:27|  4   0  31  65   0|   0     0 |1095k  671k|   0     0 |2578  3250 |15.3 10.1 9.28| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:28|  3   0  31  65   0|   0     0 |1053k  894k|   0     0 |2127  2096 |15.3 10.1 9.28| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:29|  3   0  31  65   0|   0     0 | 741k  551k|   0     0 |1533  1503 |15.3 10.1 9.28| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:30|  4   0  31  65   0|   0     0 |1040k  897k|   0     0 |1863  2114 |15.3 10.1 9.28| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:31|  6   0  30  64   0|   0     0 | 915k  964k|   0     0 |6016  8122 |18.0 10.7 9.49| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:32|  7   0  30  63   0|   0     0 | 784k  579k|   0     0 |6165  8370 |18.0 10.7 9.49| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:33|  7   0  29  64   0|   0     0 |1523k 1102k|   0     0 |9261    13k|18.0 10.7 9.49| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:34|  6   0  29  64   0|   0     0 | 564k  100k|   0     0 |8901    13k|18.0 10.7 9.49| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:35|  7   0  28  65   0|   0     0 |2387k 1573k|   0     0 |9775    15k|18.0 10.7 9.49| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:36|  8   1  30  62   0|   0     0 | 788k  808k|   0     0 |  13k   24k|20.6 11.4 9.71| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:37|  7   0  31  62   0|   0     0 | 500k  556k|   0     0 |8967    14k|20.6 11.4 9.71| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:38|  4   0  31  64   0|   0     0 | 518k  208k|   0     0 |4886  6677 |20.6 11.4 9.71| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:39|  4   0  31  65   0|   0     0 | 791k  658k|   0     0 |5017  6785 |20.6 11.4 9.71| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:40|  3   0  31  65   0|   0    20k| 519k 1127k|   0     0 |5004  6911 |20.6 11.4 9.71| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:41|  4   0  31  65   0|   0     0 | 771k  492k|   0     0 |6079  8603 |23.0 12.0 9.93| 121G 80.0G  261G 62.8G|   0     0 
----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap---
     time     |usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw | 1m   5m  15m | used  free  buff  cach| used  free
09-06 09:18:42|  3   0  31  65   0|   0     0 | 699k  590k|   0     0 |5484  8172 |23.0 12.0 9.93| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:43|  3   0  31  65   0|   0     0 | 335k  238k|   0     0 |5121  7363 |23.0 12.0 9.93| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:44|  3   0  31  65   0|1424k   13M| 971k 2219k|   0     0 |7268    11k|23.0 12.0 9.93| 121G 80.0G  261G 62.8G|   0     0 
09-06 09:18:45|  5   3  60  32   0| 131M   22M|1569k 3753k|   0     0 |8893    21k|23.0 12.0 9.93| 121G 80.1G  261G 62.8G|   0     0

There are alerts logged on beaubourg:

Jun 09 09:18:00 beaubourg sudo[2259195]: pam_unix(sudo:session): session closed for user root
Jun 09 09:18:07 beaubourg sshd[2262443]: Connection closed by 192.168.100.29 port 50410 [preauth]
Jun 09 09:18:45 beaubourg ceph-mgr[12237]: ::ffff:192.168.100.29 - - [09/Jun/2023:09:18:08] "GET /metrics HTTP/1.1" 200 - "" "Prometheus/2.7.1+ds"
Jun 09 09:18:15 beaubourg systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Jun 09 09:18:44 beaubourg zed[2262750]: eid=37674 class=delay pool='data' vdev=wwn-0x500003965c898908-part1 size=102400 offset=527745323008 priority=3 err=0 flags=0x180880 delay=38247ms bookmark=515:40824:0:3771
Jun 09 09:18:45 beaubourg systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Jun 09 09:18:44 beaubourg zed[2262749]: eid=37673 class=delay pool='data' vdev=wwn-0x50000396ec892211-part1 size=4096 offset=2471137456128 priority=3 err=0 flags=0x180880 delay=38246ms bookmark=515:71702:2:0
Jun 09 09:18:45 beaubourg systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Jun 09 09:18:44 beaubourg zed[2262768]: eid=37675 class=delay pool='data' vdev=wwn-0x50000396ec892211-part1 size=61440 offset=2554489651200 priority=3 err=0 flags=0x180880 delay=38246ms bookmark=515:26150:1:3
Jun 09 09:18:45 beaubourg pve_exporter[9463]: Exception thrown while rendering view
Jun 09 09:18:45 beaubourg pve_exporter[9463]: Traceback (most recent call last):
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in urlopen
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     httplib_response = self._make_request(
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 445, in _make_request
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     six.raise_from(e, None)
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "<string>", line 3, in raise_from
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 440, in _make_request
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     httplib_response = conn.getresponse()
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3.9/http/client.py", line 1347, in getresponse
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     response.begin()
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3.9/http/client.py", line 307, in begin
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     version, status, reason = self._read_status()
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3.9/http/client.py", line 276, in _read_status
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     raise RemoteDisconnected("Remote end closed connection without"
Jun 09 09:18:45 beaubourg pve_exporter[9463]: http.client.RemoteDisconnected: Remote end closed connection without response
Jun 09 09:18:45 beaubourg pve_exporter[9463]: During handling of the above exception, another exception occurred:
Jun 09 09:18:45 beaubourg pve_exporter[9463]: Traceback (most recent call last):
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     resp = conn.urlopen(
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 755, in urlopen
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     retries = retries.increment(
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 532, in increment
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     raise six.reraise(type(error), error, _stacktrace)
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/six.py", line 718, in reraise
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     raise value.with_traceback(tb)
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in urlopen
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     httplib_response = self._make_request(
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 445, in _make_request
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     six.raise_from(e, None)
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "<string>", line 3, in raise_from
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 440, in _make_request
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     httplib_response = conn.getresponse()
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3.9/http/client.py", line 1347, in getresponse
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     response.begin()
Jun 09 09:18:44 beaubourg zed[2262774]: eid=37676 class=delay pool='data' vdev=wwn-0x500003965c898908-part1 size=61440 offset=527748186112 priority=3 err=0 flags=0x180880 delay=38246ms bookmark=515:40824:1:3
Jun 09 09:18:45 beaubourg ceph-osd[1383751]: 2023-06-09T09:18:45.871+0000 7f92da79f700 -1 osd.1 20966 heartbeat_check: no reply from 192.168.100.32:6806 osd.2 since back 2023-06-09T09:18:17.007157+0000 front 2023-06-09T09:18:17.007286+0000 (oldest deadline 2023-06-09T09:18:41.707544+0000)
Jun 09 09:18:45 beaubourg ceph-osd[1383751]: 2023-06-09T09:18:45.875+0000 7f92da79f700 -1 osd.1 20966 get_health_metrics reporting 9 slow ops, oldest is osd_op(client.234005970.0:187787146 1.9a 1:594fd069:::rbd_data.7d741c621bb64d.0000000000000f76:head [stat out=16b,write 102400~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966)
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3.9/http/client.py", line 307, in begin
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     version, status, reason = self._read_status()
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3.9/http/client.py", line 276, in _read_status
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     raise RemoteDisconnected("Remote end closed connection without"
Jun 09 09:18:45 beaubourg pve_exporter[9463]: urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Jun 09 09:18:45 beaubourg pve_exporter[9463]: During handling of the above exception, another exception occurred:
Jun 09 09:18:45 beaubourg pve_exporter[9463]: Traceback (most recent call last):
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/pve_exporter/http.py", line 96, in view
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     return view_registry[endpoint](**params)
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/pve_exporter/http.py", line 38, in on_pve
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     output = collect_pve(self._config[module], target, self._collectors)
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/pve_exporter/collector.py", line 316, in collect_pve
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     pve = ProxmoxAPI(host, **config)
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/proxmoxer/core.py", line 106, in __init__
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     self._backend = importlib.import_module('.backends.%s' % backend, 'proxmoxer').Backend(host, **kwargs)
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/proxmoxer/backends/https.py", line 125, in __init__
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     self.auth = ProxmoxHTTPAuth(self.base_url, user, password, verify_ssl)
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/proxmoxer/backends/https.py", line 42, in __init__
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     response_data = requests.post(base_url + "/access/ticket",
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/requests/api.py", line 119, in post
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     return request('post', url, data=data, json=json, **kwargs)
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/requests/api.py", line 61, in request
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     return session.request(method=method, url=url, **kwargs)
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/requests/sessions.py", line 542, in request
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     resp = self.send(prep, **send_kwargs)
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/requests/sessions.py", line 655, in send
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     r = adapter.send(request, **kwargs)
Jun 09 09:18:45 beaubourg pve_exporter[9463]:   File "/usr/lib/python3/dist-packages/requests/adapters.py", line 498, in send
Jun 09 09:18:45 beaubourg pve_exporter[9463]:     raise ConnectionError(err, request=request)
Jun 09 09:18:45 beaubourg ceph-osd[901983]: 2023-06-09T09:18:45.843+0000 7fe1c5bcb700 -1 osd.0 20966 heartbeat_check: no reply from 192.168.100.32:6830 osd.1 since back 2023-06-09T09:18:18.138217+0000 front 2023-06-09T09:18:18.138551+0000 (oldest deadline 2023-06-09T09:18:41.638640+0000)
Jun 09 09:18:45 beaubourg ceph-osd[901983]: 2023-06-09T09:18:45.843+0000 7fe1c5bcb700 -1 osd.0 20966 heartbeat_check: no reply from 192.168.100.32:6806 osd.2 since back 2023-06-09T09:18:18.138499+0000 front 2023-06-09T09:18:18.138189+0000 (oldest deadline 2023-06-09T09:18:41.638640+0000)
Jun 09 09:18:45 beaubourg ceph-osd[901983]: 2023-06-09T09:18:45.843+0000 7fe1c5bcb700 -1 osd.0 20966 get_health_metrics reporting 4 slow ops, oldest is osd_op(client.234005970.0:187787181 1.7a 1:5e63b769:::rbd_data.7d741c621bb64d.0000000000000e84:head [stat,write 3670016~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966)
Jun 09 09:18:44 beaubourg zed[2262780]: eid=37677 class=delay pool='data' vdev=wwn-0x50000396ec89220d-part1 size=61440 offset=2554490093568 priority=3 err=0 flags=0x180880 delay=38246ms bookmark=515:26150:1:4
Jun 09 09:18:45 beaubourg pve_exporter[9463]: requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Jun 09 09:18:45 beaubourg pve_exporter[9463]: 192.168.100.29 - - [09/Jun/2023 09:18:45] "GET /pve?target=127.0.0.1 HTTP/1.1" 500 -
Jun 09 09:18:44 beaubourg zed[2262784]: eid=37678 class=delay pool='data' vdev=wwn-0x50000396ec89220d-part1 size=61440 offset=2554489651200 priority=3 err=0 flags=0x180880 delay=38246ms bookmark=515:26150:1:3
Jun 09 09:18:44 beaubourg zed[2262791]: eid=37679 class=delay pool='data' vdev=wwn-0x50000396ec89220d-part1 size=77824 offset=2482146983936 priority=0 err=0 flags=0x180880 delay=41762ms bookmark=515:117070:0:3948
Jun 09 09:18:44 beaubourg zed[2262796]: eid=37680 class=delay pool='data' vdev=wwn-0x500003976c880145-part1 size=61440 offset=429071273984 priority=3 err=0 flags=0x180880 delay=38247ms bookmark=515:26682:1:1
Jun 09 09:18:44 beaubourg zed[2262799]: eid=37681 class=delay pool='data' vdev=wwn-0x500003976c880145-part1 size=61440 offset=528495591424 priority=3 err=0 flags=0x180880 delay=38247ms bookmark=515:28560:1:3
Jun 09 09:18:44 beaubourg zed[2262806]: eid=37683 class=delay pool='data' vdev=wwn-0x50000397bc8969ad-part1 size=102400 offset=2954688135168 priority=3 err=0 flags=0x180880 delay=38247ms bookmark=515:28560:0:3042
Jun 09 09:18:44 beaubourg zed[2262805]: eid=37682 class=delay pool='data' vdev=wwn-0x50000397bc8969b5-part1 size=102400 offset=2954688135168 priority=3 err=0 flags=0x180880 delay=38247ms bookmark=515:28560:0:3042
Jun 09 09:18:44 beaubourg zed[2262809]: eid=37684 class=delay pool='data' vdev=wwn-0x50000397bc8969ad-part1 size=45056 offset=2697892405248 priority=0 err=0 flags=0x180880 delay=41761ms bookmark=515:99886:0:1986
Jun 09 09:18:44 beaubourg zed[2262812]: eid=37685 class=delay pool='data' vdev=wwn-0x50000397bc8969ad-part1 size=102400 offset=2954688749568 priority=3 err=0 flags=0x180880 delay=38247ms bookmark=515:27780:0:2026
Jun 09 09:18:44 beaubourg zed[2262815]: eid=37686 class=delay pool='data' vdev=wwn-0x50000397bc8969b5-part1 size=81920 offset=2207910199296 priority=0 err=0 flags=0x180880 delay=41762ms bookmark=515:105586:0:1840
Jun 09 09:18:44 beaubourg zed[2262826]: eid=37687 class=delay pool='data' vdev=wwn-0x50000397bc8969b5-part1 size=102400 offset=2954689507328 priority=3 err=0 flags=0x180880 delay=38247ms bookmark=515:26682:0:1123
Jun 09 09:18:44 beaubourg zed[2262843]: eid=37688 class=delay pool='data' vdev=wwn-0x50000397bc8969ad-part1 size=45056 offset=474415509504 priority=0 err=0 flags=0x180880 delay=41591ms bookmark=515:76516:0:256
Jun 09 09:18:44 beaubourg zed[2262855]: eid=37689 class=delay pool='data' vdev=wwn-0x50000396ec892211-part1 size=73728 offset=183821381632 priority=0 err=0 flags=0x180880 delay=41748ms bookmark=515:82402:0:5485
Jun 09 09:18:44 beaubourg zed[2262856]: eid=37690 class=delay pool='data' vdev=wwn-0x50000396ec892211-part1 size=65536 offset=106711961600 priority=0 err=0 flags=0x180880 delay=41763ms bookmark=515:113938:0:3016
Jun 09 09:18:44 beaubourg zed[2262859]: eid=37691 class=delay pool='data' vdev=wwn-0x50000396ec892205-part1 size=4096 offset=358977654784 priority=3 err=0 flags=0x180880 delay=38250ms bookmark=515:71702:2:0
Jun 09 09:18:44 beaubourg zed[2262862]: eid=37692 class=delay pool='data' vdev=wwn-0x50000396ec892205-part1 size=61440 offset=2294103654400 priority=3 err=0 flags=0x180880 delay=38250ms bookmark=515:26150:1:4
Jun 09 09:18:44 beaubourg zed[2262865]: eid=37693 class=delay pool='data' vdev=wwn-0x50000396ec892251-part1 size=61440 offset=527748186112 priority=3 err=0 flags=0x180880 delay=38251ms bookmark=515:40824:1:3
Jun 09 09:18:44 beaubourg zed[2262867]: eid=37694 class=delay pool='data' vdev=wwn-0x50000396ec892251-part1 size=102400 offset=527745323008 priority=3 err=0 flags=0x180880 delay=38252ms bookmark=515:40824:0:3771
Jun 09 09:18:44 beaubourg zed[2262871]: eid=37695 class=delay pool='data' vdev=wwn-0x50000396ec892251-part1 size=98304 offset=554396454912 priority=0 err=0 flags=0x180880 delay=41677ms bookmark=515:26682:0:1598
Jun 09 09:18:44 beaubourg zed[2262873]: eid=37696 class=delay pool='data' vdev=wwn-0x50000396ec894e39-part1 size=4096 offset=358977654784 priority=3 err=0 flags=0x180880 delay=38251ms bookmark=515:71702:2:0
Jun 09 09:18:44 beaubourg zed[2262876]: eid=37698 class=delay pool='data' vdev=wwn-0x50000396ec894e39-part1 size=81920 offset=655563485184 priority=0 err=0 flags=0x180880 delay=41702ms bookmark=515:10476:0:2846
Jun 09 09:18:44 beaubourg zed[2262877]: eid=37697 class=delay pool='data' vdev=wwn-0x50000396ec894e39-part1 size=61440 offset=2294103715840 priority=3 err=0 flags=0x180880 delay=38251ms bookmark=515:26682:1:0
Jun 09 09:18:45 beaubourg pveproxy[923104]: problem with client ::ffff:127.0.0.1; Connection timed out
Jun 09 09:18:45 beaubourg pve-firewall[16031]: firewall update time (38.377 seconds)
Jun 09 09:18:46 beaubourg ceph-mon[12265]: 2023-06-09T09:18:46.255+0000 7fd066db7700 -1 mon.beaubourg@0(leader) e9 get_health_metrics reporting 1 slow ops, oldest is log(1 entries from seq 3778789 at 2023-06-09T09:18:02.953204+0000)
Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: received log
Jun 09 09:18:46 beaubourg pvescheduler[2262427]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: received log
Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: received log
Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: received log
Jun 09 09:18:46 beaubourg pvestatd[16064]: status update time (39.138 seconds)
Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-node/beaubourg: -1
Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/110: -1
Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/beaubourg/proxmox: -1
Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/beaubourg/local: -1
Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/beaubourg/proxmox-cephfs: -1
Jun 09 09:18:46 beaubourg ceph-osd[901983]: 2023-06-09T09:18:46.855+0000 7fe1c5bcb700 -1 osd.0 20966 get_health_metrics reporting 5 slow ops, oldest is osd_op(client.234014830.0:58091941 1.e9 1:97457025:::rbd_data.1bea316ef5ffc8.0000000000000d30:head [read 2543616~4096] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e20966)
Jun 09 09:18:46 beaubourg ceph-osd[1383751]: 2023-06-09T09:18:46.887+0000 7f92da79f700 -1 osd.1 20966 heartbeat_check: no reply from 192.168.100.32:6806 osd.2 since back 2023-06-09T09:18:17.007157+0000 front 2023-06-09T09:18:17.007286+0000 (oldest deadline 2023-06-09T09:18:41.707544+0000)
Jun 09 09:18:46 beaubourg ceph-osd[901992]: 2023-06-09T09:18:46.959+0000 7fdd02c62700 -1 osd.3 20966 heartbeat_check: no reply from 192.168.100.32:6806 osd.2 since back 2023-06-09T09:18:16.691842+0000 front 2023-06-09T09:18:16.691828+0000 (oldest deadline 2023-06-09T09:18:41.992279+0000)
Jun 09 09:18:46 beaubourg ceph-osd[887873]: 2023-06-09T09:18:46.983+0000 7fb5967ac700 -1 osd.2 20966 heartbeat_check: no reply from 192.168.100.32:6814 osd.3 since back 2023-06-09T09:18:16.159237+0000 front 2023-06-09T09:18:16.159377+0000 (oldest deadline 2023-06-09T09:18:40.259573+0000)
Jun 09 09:18:46 beaubourg ceph-osd[887873]: 2023-06-09T09:18:46.983+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966)
Jun 09 09:18:47 beaubourg kernel: sd 0:0:2:0: [sda] tag#148 Sense Key : Recovered Error [current] [descriptor]
Jun 09 09:18:47 beaubourg kernel: sd 0:0:2:0: [sda] tag#148 Add. Sense: Defect list not found
Jun 09 09:18:47 beaubourg ceph-osd[901983]: 2023-06-09T09:18:47.907+0000 7fe1c5bcb700 -1 osd.0 20966 get_health_metrics reporting 5 slow ops, oldest is osd_op(client.234014830.0:58091941 1.e9 1:97457025:::rbd_data.1bea316ef5ffc8.0000000000000d30:head [read 2543616~4096] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e20966)
Jun 09 09:18:47 beaubourg ceph-osd[901992]: 2023-06-09T09:18:47.927+0000 7fdd02c62700 -1 osd.3 20966 heartbeat_check: no reply from 192.168.100.32:6806 osd.2 since back 2023-06-09T09:18:16.691842+0000 front 2023-06-09T09:18:16.691828+0000 (oldest deadline 2023-06-09T09:18:41.992279+0000)
Jun 09 09:18:47 beaubourg ceph-osd[887873]: 2023-06-09T09:18:47.959+0000 7fb5967ac700 -1 osd.2 20966 heartbeat_check: no reply from 192.168.100.32:6814 osd.3 since back 2023-06-09T09:18:16.159237+0000 front 2023-06-09T09:18:16.159377+0000 (oldest deadline 2023-06-09T09:18:40.259573+0000)
Jun 09 09:18:47 beaubourg ceph-osd[887873]: 2023-06-09T09:18:47.959+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966)
Jun 09 09:18:48 beaubourg kernel: sd 0:0:3:0: [sdb] tag#278 Sense Key : Recovered Error [current] [descriptor]
Jun 09 09:18:48 beaubourg kernel: sd 0:0:3:0: [sdb] tag#278 Add. Sense: Defect list not found
Jun 09 09:18:48 beaubourg ceph-osd[901992]: 2023-06-09T09:18:48.899+0000 7fdd02c62700 -1 osd.3 20966 heartbeat_check: no reply from 192.168.100.32:6806 osd.2 since back 2023-06-09T09:18:16.691842+0000 front 2023-06-09T09:18:16.691828+0000 (oldest deadline 2023-06-09T09:18:41.992279+0000)
Jun 09 09:18:48 beaubourg ceph-osd[901983]: 2023-06-09T09:18:48.899+0000 7fe1c5bcb700 -1 osd.0 20966 get_health_metrics reporting 5 slow ops, oldest is osd_op(client.234014830.0:58091941 1.e9 1:97457025:::rbd_data.1bea316ef5ffc8.0000000000000d30:head [read 2543616~4096] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e20966)
Jun 09 09:18:48 beaubourg ceph-osd[887873]: 2023-06-09T09:18:48.915+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966)
Jun 09 09:18:49 beaubourg ceph-osd[887873]: 2023-06-09T09:18:49.891+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 2 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966)
Jun 09 09:18:49 beaubourg kernel: sd 0:0:4:0: [sdc] tag#274 Sense Key : Recovered Error [current] [descriptor]
Jun 09 09:18:49 beaubourg kernel: sd 0:0:4:0: [sdc] tag#274 Add. Sense: Defect list not found
Jun 09 09:18:49 beaubourg ceph-osd[901983]: 2023-06-09T09:18:49.943+0000 7fe1c5bcb700 -1 osd.0 20966 get_health_metrics reporting 5 slow ops, oldest is osd_op(client.234014830.0:58091941 1.e9 1:97457025:::rbd_data.1bea316ef5ffc8.0000000000000d30:head [read 2543616~4096] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e20966)
Jun 09 09:18:50 beaubourg pve-ha-crm[17598]: loop take too long (45 seconds)
Jun 09 09:18:50 beaubourg ceph-osd[887873]: 2023-06-09T09:18:50.939+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 2 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966)
Jun 09 09:18:50 beaubourg ceph-osd[901983]: 2023-06-09T09:18:50.975+0000 7fe1c5bcb700 -1 osd.0 20966 get_health_metrics reporting 5 slow ops, oldest is osd_op(client.234014830.0:58091941 1.e9 1:97457025:::rbd_data.1bea316ef5ffc8.0000000000000d30:head [read 2543616~4096] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e20966)
Jun 09 09:18:51 beaubourg kernel: sd 0:0:5:0: [sdd] tag#284 Sense Key : Recovered Error [current] [descriptor]
Jun 09 09:18:51 beaubourg kernel: sd 0:0:5:0: [sdd] tag#284 Add. Sense: Defect list not found
Jun 09 09:18:51 beaubourg pve-ha-lrm[19180]: loop take too long (45 seconds)
Jun 09 09:18:51 beaubourg ceph-osd[887873]: 2023-06-09T09:18:51.947+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 2 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966)
Jun 09 09:18:52 beaubourg ceph-osd[901983]: 2023-06-09T09:18:52.003+0000 7fe1c5bcb700 -1 osd.0 20966 get_health_metrics reporting 5 slow ops, oldest is osd_op(client.234014830.0:58091941 1.e9 1:97457025:::rbd_data.1bea316ef5ffc8.0000000000000d30:head [read 2543616~4096] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e20966)
Jun 09 09:18:52 beaubourg kernel: sd 0:0:6:0: [sde] tag#245 Sense Key : Recovered Error [current] [descriptor]
Jun 09 09:18:52 beaubourg kernel: sd 0:0:6:0: [sde] tag#245 Add. Sense: Defect list not found
Jun 09 09:18:52 beaubourg ceph-osd[887873]: 2023-06-09T09:18:52.995+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 2 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966)
Jun 09 09:18:53 beaubourg kernel: sd 0:0:9:0: [sdg] tag#142 Sense Key : Recovered Error [current]
Jun 09 09:18:53 beaubourg kernel: sd 0:0:9:0: [sdg] tag#142 Add. Sense: Defect list not found
Jun 09 09:18:53 beaubourg kernel: sd 0:0:11:0: [sdi] tag#198 Sense Key : Recovered Error [current] [descriptor]
Jun 09 09:18:53 beaubourg kernel: sd 0:0:11:0: [sdi] tag#198 Add. Sense: Defect list not found
Jun 09 09:18:53 beaubourg kernel: sd 0:0:12:0: [sdj] tag#110 Sense Key : Recovered Error [current] [descriptor]
Jun 09 09:18:53 beaubourg kernel: sd 0:0:12:0: [sdj] tag#110 Add. Sense: Defect list not found
Jun 09 09:18:53 beaubourg ceph-osd[887873]: 2023-06-09T09:18:53.975+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966)

mentioned in issue #4929 (closed)

marked this issue as related to #4929 (closed)

For the record, an etc check perf on an etc with a ceph storage:

root@rancher-node-admin-mgmt1:/# etcdctl  check perf
 60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
PASS: Throughput is 149 writes/s
Slowest request took too long: 1.372047s
PASS: Stddev is 0.095261s
FAIL

compared to (with local disks storage)

rancher-node-staging-rke2-mgmt1:/ # etcdctl --cacert=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt --cert=/var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key=/var/lib/rancher/rke2/server/tls/etcd/server-client.key check perf
60/60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
PASS: Throughput is 150 writes/s
PASS: Slowest request took 0.044872s
PASS: Stddev is 0.001569s
PASS

[rancher] staging and production clusters admin regularly crash

Designs

Child items ...

Activity