[rancher] staging and production clusters admin regularly crash
It's the root cause of #4874 (closed).
it's not clear yet if it's due to some timeouts on etcd or an issue with the communication with the rancher manager.
There are several errors in the management nodes:
May 12 08:04:49 rancher-node-staging-rke2-mgmt1 rke2[1655994]: {"level":"warn","ts":"2023-05-12T08:04:25.756Z","logger":"etcd-client","caller":"v3@v3.5.4-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00077dc00/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
May 12 08:04:49 rancher-node-staging-rke2-mgmt1 rke2[1655994]: time="2023-05-12T08:04:25Z" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
May 12 08:04:49 rancher-node-staging-rke2-mgmt1 rke2[1655994]: {"level":"warn","ts":"2023-05-12T08:04:40.758Z","logger":"etcd-client","caller":"v3@v3.5.4-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00077dc00/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
May 12 08:04:49 rancher-node-staging-rke2-mgmt1 rke2[1655994]: time="2023-05-12T08:04:40Z" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
May 12 08:04:50 rancher-node-staging-rke2-mgmt1 rancher-system-agent[841]: time="2023-05-12T08:04:20Z" level=error msg="[K8s] received secret to process that was older than the last secret operated on. (301509225 vs 301509321)"
May 12 08:04:50 rancher-node-staging-rke2-mgmt1 rancher-system-agent[841]: time="2023-05-12T08:04:20Z" level=error msg="error syncing 'fleet-default/custom-8e8eb25d9b24-machine-plan': handler secret-watch: secret received was too old, requeuing"
May 12 08:04:50 rancher-node-staging-rke2-mgmt1 rancher-system-agent[841]: time="2023-05-12T08:04:25Z" level=error msg="[K8s] received secret to process that was older than the last secret operated on. (301509321 vs 301509449)"
May 12 08:04:50 rancher-node-staging-rke2-mgmt1 rancher-system-agent[841]: time="2023-05-12T08:04:25Z" level=error msg="error syncing 'fleet-default/custom-8e8eb25d9b24-machine-plan': handler secret-watch: secret received was too old, requeuing"
May 12 08:04:50 rancher-node-staging-rke2-mgmt1 rancher-system-agent[841]: time="2023-05-12T08:04:31Z" level=error msg="[K8s] received secret to process that was older than the last secret operated on. (301509449 vs 301509526)"
May 12 08:04:50 rancher-node-staging-rke2-mgmt1 rancher-system-agent[841]: time="2023-05-12T08:04:31Z" level=error msg="error syncing 'fleet-default/custom-8e8eb25d9b24-machine-plan': handler secret-watch: secret received was too old, requeuing"
...
May 12 08:04:50 rancher-node-staging-rke2-mgmt1 rke2[1655994]: E0512 08:04:49.994435 1655994 leaderelection.go:330] error retrieving resource lock kube-system/rke2: Get "https://127.0.0.1:6443/api/v1/namespaces/kube-system/configmaps/rke2": context deadline exceeded
May 12 08:04:50 rancher-node-staging-rke2-mgmt1 rke2[1655994]: I0512 08:04:49.996688 1655994 leaderelection.go:283] failed to renew lease kube-system/rke2: timed out waiting for the condition
May 12 08:04:50 rancher-node-staging-rke2-mgmt1 rke2[1655994]: E0512 08:04:49.996876 1655994 leaderelection.go:306] Failed to release lock: resource name may not be empty
May 12 08:04:50 rancher-node-staging-rke2-mgmt1 rke2[1655994]: time="2023-05-12T08:04:49Z" level=fatal msg="leaderelection lost for rke2"
...
The unit run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-18f779c5dea2e6c0e3963489872a79c7ce195b76cccc431ab724fa6a9f969f91-rootfs.mount has successfully entered the 'dead' state.
May 12 08:04:50 rancher-node-staging-rke2-mgmt1 systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE
...
May 12 08:05:21 rancher-node-staging-rke2-mgmt1 rke2[1722765]: time="2023-05-12T08:05:21Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"
Designs
- Show closed items
- #4924Dynamic infrastructure [Roadmap - Tooling and infrastructure]
- #4925Dynamic infrastructure [Roadmap - Tooling and infrastructure]
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Vincent Sellier changed milestone to %Dynamic infrastructure [Roadmap - Tooling and infrastructure]
changed milestone to %Dynamic infrastructure [Roadmap - Tooling and infrastructure]
- Vincent Sellier added kubernetes rancher labels
added kubernetes rancher labels
- Vincent Sellier assigned to @vsellier
assigned to @vsellier
- Owner
Off the top of my head, is the setup with a rancher manager on the far end of a VPN (and internet) connection supported?
- Antoine R. Dumont mentioned in issue #4883 (closed)
mentioned in issue #4883 (closed)
- Author Owner
it seems the cluster crashed due to etcd timeouts.
The management nodes have some episodes of high io pressure resulting in etcd timeout. In this case, etcd try to elect a new leader, which take some time.
The etcd compaction is enabled by default and run each 5mn. It takes ~100ms which can't explain the ios during 30s
2023-06-06T13:43:08.213081789Z stderr F {"level":"info","ts":"2023-06-06T13:43:08.212Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103639660,"took":"105.157507ms"} 2023-06-06T13:48:08.222568541Z stderr F {"level":"info","ts":"2023-06-06T13:48:08.222Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103642088,"took":"96.983351ms"} 2023-06-06T13:53:08.253215681Z stderr F {"level":"info","ts":"2023-06-06T13:53:08.252Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103644294,"took":"103.33081ms"} 2023-06-06T13:58:08.265838827Z stderr F {"level":"info","ts":"2023-06-06T13:58:08.265Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103646750,"took":"105.242367ms"} 2023-06-06T14:03:08.289396585Z stderr F {"level":"info","ts":"2023-06-06T14:03:08.287Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103649311,"took":"114.237806ms"} 2023-06-06T14:08:08.310598499Z stderr F {"level":"info","ts":"2023-06-06T14:08:08.310Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103651803,"took":"123.080496ms"} 2023-06-06T14:13:08.306273667Z stderr F {"level":"info","ts":"2023-06-06T14:13:08.306Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103654361,"took":"105.553023ms"} 2023-06-06T14:18:08.321618311Z stderr F {"level":"info","ts":"2023-06-06T14:18:08.321Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103656886,"took":"109.28005ms"} 2023-06-06T14:23:08.409672322Z stderr F {"level":"info","ts":"2023-06-06T14:23:08.405Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103659442,"took":"178.950617ms"} 2023-06-06T14:28:08.333047296Z stderr F {"level":"info","ts":"2023-06-06T14:28:08.332Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103662027,"took":"94.758173ms"} 2023-06-06T14:33:08.898939098Z stderr F {"level":"info","ts":"2023-06-06T14:33:08.898Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103664561,"took":"180.766812ms"} 2023-06-06T14:38:08.858047073Z stderr F {"level":"info","ts":"2023-06-06T14:38:08.857Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103666858,"took":"122.698765ms"} 2023-06-06T14:43:08.849632485Z stderr F {"level":"info","ts":"2023-06-06T14:43:08.849Z","caller":"mvcc/kvstore_compaction.go:57","msg":"finished scheduled compaction","compact-revision":103669424,"took":"102.120928ms"}
----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap--- time |usr sys idl wai stl| read writ| recv send| in out | int csw | 1m 5m 15m | used free buff cach| used free| 06-06 13:00:54| 8 3 89 0 0| 0 0 | 14k 17k| 0 0 |8199 15k|5.48 3.28 2.71|9663M 3948M 173M 2037M|1036k 975M| 06-06 13:00:55| 5 4 81 10 0| 0 48k| 15k 20k| 0 0 |8833 16k|5.48 3.28 2.71|9664M 3946M 173M 2037M|1036k 975M| 06-06 13:00:56| 6 2 68 24 0| 0 0 | 30k 51k| 0 0 |8295 15k|5.48 3.28 2.71|9652M 3958M 173M 2037M|1036k 975M| 06-06 13:00:57| 15 5 59 21 0| 0 0 | 26k 39k| 0 0 |9970 19k|5.48 3.28 2.71|9433M 4177M 173M 2037M|1036k 975M| 06-06 13:00:58| 4 2 70 24 0| 0 0 | 111k 79k| 0 0 |7321 14k|5.68 3.36 2.74|9433M 4177M 173M 2037M|1036k 975M| 06-06 13:00:59| 11 8 60 21 0| 0 0 | 28k 54k| 0 0 |8784 15k|5.68 3.36 2.74|9432M 4178M 173M 2037M|1036k 975M| 06-06 13:01:00| 16 20 42 21 0| 0 0 | 364k 259k| 0 0 |9209 16k|5.68 3.36 2.74|9432M 4178M 173M 2037M|1036k 975M| 06-06 13:01:01| 15 19 27 40 0| 0 0 | 14k 41k| 0 0 |8706 16k|5.68 3.36 2.74|9423M 4188M 173M 2037M|1036k 975M| ... 06-06 13:01:34| 4 2 0 94 0| 0 0 | 33k 108k| 0 0 |5916 10k|8.32 4.28 3.07|9470M 4140M 173M 2037M|1036k 975M| 06-06 13:01:35| 4 2 0 95 0| 0 0 | 28k 62k| 0 0 |6488 12k|8.32 4.28 3.07|9470M 4140M 173M 2037M|1036k 975M| 06-06 13:01:36| 4 2 0 95 0| 0 0 | 11k 55k| 0 0 |6051 11k|8.32 4.28 3.07|9470M 4140M 173M 2037M|1036k 975M| 06-06 13:01:37| 4 1 6 89 0| 0 888k| 14k 45k| 0 0 |4934 8760 |8.32 4.28 3.07|9470M 4140M 173M 2037M|1036k 975M| 06-06 13:01:38| 4 42 33 21 0| 0 45M| 20k 23k| 0 0 |6022 12k|7.97 4.28 3.08|9390M 4220M 173M 2037M|1036k 975M| 06-06 13:01:39| 31 13 51 5 0| 0 534k| 107k 686k| 0 0 |7669 12k|7.97 4.28 3.08|9369M 4210M 173M 2068M|1036k 975M|
Edited by Vincent Sellier - Author Owner
I have made and copied a snapshot of the staging's etcd.
I will move the vdb disk where the data is stored from ceph to the local disk to check if it's better.
If it's better, it's probably because etcd is very sensitive to disk performance and it's impacted by potential latencies on the ceph storage:
In such case, we could try to add more etcd nodes on different hypervisors using their local storage.
1 - Author Owner
After the disk move, there were not a simple alert related to a long response time since more than 15mn. It's a great improvement.
Collapse replies - Author Owner
It remains true since yesterday so it looks like disk(ceph) latencies were the culprit.
- Author Owner
- Vincent Sellier mentioned in issue #4924 (closed)
mentioned in issue #4924 (closed)
- Vincent Sellier marked this issue as related to #4924 (closed)
marked this issue as related to #4924 (closed)
- Vincent Sellier marked this issue as related to #4925 (closed)
marked this issue as related to #4925 (closed)
- Vincent Sellier mentioned in issue #4925 (closed)
mentioned in issue #4925 (closed)
- Author Owner
The fix will be implemented in the linked issues
- Vincent Sellier closed
closed
- Owner
I'm very, very surprised that the latency of our ceph-based storage is "too high".
I think increasing our deployment complexity by at least a factor of 10 (local storage, VM placement constraints, ...) for something that should already be within spec is a bit much?
- Owner
Was an etcd benchmark run on the original deployment? https://etcd.io/docs/v3.5/op-guide/performance/
Isn't rancher doing something unexpected with etcd?
- Author Owner
It looks like there is a real issue on the ceph cluster. After a few investigations, it seems it's related to some disk or controller issues on beaubourg.
The VMs doing some I/Os, not only the etcd ones, regularly freeze during a dozen of seconds.
It happens exactly at the same time as pve latencies and disk controller errors are logged on beaubourg's journal.
- rancher-node-production-rke2-mgmt1
I/O waits here | \/ ----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap--- time |usr sys idl wai stl| read writ| recv send| in out | int csw | 1m 5m 15m | used free buff cach| used free 09-06 09:18:02| 28 7 65 0 0| 76k 444k| 296k 1277k| 0 0 | 19k 33k|1.22 2.27 2.67|6405M 708M 23.2M 720M| 0 0 09-06 09:18:03| 14 5 77 4 0| 0 1705k| 99k 449k| 0 0 | 13k 24k|1.52 2.31 2.68|6395M 718M 23.2M 720M| 0 0 09-06 09:18:04| 6 2 91 0 0| 0 80k| 42k 71k| 0 0 |9864 18k|1.52 2.31 2.68|6395M 718M 23.2M 720M| 0 0 09-06 09:18:05| 12 8 68 13 0| 0 88k| 52k 97k| 0 0 | 13k 23k|1.52 2.31 2.68|6396M 717M 23.2M 720M| 0 0 <-- High I/O wait starts here and no effective reads/writes are detected 09-06 09:18:06| 15 19 47 19 0| 0 0 | 27k 32k| 0 0 | 11k 20k|1.52 2.31 2.68|6399M 714M 23.2M 720M| 0 0 09-06 09:18:07| 19 20 36 24 0| 0 0 | 213k 52k| 0 0 | 12k 23k|1.52 2.31 2.68|6414M 700M 23.2M 720M| 0 0 09-06 09:18:08| 14 10 22 55 0| 0 0 | 30k 31k| 0 0 | 12k 21k|1.96 2.39 2.70|6416M 698M 23.2M 720M| 0 0 09-06 09:18:09| 6 3 23 68 0| 0 0 | 26k 57k| 0 0 | 11k 20k|1.96 2.39 2.70|6402M 712M 23.2M 720M| 0 0 09-06 09:18:10| 3 2 24 72 0| 0 0 | 88k 64k| 0 0 |8165 16k|1.96 2.39 2.70|6402M 712M 23.2M 720M| 0 0 09-06 09:18:11| 3 2 24 72 0| 0 0 |6730B 26k| 0 0 |7743 15k|1.96 2.39 2.70|6402M 712M 23.2M 720M| 0 0 09-06 09:18:12| 9 3 22 66 0| 0 0 |9660B 9315B| 0 0 | 11k 20k|1.96 2.39 2.70|6422M 692M 23.2M 720M| 0 0 09-06 09:18:13| 3 3 24 71 0| 0 0 |5158B 8163B| 0 0 |7372 14k|2.53 2.50 2.74|6423M 690M 23.2M 720M| 0 0 09-06 09:18:14| 5 2 23 70 0| 0 0 |5432B 5752B| 0 0 |8107 15k|2.53 2.50 2.74|6413M 700M 23.2M 720M| 0 0 09-06 09:18:15| 9 5 22 64 0| 80k 0 | 130k 97k| 0 0 | 11k 20k|2.53 2.50 2.74|6401M 712M 23.2M 721M| 0 0 09-06 09:18:16| 16 19 0 65 0| 0 0 | 16k 14k| 0 0 | 10k 19k|2.53 2.50 2.74|6403M 710M 23.2M 721M| 0 0 09-06 09:18:17| 21 22 0 57 0| 0 0 | 371k 249k| 0 0 | 13k 24k|2.53 2.50 2.74|6250M 863M 23.2M 721M| 0 0 09-06 09:18:18| 8 9 0 84 0| 0 0 | 23k 88k| 0 0 |9633 18k|3.68 2.74 2.81|6252M 861M 23.2M 721M| 0 0 09-06 09:18:19| 5 3 0 93 0| 0 0 | 708k 1736k| 0 0 | 11k 20k|3.68 2.74 2.81|6241M 872M 23.2M 721M| 0 0 ----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap--- time |usr sys idl wai stl| read writ| recv send| in out | int csw | 1m 5m 15m | used free buff cach| used free 09-06 09:18:20| 4 2 0 94 0| 0 0 | 335k 211k| 0 0 |9392 17k|3.68 2.74 2.81|6237M 876M 23.2M 721M| 0 0 09-06 09:18:21| 3 3 0 94 0| 0 0 | 12k 53k| 0 0 |7698 14k|3.68 2.74 2.81|6237M 876M 23.2M 721M| 0 0 09-06 09:18:22| 8 4 0 88 0| 0 0 | 27k 7269B| 0 0 | 10k 18k|3.68 2.74 2.81|6256M 857M 23.2M 721M| 0 0 09-06 09:18:23| 2 2 0 96 0| 0 0 |8352B 10k| 0 0 |6833 13k|4.83 3.00 2.90|6256M 856M 23.2M 721M| 0 0 09-06 09:18:24| 3 1 0 96 0| 0 0 |3248B 4019B| 0 0 |7276 14k|4.83 3.00 2.90|6235M 877M 23.2M 721M| 0 0 09-06 09:18:25| 6 6 0 88 0| 0 0 | 15k 20k| 0 0 |9159 16k|4.83 3.00 2.90|6231M 882M 23.2M 721M| 0 0 09-06 09:18:26| 17 20 0 64 0| 0 0 | 32k 250k| 0 0 |9647 17k|4.83 3.00 2.90|6232M 880M 23.2M 721M| 0 0 09-06 09:18:27| 19 19 0 62 0| 0 0 | 17k 19k| 0 0 |9888 18k|4.83 3.00 2.90|6245M 867M 23.2M 721M| 0 0 09-06 09:18:28| 11 11 0 78 0| 0 0 | 10k 12k| 0 0 |9024 17k|5.97 3.26 2.98|6249M 864M 23.2M 721M| 0 0 09-06 09:18:29| 4 2 0 94 0| 0 0 |1684B 2321B| 0 0 |6675 13k|5.97 3.26 2.98|6238M 875M 23.2M 721M| 0 0 09-06 09:18:30| 4 2 0 94 0| 0 0 |8290B 78k| 0 0 |7169 14k|5.97 3.26 2.98|6238M 875M 23.2M 721M| 0 0 09-06 09:18:31| 3 2 0 95 0| 0 0 | 17k 40k| 0 0 | 11k 21k|5.97 3.26 2.98|6238M 874M 23.2M 721M| 0 0 09-06 09:18:32| 9 4 0 87 0| 0 0 |3067B 4630B| 0 0 | 16k 29k|5.97 3.26 2.98|6261M 852M 23.2M 721M| 0 0 09-06 09:18:33| 2 2 0 95 0| 0 0 |2994B 3537B| 0 0 | 13k 26k|6.93 3.51 3.06|6261M 852M 23.2M 721M| 0 0 09-06 09:18:34| 3 1 0 96 0| 0 0 |7408B 9111B| 0 0 | 13k 26k|6.93 3.51 3.06|6240M 872M 23.2M 721M| 0 0 09-06 09:18:35| 17 9 0 73 0| 66k 0 | 697k 670k| 0 0 | 18k 32k|6.93 3.51 3.06|6042M 1071M 23.2M 721M| 0 0 09-06 09:18:36| 23 18 0 59 0| 0 0 |6058k 11M| 0 0 | 15k 27k|6.93 3.51 3.06|6062M 1050M 23.2M 721M| 0 0 09-06 09:18:37| 19 20 0 61 0| 0 0 | 11k 11k| 0 0 | 14k 25k|6.93 3.51 3.06|6073M 1040M 23.2M 721M| 0 0 09-06 09:18:38| 12 12 0 76 0| 0 0 |2661B 4447B| 0 0 | 13k 23k|8.14 3.81 3.17|6072M 1041M 23.2M 721M| 0 0 09-06 09:18:39| 4 2 0 94 0| 0 0 | 23k 25k| 0 0 | 14k 26k|8.14 3.81 3.17|6062M 1051M 23.2M 721M| 0 0 09-06 09:18:40| 3 2 0 95 0| 0 0 |3057B 5034B| 0 0 | 17k 34k|8.14 3.81 3.17|6061M 1051M 23.2M 721M| 0 0 09-06 09:18:41| 2 2 0 96 0| 0 0 |5032B 40k| 0 0 | 17k 34k|8.14 3.81 3.17|6061M 1051M 23.2M 721M| 0 0 09-06 09:18:42| 8 4 0 88 0| 0 0 |5357B 4641B| 0 0 | 19k 37k|8.14 3.81 3.17|6081M 1031M 23.2M 721M| 0 0 09-06 09:18:43| 2 1 0 97 0| 0 0 |2829B 2328B| 0 0 | 17k 33k|9.33 4.13 3.27|6081M 1031M 23.2M 721M| 0 0 09-06 09:18:44| 4 1 0 95 0| 0 0 |3696B 4655B| 0 0 | 17k 34k|9.33 4.13 3.27|6062M 1050M 23.2M 721M| 0 0 09-06 09:18:45| 7 5 0 88 0| 43k 4096B| 33k 49k| 0 0 | 15k 28k|9.33 4.13 3.27|6059M 1053M 23.2M 721M| 0 0 <---- High I/Os ends here 09-06 09:18:46| 8 12 18 62 0|1509k 424k| 34k 40k| 0 0 |9704 18k|9.33 4.13 3.27|5869M 1241M 23.2M 723M| 0 0 09-06 09:18:47| 4 1 47 49 0| 0 8192B| 43k 89k| 0 0 |5380 9802 |9.33 4.13 3.27|5869M 1241M 23.2M 723M| 0 0 09-06 09:18:48| 2 2 48 48 0| 0 0 | 25k 58k| 0 0 |5208 9574 |9.22 4.20 3.30|5869M 1242M 23.2M 723M| 0 0 09-06 09:18:49| 4 2 48 47 0| 0 0 | 35k 59k| 0 0 |5670 10k|9.22 4.20 3.30|5870M 1241M 23.2M 723M| 0 0 09-06 09:18:50| 2 1 49 48 0| 0 0 | 37k 58k| 0 0 |5754 11k|9.22 4.20 3.30|5870M 1241M 23.2M 723M| 0 0 09-06 09:18:51| 9 4 43 44 0|1952k 204k| 27k 91k| 0 0 |7683 13k|9.22 4.20 3.30|5885M 1224M 23.2M 725M| 0 0 09-06 09:18:52| 1 2 49 49 0| 96k 0 | 30k 36k| 0 0 |4712 8798 |9.22 4.20 3.30|5885M 1224M 23.2M 725M| 0 0 09-06 09:18:53| 2 2 48 48 0| 188k 0 | 20k 26k| 0 0 |4422 8272 |9.12 4.26 3.32|5885M 1223M 23.2M 725M| 0 0 09-06 09:18:54| 2 26 38 34 0| 29M 2724k| 54k 65k| 0 0 |8732 15k|9.12 4.26 3.32|6093M 1006M 23.2M 735M| 0 0 09-06 09:18:55| 3 12 44 41 0|5690k 23M| 133k 139k| 0 0 | 10k 18k|9.12 4.26 3.32|6155M 944M 23.2M 735M| 0 0 09-06 09:18:56| 42 27 29 1 0|3641k 18M| 200k 437k| 0 0 |7831 11k|9.12 4.26 3.32|6046M 1025M 23.2M 763M| 0 0 09-06 09:18:57| 33 10 57 0 0|1196k 128k| 18k 22k| 0 0 |6548 10k|9.12 4.26 3.32|6025M 1044M 23.2M 764M| 0 0 09-06 09:18:58| 3 2 95 0 0| 0 52k| 112k 79k| 0 0 |5129 9433 |8.39 4.19 3.31|6020M 1049M 23.2M 764M| 0 0
- pergamon
----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap--- time |usr sys idl wai stl| read writ| recv send| in out | int csw | 1m 5m 15m | used free buff cach| used free 09-06 09:17:59| 3 1 96 0 0| 0 1792k| 100k 22k| 0 0 |1946 2772 |0.25 0.68 1.19|10.8G 5616M 1323M 27.4G| 806M 3288M 09-06 09:18:00| 5 3 91 1 0| 0 1924k| 143k 39k| 0 0 |2382 3208 |0.25 0.68 1.19|10.8G 5616M 1323M 27.4G| 806M 3288M 09-06 09:18:01| 14 4 80 1 0| 0 2484k| 102k 53k| 0 0 |2181 2354 |0.25 0.68 1.19|10.8G 5576M 1323M 27.4G| 806M 3288M 09-06 09:18:02| 2 1 93 4 0| 0 2160k| 73k 41k| 0 0 |1641 1957 |0.23 0.67 1.18|10.8G 5577M 1323M 27.4G| 806M 3288M 09-06 09:18:03| 11 2 52 35 0| 16k 336k| 79k 219k| 0 0 |2270 3173 |0.23 0.67 1.18|10.8G 5568M 1323M 27.4G| 806M 3288M <-- High I/O wait starts here and no effective reads/writes are detected, exactly at the same moment as mgmt1 09-06 09:18:04| 6 5 47 42 0| 0 0 | 289k 68k| 0 0 |2246 2620 |0.23 0.67 1.18|10.8G 5470M 1323M 27.4G| 806M 3288M 09-06 09:18:05| 8 3 46 43 0| 0 36k| 543k 91k| 0 0 |2149 2830 |0.23 0.67 1.18|10.8G 5468M 1323M 27.4G| 806M 3288M 09-06 09:18:06| 2 5 47 45 0| 0 0 | 75k 54k| 0 0 | 13k 2003 |0.23 0.67 1.18|10.8G 5466M 1323M 27.4G| 806M 3288M 09-06 09:18:07| 3 7 47 43 0| 0 0 | 330k 96k| 0 0 | 19k 2745 |0.77 0.77 1.21|10.8G 5468M 1323M 27.4G| 806M 3288M 09-06 09:18:08| 3 1 48 48 0| 0 0 | 140k 60k| 0 0 |1776 2488 |0.77 0.77 1.21|10.8G 5466M 1323M 27.4G| 806M 3288M 09-06 09:18:09| 3 1 46 50 0| 0 0 | 64k 31k| 0 0 |1500 1915 |0.77 0.77 1.21|10.8G 5464M 1323M 27.4G| 806M 3288M 09-06 09:18:10| 2 2 24 72 0| 0 20k| 79k 122k| 0 0 |1266 1619 |0.77 0.77 1.21|10.8G 5461M 1323M 27.4G| 806M 3288M 09-06 09:18:11| 2 1 24 73 0| 0 0 | 114k 47k| 0 0 |1541 1930 |0.77 0.77 1.21|10.8G 5460M 1323M 27.4G| 806M 3288M 09-06 09:18:12| 2 1 25 73 0| 0 0 | 111k 27k| 0 0 |1253 1604 |1.43 0.91 1.25|10.8G 5461M 1323M 27.4G| 806M 3288M 09-06 09:18:13| 2 1 24 73 0| 0 0 | 155k 19k| 0 0 |1475 2098 |1.43 0.91 1.25|10.8G 5462M 1323M 27.4G| 806M 3288M 09-06 09:18:14| 4 1 24 71 0| 0 0 | 235k 44k| 0 0 |1852 2428 |1.43 0.91 1.25|10.8G 5509M 1323M 27.4G| 806M 3288M 09-06 09:18:15| 5 3 24 68 0| 0 8192B| 79k 33k| 0 0 |1970 2597 |1.43 0.91 1.25|10.8G 5556M 1323M 27.4G| 806M 3288M 09-06 09:18:16| 8 3 23 66 0| 0 0 | 198k 75k| 0 0 |2092 2371 |1.43 0.91 1.25|10.9G 5531M 1323M 27.4G| 806M 3288M 09-06 09:18:17| 6 1 24 69 0| 0 0 | 574k 37k| 0 0 |2066 2669 |2.28 1.10 1.31|10.8G 5549M 1323M 27.4G| 806M 3288M 09-06 09:18:18| 2 1 24 73 0| 0 0 | 98k 47k| 0 0 |1453 1909 |2.28 1.10 1.31|10.8G 5549M 1323M 27.4G| 806M 3288M 09-06 09:18:19| 2 1 24 74 0| 0 0 | 110k 35k| 0 0 |1647 2276 |2.28 1.10 1.31|10.8G 5552M 1323M 27.4G| 806M 3288M 09-06 09:18:20| 5 3 22 70 0| 0 0 | 139k 41k| 0 0 |1873 2396 |2.28 1.10 1.31|10.8G 5549M 1323M 27.4G| 806M 3288M 09-06 09:18:21| 4 1 23 72 0| 0 0 | 178k 54k| 0 0 |2017 2541 |2.28 1.10 1.31|10.9G 5539M 1323M 27.4G| 806M 3288M 09-06 09:18:22| 5 2 23 70 0| 0 0 |5557k 32k| 0 0 |1678 2150 |3.06 1.28 1.37|10.9G 5536M 1323M 27.4G| 806M 3288M 09-06 09:18:23| 5 1 23 71 0| 0 0 | 330k 53k| 0 0 |2008 2687 |3.06 1.28 1.37|10.9G 5532M 1323M 27.4G| 806M 3288M 09-06 09:18:24| 2 4 24 70 0| 0 0 | 83k 40k| 0 0 |1518 1925 |3.06 1.28 1.37|10.9G 5433M 1323M 27.4G| 806M 3288M 09-06 09:18:25| 7 2 19 71 0| 0 0 | 12M 37k| 0 0 |2436 3225 |3.06 1.28 1.37|10.9G 5419M 1323M 27.4G| 806M 3288M ----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap--- time |usr sys idl wai stl| read writ| recv send| in out | int csw | 1m 5m 15m | used free buff cach| used free 09-06 09:18:26| 3 1 24 71 0| 0 0 | 203k 44k| 0 0 |1884 2404 |3.06 1.28 1.37|10.9G 5421M 1323M 27.4G| 806M 3288M 09-06 09:18:27| 3 1 24 73 0| 0 0 | 253k 41k| 0 0 |2031 2883 |3.77 1.45 1.43|10.9G 5415M 1323M 27.4G| 806M 3288M 09-06 09:18:28| 2 1 24 73 0| 0 0 | 55k 36k| 0 0 |1767 2520 |3.77 1.45 1.43|10.9G 5415M 1323M 27.4G| 806M 3288M 09-06 09:18:29| 1 1 25 74 0| 0 0 | 84k 15k| 0 0 | 995 1332 |3.77 1.45 1.43|10.9G 5414M 1323M 27.4G| 806M 3288M 09-06 09:18:30| 2 2 24 73 0| 0 0 | 81k 26k| 0 0 |1507 2113 |3.77 1.45 1.43|10.9G 5408M 1323M 27.4G| 806M 3288M 09-06 09:18:31| 3 2 23 72 0| 0 0 | 233k 36k| 0 0 |1419 1810 |3.77 1.45 1.43|10.9G 5397M 1323M 27.4G| 806M 3288M 09-06 09:18:32| 2 1 24 74 0| 0 0 | 427k 13k| 0 0 |1134 1619 |4.43 1.63 1.48|10.9G 5396M 1323M 27.4G| 806M 3288M 09-06 09:18:33| 5 1 22 72 0| 0 0 | 157k 33k| 0 0 |1660 2274 |4.43 1.63 1.48|10.9G 5377M 1323M 27.4G| 806M 3288M 09-06 09:18:34| 3 1 24 72 0| 0 0 | 102k 49k| 0 0 |1738 2291 |4.43 1.63 1.48|10.9G 5474M 1323M 27.4G| 806M 3288M 09-06 09:18:35| 5 2 1 92 0| 0 0 | 477k 20k| 0 0 |1710 2485 |4.43 1.63 1.48|10.9G 5469M 1323M 27.4G| 806M 3288M 09-06 09:18:36| 2 1 0 98 0| 0 0 | 91k 45k| 0 0 |1530 1990 |4.43 1.63 1.48|10.9G 5466M 1323M 27.4G| 806M 3288M 09-06 09:18:37| 3 1 0 97 0| 0 0 | 101k 26k| 0 0 |1367 1898 |5.12 1.82 1.55|10.9G 5464M 1323M 27.4G| 806M 3288M 09-06 09:18:38| 2 1 0 98 0| 0 0 | 60k 19k| 0 0 |1018 1480 |5.12 1.82 1.55|10.9G 5461M 1323M 27.4G| 806M 3288M 09-06 09:18:39| 4 2 0 94 0| 0 0 | 153k 29k| 0 0 |1746 2435 |5.12 1.82 1.55|10.9G 5455M 1323M 27.4G| 806M 3288M 09-06 09:18:40| 4 4 0 93 0| 0 0 | 105k 23k| 0 0 |3130 1899 |5.12 1.82 1.55|10.9G 5452M 1323M 27.4G| 806M 3288M 09-06 09:18:41| 2 1 0 97 0| 0 0 | 51k 41k| 0 0 |1349 1595 |5.12 1.82 1.55|10.9G 5454M 1323M 27.4G| 806M 3288M 09-06 09:18:42| 5 2 0 94 0| 0 0 | 53k 20k| 0 0 |6616 12k|5.75 2.00 1.61|10.9G 5454M 1323M 27.4G| 806M 3288M 09-06 09:18:43| 3 2 0 96 0| 0 0 | 159k 32k| 0 0 |1843 2444 |5.75 2.00 1.61|10.9G 5448M 1323M 27.4G| 806M 3288M 09-06 09:18:44| 3 5 0 93 0| 0 136k| 304k 46k| 0 0 |2235 2745 |5.75 2.00 1.61|10.9G 5346M 1323M 27.4G| 806M 3288M 09-06 09:18:45| 3 4 0 94 0| 0 12k| 284k 37k| 0 0 |2589 3458 |5.75 2.00 1.61|11.0G 5338M 1323M 27.4G| 806M 3288M 09-06 09:18:46| 59 10 7 22 2| 0 7680k| 657k 73k| 0 0 |4646 3617 |5.75 2.00 1.61|11.1G 5153M 1323M 27.4G| 806M 3288M <-- <---- High I/Os ends here (same time) 09-06 09:18:47| 96 4 0 0 0| 0 0 | 115k 17k| 0 0 |3629 1600 |6.33 2.19 1.67|11.2G 5095M 1323M 27.4G| 806M 3288M 09-06 09:18:48| 35 10 14 42 0|4096B 0 | 515k 56k| 0 0 |4860 3055 |6.33 2.19 1.67|11.2G 4917M 1323M 27.6G| 806M 3288M 09-06 09:18:49| 3 1 23 72 0| 0 0 | 305k 50k| 0 0 |2312 3214 |6.33 2.19 1.67|11.2G 4919M 1323M 27.6G| 806M 3288M 09-06 09:18:50| 5 1 2 91 0| 0 0 | 297k 34k| 0 0 |2084 3173 |6.33 2.19 1.67|11.2G 4923M 1323M 27.6G| 806M 3288M 09-06 09:18:51| 1 1 0 98 0| 0 12k| 125k 39k| 0 0 |1343 1707 |6.33 2.19 1.67|11.2G 4923M 1323M 27.6G| 806M 3288M 09-06 09:18:52| 2 0 0 98 0| 0 8192B| 198k 19k| 0 0 |1307 1888 |6.62 2.32 1.71|11.2G 4923M 1323M 27.6G| 806M 3288M ----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap--- time |usr sys idl wai stl| read writ| recv send| in out | int csw | 1m 5m 15m | used free buff cach| used free 09-06 09:18:53| 6 1 2 92 0| 0 1964k| 505k 18k| 0 0 |1448 1901 |6.62 2.32 1.71|11.2G 4924M 1323M 27.6G| 806M 3288M 09-06 09:18:54| 70 8 6 14 1| 84k 15M| 569k 207k| 0 0 |6079 9717 |6.62 2.32 1.71|11.2G 5021M 1323M 27.6G| 806M 3288M 09-06 09:18:55| 72 11 15 2 1| 276k 7164k| 206k 1633k| 0 0 |5241 11k|6.62 2.32 1.71|11.2G 5029M 1323M 27.6G| 806M 3288M
- hypervisor3
---system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap--- - time |usr sys idl wai stl| read writ| recv send| in out | int csw | 1m 5m 15m | used free buff cach| used free| 09-06 09:18:01| 23 3 73 0 0| 113M 13M| 30M 57M| 0 0 | 68k 83k|17.7 17.7 17.1| 174G 6181M 65.6G 904M| 11G 4264M| <-- nothing really special 09-06 09:18:02| 31 4 64 0 0| 105M 24M| 39M 37M| 0 0 | 77k 85k|17.7 17.7 17.1| 174G 6130M 65.6G 900M| 11G 4264M| 09-06 09:18:03| 29 4 66 0 0| 74M 37M| 69M 64M| 0 0 | 76k 81k|17.7 17.7 17.1| 174G 6139M 65.6G 905M| 11G 4264M| 09-06 09:18:04| 23 3 74 0 0| 122M 5272k| 45M 47M| 0 0 | 58k 68k|17.7 17.7 17.1| 174G 6376M 65.6G 900M| 11G 4264M| 09-06 09:18:05| 22 2 75 0 0| 118M 3112k| 72M 43M| 0 0 | 54k 64k|18.1 17.8 17.1| 174G 6362M 65.6G 901M| 11G 4264M| 09-06 09:18:06| 21 2 77 0 0| 121M 1984k| 66M 37M| 0 0 | 53k 66k|18.1 17.8 17.1| 174G 6344M 65.6G 901M| 11G 4264M| 09-06 09:18:07| 21 2 77 0 0| 125M 3400k| 11M 15M| 0 0 | 55k 51k|18.1 17.8 17.1| 174G 6347M 65.6G 900M| 11G 4264M| 09-06 09:18:08| 20 2 77 0 0| 117M 2504k| 50M 97M| 0 0 | 47k 59k|18.1 17.8 17.1| 174G 6374M 65.6G 901M| 11G 4264M| 09-06 09:18:09| 19 2 78 0 0| 122M 1096k| 11M 20M| 0 0 | 41k 53k|18.1 17.8 17.1| 174G 6378M 65.6G 901M| 11G 4264M| 09-06 09:18:10| 22 2 75 0 0| 119M 1252k|6773k 7031k| 0 0 | 42k 53k|18.0 17.7 17.1| 174G 6375M 65.6G 900M| 11G 4264M| 09-06 09:18:11| 20 2 77 0 0| 110M 280k|6545k 24M| 0 0 | 41k 49k|18.0 17.7 17.1| 174G 6353M 65.6G 901M| 11G 4264M| 09-06 09:18:12| 19 2 78 0 0| 127M 984k| 14M 26M| 0 0 | 55k 69k|18.0 17.7 17.1| 174G 6350M 65.6G 906M| 11G 4264M| 09-06 09:18:13| 20 2 78 0 0| 119M 1612k| 14M 54M| 0 0 | 47k 55k|18.0 17.7 17.1| 174G 6348M 65.6G 901M| 11G 4264M| 09-06 09:18:14| 20 2 78 0 0| 130M 5928k|9900k 9520k| 0 0 | 39k 50k|18.0 17.7 17.1| 174G 6357M 65.6G 905M| 11G 4264M| 09-06 09:18:15| 20 2 78 0 0| 125M 28k|6291k 4024k| 0 0 | 38k 51k|17.7 17.7 17.1| 174G 6358M 65.6G 900M| 11G 4264M| 09-06 09:18:16| 21 2 77 0 0| 119M 756k| 26M 33M| 0 0 | 44k 51k|17.7 17.7 17.1| 174G 6352M 65.6G 901M| 11G 4264M| 09-06 09:18:17| 23 2 74 0 0| 119M 892k| 13M 12M| 0 0 | 53k 58k|17.7 17.7 17.1| 174G 6403M 65.6G 903M| 11G 4264M| 09-06 09:18:18| 21 2 76 1 0| 128M 1424k| 19M 21M| 0 0 | 42k 50k|17.7 17.7 17.1| 174G 6421M 65.6G 900M| 11G 4264M| 09-06 09:18:19| 20 2 78 0 0| 129M 212k|4580k 4293k| 0 0 | 35k 45k|17.7 17.7 17.1| 174G 6431M 65.6G 901M| 11G 4264M| 09-06 09:18:20| 21 2 77 0 0| 128M 392k| 17M 20M| 0 0 | 44k 57k|17.6 17.7 17.1| 174G 6421M 65.6G 902M| 11G 4264M| 09-06 09:18:21| 19 2 78 0 0| 114M 892k| 16M 19M| 0 0 | 41k 53k|17.6 17.7 17.1| 174G 6395M 65.6G 900M| 11G 4264M| 09-06 09:18:22| 19 2 79 0 0| 135M 536k| 12M 32M| 0 0 | 43k 52k|17.6 17.7 17.1| 174G 6393M 65.6G 901M| 11G 4264M| 09-06 09:18:23| 19 2 79 0 0| 124M 168k| 10M 24M| 0 0 | 52k 48k|17.6 17.7 17.1| 174G 6402M 65.6G 901M| 11G 4264M| 09-06 09:18:24| 22 2 76 0 0| 133M 664k|9970k 10M| 0 0 | 44k 58k|17.6 17.7 17.1| 174G 6395M 65.6G 901M| 11G 4264M| 09-06 09:18:25| 17 3 80 0 0| 120M 620k| 11M 18M| 788k 0 | 56k 48k|17.3 17.6 17.1| 173G 7316M 65.6G 901M| 11G 4265M| 09-06 09:18:26| 20 2 78 0 0| 122M 3204k| 16M 50M| 0 0 | 43k 55k|17.3 17.6 17.1| 173G 7256M 65.6G 901M| 11G 4265M| ----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap--- time |usr sys idl wai stl| read writ| recv send| in out | int csw | 1m 5m 15m | used free buff cach| used free| 09-06 09:18:27| 18 2 79 0 0| 117M 516k|5237k 37M| 0 0 | 41k 50k|17.3 17.6 17.1| 173G 7198M 65.6G 906M| 11G 4265M| 09-06 09:18:28| 17 2 80 0 0| 114M 604k|5723k 24M| 0 0 | 41k 52k|17.3 17.6 17.1| 173G 7176M 65.6G 901M| 11G 4265M| 09-06 09:18:29| 17 2 81 0 0| 111M 496k|4194k 9513k| 0 0 | 34k 46k|17.3 17.6 17.1| 173G 7104M 65.6G 905M| 11G 4265M| 09-06 09:18:30| 19 2 78 0 0| 123M 296k| 58M 112M| 0 0 | 41k 54k|17.1 17.6 17.1| 174G 7010M 65.6G 901M| 11G 4265M| 09-06 09:18:31| 20 2 77 1 0| 124M 628k|5238k 5703k| 0 0 | 56k 59k|17.1 17.6 17.1| 174G 6975M 65.6G 904M| 11G 4265M| 09-06 09:18:32| 19 2 79 0 0| 77M 260k| 42M 111M| 0 0 | 38k 45k|17.1 17.6 17.1| 174G 6932M 65.6G 890M| 11G 4265M| 09-06 09:18:33| 18 1 81 0 0|2256k 580k|8906k 16M| 0 0 | 40k 56k|17.1 17.6 17.1| 174G 6961M 65.6G 889M| 11G 4265M| 09-06 09:18:34| 15 1 84 0 0| 0 132k|3960k 5071k| 0 0 | 35k 46k|17.1 17.6 17.1| 174G 6972M 65.6G 889M| 11G 4265M| 09-06 09:18:35| 19 1 80 0 0| 684k 1000k|5877k 4836k| 684k 0 | 42k 53k|16.5 17.4 17.0| 173G 7287M 65.6G 889M| 11G 4265M| 09-06 09:18:36| 17 1 81 0 0|2316k 372k|3694k 4162k| 0 0 | 34k 45k|16.5 17.4 17.0| 173G 7242M 65.6G 889M| 11G 4265M| 09-06 09:18:37| 17 1 82 0 0| 12k 880k|4275k 5946k| 0 0 | 35k 44k|16.5 17.4 17.0| 173G 7221M 65.6G 889M| 11G 4265M| 09-06 09:18:38| 16 1 83 0 0| 0 256k|3790k 4358k| 0 0 | 30k 39k|16.5 17.4 17.0| 173G 7212M 65.6G 889M| 11G 4265M| 09-06 09:18:39| 17 1 81 0 0|8192B 732k|3941k 3681k| 0 0 | 41k 44k|16.5 17.4 17.0| 173G 7200M 65.6G 889M| 11G 4265M| 09-06 09:18:40| 20 1 78 0 0| 0 292k|4244k 4244k| 0 0 | 36k 49k|16.3 17.4 17.0| 173G 7163M 65.6G 889M| 11G 4265M| 09-06 09:18:41| 19 1 80 0 0| 0 636k|5937k 5896k| 0 0 | 36k 46k|16.3 17.4 17.0| 173G 7125M 65.6G 889M| 11G 4265M| 09-06 09:18:42| 19 2 80 0 0|5120k 524k|5582k 6305k| 0 0 | 45k 58k|16.3 17.4 17.0| 174G 7094M 65.6G 889M| 11G 4265M| 09-06 09:18:43| 18 1 80 0 0| 24k 492k|5208k 5165k| 0 0 | 36k 48k|16.3 17.4 17.0| 174G 7080M 65.6G 890M| 11G 4265M| 09-06 09:18:44| 17 2 81 0 0| 0 1472k|3637k 3256k| 0 0 | 34k 47k|16.3 17.4 17.0| 174G 7050M 65.6G 890M| 11G 4265M| 09-06 09:18:45| 22 2 76 0 0| 460k 8460k| 13M 8543k| 432k 0 | 57k 67k|15.8 17.3 17.0| 173G 7376M 65.6G 890M| 11G 4265M| 09-06 09:18:46| 32 4 64 0 0| 676k 40M| 92M 86M|4096B 0 | 123k 146k|15.8 17.3 17.0| 173G 7265M 65.6G 888M| 11G 4265M| 09-06 09:18:47| 26 2 71 0 0| 56k 2312k| 74M 123M| 0 0 | 87k 95k|15.8 17.3 17.0| 173G 7306M 65.6G 890M| 11G 4265M| 09-06 09:18:48| 31 2 67 0 0| 16k 2004k| 32M 31M| 0 0 | 87k 89k|15.8 17.3 17.0| 173G 7251M 65.6G 890M| 11G 4265M| 09-06 09:18:49| 26 2 72 0 0| 48k 2368k| 21M 19M| 44k 0 | 65k 69k|15.8 17.3 17.0| 174G 7088M 65.6G 890M| 11G 4265M| 09-06 09:18:50| 24 2 75 0 0| 28k 2600k| 16M 15M|8192B 0 | 65k 54k|16.0 17.3 17.0| 174G 7009M 65.6G 890M| 11G 4265M| 09-06 09:18:51| 24 2 75 0 0| 152k 9844k| 27M 36M| 0 0 | 70k 82k|16.0 17.3 17.0| 174G 7006M 65.6G 890M| 11G 4265M| 09-06 09:18:52| 17 1 82 0 0|5372k 2868k| 12M 10M| 0 0 | 42k 54k|16.0 17.3 17.0| 174G 6992M 65.6G 890M| 11G 4265M|
- beaubourg
09-06 09:18:02| 9 8 75 8 0| 563M 18M| 41M 22M| 0 0 | 27k 64k|7.80 8.44 8.75| 121G 79.9G 261G 62.8G| 0 0 09-06 09:18:03| 3 2 73 22 0| 101M 5236k| 12M 8567k| 0 0 | 11k 20k|7.80 8.44 8.75| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:04| 3 0 68 29 0| 0 0 |8311k 4808k| 0 0 |3600 4635 |7.80 8.44 8.75| 121G 80.0G 261G 62.8G| 0 0 <---- boum 09-06 09:18:05| 4 0 65 31 0| 0 0 |4348k 5231k| 0 0 |3968 5233 |7.80 8.44 8.75| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:06| 4 0 65 31 0| 0 0 |1878k 1007k| 0 0 |2166 2651 |8.22 8.52 8.77| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:07| 4 0 57 39 0| 0 0 |8297k 9089k| 0 0 |3998 3939 |8.22 8.52 8.77| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:08| 4 0 56 40 0| 0 0 |3487k 3646k| 0 0 |3314 4021 |8.22 8.52 8.77| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:09| 4 1 56 39 0| 0 0 |2402k 2550k| 0 0 |4193 4989 |8.22 8.52 8.77| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:10| 4 0 56 40 0| 0 0 |1938k 1140k| 0 0 |2232 2009 |8.22 8.52 8.77| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:11| 4 0 56 40 0| 0 0 |1329k 1565k| 0 0 |3450 4095 |9.16 8.71 8.83| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:12| 4 0 56 40 0| 0 0 |1051k 1158k| 0 0 |3193 5046 |9.16 8.71 8.83| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:13| 4 0 56 40 0| 0 28k|2213k 2807k| 0 0 |2815 3207 |9.16 8.71 8.83| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:14| 5 1 55 40 0| 0 0 |1035k 1031k| 0 0 |8983 22k|9.16 8.71 8.83| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:15| 4 0 56 40 0| 0 0 |1068k 459k| 0 0 |9416 4303 |9.16 8.71 8.83| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:16| 3 0 51 45 0| 0 0 |1493k 1324k| 0 0 | 12k 3513 |10.3 8.96 8.92| 121G 80.0G 261G 62.8G| 0 0 ----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap--- time |usr sys idl wai stl| read writ| recv send| in out | int csw | 1m 5m 15m | used free buff cach| used free 09-06 09:18:17| 4 0 43 53 0| 0 0 |1720k 2336k| 0 0 | 11k 4721 |10.3 8.96 8.92| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:18| 3 0 40 56 0| 0 0 |1826k 929k| 0 0 |2316 2405 |10.3 8.96 8.92| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:19| 3 0 40 56 0| 0 0 | 938k 649k| 0 0 |1995 2256 |10.3 8.96 8.92| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:20| 3 0 37 59 0| 0 0 | 601k 405k| 0 0 |1536 1501 |10.3 8.96 8.92| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:21| 3 0 37 59 0| 0 0 |1667k 938k| 0 0 |1979 2303 |12.8 9.49 9.09| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:22| 4 0 33 64 0| 0 0 |1406k 746k| 0 0 |2559 2556 |12.8 9.49 9.09| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:23| 3 0 31 65 0| 0 0 | 624k 672k| 0 0 |1609 1730 |12.8 9.49 9.09| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:24| 3 0 31 65 0| 0 0 |1204k 513k| 0 0 |1889 2200 |12.8 9.49 9.09| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:25| 4 0 31 65 0| 0 0 |1022k 965k| 0 0 |3159 3653 |12.8 9.49 9.09| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:26| 4 0 31 65 0| 0 0 | 612k 581k| 0 0 |1879 2195 |15.3 10.1 9.28| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:27| 4 0 31 65 0| 0 0 |1095k 671k| 0 0 |2578 3250 |15.3 10.1 9.28| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:28| 3 0 31 65 0| 0 0 |1053k 894k| 0 0 |2127 2096 |15.3 10.1 9.28| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:29| 3 0 31 65 0| 0 0 | 741k 551k| 0 0 |1533 1503 |15.3 10.1 9.28| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:30| 4 0 31 65 0| 0 0 |1040k 897k| 0 0 |1863 2114 |15.3 10.1 9.28| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:31| 6 0 30 64 0| 0 0 | 915k 964k| 0 0 |6016 8122 |18.0 10.7 9.49| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:32| 7 0 30 63 0| 0 0 | 784k 579k| 0 0 |6165 8370 |18.0 10.7 9.49| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:33| 7 0 29 64 0| 0 0 |1523k 1102k| 0 0 |9261 13k|18.0 10.7 9.49| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:34| 6 0 29 64 0| 0 0 | 564k 100k| 0 0 |8901 13k|18.0 10.7 9.49| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:35| 7 0 28 65 0| 0 0 |2387k 1573k| 0 0 |9775 15k|18.0 10.7 9.49| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:36| 8 1 30 62 0| 0 0 | 788k 808k| 0 0 | 13k 24k|20.6 11.4 9.71| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:37| 7 0 31 62 0| 0 0 | 500k 556k| 0 0 |8967 14k|20.6 11.4 9.71| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:38| 4 0 31 64 0| 0 0 | 518k 208k| 0 0 |4886 6677 |20.6 11.4 9.71| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:39| 4 0 31 65 0| 0 0 | 791k 658k| 0 0 |5017 6785 |20.6 11.4 9.71| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:40| 3 0 31 65 0| 0 20k| 519k 1127k| 0 0 |5004 6911 |20.6 11.4 9.71| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:41| 4 0 31 65 0| 0 0 | 771k 492k| 0 0 |6079 8603 |23.0 12.0 9.93| 121G 80.0G 261G 62.8G| 0 0 ----system---- --total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- ------memory-usage----- ----swap--- time |usr sys idl wai stl| read writ| recv send| in out | int csw | 1m 5m 15m | used free buff cach| used free 09-06 09:18:42| 3 0 31 65 0| 0 0 | 699k 590k| 0 0 |5484 8172 |23.0 12.0 9.93| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:43| 3 0 31 65 0| 0 0 | 335k 238k| 0 0 |5121 7363 |23.0 12.0 9.93| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:44| 3 0 31 65 0|1424k 13M| 971k 2219k| 0 0 |7268 11k|23.0 12.0 9.93| 121G 80.0G 261G 62.8G| 0 0 09-06 09:18:45| 5 3 60 32 0| 131M 22M|1569k 3753k| 0 0 |8893 21k|23.0 12.0 9.93| 121G 80.1G 261G 62.8G| 0 0
There are alerts logged on beaubourg:
Jun 09 09:18:00 beaubourg sudo[2259195]: pam_unix(sudo:session): session closed for user root Jun 09 09:18:07 beaubourg sshd[2262443]: Connection closed by 192.168.100.29 port 50410 [preauth] Jun 09 09:18:45 beaubourg ceph-mgr[12237]: ::ffff:192.168.100.29 - - [09/Jun/2023:09:18:08] "GET /metrics HTTP/1.1" 200 - "" "Prometheus/2.7.1+ds" Jun 09 09:18:15 beaubourg systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter... Jun 09 09:18:44 beaubourg zed[2262750]: eid=37674 class=delay pool='data' vdev=wwn-0x500003965c898908-part1 size=102400 offset=527745323008 priority=3 err=0 flags=0x180880 delay=38247ms bookmark=515:40824:0:3771 Jun 09 09:18:45 beaubourg systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded. Jun 09 09:18:44 beaubourg zed[2262749]: eid=37673 class=delay pool='data' vdev=wwn-0x50000396ec892211-part1 size=4096 offset=2471137456128 priority=3 err=0 flags=0x180880 delay=38246ms bookmark=515:71702:2:0 Jun 09 09:18:45 beaubourg systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter. Jun 09 09:18:44 beaubourg zed[2262768]: eid=37675 class=delay pool='data' vdev=wwn-0x50000396ec892211-part1 size=61440 offset=2554489651200 priority=3 err=0 flags=0x180880 delay=38246ms bookmark=515:26150:1:3 Jun 09 09:18:45 beaubourg pve_exporter[9463]: Exception thrown while rendering view Jun 09 09:18:45 beaubourg pve_exporter[9463]: Traceback (most recent call last): Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in urlopen Jun 09 09:18:45 beaubourg pve_exporter[9463]: httplib_response = self._make_request( Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 445, in _make_request Jun 09 09:18:45 beaubourg pve_exporter[9463]: six.raise_from(e, None) Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "<string>", line 3, in raise_from Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 440, in _make_request Jun 09 09:18:45 beaubourg pve_exporter[9463]: httplib_response = conn.getresponse() Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3.9/http/client.py", line 1347, in getresponse Jun 09 09:18:45 beaubourg pve_exporter[9463]: response.begin() Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3.9/http/client.py", line 307, in begin Jun 09 09:18:45 beaubourg pve_exporter[9463]: version, status, reason = self._read_status() Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3.9/http/client.py", line 276, in _read_status Jun 09 09:18:45 beaubourg pve_exporter[9463]: raise RemoteDisconnected("Remote end closed connection without" Jun 09 09:18:45 beaubourg pve_exporter[9463]: http.client.RemoteDisconnected: Remote end closed connection without response Jun 09 09:18:45 beaubourg pve_exporter[9463]: During handling of the above exception, another exception occurred: Jun 09 09:18:45 beaubourg pve_exporter[9463]: Traceback (most recent call last): Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send Jun 09 09:18:45 beaubourg pve_exporter[9463]: resp = conn.urlopen( Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 755, in urlopen Jun 09 09:18:45 beaubourg pve_exporter[9463]: retries = retries.increment( Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 532, in increment Jun 09 09:18:45 beaubourg pve_exporter[9463]: raise six.reraise(type(error), error, _stacktrace) Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/six.py", line 718, in reraise Jun 09 09:18:45 beaubourg pve_exporter[9463]: raise value.with_traceback(tb) Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in urlopen Jun 09 09:18:45 beaubourg pve_exporter[9463]: httplib_response = self._make_request( Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 445, in _make_request Jun 09 09:18:45 beaubourg pve_exporter[9463]: six.raise_from(e, None) Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "<string>", line 3, in raise_from Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 440, in _make_request Jun 09 09:18:45 beaubourg pve_exporter[9463]: httplib_response = conn.getresponse() Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3.9/http/client.py", line 1347, in getresponse Jun 09 09:18:45 beaubourg pve_exporter[9463]: response.begin() Jun 09 09:18:44 beaubourg zed[2262774]: eid=37676 class=delay pool='data' vdev=wwn-0x500003965c898908-part1 size=61440 offset=527748186112 priority=3 err=0 flags=0x180880 delay=38246ms bookmark=515:40824:1:3 Jun 09 09:18:45 beaubourg ceph-osd[1383751]: 2023-06-09T09:18:45.871+0000 7f92da79f700 -1 osd.1 20966 heartbeat_check: no reply from 192.168.100.32:6806 osd.2 since back 2023-06-09T09:18:17.007157+0000 front 2023-06-09T09:18:17.007286+0000 (oldest deadline 2023-06-09T09:18:41.707544+0000) Jun 09 09:18:45 beaubourg ceph-osd[1383751]: 2023-06-09T09:18:45.875+0000 7f92da79f700 -1 osd.1 20966 get_health_metrics reporting 9 slow ops, oldest is osd_op(client.234005970.0:187787146 1.9a 1:594fd069:::rbd_data.7d741c621bb64d.0000000000000f76:head [stat out=16b,write 102400~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966) Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3.9/http/client.py", line 307, in begin Jun 09 09:18:45 beaubourg pve_exporter[9463]: version, status, reason = self._read_status() Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3.9/http/client.py", line 276, in _read_status Jun 09 09:18:45 beaubourg pve_exporter[9463]: raise RemoteDisconnected("Remote end closed connection without" Jun 09 09:18:45 beaubourg pve_exporter[9463]: urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) Jun 09 09:18:45 beaubourg pve_exporter[9463]: During handling of the above exception, another exception occurred: Jun 09 09:18:45 beaubourg pve_exporter[9463]: Traceback (most recent call last): Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/pve_exporter/http.py", line 96, in view Jun 09 09:18:45 beaubourg pve_exporter[9463]: return view_registry[endpoint](**params) Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/pve_exporter/http.py", line 38, in on_pve Jun 09 09:18:45 beaubourg pve_exporter[9463]: output = collect_pve(self._config[module], target, self._collectors) Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/pve_exporter/collector.py", line 316, in collect_pve Jun 09 09:18:45 beaubourg pve_exporter[9463]: pve = ProxmoxAPI(host, **config) Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/proxmoxer/core.py", line 106, in __init__ Jun 09 09:18:45 beaubourg pve_exporter[9463]: self._backend = importlib.import_module('.backends.%s' % backend, 'proxmoxer').Backend(host, **kwargs) Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/proxmoxer/backends/https.py", line 125, in __init__ Jun 09 09:18:45 beaubourg pve_exporter[9463]: self.auth = ProxmoxHTTPAuth(self.base_url, user, password, verify_ssl) Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/proxmoxer/backends/https.py", line 42, in __init__ Jun 09 09:18:45 beaubourg pve_exporter[9463]: response_data = requests.post(base_url + "/access/ticket", Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/requests/api.py", line 119, in post Jun 09 09:18:45 beaubourg pve_exporter[9463]: return request('post', url, data=data, json=json, **kwargs) Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/requests/api.py", line 61, in request Jun 09 09:18:45 beaubourg pve_exporter[9463]: return session.request(method=method, url=url, **kwargs) Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/requests/sessions.py", line 542, in request Jun 09 09:18:45 beaubourg pve_exporter[9463]: resp = self.send(prep, **send_kwargs) Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/requests/sessions.py", line 655, in send Jun 09 09:18:45 beaubourg pve_exporter[9463]: r = adapter.send(request, **kwargs) Jun 09 09:18:45 beaubourg pve_exporter[9463]: File "/usr/lib/python3/dist-packages/requests/adapters.py", line 498, in send Jun 09 09:18:45 beaubourg pve_exporter[9463]: raise ConnectionError(err, request=request) Jun 09 09:18:45 beaubourg ceph-osd[901983]: 2023-06-09T09:18:45.843+0000 7fe1c5bcb700 -1 osd.0 20966 heartbeat_check: no reply from 192.168.100.32:6830 osd.1 since back 2023-06-09T09:18:18.138217+0000 front 2023-06-09T09:18:18.138551+0000 (oldest deadline 2023-06-09T09:18:41.638640+0000) Jun 09 09:18:45 beaubourg ceph-osd[901983]: 2023-06-09T09:18:45.843+0000 7fe1c5bcb700 -1 osd.0 20966 heartbeat_check: no reply from 192.168.100.32:6806 osd.2 since back 2023-06-09T09:18:18.138499+0000 front 2023-06-09T09:18:18.138189+0000 (oldest deadline 2023-06-09T09:18:41.638640+0000) Jun 09 09:18:45 beaubourg ceph-osd[901983]: 2023-06-09T09:18:45.843+0000 7fe1c5bcb700 -1 osd.0 20966 get_health_metrics reporting 4 slow ops, oldest is osd_op(client.234005970.0:187787181 1.7a 1:5e63b769:::rbd_data.7d741c621bb64d.0000000000000e84:head [stat,write 3670016~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966) Jun 09 09:18:44 beaubourg zed[2262780]: eid=37677 class=delay pool='data' vdev=wwn-0x50000396ec89220d-part1 size=61440 offset=2554490093568 priority=3 err=0 flags=0x180880 delay=38246ms bookmark=515:26150:1:4 Jun 09 09:18:45 beaubourg pve_exporter[9463]: requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) Jun 09 09:18:45 beaubourg pve_exporter[9463]: 192.168.100.29 - - [09/Jun/2023 09:18:45] "GET /pve?target=127.0.0.1 HTTP/1.1" 500 - Jun 09 09:18:44 beaubourg zed[2262784]: eid=37678 class=delay pool='data' vdev=wwn-0x50000396ec89220d-part1 size=61440 offset=2554489651200 priority=3 err=0 flags=0x180880 delay=38246ms bookmark=515:26150:1:3 Jun 09 09:18:44 beaubourg zed[2262791]: eid=37679 class=delay pool='data' vdev=wwn-0x50000396ec89220d-part1 size=77824 offset=2482146983936 priority=0 err=0 flags=0x180880 delay=41762ms bookmark=515:117070:0:3948 Jun 09 09:18:44 beaubourg zed[2262796]: eid=37680 class=delay pool='data' vdev=wwn-0x500003976c880145-part1 size=61440 offset=429071273984 priority=3 err=0 flags=0x180880 delay=38247ms bookmark=515:26682:1:1 Jun 09 09:18:44 beaubourg zed[2262799]: eid=37681 class=delay pool='data' vdev=wwn-0x500003976c880145-part1 size=61440 offset=528495591424 priority=3 err=0 flags=0x180880 delay=38247ms bookmark=515:28560:1:3 Jun 09 09:18:44 beaubourg zed[2262806]: eid=37683 class=delay pool='data' vdev=wwn-0x50000397bc8969ad-part1 size=102400 offset=2954688135168 priority=3 err=0 flags=0x180880 delay=38247ms bookmark=515:28560:0:3042 Jun 09 09:18:44 beaubourg zed[2262805]: eid=37682 class=delay pool='data' vdev=wwn-0x50000397bc8969b5-part1 size=102400 offset=2954688135168 priority=3 err=0 flags=0x180880 delay=38247ms bookmark=515:28560:0:3042 Jun 09 09:18:44 beaubourg zed[2262809]: eid=37684 class=delay pool='data' vdev=wwn-0x50000397bc8969ad-part1 size=45056 offset=2697892405248 priority=0 err=0 flags=0x180880 delay=41761ms bookmark=515:99886:0:1986 Jun 09 09:18:44 beaubourg zed[2262812]: eid=37685 class=delay pool='data' vdev=wwn-0x50000397bc8969ad-part1 size=102400 offset=2954688749568 priority=3 err=0 flags=0x180880 delay=38247ms bookmark=515:27780:0:2026 Jun 09 09:18:44 beaubourg zed[2262815]: eid=37686 class=delay pool='data' vdev=wwn-0x50000397bc8969b5-part1 size=81920 offset=2207910199296 priority=0 err=0 flags=0x180880 delay=41762ms bookmark=515:105586:0:1840 Jun 09 09:18:44 beaubourg zed[2262826]: eid=37687 class=delay pool='data' vdev=wwn-0x50000397bc8969b5-part1 size=102400 offset=2954689507328 priority=3 err=0 flags=0x180880 delay=38247ms bookmark=515:26682:0:1123 Jun 09 09:18:44 beaubourg zed[2262843]: eid=37688 class=delay pool='data' vdev=wwn-0x50000397bc8969ad-part1 size=45056 offset=474415509504 priority=0 err=0 flags=0x180880 delay=41591ms bookmark=515:76516:0:256 Jun 09 09:18:44 beaubourg zed[2262855]: eid=37689 class=delay pool='data' vdev=wwn-0x50000396ec892211-part1 size=73728 offset=183821381632 priority=0 err=0 flags=0x180880 delay=41748ms bookmark=515:82402:0:5485 Jun 09 09:18:44 beaubourg zed[2262856]: eid=37690 class=delay pool='data' vdev=wwn-0x50000396ec892211-part1 size=65536 offset=106711961600 priority=0 err=0 flags=0x180880 delay=41763ms bookmark=515:113938:0:3016 Jun 09 09:18:44 beaubourg zed[2262859]: eid=37691 class=delay pool='data' vdev=wwn-0x50000396ec892205-part1 size=4096 offset=358977654784 priority=3 err=0 flags=0x180880 delay=38250ms bookmark=515:71702:2:0 Jun 09 09:18:44 beaubourg zed[2262862]: eid=37692 class=delay pool='data' vdev=wwn-0x50000396ec892205-part1 size=61440 offset=2294103654400 priority=3 err=0 flags=0x180880 delay=38250ms bookmark=515:26150:1:4 Jun 09 09:18:44 beaubourg zed[2262865]: eid=37693 class=delay pool='data' vdev=wwn-0x50000396ec892251-part1 size=61440 offset=527748186112 priority=3 err=0 flags=0x180880 delay=38251ms bookmark=515:40824:1:3 Jun 09 09:18:44 beaubourg zed[2262867]: eid=37694 class=delay pool='data' vdev=wwn-0x50000396ec892251-part1 size=102400 offset=527745323008 priority=3 err=0 flags=0x180880 delay=38252ms bookmark=515:40824:0:3771 Jun 09 09:18:44 beaubourg zed[2262871]: eid=37695 class=delay pool='data' vdev=wwn-0x50000396ec892251-part1 size=98304 offset=554396454912 priority=0 err=0 flags=0x180880 delay=41677ms bookmark=515:26682:0:1598 Jun 09 09:18:44 beaubourg zed[2262873]: eid=37696 class=delay pool='data' vdev=wwn-0x50000396ec894e39-part1 size=4096 offset=358977654784 priority=3 err=0 flags=0x180880 delay=38251ms bookmark=515:71702:2:0 Jun 09 09:18:44 beaubourg zed[2262876]: eid=37698 class=delay pool='data' vdev=wwn-0x50000396ec894e39-part1 size=81920 offset=655563485184 priority=0 err=0 flags=0x180880 delay=41702ms bookmark=515:10476:0:2846 Jun 09 09:18:44 beaubourg zed[2262877]: eid=37697 class=delay pool='data' vdev=wwn-0x50000396ec894e39-part1 size=61440 offset=2294103715840 priority=3 err=0 flags=0x180880 delay=38251ms bookmark=515:26682:1:0 Jun 09 09:18:45 beaubourg pveproxy[923104]: problem with client ::ffff:127.0.0.1; Connection timed out Jun 09 09:18:45 beaubourg pve-firewall[16031]: firewall update time (38.377 seconds) Jun 09 09:18:46 beaubourg ceph-mon[12265]: 2023-06-09T09:18:46.255+0000 7fd066db7700 -1 mon.beaubourg@0(leader) e9 get_health_metrics reporting 1 slow ops, oldest is log(1 entries from seq 3778789 at 2023-06-09T09:18:02.953204+0000) Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: received log Jun 09 09:18:46 beaubourg pvescheduler[2262427]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: received log Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: received log Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: received log Jun 09 09:18:46 beaubourg pvestatd[16064]: status update time (39.138 seconds) Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-node/beaubourg: -1 Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/110: -1 Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/beaubourg/proxmox: -1 Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/beaubourg/local: -1 Jun 09 09:18:46 beaubourg pmxcfs[11649]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/beaubourg/proxmox-cephfs: -1 Jun 09 09:18:46 beaubourg ceph-osd[901983]: 2023-06-09T09:18:46.855+0000 7fe1c5bcb700 -1 osd.0 20966 get_health_metrics reporting 5 slow ops, oldest is osd_op(client.234014830.0:58091941 1.e9 1:97457025:::rbd_data.1bea316ef5ffc8.0000000000000d30:head [read 2543616~4096] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e20966) Jun 09 09:18:46 beaubourg ceph-osd[1383751]: 2023-06-09T09:18:46.887+0000 7f92da79f700 -1 osd.1 20966 heartbeat_check: no reply from 192.168.100.32:6806 osd.2 since back 2023-06-09T09:18:17.007157+0000 front 2023-06-09T09:18:17.007286+0000 (oldest deadline 2023-06-09T09:18:41.707544+0000) Jun 09 09:18:46 beaubourg ceph-osd[901992]: 2023-06-09T09:18:46.959+0000 7fdd02c62700 -1 osd.3 20966 heartbeat_check: no reply from 192.168.100.32:6806 osd.2 since back 2023-06-09T09:18:16.691842+0000 front 2023-06-09T09:18:16.691828+0000 (oldest deadline 2023-06-09T09:18:41.992279+0000) Jun 09 09:18:46 beaubourg ceph-osd[887873]: 2023-06-09T09:18:46.983+0000 7fb5967ac700 -1 osd.2 20966 heartbeat_check: no reply from 192.168.100.32:6814 osd.3 since back 2023-06-09T09:18:16.159237+0000 front 2023-06-09T09:18:16.159377+0000 (oldest deadline 2023-06-09T09:18:40.259573+0000) Jun 09 09:18:46 beaubourg ceph-osd[887873]: 2023-06-09T09:18:46.983+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966) Jun 09 09:18:47 beaubourg kernel: sd 0:0:2:0: [sda] tag#148 Sense Key : Recovered Error [current] [descriptor] Jun 09 09:18:47 beaubourg kernel: sd 0:0:2:0: [sda] tag#148 Add. Sense: Defect list not found Jun 09 09:18:47 beaubourg ceph-osd[901983]: 2023-06-09T09:18:47.907+0000 7fe1c5bcb700 -1 osd.0 20966 get_health_metrics reporting 5 slow ops, oldest is osd_op(client.234014830.0:58091941 1.e9 1:97457025:::rbd_data.1bea316ef5ffc8.0000000000000d30:head [read 2543616~4096] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e20966) Jun 09 09:18:47 beaubourg ceph-osd[901992]: 2023-06-09T09:18:47.927+0000 7fdd02c62700 -1 osd.3 20966 heartbeat_check: no reply from 192.168.100.32:6806 osd.2 since back 2023-06-09T09:18:16.691842+0000 front 2023-06-09T09:18:16.691828+0000 (oldest deadline 2023-06-09T09:18:41.992279+0000) Jun 09 09:18:47 beaubourg ceph-osd[887873]: 2023-06-09T09:18:47.959+0000 7fb5967ac700 -1 osd.2 20966 heartbeat_check: no reply from 192.168.100.32:6814 osd.3 since back 2023-06-09T09:18:16.159237+0000 front 2023-06-09T09:18:16.159377+0000 (oldest deadline 2023-06-09T09:18:40.259573+0000) Jun 09 09:18:47 beaubourg ceph-osd[887873]: 2023-06-09T09:18:47.959+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966) Jun 09 09:18:48 beaubourg kernel: sd 0:0:3:0: [sdb] tag#278 Sense Key : Recovered Error [current] [descriptor] Jun 09 09:18:48 beaubourg kernel: sd 0:0:3:0: [sdb] tag#278 Add. Sense: Defect list not found Jun 09 09:18:48 beaubourg ceph-osd[901992]: 2023-06-09T09:18:48.899+0000 7fdd02c62700 -1 osd.3 20966 heartbeat_check: no reply from 192.168.100.32:6806 osd.2 since back 2023-06-09T09:18:16.691842+0000 front 2023-06-09T09:18:16.691828+0000 (oldest deadline 2023-06-09T09:18:41.992279+0000) Jun 09 09:18:48 beaubourg ceph-osd[901983]: 2023-06-09T09:18:48.899+0000 7fe1c5bcb700 -1 osd.0 20966 get_health_metrics reporting 5 slow ops, oldest is osd_op(client.234014830.0:58091941 1.e9 1:97457025:::rbd_data.1bea316ef5ffc8.0000000000000d30:head [read 2543616~4096] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e20966) Jun 09 09:18:48 beaubourg ceph-osd[887873]: 2023-06-09T09:18:48.915+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966) Jun 09 09:18:49 beaubourg ceph-osd[887873]: 2023-06-09T09:18:49.891+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 2 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966) Jun 09 09:18:49 beaubourg kernel: sd 0:0:4:0: [sdc] tag#274 Sense Key : Recovered Error [current] [descriptor] Jun 09 09:18:49 beaubourg kernel: sd 0:0:4:0: [sdc] tag#274 Add. Sense: Defect list not found Jun 09 09:18:49 beaubourg ceph-osd[901983]: 2023-06-09T09:18:49.943+0000 7fe1c5bcb700 -1 osd.0 20966 get_health_metrics reporting 5 slow ops, oldest is osd_op(client.234014830.0:58091941 1.e9 1:97457025:::rbd_data.1bea316ef5ffc8.0000000000000d30:head [read 2543616~4096] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e20966) Jun 09 09:18:50 beaubourg pve-ha-crm[17598]: loop take too long (45 seconds) Jun 09 09:18:50 beaubourg ceph-osd[887873]: 2023-06-09T09:18:50.939+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 2 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966) Jun 09 09:18:50 beaubourg ceph-osd[901983]: 2023-06-09T09:18:50.975+0000 7fe1c5bcb700 -1 osd.0 20966 get_health_metrics reporting 5 slow ops, oldest is osd_op(client.234014830.0:58091941 1.e9 1:97457025:::rbd_data.1bea316ef5ffc8.0000000000000d30:head [read 2543616~4096] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e20966) Jun 09 09:18:51 beaubourg kernel: sd 0:0:5:0: [sdd] tag#284 Sense Key : Recovered Error [current] [descriptor] Jun 09 09:18:51 beaubourg kernel: sd 0:0:5:0: [sdd] tag#284 Add. Sense: Defect list not found Jun 09 09:18:51 beaubourg pve-ha-lrm[19180]: loop take too long (45 seconds) Jun 09 09:18:51 beaubourg ceph-osd[887873]: 2023-06-09T09:18:51.947+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 2 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966) Jun 09 09:18:52 beaubourg ceph-osd[901983]: 2023-06-09T09:18:52.003+0000 7fe1c5bcb700 -1 osd.0 20966 get_health_metrics reporting 5 slow ops, oldest is osd_op(client.234014830.0:58091941 1.e9 1:97457025:::rbd_data.1bea316ef5ffc8.0000000000000d30:head [read 2543616~4096] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e20966) Jun 09 09:18:52 beaubourg kernel: sd 0:0:6:0: [sde] tag#245 Sense Key : Recovered Error [current] [descriptor] Jun 09 09:18:52 beaubourg kernel: sd 0:0:6:0: [sde] tag#245 Add. Sense: Defect list not found Jun 09 09:18:52 beaubourg ceph-osd[887873]: 2023-06-09T09:18:52.995+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 2 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966) Jun 09 09:18:53 beaubourg kernel: sd 0:0:9:0: [sdg] tag#142 Sense Key : Recovered Error [current] Jun 09 09:18:53 beaubourg kernel: sd 0:0:9:0: [sdg] tag#142 Add. Sense: Defect list not found Jun 09 09:18:53 beaubourg kernel: sd 0:0:11:0: [sdi] tag#198 Sense Key : Recovered Error [current] [descriptor] Jun 09 09:18:53 beaubourg kernel: sd 0:0:11:0: [sdi] tag#198 Add. Sense: Defect list not found Jun 09 09:18:53 beaubourg kernel: sd 0:0:12:0: [sdj] tag#110 Sense Key : Recovered Error [current] [descriptor] Jun 09 09:18:53 beaubourg kernel: sd 0:0:12:0: [sdj] tag#110 Add. Sense: Defect list not found Jun 09 09:18:53 beaubourg ceph-osd[887873]: 2023-06-09T09:18:53.975+0000 7fb5967ac700 -1 osd.2 20966 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.229229085.0:177881041 1.a8 1:155a1194:::rbd_data.6c7753f9e2dc6f.00000000000010db:head [stat,write 782336~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e20966)
Edited by Vincent Sellier - Vincent Sellier mentioned in issue #4929 (closed)
mentioned in issue #4929 (closed)
- Vincent Sellier marked this issue as related to #4929 (closed)
marked this issue as related to #4929 (closed)
- Author Owner
For the record, an etc check perf on an etc with a ceph storage:
root@rancher-node-admin-mgmt1:/# etcdctl check perf 60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s PASS: Throughput is 149 writes/s Slowest request took too long: 1.372047s PASS: Stddev is 0.095261s FAIL
compared to (with local disks storage)
rancher-node-staging-rke2-mgmt1:/ # etcdctl --cacert=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt --cert=/var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key=/var/lib/rancher/rke2/server/tls/etcd/server-client.key check perf 60/60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s PASS: Throughput is 150 writes/s PASS: Slowest request took 0.044872s PASS: Stddev is 0.001569s PASS