-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crio context deadline exceeded #8007
Comments
can you post the cri-o logs while this is happening? |
journalctl -u cri-o? |
yeah! |
this is message happening Apr 11 21:50:04 node7 crio: time="2024-04-11 21:50:04.343272892+08:00" level=info msg="Checking image status: registry.aliyuncs.com/linkflow/pause:3.9" id=76e1a622-ce3d-4a1f-a686-58b026b73c8c name=/runtime.v1.ImageService/ImageStatus
Apr 11 21:50:04 node7 crio: time="2024-04-11 21:50:04.343488104+08:00" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c,RepoTags:[registry.aliyuncs.com/linkflow/pause:3.9],RepoDigests:[registry.aliyuncs.com/linkflow/pause@sha256:0fc1f3b764be56f7c881a69cbd553ae25a2b5523c6901fbacb8270307c29d0c4],Size_:748009,Uid:&Int64Value{Value:65535,},Username:,Spec:&ImageSpec{Image:,Annotations:map[string]string{},},Pinned:false,},Info:map[string]string{},}" id=76e1a622-ce3d-4a1f-a686-58b026b73c8c name=/runtime.v1.ImageService/ImageStatus
Apr 11 21:51:01 node7 systemd: Started Session 74940 of user root.
Apr 11 21:52:01 node7 systemd: Started Session 74941 of user root.
Apr 11 21:53:01 node7 systemd: Started Session 74942 of user root.
Apr 11 21:54:01 node7 systemd: Started Session 74943 of user root.
Apr 11 21:55:01 node7 systemd: Started Session 74944 of user root.
Apr 11 21:55:04 node7 crio: time="2024-04-11 21:55:04.346715313+08:00" level=info msg="Checking image status: registry.aliyuncs.com/linkflow/pause:3.9" id=efa5b073-1589-4f68-ba45-2ac4b892dd76 name=/runtime.v1.ImageService/ImageStatus
Apr 11 21:55:04 node7 crio: time="2024-04-11 21:55:04.346949509+08:00" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c,RepoTags:[registry.aliyuncs.com/linkflow/pause:3.9],RepoDigests:[registry.aliyuncs.com/linkflow/pause@sha256:0fc1f3b764be56f7c881a69cbd553ae25a2b5523c6901fbacb8270307c29d0c4],Size_:748009,Uid:&Int64Value{Value:65535,},Username:,Spec:&ImageSpec{Image:,Annotations:map[string]string{},},Pinned:false,},Info:map[string]string{},}" id=efa5b073-1589-4f68-ba45-2ac4b892dd76 name=/runtime.v1.ImageService/ImageStatus
Apr 11 21:56:01 node7 systemd: Started Session 74945 of user root.
Apr 11 21:57:01 node7 systemd: Started Session 74946 of user root.
Apr 11 21:58:01 node7 systemd: Started Session 74947 of user root.
Apr 11 21:59:01 node7 systemd: Started Session 74948 of user root.
Apr 11 22:00:01 node7 systemd: Started Session 74949 of user root.
Apr 11 22:00:04 node7 crio: time="2024-04-11 22:00:04.350012002+08:00" level=info msg="Checking image status: registry.aliyuncs.com/linkflow/pause:3.9" id=66416206-0a2d-4460-89a3-f2d749917e50 name=/runtime.v1.ImageService/ImageStatus
Apr 11 22:00:04 node7 crio: time="2024-04-11 22:00:04.350245730+08:00" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c,RepoTags:[registry.aliyuncs.com/linkflow/pause:3.9],RepoDigests:[registry.aliyuncs.com/linkflow/pause@sha256:0fc1f3b764be56f7c881a69cbd553ae25a2b5523c6901fbacb8270307c29d0c4],Size_:748009,Uid:&Int64Value{Value:65535,},Username:,Spec:&ImageSpec{Image:,Annotations:map[string]string{},},Pinned:false,},Info:map[string]string{},}" id=66416206-0a2d-4460-89a3-f2d749917e50 name=/runtime.v1.ImageService/ImageStatus
Apr 11 22:01:01 node7 systemd: Started Session 74950 of user root.
Apr 11 22:01:01 node7 systemd: Started Session 74951 of user root.
Apr 11 22:02:01 node7 systemd: Started Session 74952 of user root.
Apr 11 22:03:01 node7 systemd: Started Session 74953 of user root.
Apr 11 22:03:35 node7 kubelet: E0411 22:03:35.137593 14277 remote_runtime.go:479] "ExecSync cmd from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" containerID="586b18245a375bf09f4da25d029c2471d1e72f943fd5b5f78f784f6db43d8e0f" cmd=[/bin/calico-node -bird-ready -felix-ready]
Apr 11 22:04:01 node7 systemd: Started Session 74954 of user root.
Apr 11 22:04:11 node7 kubelet: E0411 22:04:11.103826 14277 remote_runtime.go:277] "ListPodSandbox with filter from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" filter="&PodSandboxFilter{Id:,State:&PodSandboxStateValue{State:SANDBOX_READY,},LabelSelector:map[string]string{},}"
Apr 11 22:04:11 node7 kubelet: E0411 22:04:11.103895 14277 kuberuntime_sandbox.go:297] "Failed to list pod sandboxes" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 11 22:04:11 node7 kubelet: E0411 22:04:11.103915 14277 kubelet_pods.go:1103] "Error listing containers" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 11 22:04:11 node7 kubelet: E0411 22:04:11.103928 14277 kubelet.go:2427] "Failed cleaning pods" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 11 22:04:11 node7 kubelet: E0411 22:04:11.103941 14277 kubelet.go:2431] "Housekeeping took longer than expected" err="housekeeping took too long" expected="1s" actual="2m50.48s"
Apr 11 22:04:11 node7 kubelet: E0411 22:04:11.103956 14277 kubelet.go:2281] "Skipping pod synchronization" err="container runtime is down"
Apr 11 22:04:11 node7 kubelet: E0411 22:04:11.103843 14277 remote_runtime.go:415] "ContainerStatus from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" containerID="5107706393b4c2f329c856f1dce971434f64dc790cb2897440ed8464f5f496d3"
Apr 11 22:04:11 node7 kubelet: E0411 22:04:11.103985 14277 remote_runtime.go:277] "ListPodSandbox with filter from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" filter="nil"
Apr 11 22:04:11 node7 kubelet: E0411 22:04:11.103994 14277 kuberuntime_sandbox.go:297] "Failed to list pod sandboxes" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 11 22:04:11 node7 kubelet: E0411 22:04:11.104005 14277 container_log_manager.go:235] "Failed to get container status" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" containerID="5107706393b4c2f329c856f1dce971434f64dc790cb2897440ed8464f5f496d3"
Apr 11 22:04:11 node7 kubelet: E0411 22:04:11.104218 14277 kubelet.go:1371] "Container garbage collection failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 11 22:05:01 node7 systemd: Started Session 74955 of user root.
Apr 11 22:06:01 node7 systemd: Started Session 74956 of user root.
Apr 11 22:06:11 node7 kubelet: E0411 22:06:11.113781 14277 remote_runtime.go:415] "ContainerStatus from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" containerID="7f375a77bb311b99f25074f99058bdbb6f5b44ece39378c15d92503bc36bc49e"
Apr 11 22:06:11 node7 kubelet: E0411 22:06:11.113844 14277 container_log_manager.go:235] "Failed to get container status" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" containerID="7f375a77bb311b99f25074f99058bdbb6f5b44ece39378c15d92503bc36bc49e"
Apr 11 22:06:11 node7 crio: time="2024-04-11 22:06:11.120433243+08:00" level=info msg="Checking image status: registry.aliyuncs.com/linkflow/pause:3.9" id=2815d546-2efa-4cef-9f10-ba0dcdaaec66 name=/runtime.v1.ImageService/ImageStatus
Apr 11 22:06:11 node7 crio: time="2024-04-11 22:06:11.120643208+08:00" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c,RepoTags:[registry.aliyuncs.com/linkflow/pause:3.9],RepoDigests:[registry.aliyuncs.com/linkflow/pause@sha256:0fc1f3b764be56f7c881a69cbd553ae25a2b5523c6901fbacb8270307c29d0c4],Size_:748009,Uid:&Int64Value{Value:65535,},Username:,Spec:&ImageSpec{Image:,Annotations:map[string]string{},},Pinned:false,},Info:map[string]string{},}" id=2815d546-2efa-4cef-9f10-ba0dcdaaec66 name=/runtime.v1.ImageService/ImageStatus
Apr 11 22:06:11 node7 kubelet: I0411 22:06:11.121778 14277 setters.go:552] "Node became not ready" node="node7" condition={Type:Ready Status:False LastHeartbeatTime:2024-04-11 22:06:11.121712389 +0800 CST m=+4423015.057618713 LastTransitionTime:2024-04-11 22:06:11.121712389 +0800 CST m=+4423015.057618713 Reason:KubeletNotReady Message:PLEG is not healthy: pleg was last seen active 3m59.017556433s ago; threshold is 3m0s}
Apr 11 22:06:16 node7 kubelet: E0411 22:06:16.164187 14277 remote_runtime.go:479] "ExecSync cmd from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" containerID="586b18245a375bf09f4da25d029c2471d1e72f943fd5b5f78f784f6db43d8e0f" cmd=[/bin/calico-node -bird-ready -felix-ready]
Apr 11 22:07:01 node7 systemd: Started Session 74957 of user root.
Apr 11 22:08:01 node7 systemd: Started Session 74958 of user root.
Apr 11 22:08:11 node7 kubelet: E0411 22:08:11.123785 14277 remote_runtime.go:277] "ListPodSandbox with filter from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" filter="&PodSandboxFilter{Id:,State:&PodSandboxStateValue{State:SANDBOX_READY,},LabelSelector:map[string]string{},}"
Apr 11 22:08:11 node7 kubelet: E0411 22:08:11.123866 14277 remote_runtime.go:277] "ListPodSandbox with filter from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" filter="nil"
Apr 11 22:08:11 node7 kubelet: E0411 22:08:11.123903 14277 kuberuntime_sandbox.go:297] "Failed to list pod sandboxes" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 11 22:08:11 node7 kubelet: E0411 22:08:11.123871 14277 kuberuntime_sandbox.go:297] "Failed to list pod sandboxes" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 11 22:08:11 node7 kubelet: E0411 22:08:11.123952 14277 kubelet_pods.go:1103] "Error listing containers" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 11 22:08:11 node7 kubelet: E0411 22:08:11.123971 14277 kubelet.go:2427] "Failed cleaning pods" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Apr 11 22:08:11 node7 kubelet: E0411 22:08:11.123990 14277 kubelet.go:2431] "Housekeeping took longer than expected" err="housekeeping took too long" expected="1s" actual="3m59.919s"
Apr 11 22:08:11 node7 kubelet: E0411 22:08:11.124018 14277 kubelet.go:2281] "Skipping pod synchronization" err="[container runtime is down, PLEG is not healthy: pleg was last seen active 3m59.00963546s ago; threshold is 3m0s]"
this is crio - Logs begin at 二 2024-02-20 17:20:35 CST, end at 五 2024-04-12 22:53:01 CST. --
4月 11 21:50:04 node7 crio[6968]: time="2024-04-11 21:50:04.343272892+08:00" level=info msg="Checking image status: registry.aliyuncs.com/linkflow/pause:3.9" id=76e1a622-ce3d-4a1f-a686-58b026b73c8c name=/runtime.v1.ImageService/ImageStatus
4月 11 21:50:04 node7 crio[6968]: time="2024-04-11 21:50:04.343488104+08:00" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c,RepoTags:[registry.aliyuncs.com/
4月 11 21:55:04 node7 crio[6968]: time="2024-04-11 21:55:04.346715313+08:00" level=info msg="Checking image status: registry.aliyuncs.com/linkflow/pause:3.9" id=efa5b073-1589-4f68-ba45-2ac4b892dd76 name=/runtime.v1.ImageService/ImageStatus
4月 11 21:55:04 node7 crio[6968]: time="2024-04-11 21:55:04.346949509+08:00" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c,RepoTags:[registry.aliyuncs.com/
4月 11 22:00:04 node7 crio[6968]: time="2024-04-11 22:00:04.350012002+08:00" level=info msg="Checking image status: registry.aliyuncs.com/linkflow/pause:3.9" id=66416206-0a2d-4460-89a3-f2d749917e50 name=/runtime.v1.ImageService/ImageStatus
4月 11 22:00:04 node7 crio[6968]: time="2024-04-11 22:00:04.350245730+08:00" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c,RepoTags:[registry.aliyuncs.com/
4月 11 22:06:11 node7 crio[6968]: time="2024-04-11 22:06:11.120433243+08:00" level=info msg="Checking image status: registry.aliyuncs.com/linkflow/pause:3.9" id=2815d546-2efa-4cef-9f10-ba0dcdaaec66 name=/runtime.v1.ImageService/ImageStatus
4月 11 22:06:11 node7 crio[6968]: time="2024-04-11 22:06:11.120643208+08:00" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c,RepoTags:[registry.aliyuncs.com/
4月 11 22:14:11 node7 crio[6968]: time="2024-04-11 22:14:11.152505028+08:00" level=info msg="Checking image status: registry.aliyuncs.com/linkflow/pause:3.9" id=aed7c6bf-d6b6-450b-b4cd-8e6433addf63 name=/runtime.v1.ImageService/ImageStatus
4月 11 22:14:11 node7 crio[6968]: time="2024-04-11 22:14:11.152874810+08:00" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c,RepoTags:[registry.aliyuncs.com/
4月 11 22:22:11 node7 crio[6968]: time="2024-04-11 22:22:11.175745207+08:00" level=info msg="Checking image status: registry.aliyuncs.com/linkflow/pause:3.9" id=d6f88bac-1113-48b0-8bbf-0a281c28ce49 name=/runtime.v1.ImageService/ImageStatus
4月 11 22:22:11 node7 crio[6968]: time="2024-04-11 22:22:11.176084107+08:00" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c,RepoTags:[registry.aliyuncs.com/
4月 11 22:30:11 node7 crio[6968]: time="2024-04-11 22:30:11.198768268+08:00" level=info msg="Checking image status: registry.aliyuncs.com/linkflow/pause:3.9" id=d48c3c46-57e7-49f8-8526-c12cb33cd380 name=/runtime.v1.ImageService/ImageStatus
4月 11 22:30:11 node7 crio[6968]: time="2024-04-11 22:30:11.199023930+08:00" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c,RepoTags:[registry.aliyuncs.com/
4月 11 22:38:11 node7 crio[6968]: time="2024-04-11 22:38:11.237726770+08:00" level=info msg="Checking image status: registry.aliyuncs.com/linkflow/pause:3.9" id=97745b1a-4526-4090-92f4-9d3bc4d1481d name=/runtime.v1.ImageService/ImageStatus
4月 11 22:38:11 node7 crio[6968]: time="2024-04-11 22:38:11.237985537+08:00" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c,RepoTags:[registry.aliyuncs.com/
4月 11 22:46:11 node7 crio[6968]: time="2024-04-11 22:46:11.273071957+08:00" level=info msg="Checking image status: registry.aliyuncs.com/linkflow/pause:3.9" id=41c06890-1cf3-405b-90d5-5b5952943139 name=/runtime.v1.ImageService/ImageStatus
4月 11 22:46:11 node7 crio[6968]: time="2024-04-11 22:46:11.273854824+08:00" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c,RepoTags:[registry.aliyuncs.com/
4月 11 22:50:11 node7 crio[6968]: time="2024-04-11 22:50:11.293857819+08:00" level=info msg="Stopping container: adc20c68f0b41b037ff20bfb4f2bfec2d2c8c5e8a86db35cf26ec37fa76dce54 (timeout: 30s)" id=09d05bda-c52d-42f7-9d56-1b139ea5ffb9 name=/r
4月 11 22:50:11 node7 crio[6968]: time="2024-04-11 22:50:11.294096082+08:00" level=info msg="Stopping container: 5107706393b4c2f329c856f1dce971434f64dc790cb2897440ed8464f5f496d3 (timeout: 30s)" id=a12df590-31cc-4572-b65f-b35bb2bb6a45 name=/r
4月 11 22:50:11 node7 crio[6968]: time="2024-04-11 22:50:11.294225625+08:00" level=info msg="Stopping container: fa2bd8715b03b25ae517513b8d0b0b67d963239dedcec63e8d293d730e897586 (timeout: 30s)" id=c4921199-cb7e-48b7-98ba-de1e66b1e3cf name=/r
4月 11 22:52:11 node7 crio[6968]: time="2024-04-11 22:52:11.300156247+08:00" level=warning msg="Stopping container adc20c68f0b41b037ff20bfb4f2bfec2d2c8c5e8a86db35cf26ec37fa76dce54 with stop signal timed out: context canceled" id=09d05bda-c52
4月 11 22:52:11 node7 crio[6968]: time="2024-04-11 22:52:11.300197590+08:00" level=info msg="Stopping pod sandbox: 50d6c02598dc65bd386245ec41b9a1b7276cd93fec725f2c44a2e2089bf4a33d" id=699f3c89-b39d-4760-bc28-59dac6a215d3 name=/runtime.v1.Run
4月 11 22:52:11 node7 crio[6968]: time="2024-04-11 22:52:11.300332571+08:00" level=info msg="Stopping pod sandbox: 7b3dab6725265e8a2339f22d2a966550bd4683cfb475c31e0aa9b407189a6cea" id=e7e87036-26a7-4862-9989-84a5c0c2ea58 name=/runtime.v1.Run
4月 11 22:52:11 node7 crio[6968]: time="2024-04-11 22:52:11.300403789+08:00" level=info msg="Got pod network &{Name:salesorder2hudi-3242818dc5f34103-exec-1 Namespace:cdp ID:50d6c02598dc65bd386245ec41b9a1b7276cd93fec725f2c44a2e2089bf4a33d UID
4月 11 22:52:11 node7 crio[6968]: time="2024-04-11 22:52:11.300159723+08:00" level=warning msg="Stopping container 5107706393b4c2f329c856f1dce971434f64dc790cb2897440ed8464f5f496d3 with stop signal timed out: timeout reached after 30 seconds
4月 11 22:52:11 node7 crio[6968]: time="2024-04-11 22:52:11.300615444+08:00" level=info msg="Deleting pod cdp_salesorder2hudi-3242818dc5f34103-exec-1 from CNI network \"k8s-pod-network\" (type=calico)"
4月 11 22:52:11 node7 crio[6968]: time="2024-04-11 22:52:11.300730065+08:00" level=info msg="Got pod network &{Name:journey2hudi-69f59a8dc5f3477e-exec-1 Namespace:cdp ID:7b3dab6725265e8a2339f22d2a966550bd4683cfb475c31e0aa9b407189a6cea UID:d8
4月 11 22:52:11 node7 crio[6968]: time="2024-04-11 22:52:11.300878130+08:00" level=info msg="Deleting pod cdp_journey2hudi-69f59a8dc5f3477e-exec-1 from CNI network \"k8s-pod-network\" (type=calico)"
4月 11 22:52:11 node7 crio[6968]: time="2024-04-11 22:52:11.300910807+08:00" level=info msg="Stopping pod sandbox: 5191dc7b42bf05769b0c164deb6cb1bd62a1cbafb07a94931723eae410e79301" id=1c284dd0-4ace-4bfe-8d52-e0ba4800c8a9 name=/runtime.v1.Run
4月 11 22:52:11 node7 crio[6968]: time="2024-04-11 22:52:11.301247991+08:00" level=info msg="Got pod network &{Name:dim2hudi-2bc8818dc5f333b7-exec-1 Namespace:cdp ID:5191dc7b42bf05769b0c164deb6cb1bd62a1cbafb07a94931723eae410e79301 UID:31dba5
4月 11 22:52:11 node7 crio[6968]: time="2024-04-11 22:52:11.301418523+08:00" level=info msg="Deleting pod cdp_dim2hudi-2bc8818dc5f333b7-exec-1 from CNI network \"k8s-pod-network\" (type=calico)"
4月 11 22:54:11 node7 crio[6968]: time="2024-04-11 22:54:11.313770085+08:00" level=info msg="Checking image status: registry.aliyuncs.com/linkflow/pause:3.9" id=d0ba6b91-38d9-48e9-aa6d-ed20484c631c name=/runtime.v1.ImageService/ImageStatus
4月 11 22:54:11 node7 crio[6968]: time="2024-04-11 22:54:11.314008887+08:00" level=info msg="Image status: &ImageStatusResponse{Image:&Image{Id:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c,RepoTags:[registry.aliyuncs.com/
4月 11 22:58:11 node7 crio[6968]: time="2024-04-11 22:58:11.338855472+08:00" level=info msg="Stopped container fa2bd8715b03b25ae517513b8d0b0b67d963239dedcec63e8d293d730e897586: cdp/dim2hudi-2bc8818dc5f333b7-exec-1/spark-kubernetes-executor"
4月 11 23:02:11 node7 crio[6968]: time="2024-04-11 23:02:11.353814587+08:00" level=info msg="Checking image status: registry.aliyuncs.com/linkflow/pause:3.9" id=0ee1ebef-94d7-4c49-926c-bbc228f62040 name=/runtime.v1.ImageService/ImageStatus
|
how are the resources on the node? is system.slice big enough? I haven't seen this exact thing happen but I'm wondering if cri-o is being throttled or something. The other thought is something is weird with localhost communication so some messages aren't getting connected. both of these guesses are long shots though |
no resources problem。 one of this cluster just install k8s and not running program,about two months later ,one node down(has 9 node),But some nodes in the same cluster haven't had any problems for a year。 |
[root@node7 cni]# free -g
total used free shared buff/cache available
Mem: 125 2 114 0 9 122
Swap: 0 0 0
[root@node7 cni]#
|
How did you identify which log reflects the communication issue? |
well I see cri-o doing some communicating with the kubelet as it's reacting to stop container requests. But for some reason the execs are timing out. maybe they're taking too long? can you try enabling debug log_level in cri-o to see if that helps find the issue? |
Okay, I can start debugging, but reproducing the problem may take a long time |
A friendly reminder that this issue had no activity for 30 days. |
What happened?
[root@node7 ~]# crictl ps
FATA[0002] validate service connection: validate CRI v1 runtime API for endpoint "unix:///var/run/crio/crio.sock": rpc error: code = DeadlineExceeded desc = context deadline exceeded
What did you expect to happen?
k8s node will be notready
How can we reproduce it (as minimally and precisely as possible)?
He has been running for a while and it is not easy to replicate. In the past year, I have encountered this problem about five or six times using CRIO in different clusters
Anything else we need to know?
when systemctl restart cri-o, problem is gone。
CRI-O and Kubernetes version
OS version
Additional environment details (AWS, VirtualBox, physical, etc.)
The text was updated successfully, but these errors were encountered: