Pods not syncing quickly: pod-syncer Error updating pod: context deadline exceeded #1765

alextricity25 · 2024-05-10T22:32:57Z

What happened?

I'm experiencing this rather strange bug where some pods are reporting to be stuck in the "Init: 0/1" status, but only when connected to the vcluster context. When connected to the host cluster context, the pod status reports correctly, however, the pod status in the vcluster occasionally is stuck in either "PendingCreate" or the "Init: 0/1" status, which causes downstream issues with my helm chart installation flow. The pod events while connected to the vcluster context report the following:

Warning  SyncError  13m   pod-syncer         Error updating pod: context deadline exceeded

The below image shows the bug in action with two panes. The left pane is K9s connected to the vcluster context, and the right connected to the host cluster context. As you can see, the pods in the host cluster are "Running", but the same pods in the vcluster context are suck in the "Init:0/1" status.

Looking at the pod events, I see the following:

The pod-syncer error only appears when connected to the vcluster.

The only "error" that I noticed in the vcluster syncer logs is:

filters/wrap.go:54	timeout or abort while handling: method=GET URI="/api/v1/namespaces/xrdm/pods/xxxx-portal-worker-66bc6969b-r4qqr/log?container=xxxx-portal-worker&follow=true&tailLines=100&timestamps=true" audit-ID="1d310b7a-7010-4c0e-a116-7b4127d94193"

What did you expect to happen?

I expect the pods status while connected to the vcluster context to reflect the correct status.

How can we reproduce it (as minimally and precisely as possible)?

Install vcluster version v0.20.0-beta.5 using the helm chart.
Connect to the vcluster and create a bunch of containers
Observe that the pod-syncer is not working as intended

Anything else we need to know?

The pods while connected to the vcluster context will eventually report the correct status, but sometimes it takes 5 or 10 minutes before it does.

Host cluster Kubernetes version

$ kubectl version
Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.0

Host cluster Kubernetes distribution

GKE-1.29.0

vlcuster version

v0.20.0-beta.5

Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)

k8s

OS and Arch

OS:  GKE containerd image
Arch:

The text was updated successfully, but these errors were encountered:

heiko-braun · 2024-05-12T16:21:15Z

Let me summarise: the pods get scheduled on the host, but the vcluster api server doesn’t reflect the status correctly?

Do you observe significant load (i.e. request latency, total requests increased) on the host api server when this happens?

everflux · 2024-05-13T13:19:56Z

Looks similar to #1589 to me

alextricity25 · 2024-05-13T16:00:09Z

@heiko-braun

Let me summarise: the pods get scheduled on the host, but the vcluster api server doesn’t reflect the status correctly?

That's correct!

Do you observe significant load (i.e. request latency, total requests increased) on the host api server when this happens?

I do not. The vcluster pods on the host cluster are given a good amount of resources. 3 vCPUs, 4Gi of memory. Usually the node this vcluster pod is on is no where near these limits.

@everflux

Looks similar to #1589 to me

Yes, indeed! I suppose this issue can be marked as a duplicate. Thanks for catching that!

FabianKramm · 2024-05-14T09:39:16Z

@alextricity25 would you mind trying virtual Kubernetes version v1.29.4 as there was a pretty significant bug in v1.29.0 that caused issues (kubernetes/kubernetes#123448), which could be the problem for this

alextricity25 · 2024-05-14T16:03:40Z

@alextricity25 would you mind trying virtual Kubernetes version v1.29.4 as there was a pretty significant bug in v1.29.0 that caused issues (kubernetes/kubernetes#123448), which could be the problem for this

@FabianKramm The default is v1.29.0. I'll override this to 1.29.4 to see if that does anything. It's difficult to iterate and test to see if this certain changes fixes this issue, because it doesn't happen all the time. I'll let you know!

everflux · 2024-05-18T19:43:51Z

@FabianKramm I tested 1.29.4 (Server Version: v1.29.4+k3s1, I think embedded db/sqlite) with helm chart v0.20.0-beta.5 and still observed the issue once the metrics server was present in the host cluster. (host is 1.18.2)

alextricity25 · 2024-05-21T15:37:16Z

@FabianKramm I also observed the issue again on 1.29.5. Screen shot below. The pods in the vCluster context eventually did report the correct status, but it took about 1-2 minutes before they did.

alextricity25 added the kind/bug label May 10, 2024

alextricity25 mentioned this issue May 13, 2024

Kubernetes API access issues from syncer container on vcluster v0.19.0 #1589

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods not syncing quickly: pod-syncer Error updating pod: context deadline exceeded #1765

Pods not syncing quickly: pod-syncer Error updating pod: context deadline exceeded #1765

alextricity25 commented May 10, 2024 •

edited

heiko-braun commented May 12, 2024

everflux commented May 13, 2024

alextricity25 commented May 13, 2024

FabianKramm commented May 14, 2024 •

edited

alextricity25 commented May 14, 2024

everflux commented May 18, 2024

alextricity25 commented May 21, 2024

Pods not syncing quickly: pod-syncer Error updating pod: context deadline exceeded #1765

Pods not syncing quickly: pod-syncer Error updating pod: context deadline exceeded #1765

Comments

alextricity25 commented May 10, 2024 • edited

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Host cluster Kubernetes version

Host cluster Kubernetes distribution

vlcuster version

Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)

OS and Arch

heiko-braun commented May 12, 2024

everflux commented May 13, 2024

alextricity25 commented May 13, 2024

FabianKramm commented May 14, 2024 • edited

alextricity25 commented May 14, 2024

everflux commented May 18, 2024

alextricity25 commented May 21, 2024

alextricity25 commented May 10, 2024 •

edited

FabianKramm commented May 14, 2024 •

edited