hub-agent fails when clusterresourcesnapshot takes more than 30 seconds to respond #709
Unanswered
d4rkhunt33r
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi
I am new to this project but my company is evaluating if we should use azure fleet to sync objects between kubernetes clusters for disaster recovery purposes
During evaluation we installed the hub-agent and the member-agent but after load testing with a bunch of kubernetes objects, the hub-agent keeps crashing
I think the issue is related to the time it takes to get the clusterresourcesnapshot objects from the kubernetes api server
I0304 21:18:59.379494 1 controller/controller.go:214] "Shutdown signal received, waiting for all workers to finish" controller="cluster-resource-placement-controller-v1beta1" I0304 21:18:59.379510 1 controller/controller.go:214] "Shutdown signal received, waiting for all workers to finish" controller="resource-change-controller" E0304 21:18:59.380129 1 v1beta1/membercluster_controller.go:98] "failed to join" err="failed to sync namespace: failed to get namespace fleet-member-istio-replica: Timeout: failed waiting for *v1.Namespace Informer to sync" memberCluster="istio-replica" W0304 21:19:00.579680 1 cache/reflector.go:535] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1beta1.ClusterResourceSnapshot: the server was unable to return a response in the time allotted, but may still be processing the request (get clusterresourcesnapshots.placement.kubernetes-fleet.io) I0304 21:19:00.579938 1 trace/trace.go:236] Trace[1130795302]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229 (04-Mar-2024 21:18:00.538) (total time: 60040ms): Trace[1130795302]: ---"Objects listed" error:the server was unable to return a response in the time allotted, but may still be processing the request (get clusterresourcesnapshots.placement.kubernetes-fleet.io) 60040ms (21:19:00.579) Trace[1130795302]: [1m0.04089546s] [1m0.04089546s] END E0304 21:19:00.580063 1 cache/reflector.go:147] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: Failed to watch *v1beta1.ClusterResourceSnapshot: failed to list *v1beta1.ClusterResourceSnapshot: the server was unable to return a response in the time allotted, but may still be processing the request (get clusterresourcesnapshots.placement.kubernetes-fleet.io) I0304 21:19:01.379413 1 trace/trace.go:236] Trace[1018449428]: "DeltaFIFO Pop Process" ID:ptm-device-status-api,Depth:2542,Reason:slow event handlers blocking the queue (04-Mar-2024 21:19:01.249) (total time: 130ms): Trace[1018449428]: [130.285897ms] [130.285897ms] END E0304 21:19:29.389490 1 hubagent/main.go:171] "problem starting manager" err="[failed to wait for clusterschedulingpolicysnapshot caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.ClusterSchedulingPolicySnapshot, failed waiting for all runnables to end within grace period of 30s: context deadline exceeded]"
We have 2841 cluster resource snapshots in the cluster and the response from the server for kubectl get clusterresourcesnapshot takes more than 1 minute
is it possible to change the ammount of time the kubernetes client library waits for the resources in order to not crash ?
Beta Was this translation helpful? Give feedback.
All reactions