hub-agent fails when clusterresourcesnapshot takes more than 30 seconds to respond #709

d4rkhunt33r · 2024-03-04T22:58:02Z

d4rkhunt33r
Mar 4, 2024

Hi

I am new to this project but my company is evaluating if we should use azure fleet to sync objects between kubernetes clusters for disaster recovery purposes

During evaluation we installed the hub-agent and the member-agent but after load testing with a bunch of kubernetes objects, the hub-agent keeps crashing

I think the issue is related to the time it takes to get the clusterresourcesnapshot objects from the kubernetes api server

I0304 21:18:59.379494 1 controller/controller.go:214] "Shutdown signal received, waiting for all workers to finish" controller="cluster-resource-placement-controller-v1beta1" I0304 21:18:59.379510 1 controller/controller.go:214] "Shutdown signal received, waiting for all workers to finish" controller="resource-change-controller" E0304 21:18:59.380129 1 v1beta1/membercluster_controller.go:98] "failed to join" err="failed to sync namespace: failed to get namespace fleet-member-istio-replica: Timeout: failed waiting for *v1.Namespace Informer to sync" memberCluster="istio-replica" W0304 21:19:00.579680 1 cache/reflector.go:535] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1beta1.ClusterResourceSnapshot: the server was unable to return a response in the time allotted, but may still be processing the request (get clusterresourcesnapshots.placement.kubernetes-fleet.io) I0304 21:19:00.579938 1 trace/trace.go:236] Trace[1130795302]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229 (04-Mar-2024 21:18:00.538) (total time: 60040ms): Trace[1130795302]: ---"Objects listed" error:the server was unable to return a response in the time allotted, but may still be processing the request (get clusterresourcesnapshots.placement.kubernetes-fleet.io) 60040ms (21:19:00.579) Trace[1130795302]: [1m0.04089546s] [1m0.04089546s] END E0304 21:19:00.580063 1 cache/reflector.go:147] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: Failed to watch *v1beta1.ClusterResourceSnapshot: failed to list *v1beta1.ClusterResourceSnapshot: the server was unable to return a response in the time allotted, but may still be processing the request (get clusterresourcesnapshots.placement.kubernetes-fleet.io) I0304 21:19:01.379413 1 trace/trace.go:236] Trace[1018449428]: "DeltaFIFO Pop Process" ID:ptm-device-status-api,Depth:2542,Reason:slow event handlers blocking the queue (04-Mar-2024 21:19:01.249) (total time: 130ms): Trace[1018449428]: [130.285897ms] [130.285897ms] END E0304 21:19:29.389490 1 hubagent/main.go:171] "problem starting manager" err="[failed to wait for clusterschedulingpolicysnapshot caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.ClusterSchedulingPolicySnapshot, failed waiting for all runnables to end within grace period of 30s: context deadline exceeded]"

We have 2841 cluster resource snapshots in the cluster and the response from the server for kubectl get clusterresourcesnapshot takes more than 1 minute

is it possible to change the ammount of time the kubernetes client library waits for the resources in order to not crash ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hub-agent fails when clusterresourcesnapshot takes more than 30 seconds to respond #709

{{title}}

Replies: 0 comments

Select a reply

hub-agent fails when clusterresourcesnapshot takes more than 30 seconds to respond #709

d4rkhunt33r Mar 4, 2024

Replies: 0 comments

d4rkhunt33r
Mar 4, 2024