You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running into multiple client rate limiter Wait returned an error issues in keda-operator. We have around 50-100 scaled objects across various EKS clusters and this problem seen after we upgraded KEDA from 2.13.0 to 2.15.1 version.
We have a varying number of ScaledObjects across our eks clusters, and as per previous threads we previously increased the qps values but since upgrade we are getting frequent errors.
Our K8s client config is as follows:
kube-api-qps: 35
kube-api-burst: 70
I want to understand that what is the reason KEDA sends so many concurrent requests to the kubernetes api server? Since we do not have a very large number of scaledobjects and default polling_intervalof 30sec is used, why the keda client is ratelimiting the requests? Need to understand if the keda retries are causing issues or is it something else?
Expected Behavior
We should not run into client rate limit issues.
Actual Behavior
We are getting error messages in keda-operator logs for client rate limit issues.
Steps to Reproduce the Problem
Observed in cluster with scaledobjects around 50-100 with keda version 2.15.1 and eks version 1.31.
Logs from KEDA operator
2024-11-25T11:24:51Z ERROR prometheus_scaler error executing prometheus query {"type": "ScaledObject", "namespace": "internal-namespace", "name": "internal-app-1", "error": "Get \"http://vmselect.internal-endpoint.com:8481/select/0/prometheus/api/v1/query?query=sum%28sum%28rate%28envoy_http_downstream_rq_total%7Benvoy_http_conn_manager_prefix%3D~%22ingress_http%7Cingress_https%22%2Csource_cluster%3D%22eks-cluster-1%22%2C+namespace%3D%22internal-namespace%22%2C+service%3D%22internal-app-1%22%2C+pod%3D~%27internal-app-1.%2A%27%7D%5B1m%5D%29%2A60%29+by+%28pod%2C+namespace%29+%2A+ignoring%28pod%2C+namespace%29+group_left%28%29+max%281+%2B+max%28request_buffer_per_datacenter%7Bdatacenter%3D%22sgp%22%7D+or+request_buffer_per_service%7Bdatacenter%3D%22sgp%22%2C+exported_namespace%3D%22internal-namespace%22%2C+exported_service%3D%22internal-app-1%22%7D%29%29+or+sum%28rate%28envoy_http_downstream_rq_total%7Benvoy_http_conn_manager_prefix%3D~%22ingress_http%7Cingress_https%22%2Csource_cluster%3D%22eks-cluster-1%22%2C+namespace%3D%22internal-namespace%22%2C+service%3D%22internal-app-1%22%2C+pod%3D~%27internal-app-1.%2A%27%7D%5B1m%5D%29%2A60%29+by+%28pod%2C+namespace%29%29&time=2024-11-25T11:24:51Z\": context canceled"}
2024-11-25T11:24:51Z ERROR scale_handler error getting scale decision {"scaledObject.Namespace": "internal-namespace", "scaledObject.Name": "internal-app-1", "scaler": "prometheusScaler", "error": "scaler with id 0 not found, len = 0, cache has been probably already invalidated"}
2024-11-25T11:24:51Z ERROR scaleexecutor failed to patch Objects {"scaledobject.Name": "internal-app-1", "scaledObject.Namespace": "internal-namespace", "scaleTarget.Name": "internal-app-1", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:51Z ERROR scaleexecutor error setting ready condition {"scaledobject.Name": "internal-app-1", "scaledObject.Namespace": "internal-namespace", "scaleTarget.Name": "internal-app-1", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:51Z ERROR scaleexecutor failed to patch Objects {"scaledobject.Name": "internal-app-1", "scaledObject.Namespace": "internal-namespace", "scaleTarget.Name": "internal-app-1", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:51Z ERROR scaleexecutor Error setting active condition when triggers are not active {"scaledobject.Name": "internal-app-1", "scaledObject.Namespace": "internal-namespace", "scaleTarget.Name": "internal-app-1", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:51Z ERROR scale_handler error getting metric spec for the scaler {"scaledObject.Namespace": "internal-namespace", "scaledObject.Name": "internal-app-1", "scaler": "prometheusScaler", "error": "scaler with id 0 not found. Len = 0"}
2024-11-25T11:24:51Z ERROR scale_handler error getting metric spec for the scaler {"scaledObject.Namespace": "internal-namespace", "scaledObject.Name": "internal-app-1", "scaler": "cpuMemoryScaler", "error": "scaler with id 1 not found. Len = 0"}
2024-11-25T11:24:52Z ERROR prometheus_scaler error executing prometheus query {"type": "ScaledObject", "namespace": "internal-namespace-2", "name": "internal-app-2", "error": "Get \"http://vmselect.internal-endpoint.com:8481/select/0/prometheus/api/v1/query?query=%28sum%28sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_2_%28%28default%29%7C%28headless%29%29_internal-namespace-2.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+%2B+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_2_%28%28default%29%7C%28headless%29%29_internal-namespace-2.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_2_%28%28default%29%7C%28headless%29%29_internal-namespace-2.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_2_%28%28default%29%7C%28headless%29%29_internal-namespace-2.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29%29%29+%2A+max%281%2B+max%28request_buffer_per_datacenter%7Bdatacenter%3D%22sgp%22%7D+or+request_buffer_per_service%7Bdatacenter%3D%22sgp%22%2C+exported_namespace%3D%22internal-namespace-2%22%2C+exported_service%3D%22internal-app-2%22%7D%29%29+by+%28namespace%2C+service%29&time=2024-11-25T11:24:51Z\": context canceled"}
2024-11-25T11:24:52Z ERROR scale_handler error getting scale decision {"scaledObject.Namespace": "internal-namespace-2", "scaledObject.Name": "internal-app-2", "scaler": "prometheusScaler", "error": "scaler with id 2 not found, len = 0, cache has been probably already invalidated"}
2024-11-25T11:24:52Z ERROR scaleexecutor failed to patch Objects {"scaledobject.Name": "internal-app-2", "scaledObject.Namespace": "internal-namespace-2", "scaleTarget.Name": "internal-app-2", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:52Z ERROR scaleexecutor error setting ready condition {"scaledobject.Name": "internal-app-2", "scaledObject.Namespace": "internal-namespace-2", "scaleTarget.Name": "internal-app-2", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:52Z ERROR scaleexecutor failed to patch Objects {"scaledobject.Name": "internal-app-2", "scaledObject.Namespace": "internal-namespace-2", "scaleTarget.Name": "internal-app-2", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:52Z ERROR scaleexecutor Error setting active condition when triggers are not active {"scaledobject.Name": "internal-app-2", "scaledObject.Namespace": "internal-namespace-2", "scaleTarget.Name": "internal-app-2", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:52Z ERROR scale_handler error getting metric spec for the scaler {"scaledObject.Namespace": "internal-namespace-2", "scaledObject.Name": "internal-app-2", "scaler": "cpuMemoryScaler", "error": "scaler with id 0 not found. Len = 0"}
2024-11-25T11:24:52Z ERROR scale_handler error getting metric spec for the scaler {"scaledObject.Namespace": "internal-namespace-2", "scaledObject.Name": "internal-app-2", "scaler": "cpuMemoryScaler", "error": "scaler with id 1 not found. Len = 0"}
2024-11-25T11:24:53Z ERROR prometheus_scaler error executing prometheus query {"type": "ScaledObject", "namespace": "internal-namespace-3", "name": "internal-app-3", "error": "Get \"http://vmselect.internal-endpoint.com:8481/select/0/prometheus/api/v1/query?query=%28sum%28sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_3_%28%28default%29%7C%28headless%29%29_internal-namespace-3.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+%2B+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_3_%28%28default%29%7C%28headless%29%29_internal-namespace-3.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_3_%28%28default%29%7C%28headless%29%29_internal-namespace-3.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_3_%28%28default%29%7C%28headless%29%29_internal-namespace-3.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29%29%29+%2A+max%281%2B+max%28request_buffer_per_datacenter%7Bdatacenter%3D%22sgp%22%7D+or+request_buffer_per_service%7Bdatacenter%3D%22sgp%22%2C+exported_namespace%3D%22internal-namespace-3%22%2C+exported_service%3D%22internal-app-3%22%7D%29%29+by+%28namespace%2C+service%29&time=2024-11-25T11:24:53Z\": context canceled"}
2024-11-25T11:24:53Z ERROR scale_handler error getting scale decision {"scaledObject.Namespace": "internal-namespace-3", "scaledObject.Name": "internal-app-3", "scaler": "prometheusScaler", "error": "scaler with id 2 not found, len = 0, cache has been probably already invalidated"}
2024-11-25T11:24:53Z ERROR scaleexecutor failed to patch Objects {"scaledobject.Name": "internal-app-3", "scaledObject.Namespace": "internal-namespace-3", "scaleTarget.Name": "internal-app-3", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:53Z ERROR scaleexecutor error setting ready condition {"scaledobject.Name": "internal-app-3", "scaledObject.Namespace": "internal-namespace-3", "scaleTarget.Name": "internal-app-3", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:53Z ERROR scaleexecutor failed to patch Objects {"scaledobject.Name": "internal-app-3", "scaledObject.Namespace": "internal-namespace-3", "scaleTarget.Name": "internal-app-3", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:53Z ERROR scaleexecutor Error setting active condition when triggers are not active {"scaledobject.Name": "internal-app-3", "scaledObject.Namespace": "internal-namespace-3", "scaleTarget.Name": "internal-app-3", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:53Z ERROR scale_handler error getting metric spec for the scaler {"scaledObject.Namespace": "internal-namespace-3", "scaledObject.Name": "internal-app-3", "scaler": "cpuMemoryScaler", "error": "scaler with id 0 not found. Len = 0"}
2024-11-25T11:24:53Z ERROR scale_handler error getting metric spec for the scaler {"scaledObject.Namespace": "internal-namespace-3", "scaledObject.Name": "internal-app-3", "scaler": "cpuMemoryScaler", "error": "scaler with id 1 not found. Len = 0"}
2024-11-25T11:24:54Z ERROR prometheus_scaler error executing prometheus query {"type": "ScaledObject", "namespace": "internal-namespace-4", "name": "internal-app-4", "error": "Get \"http://vmselect.internal-endpoint.com:8481/select/0/prometheus/api/v1/query?query=%28sum%28sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_4_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+%2B+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_4_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_4_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_4_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29%29%29+%2A+max%281%2B+max%28request_buffer_per_datacenter%7Bdatacenter%3D%22sgp%22%7D+or+request_buffer_per_service%7Bdatacenter%3D%22sgp%22%2C+exported_namespace%3D%22internal-namespace-4%22%2C+exported_service%3D%22internal-app-4%22%7D%29%29+by+%28namespace%2C+service%29&time=2024-11-25T11:24:53Z\": context canceled"}
2024-11-25T11:24:54Z ERROR scale_handler error getting scale decision {"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4", "scaler": "prometheusScaler", "error": "scaler with id 2 not found, len = 0, cache has been probably already invalidated"}
2024-11-25T11:24:54Z ERROR scaleexecutor failed to patch Objects {"scaledobject.Name": "internal-app-4", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z ERROR scaleexecutor error setting ready condition {"scaledobject.Name": "internal-app-4", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z ERROR scaleexecutor failed to patch Objects {"scaledobject.Name": "internal-app-4", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z ERROR scaleexecutor Error setting active condition when triggers are not active {"scaledobject.Name": "internal-app-4", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z ERROR scale_handler error getting metric spec for the scaler {"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4", "scaler": "cpuMemoryScaler", "error": "scaler with id 0 not found. Len = 0"}
2024-11-25T11:24:54Z ERROR scale_handler error getting metric spec for the scaler {"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4", "scaler": "cpuMemoryScaler", "error": "scaler with id 1 not found. Len = 0"}
2024-11-25T11:24:54Z ERROR prometheus_scaler error executing prometheus query {"type": "ScaledObject", "namespace": "internal-namespace-4", "name": "internal-app-4-p0", "error": "Get \"http://vmselect.internal-endpoint.com:8481/select/0/prometheus/api/v1/query?query=%28sum%28sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_4_p0_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+%2B+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_4_p0_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_4_p0_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_4_p0_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29%29%29+%2A+max%281%2B+max%28request_buffer_per_datacenter%7Bdatacenter%3D%22sgp%22%7D+or+request_buffer_per_service%7Bdatacenter%3D%22sgp%22%2C+exported_namespace%3D%22internal-namespace-4%22%2C+exported_service%3D%22internal-app-4-p0%22%7D%29%29+by+%28namespace%2C+service%29&time=2024-11-25T11:24:54Z\": context canceled"}
2024-11-25T11:24:54Z ERROR scale_handler error getting scale decision {"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4-p0", "scaler": "prometheusScaler", "error": "scaler with id 2 not found, len = 0, cache has been probably already invalidated"}
2024-11-25T11:24:54Z ERROR scaleexecutor failed to patch Objects {"scaledobject.Name": "internal-app-4-p0", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4-p0", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z ERROR scaleexecutor error setting ready condition {"scaledobject.Name": "internal-app-4-p0", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4-p0", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z ERROR scaleexecutor failed to patch Objects {"scaledobject.Name": "internal-app-4-p0", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4-p0", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z ERROR scaleexecutor Error setting active condition when triggers are not active {"scaledobject.Name": "internal-app-4-p0", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4-p0", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z ERROR scale_handler error getting metric spec for the scaler {"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4-p0", "scaler": "cpuMemoryScaler", "error": "scaler with id 0 not found. Len = 0"}
2024-11-25T11:24:54Z ERROR scale_handler error getting metric spec for the scaler {"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4-p0", "scaler": "cpuMemoryScaler", "error": "scaler with id 1 not found. Len = 0"}
KEDA Version
2.15.1
Kubernetes Version
1.31
Platform
Amazon Web Services
Scaler Details
CpuMemoryScaler & PrometheusScaler
Anything else?
No response
The text was updated successfully, but these errors were encountered:
Hello
I see multiple scaler issues in your logs. When there is a scaling error, KEDA needs to register the status in k8s API as it's the place to store the state. As there are prometheus timeouts, it could trigger the rate limiter. Could you share one of the ScaledObjects that you use?
Report
Running into multiple
client rate limiter Wait returned an error
issues in keda-operator. We have around 50-100 scaled objects across various EKS clusters and this problem seen after we upgraded KEDA from 2.13.0 to 2.15.1 version.We have a varying number of ScaledObjects across our eks clusters, and as per previous threads we previously increased the qps values but since upgrade we are getting frequent errors.
Our K8s client config is as follows:
I want to understand that what is the reason KEDA sends so many concurrent requests to the kubernetes api server? Since we do not have a very large number of scaledobjects and default
polling_interval
of 30sec is used, why the keda client is ratelimiting the requests? Need to understand if the keda retries are causing issues or is it something else?Expected Behavior
We should not run into client rate limit issues.
Actual Behavior
We are getting error messages in keda-operator logs for client rate limit issues.
Steps to Reproduce the Problem
Observed in cluster with scaledobjects around 50-100 with keda version 2.15.1 and eks version 1.31.
Logs from KEDA operator
KEDA Version
2.15.1
Kubernetes Version
1.31
Platform
Amazon Web Services
Scaler Details
CpuMemoryScaler & PrometheusScaler
Anything else?
No response
The text was updated successfully, but these errors were encountered: