Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEDA : Client rate limit issues #6359

Open
Sathyam-Hotstar opened this issue Nov 25, 2024 · 1 comment
Open

KEDA : Client rate limit issues #6359

Sathyam-Hotstar opened this issue Nov 25, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Sathyam-Hotstar
Copy link

Sathyam-Hotstar commented Nov 25, 2024

Report

Running into multiple client rate limiter Wait returned an error issues in keda-operator. We have around 50-100 scaled objects across various EKS clusters and this problem seen after we upgraded KEDA from 2.13.0 to 2.15.1 version.

We have a varying number of ScaledObjects across our eks clusters, and as per previous threads we previously increased the qps values but since upgrade we are getting frequent errors.

Our K8s client config is as follows:

  • kube-api-qps: 35
  • kube-api-burst: 70

I want to understand that what is the reason KEDA sends so many concurrent requests to the kubernetes api server? Since we do not have a very large number of scaledobjects and default polling_intervalof 30sec is used, why the keda client is ratelimiting the requests? Need to understand if the keda retries are causing issues or is it something else?

Expected Behavior

We should not run into client rate limit issues.

Actual Behavior

We are getting error messages in keda-operator logs for client rate limit issues.

Steps to Reproduce the Problem

Observed in cluster with scaledobjects around 50-100 with keda version 2.15.1 and eks version 1.31.

Logs from KEDA operator

2024-11-25T11:24:51Z	ERROR	prometheus_scaler	error executing prometheus query	{"type": "ScaledObject", "namespace": "internal-namespace", "name": "internal-app-1", "error": "Get \"http://vmselect.internal-endpoint.com:8481/select/0/prometheus/api/v1/query?query=sum%28sum%28rate%28envoy_http_downstream_rq_total%7Benvoy_http_conn_manager_prefix%3D~%22ingress_http%7Cingress_https%22%2Csource_cluster%3D%22eks-cluster-1%22%2C+namespace%3D%22internal-namespace%22%2C+service%3D%22internal-app-1%22%2C+pod%3D~%27internal-app-1.%2A%27%7D%5B1m%5D%29%2A60%29+by+%28pod%2C+namespace%29+%2A+ignoring%28pod%2C+namespace%29+group_left%28%29+max%281+%2B+max%28request_buffer_per_datacenter%7Bdatacenter%3D%22sgp%22%7D+or+request_buffer_per_service%7Bdatacenter%3D%22sgp%22%2C+exported_namespace%3D%22internal-namespace%22%2C+exported_service%3D%22internal-app-1%22%7D%29%29+or+sum%28rate%28envoy_http_downstream_rq_total%7Benvoy_http_conn_manager_prefix%3D~%22ingress_http%7Cingress_https%22%2Csource_cluster%3D%22eks-cluster-1%22%2C+namespace%3D%22internal-namespace%22%2C+service%3D%22internal-app-1%22%2C+pod%3D~%27internal-app-1.%2A%27%7D%5B1m%5D%29%2A60%29+by+%28pod%2C+namespace%29%29&time=2024-11-25T11:24:51Z\": context canceled"}
2024-11-25T11:24:51Z	ERROR	scale_handler	error getting scale decision	{"scaledObject.Namespace": "internal-namespace", "scaledObject.Name": "internal-app-1", "scaler": "prometheusScaler", "error": "scaler with id 0 not found, len = 0, cache has been probably already invalidated"}
2024-11-25T11:24:51Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-1", "scaledObject.Namespace": "internal-namespace", "scaleTarget.Name": "internal-app-1", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:51Z	ERROR	scaleexecutor	error setting ready condition	{"scaledobject.Name": "internal-app-1", "scaledObject.Namespace": "internal-namespace", "scaleTarget.Name": "internal-app-1", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:51Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-1", "scaledObject.Namespace": "internal-namespace", "scaleTarget.Name": "internal-app-1", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:51Z	ERROR	scaleexecutor	Error setting active condition when triggers are not active	{"scaledobject.Name": "internal-app-1", "scaledObject.Namespace": "internal-namespace", "scaleTarget.Name": "internal-app-1", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:51Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace", "scaledObject.Name": "internal-app-1", "scaler": "prometheusScaler", "error": "scaler with id 0 not found. Len = 0"}
2024-11-25T11:24:51Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace", "scaledObject.Name": "internal-app-1", "scaler": "cpuMemoryScaler", "error": "scaler with id 1 not found. Len = 0"}
2024-11-25T11:24:52Z	ERROR	prometheus_scaler	error executing prometheus query	{"type": "ScaledObject", "namespace": "internal-namespace-2", "name": "internal-app-2", "error": "Get \"http://vmselect.internal-endpoint.com:8481/select/0/prometheus/api/v1/query?query=%28sum%28sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_2_%28%28default%29%7C%28headless%29%29_internal-namespace-2.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+%2B+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_2_%28%28default%29%7C%28headless%29%29_internal-namespace-2.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_2_%28%28default%29%7C%28headless%29%29_internal-namespace-2.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_2_%28%28default%29%7C%28headless%29%29_internal-namespace-2.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29%29%29+%2A+max%281%2B+max%28request_buffer_per_datacenter%7Bdatacenter%3D%22sgp%22%7D+or+request_buffer_per_service%7Bdatacenter%3D%22sgp%22%2C+exported_namespace%3D%22internal-namespace-2%22%2C+exported_service%3D%22internal-app-2%22%7D%29%29+by+%28namespace%2C+service%29&time=2024-11-25T11:24:51Z\": context canceled"}
2024-11-25T11:24:52Z	ERROR	scale_handler	error getting scale decision	{"scaledObject.Namespace": "internal-namespace-2", "scaledObject.Name": "internal-app-2", "scaler": "prometheusScaler", "error": "scaler with id 2 not found, len = 0, cache has been probably already invalidated"}
2024-11-25T11:24:52Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-2", "scaledObject.Namespace": "internal-namespace-2", "scaleTarget.Name": "internal-app-2", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:52Z	ERROR	scaleexecutor	error setting ready condition	{"scaledobject.Name": "internal-app-2", "scaledObject.Namespace": "internal-namespace-2", "scaleTarget.Name": "internal-app-2", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:52Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-2", "scaledObject.Namespace": "internal-namespace-2", "scaleTarget.Name": "internal-app-2", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:52Z	ERROR	scaleexecutor	Error setting active condition when triggers are not active	{"scaledobject.Name": "internal-app-2", "scaledObject.Namespace": "internal-namespace-2", "scaleTarget.Name": "internal-app-2", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:52Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-2", "scaledObject.Name": "internal-app-2", "scaler": "cpuMemoryScaler", "error": "scaler with id 0 not found. Len = 0"}
2024-11-25T11:24:52Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-2", "scaledObject.Name": "internal-app-2", "scaler": "cpuMemoryScaler", "error": "scaler with id 1 not found. Len = 0"}
2024-11-25T11:24:53Z	ERROR	prometheus_scaler	error executing prometheus query	{"type": "ScaledObject", "namespace": "internal-namespace-3", "name": "internal-app-3", "error": "Get \"http://vmselect.internal-endpoint.com:8481/select/0/prometheus/api/v1/query?query=%28sum%28sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_3_%28%28default%29%7C%28headless%29%29_internal-namespace-3.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+%2B+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_3_%28%28default%29%7C%28headless%29%29_internal-namespace-3.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_3_%28%28default%29%7C%28headless%29%29_internal-namespace-3.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_3_%28%28default%29%7C%28headless%29%29_internal-namespace-3.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29%29%29+%2A+max%281%2B+max%28request_buffer_per_datacenter%7Bdatacenter%3D%22sgp%22%7D+or+request_buffer_per_service%7Bdatacenter%3D%22sgp%22%2C+exported_namespace%3D%22internal-namespace-3%22%2C+exported_service%3D%22internal-app-3%22%7D%29%29+by+%28namespace%2C+service%29&time=2024-11-25T11:24:53Z\": context canceled"}
2024-11-25T11:24:53Z	ERROR	scale_handler	error getting scale decision	{"scaledObject.Namespace": "internal-namespace-3", "scaledObject.Name": "internal-app-3", "scaler": "prometheusScaler", "error": "scaler with id 2 not found, len = 0, cache has been probably already invalidated"}
2024-11-25T11:24:53Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-3", "scaledObject.Namespace": "internal-namespace-3", "scaleTarget.Name": "internal-app-3", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:53Z	ERROR	scaleexecutor	error setting ready condition	{"scaledobject.Name": "internal-app-3", "scaledObject.Namespace": "internal-namespace-3", "scaleTarget.Name": "internal-app-3", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:53Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-3", "scaledObject.Namespace": "internal-namespace-3", "scaleTarget.Name": "internal-app-3", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:53Z	ERROR	scaleexecutor	Error setting active condition when triggers are not active	{"scaledobject.Name": "internal-app-3", "scaledObject.Namespace": "internal-namespace-3", "scaleTarget.Name": "internal-app-3", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:53Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-3", "scaledObject.Name": "internal-app-3", "scaler": "cpuMemoryScaler", "error": "scaler with id 0 not found. Len = 0"}
2024-11-25T11:24:53Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-3", "scaledObject.Name": "internal-app-3", "scaler": "cpuMemoryScaler", "error": "scaler with id 1 not found. Len = 0"}
2024-11-25T11:24:54Z	ERROR	prometheus_scaler	error executing prometheus query	{"type": "ScaledObject", "namespace": "internal-namespace-4", "name": "internal-app-4", "error": "Get \"http://vmselect.internal-endpoint.com:8481/select/0/prometheus/api/v1/query?query=%28sum%28sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_4_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+%2B+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_4_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_4_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_4_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29%29%29+%2A+max%281%2B+max%28request_buffer_per_datacenter%7Bdatacenter%3D%22sgp%22%7D+or+request_buffer_per_service%7Bdatacenter%3D%22sgp%22%2C+exported_namespace%3D%22internal-namespace-4%22%2C+exported_service%3D%22internal-app-4%22%7D%29%29+by+%28namespace%2C+service%29&time=2024-11-25T11:24:53Z\": context canceled"}
2024-11-25T11:24:54Z	ERROR	scale_handler	error getting scale decision	{"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4", "scaler": "prometheusScaler", "error": "scaler with id 2 not found, len = 0, cache has been probably already invalidated"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-4", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	error setting ready condition	{"scaledobject.Name": "internal-app-4", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-4", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	Error setting active condition when triggers are not active	{"scaledobject.Name": "internal-app-4", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4", "scaler": "cpuMemoryScaler", "error": "scaler with id 0 not found. Len = 0"}
2024-11-25T11:24:54Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4", "scaler": "cpuMemoryScaler", "error": "scaler with id 1 not found. Len = 0"}
2024-11-25T11:24:54Z	ERROR	prometheus_scaler	error executing prometheus query	{"type": "ScaledObject", "namespace": "internal-namespace-4", "name": "internal-app-4-p0", "error": "Get \"http://vmselect.internal-endpoint.com:8481/select/0/prometheus/api/v1/query?query=%28sum%28sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_4_p0_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+%2B+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_4_p0_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_4_p0_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_4_p0_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29%29%29+%2A+max%281%2B+max%28request_buffer_per_datacenter%7Bdatacenter%3D%22sgp%22%7D+or+request_buffer_per_service%7Bdatacenter%3D%22sgp%22%2C+exported_namespace%3D%22internal-namespace-4%22%2C+exported_service%3D%22internal-app-4-p0%22%7D%29%29+by+%28namespace%2C+service%29&time=2024-11-25T11:24:54Z\": context canceled"}
2024-11-25T11:24:54Z	ERROR	scale_handler	error getting scale decision	{"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4-p0", "scaler": "prometheusScaler", "error": "scaler with id 2 not found, len = 0, cache has been probably already invalidated"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-4-p0", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4-p0", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	error setting ready condition	{"scaledobject.Name": "internal-app-4-p0", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4-p0", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-4-p0", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4-p0", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	Error setting active condition when triggers are not active	{"scaledobject.Name": "internal-app-4-p0", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4-p0", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4-p0", "scaler": "cpuMemoryScaler", "error": "scaler with id 0 not found. Len = 0"}
2024-11-25T11:24:54Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4-p0", "scaler": "cpuMemoryScaler", "error": "scaler with id 1 not found. Len = 0"}

KEDA Version

2.15.1

Kubernetes Version

1.31

Platform

Amazon Web Services

Scaler Details

CpuMemoryScaler & PrometheusScaler

Anything else?

No response

@Sathyam-Hotstar Sathyam-Hotstar added the bug Something isn't working label Nov 25, 2024
@JorTurFer
Copy link
Member

Hello
I see multiple scaler issues in your logs. When there is a scaling error, KEDA needs to register the status in k8s API as it's the place to store the state. As there are prometheus timeouts, it could trigger the rate limiter. Could you share one of the ScaledObjects that you use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: To Triage
Development

No branches or pull requests

2 participants