You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The HPA value using a custom metric was unexpectedly twice the expected value
Describe what you expected:
The HPA value correctly matches the query and the value stored in the datadog-custom-metrics configmap
Steps to reproduce the issue:
Deploy datadog Helm chart v3.59.0 with custom metrics enabled but not clusterAgent.metricsProvider.useDatadogMetrics set
Create two HPAs using an external metric with any metric name, but ensure they are the same in both HPAs
Observe that the HPA value is double what it should be, adding a third HPA will triple the original value
I traced this down in detail within the Datadog codebase and found that everything is working correctly up until the moment that the metric is queried by Kubernetes itself. Here is an example of the response to the custom metric call:
Kubernetes appears to interpret this as an addition and adds them up instead of deduping the results here, resulting in a metric of 202m instead of the correct 101m:
> kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
example-hpa Deployment/example-deployment 202m/1 (avg) 1 5 1 79m
example-second-hpa Deployment/example-second-deployment 202m/1 (avg) 1 5 1 79m
Deleting the second HPA results in correct behaviour:
> kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
example-hpa Deployment/example-deployment 101m/1 (avg) 1 5 1 105m
I'm not sure if this is technically a bug in Kubernetes itself, but it's certainly something that can be worked around in the Datadog custom metric provider. Fixing this could however have unintended consequences for users who accidentally rely on this behaviour though, so I'm not sure what the correct approach here is.
I discovered this issue while migrating to the DatadogMetric CRD (aka clusterAgent.metricsProvider.useDatadogMetrics) and was having difficulty determining why I was seeing different results for what should be an identical query to Datadog.
Agent Environment
Kubernetes v1.29.2
Datadog Helm chart v3.59.0
Datadog cluster agent v7.52.0
Describe what happened:
The HPA value using a custom metric was unexpectedly twice the expected value
Describe what you expected:
The HPA value correctly matches the query and the value stored in the
datadog-custom-metrics
configmapSteps to reproduce the issue:
clusterAgent.metricsProvider.useDatadogMetrics
setI traced this down in detail within the Datadog codebase and found that everything is working correctly up until the moment that the metric is queried by Kubernetes itself. Here is an example of the response to the custom metric call:
Kubernetes appears to interpret this as an addition and adds them up instead of deduping the results here, resulting in a metric of 202m instead of the correct 101m:
Deleting the second HPA results in correct behaviour:
I'm not sure if this is technically a bug in Kubernetes itself, but it's certainly something that can be worked around in the Datadog custom metric provider. Fixing this could however have unintended consequences for users who accidentally rely on this behaviour though, so I'm not sure what the correct approach here is.
I discovered this issue while migrating to the DatadogMetric CRD (aka
clusterAgent.metricsProvider.useDatadogMetrics
) and was having difficulty determining why I was seeing different results for what should be an identical query to Datadog.Additional environment details (Operating System, Cloud provider, etc):
Kubernetes: Azure (AKS)
OS: Azure Linux (formerly CBL-Mariner)
The text was updated successfully, but these errors were encountered: