Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Custom metrics values add up if multiple HPAs use the same metric name #24838

Open
MattJeanes opened this issue Apr 18, 2024 · 0 comments

Comments

@MattJeanes
Copy link

MattJeanes commented Apr 18, 2024

Agent Environment
Kubernetes v1.29.2
Datadog Helm chart v3.59.0
Datadog cluster agent v7.52.0

Describe what happened:

The HPA value using a custom metric was unexpectedly twice the expected value

Describe what you expected:

The HPA value correctly matches the query and the value stored in the datadog-custom-metrics configmap

Steps to reproduce the issue:

  • Deploy datadog Helm chart v3.59.0 with custom metrics enabled but not clusterAgent.metricsProvider.useDatadogMetrics set
  • Create two HPAs using an external metric with any metric name, but ensure they are the same in both HPAs
  • Observe that the HPA value is double what it should be, adding a third HPA will triple the original value

I traced this down in detail within the Datadog codebase and found that everything is working correctly up until the moment that the metric is queried by Kubernetes itself. Here is an example of the response to the custom metric call:

// kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/example.service.requests"
{
    "kind": "ExternalMetricValueList",
    "apiVersion": "external.metrics.k8s.io/v1beta1",
    "metadata": {},
    "items": [
        {
            "metricName": "Example.Service.Requests",
            "metricLabels": {
                "datacenter": "mj015"
            },
            "timestamp": "2024-04-18T21:54:58Z",
            "value": "100000002n"
        },
        {
            "metricName": "Example.Service.Requests",
            "metricLabels": {
                "datacenter": "mj015"
            },
            "timestamp": "2024-04-18T21:54:58Z",
            "value": "100000002n"
        }
    ]
}

Kubernetes appears to interpret this as an addition and adds them up instead of deduping the results here, resulting in a metric of 202m instead of the correct 101m:

> kubectl get hpa
NAME                 REFERENCE                              TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
example-hpa          Deployment/example-deployment          202m/1 (avg)   1         5         1          79m
example-second-hpa   Deployment/example-second-deployment   202m/1 (avg)   1         5         1          79m

Deleting the second HPA results in correct behaviour:

// kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/example.service.requests"
{
    "kind": "ExternalMetricValueList",
    "apiVersion": "external.metrics.k8s.io/v1beta1",
    "metadata": {},
    "items": [
        {
            "metricName": "Example.Service.Requests",
            "metricLabels": {
                "datacenter": "mj015"
            },
            "timestamp": "2024-04-18T22:08:01Z",
            "value": "100000002n"
        }
    ]
}
> kubectl get hpa
NAME          REFERENCE                       TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
example-hpa   Deployment/example-deployment   101m/1 (avg)   1         5         1          105m

I'm not sure if this is technically a bug in Kubernetes itself, but it's certainly something that can be worked around in the Datadog custom metric provider. Fixing this could however have unintended consequences for users who accidentally rely on this behaviour though, so I'm not sure what the correct approach here is.

I discovered this issue while migrating to the DatadogMetric CRD (aka clusterAgent.metricsProvider.useDatadogMetrics) and was having difficulty determining why I was seeing different results for what should be an identical query to Datadog.

Additional environment details (Operating System, Cloud provider, etc):

Kubernetes: Azure (AKS)
OS: Azure Linux (formerly CBL-Mariner)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant