[BUG] Custom metrics values add up if multiple HPAs use the same metric name #24838

MattJeanes · 2024-04-18T22:11:39Z

Agent Environment
Kubernetes v1.29.2
Datadog Helm chart v3.59.0
Datadog cluster agent v7.52.0

Describe what happened:

The HPA value using a custom metric was unexpectedly twice the expected value

Describe what you expected:

The HPA value correctly matches the query and the value stored in the datadog-custom-metrics configmap

Steps to reproduce the issue:

Deploy datadog Helm chart v3.59.0 with custom metrics enabled but not clusterAgent.metricsProvider.useDatadogMetrics set
Create two HPAs using an external metric with any metric name, but ensure they are the same in both HPAs
Observe that the HPA value is double what it should be, adding a third HPA will triple the original value

I traced this down in detail within the Datadog codebase and found that everything is working correctly up until the moment that the metric is queried by Kubernetes itself. Here is an example of the response to the custom metric call:

// kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/example.service.requests"
{
    "kind": "ExternalMetricValueList",
    "apiVersion": "external.metrics.k8s.io/v1beta1",
    "metadata": {},
    "items": [
        {
            "metricName": "Example.Service.Requests",
            "metricLabels": {
                "datacenter": "mj015"
            },
            "timestamp": "2024-04-18T21:54:58Z",
            "value": "100000002n"
        },
        {
            "metricName": "Example.Service.Requests",
            "metricLabels": {
                "datacenter": "mj015"
            },
            "timestamp": "2024-04-18T21:54:58Z",
            "value": "100000002n"
        }
    ]
}

Kubernetes appears to interpret this as an addition and adds them up instead of deduping the results here, resulting in a metric of 202m instead of the correct 101m:

> kubectl get hpa
NAME                 REFERENCE                              TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
example-hpa          Deployment/example-deployment          202m/1 (avg)   1         5         1          79m
example-second-hpa   Deployment/example-second-deployment   202m/1 (avg)   1         5         1          79m

Deleting the second HPA results in correct behaviour:

// kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/example.service.requests"
{
    "kind": "ExternalMetricValueList",
    "apiVersion": "external.metrics.k8s.io/v1beta1",
    "metadata": {},
    "items": [
        {
            "metricName": "Example.Service.Requests",
            "metricLabels": {
                "datacenter": "mj015"
            },
            "timestamp": "2024-04-18T22:08:01Z",
            "value": "100000002n"
        }
    ]
}

> kubectl get hpa
NAME          REFERENCE                       TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
example-hpa   Deployment/example-deployment   101m/1 (avg)   1         5         1          105m

I'm not sure if this is technically a bug in Kubernetes itself, but it's certainly something that can be worked around in the Datadog custom metric provider. Fixing this could however have unintended consequences for users who accidentally rely on this behaviour though, so I'm not sure what the correct approach here is.

I discovered this issue while migrating to the DatadogMetric CRD (aka clusterAgent.metricsProvider.useDatadogMetrics) and was having difficulty determining why I was seeing different results for what should be an identical query to Datadog.

Additional environment details (Operating System, Cloud provider, etc):

Kubernetes: Azure (AKS)
OS: Azure Linux (formerly CBL-Mariner)

The text was updated successfully, but these errors were encountered:

MattJeanes added the team/triage label Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Custom metrics values add up if multiple HPAs use the same metric name #24838

[BUG] Custom metrics values add up if multiple HPAs use the same metric name #24838

MattJeanes commented Apr 18, 2024 •

edited

[BUG] Custom metrics values add up if multiple HPAs use the same metric name #24838

[BUG] Custom metrics values add up if multiple HPAs use the same metric name #24838

Comments

MattJeanes commented Apr 18, 2024 • edited

MattJeanes commented Apr 18, 2024 •

edited