Reintroduce `gpuRequestAverage` and `gpuUsageAverage` to the Allocation API Schema #1787

thomasvn · 2022-11-04T17:59:31Z

What problem are you trying to solve?

Currently when calling the Allocation API (example below), we are only returned gpuCount, gpuHours, gpuCost, and gpuCostAdjustment.

To get a better understanding of how the GPUs are actually used, it would be nice to add gpuRequestAverage and gpuUsageAverage to the Allocation API Schema (similar to how cpuCoreRequestAverage and cpuCoreUsageAverage exist in the schema)

/model/allocation?window=yesterday&accumulate=true&shareIdle=true&reconcile=true&idleByNode=true&aggregate=node

Background

This feature was previously introduced in PR #944, however it needed to be rolled back in PR #965.

Given that Kubecost now has a backwards-compatible ETL PR #624, this new Allocation Schema can be re-introduced.

┆Issue is synchronized with this Jira Task by Unito

The text was updated successfully, but these errors were encountered:

AjayTripathy · 2022-11-08T23:13:15Z

Just as a note, those aren't standalone; they require deploying a metric collector to run that isn't straightforward. I'm happy to bring those back, just know that getting the relevant exporters running on clusters can be a challenge. I think we can look to bring this back in an alpha state though for v1.99 and provide a guide for spinning up the gpu usage exporters. @kaelanspatel thoughts?

kaelanspatel · 2022-11-08T23:26:57Z

Agree for reintroduce, would be nice to support.

Like Ajay says, this was initially designed to use the DCGM_FI_DEV_GPU_UTIL metric from the Nvidia DCGM. Old setup document here. If we find some time, we might want to look at other metrics and/or the state of the DCGM, the DCGM_FI_DEV_GPU_UTIL metric was moved to off by default in the DCGM and I wonder if there's been an alternative since for a raw usage number.

AjayTripathy · 2022-11-17T16:41:25Z

#1805 FYI when we reintroduce. Need to scrape the DCGM exporter by default for ease of setup.

Howlla · 2023-08-30T23:39:56Z

Is DCGM enabled by default now?

thomasvn · 2023-09-01T20:15:08Z

@Howlla Feature to scrape & process DCGM metrics not currently available

worr · 2024-03-19T12:22:57Z

Howdy! Are there any plans to reintroduce this feature in the near future? It's something I know we'd find quite useful.

chipzoller · 2024-05-09T15:57:25Z

This is planned in opencost/opencost#2731. Closing this issue as it is not Helm specific.

thomasvn added the enhancement New feature or request label Nov 4, 2022

github-actions bot added the needs-triage label Nov 4, 2022

Adam-Stack-PM added api and removed needs-triage labels Nov 4, 2022

thomasvn mentioned this issue Nov 19, 2022

Adjust bundled Prometheus to scrape for only essential metrics #1805

Merged

AjayTripathy added the v1.100-proposal label Dec 19, 2022

chipzoller removed api labels May 1, 2024

thomasvn mentioned this issue May 3, 2024

Reintroduce GPU Usage & Efficiency opencost/opencost#2731

Open

chipzoller closed this as completed May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reintroduce `gpuRequestAverage` and `gpuUsageAverage` to the Allocation API Schema #1787

Reintroduce `gpuRequestAverage` and `gpuUsageAverage` to the Allocation API Schema #1787

thomasvn commented Nov 4, 2022 •

edited by sync-by-unito bot

AjayTripathy commented Nov 8, 2022

kaelanspatel commented Nov 8, 2022

AjayTripathy commented Nov 17, 2022

Howlla commented Aug 30, 2023

thomasvn commented Sep 1, 2023

worr commented Mar 19, 2024

chipzoller commented May 9, 2024

Reintroduce gpuRequestAverage and gpuUsageAverage to the Allocation API Schema #1787

Reintroduce gpuRequestAverage and gpuUsageAverage to the Allocation API Schema #1787

Comments

thomasvn commented Nov 4, 2022 • edited by sync-by-unito bot

What problem are you trying to solve?

Background

AjayTripathy commented Nov 8, 2022

kaelanspatel commented Nov 8, 2022

AjayTripathy commented Nov 17, 2022

Howlla commented Aug 30, 2023

thomasvn commented Sep 1, 2023

worr commented Mar 19, 2024

chipzoller commented May 9, 2024

Reintroduce `gpuRequestAverage` and `gpuUsageAverage` to the Allocation API Schema #1787

Reintroduce `gpuRequestAverage` and `gpuUsageAverage` to the Allocation API Schema #1787

thomasvn commented Nov 4, 2022 •

edited by sync-by-unito bot