Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduce gpuRequestAverage and gpuUsageAverage to the Allocation API Schema #1787

Closed
thomasvn opened this issue Nov 4, 2022 · 7 comments · May be fixed by opencost/opencost#2731
Closed
Labels
enhancement New feature or request

Comments

@thomasvn
Copy link
Member

thomasvn commented Nov 4, 2022

What problem are you trying to solve?

Currently when calling the Allocation API (example below), we are only returned gpuCount, gpuHours, gpuCost, and gpuCostAdjustment.

To get a better understanding of how the GPUs are actually used, it would be nice to add gpuRequestAverage and gpuUsageAverage to the Allocation API Schema (similar to how cpuCoreRequestAverage and cpuCoreUsageAverage exist in the schema)

/model/allocation?window=yesterday&accumulate=true&shareIdle=true&reconcile=true&idleByNode=true&aggregate=node

Background

This feature was previously introduced in PR #944, however it needed to be rolled back in PR #965.

Given that Kubecost now has a backwards-compatible ETL PR #624, this new Allocation Schema can be re-introduced.

┆Issue is synchronized with this Jira Task by Unito

@thomasvn thomasvn added the enhancement New feature or request label Nov 4, 2022
@AjayTripathy
Copy link
Contributor

Just as a note, those aren't standalone; they require deploying a metric collector to run that isn't straightforward. I'm happy to bring those back, just know that getting the relevant exporters running on clusters can be a challenge. I think we can look to bring this back in an alpha state though for v1.99 and provide a guide for spinning up the gpu usage exporters. @kaelanspatel thoughts?

@kaelanspatel
Copy link
Contributor

Agree for reintroduce, would be nice to support.

Like Ajay says, this was initially designed to use the DCGM_FI_DEV_GPU_UTIL metric from the Nvidia DCGM. Old setup document here. If we find some time, we might want to look at other metrics and/or the state of the DCGM, the DCGM_FI_DEV_GPU_UTIL metric was moved to off by default in the DCGM and I wonder if there's been an alternative since for a raw usage number.

@AjayTripathy
Copy link
Contributor

#1805 FYI when we reintroduce. Need to scrape the DCGM exporter by default for ease of setup.

@Howlla
Copy link

Howlla commented Aug 30, 2023

Is DCGM enabled by default now?

@thomasvn
Copy link
Member Author

thomasvn commented Sep 1, 2023

@Howlla Feature to scrape & process DCGM metrics not currently available

@worr
Copy link

worr commented Mar 19, 2024

Howdy! Are there any plans to reintroduce this feature in the near future? It's something I know we'd find quite useful.

@chipzoller
Copy link
Collaborator

This is planned in opencost/opencost#2731. Closing this issue as it is not Helm specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants