Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use libdrm_amdgpu as an alternative GPU load information source #925

Open
wants to merge 31 commits into
base: master
Choose a base branch
from

Conversation

bsolos
Copy link

@bsolos bsolos commented Feb 15, 2023

Use libdrm_amdgpu to calculate the GPU load. This should resolve #923.

The GPU load calculation method was inspired by radeontop, but no code in this PR was copied from there.

Copy link
Contributor

@RPINerd RPINerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a vega card to physically test this fork, but the general structure looks good and I didn't find anything basic like spelling errors 😃

@bsolos
Copy link
Author

bsolos commented Feb 18, 2023

It works properly on my machine, and I don't have any other Vega cards to test it on. Actually, it should also work on most non-Vega AMD GPUs

@jackun
Copy link
Collaborator

jackun commented Mar 4, 2023

#703

Also you'd need drm master authorization, with secondary GPUs at least.

meson.build Outdated Show resolved Hide resolved
src/amdgpu.cpp Outdated Show resolved Hide resolved
src/amdgpu_libdrm.cpp Outdated Show resolved Hide resolved
src/amdgpu_libdrm.h Outdated Show resolved Hide resolved
src/gpu.cpp Outdated Show resolved Hide resolved
@evelikov
Copy link
Contributor

evelikov commented Mar 4, 2023

Also you'd need drm master authorization, with secondary GPUs at least.

This should not be needed, if one uses the renderD node - seems like currently the card one is being used.

If the card node is already opened you can the fd with drmGetRenderDeviceNameFromFd(). Alternatively drmGetDevices2() gives you all devices, find the needed one by matching the card node and use the render node. This MR does something different but should give you a good starting point.

@evelikov
Copy link
Contributor

evelikov commented Mar 4, 2023

@bsolos overall I would encourage you to open a bug at the AMDGPU gitlab, clearly describe the issue (exact kernel, version, distro, use-case, etc) + CC Alex Deucher aka @agd5f

This is a great workaround, but the upstream gpu metrics should really be fixed.

@bsolos
Copy link
Author

bsolos commented Mar 6, 2023

I didn't open an issue there because https://gitlab.freedesktop.org/drm/amd/-/issues/1932 is already open. It seemed like there was no progress in the last 10 months, so I thought that this workaround might be beneficial to MangoHud. Should I still open a new issue?

@evelikov
Copy link
Contributor

evelikov commented Mar 6, 2023

One should not assume that devs don't care about issues, just because there's no update. Sometimes they have higher/other priorities, sometimes it fall through the cracks.

By opening/prodding you'll increase visibility and raise severity. If you can test kernel patches, it's more likely that devs will try to get fixed faster. Sitting quietly does not help, I'm afraid.

@bsolos bsolos marked this pull request as draft March 6, 2023 15:02
@bsolos bsolos marked this pull request as ready for review March 6, 2023 16:00
src/amdgpu_libdrm.cpp Outdated Show resolved Hide resolved
@agd5f
Copy link

agd5f commented Mar 6, 2023 via email

@bsolos
Copy link
Author

bsolos commented Mar 6, 2023

@bsolos overall I would encourage you to open a bug at the AMDGPU gitlab, clearly describe the issue (exact kernel, version, distro, use-case, etc) + CC Alex Deucher aka @agd5f

This is a great workaround, but the upstream gpu metrics should really be fixed.

Seems like the load sensor isn't supported on the hardware level

src/meson.build Outdated Show resolved Hide resolved
src/amdgpu_libdrm.h Outdated Show resolved Hide resolved
src/amdgpu_libdrm.cpp Outdated Show resolved Hide resolved
meson.build Outdated Show resolved Hide resolved
@agd5f
Copy link

agd5f commented Mar 7, 2023 via email

@bsolos
Copy link
Author

bsolos commented Mar 7, 2023

drmGetStats doesnt seem to work on the renderD node, and using it on the primary node always gives stats.count=0. Maybe I have to authenticate first?

@evelikov
Copy link
Contributor

evelikov commented Mar 7, 2023

@agd5f perhaps a in-tree kernel AMDGPU doc outlining the preferred options and their caveats will be great. Something people can keep an eye on, as things evolve - say team introduces new method do fetch X, or method Y has issues (aka gfxoff issue mentioned), approach Z might be deprecated (ETA, reason), etc.

@evelikov
Copy link
Contributor

evelikov commented Mar 7, 2023

@bsolos drmGetStats is legacy API and should not be used. As the in-kernel comment says "getstats is defunct, just clear"

@bsolos
Copy link
Author

bsolos commented Mar 7, 2023

This makes sense now. It seems like finding what is the correct way is much more difficult than I thought

I use the register-polling approach because that's what radeontop does, and it works

@agd5f
Copy link

agd5f commented Mar 7, 2023 via email

@evelikov
Copy link
Contributor

evelikov commented Mar 7, 2023

@bsolos the site/link is down see https://patchwork.freedesktop.org/series/102175/

@agd5f does that interface provide system-wise statistics? it seems to be per-client and per-fd, where mangohud exposes the total system data. Technically one could iterating over /proc/foo/fdinfo for the total, assuming they have permissions - yet mangohud should not be run as root.

@agd5f
Copy link

agd5f commented Mar 8, 2023 via email

@Umio-Yasuno
Copy link

Umio-Yasuno commented Apr 3, 2023

Hello.
I am developing amdgpu_top.
amdgpu_top has simple fdinfo parser and performance counters (GRBM, GRBM2, CP_STAT) readings and sensor readings implemented.
Would it help you?

@Umio-Yasuno
Copy link

Umio-Yasuno commented Apr 3, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GPU utilization always at 100% on Vega 3
6 participants