Use libdrm_amdgpu as an alternative GPU load information source #925

bsolos · 2023-02-15T13:08:53Z

Use libdrm_amdgpu to calculate the GPU load. This should resolve #923.

The GPU load calculation method was inspired by radeontop, but no code in this PR was copied from there.

RPINerd

I don't have a vega card to physically test this fork, but the general structure looks good and I didn't find anything basic like spelling errors 😃

bsolos · 2023-02-18T19:26:26Z

It works properly on my machine, and I don't have any other Vega cards to test it on. Actually, it should also work on most non-Vega AMD GPUs

jackun · 2023-03-04T19:36:55Z

#703

Also you'd need drm master authorization, with secondary GPUs at least.

meson.build

src/amdgpu.cpp

src/amdgpu_libdrm.cpp

src/amdgpu_libdrm.h

src/gpu.cpp

evelikov · 2023-03-04T20:13:42Z

Also you'd need drm master authorization, with secondary GPUs at least.

This should not be needed, if one uses the renderD node - seems like currently the card one is being used.

If the card node is already opened you can the fd with drmGetRenderDeviceNameFromFd(). Alternatively drmGetDevices2() gives you all devices, find the needed one by matching the card node and use the render node. This MR does something different but should give you a good starting point.

evelikov · 2023-03-04T20:16:51Z

@bsolos overall I would encourage you to open a bug at the AMDGPU gitlab, clearly describe the issue (exact kernel, version, distro, use-case, etc) + CC Alex Deucher aka @agd5f

This is a great workaround, but the upstream gpu metrics should really be fixed.

bsolos · 2023-03-06T14:23:21Z

I didn't open an issue there because https://gitlab.freedesktop.org/drm/amd/-/issues/1932 is already open. It seemed like there was no progress in the last 10 months, so I thought that this workaround might be beneficial to MangoHud. Should I still open a new issue?

evelikov · 2023-03-06T14:35:42Z

One should not assume that devs don't care about issues, just because there's no update. Sometimes they have higher/other priorities, sometimes it fall through the cracks.

By opening/prodding you'll increase visibility and raise severity. If you can test kernel patches, it's more likely that devs will try to get fixed faster. Sitting quietly does not help, I'm afraid.

src/amdgpu_libdrm.cpp

agd5f · 2023-03-06T17:55:18Z

Wouldn't something like: https://www.kernel.org/doc/html/latest/gpu/drm-usage-stats.html make more sense then polling hardware registers? Plus it's cross-vendor. Alex

…

On Mon, Mar 6, 2023 at 12:05 PM bsolos ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/amdgpu_libdrm.cpp <#925 (comment)> : > @@ -51,6 +52,13 @@ static int libdrm_initialize() { return -1; } + char *renderD = drmGetRenderDeviceNameFromFd(fd); + fd = open(renderD, O_RDWR); Sorry, I've never really worked with libdrm before. Will fix shortly — Reply to this email directly, view it on GitHub <#925 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVKS5D2PTIWVGQAQJKITL43W2YKPDANCNFSM6AAAAAAU42PTOE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

bsolos · 2023-03-06T19:01:18Z

@bsolos overall I would encourage you to open a bug at the AMDGPU gitlab, clearly describe the issue (exact kernel, version, distro, use-case, etc) + CC Alex Deucher aka @agd5f

This is a great workaround, but the upstream gpu metrics should really be fixed.

Seems like the load sensor isn't supported on the hardware level

src/meson.build

src/amdgpu_libdrm.h

src/amdgpu_libdrm.cpp

meson.build

Rebase onto master

…config

agd5f · 2023-03-07T13:49:47Z

The other problem with polling registers is that it keeps the GPU awake using more power. The driver has to disable gfxoff when you read back registers. Alex

…

On Mon, Mar 6, 2023 at 12:55 PM Alex Deucher ***@***.***> wrote: Wouldn't something like: https://www.kernel.org/doc/html/latest/gpu/drm-usage-stats.html make more sense then polling hardware registers? Plus it's cross-vendor. Alex On Mon, Mar 6, 2023 at 12:05 PM bsolos ***@***.***> wrote: > ***@***.**** commented on this pull request. > ------------------------------ > > In src/amdgpu_libdrm.cpp > <#925 (comment)> > : > > > @@ -51,6 +52,13 @@ static int libdrm_initialize() { > return -1; > } > > + char *renderD = drmGetRenderDeviceNameFromFd(fd); > + fd = open(renderD, O_RDWR); > > Sorry, I've never really worked with libdrm before. Will fix shortly > > — > Reply to this email directly, view it on GitHub > <#925 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AVKS5D2PTIWVGQAQJKITL43W2YKPDANCNFSM6AAAAAAU42PTOE> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

bsolos · 2023-03-07T14:00:11Z

drmGetStats doesnt seem to work on the renderD node, and using it on the primary node always gives stats.count=0. Maybe I have to authenticate first?

evelikov · 2023-03-07T14:06:42Z

@agd5f perhaps a in-tree kernel AMDGPU doc outlining the preferred options and their caveats will be great. Something people can keep an eye on, as things evolve - say team introduces new method do fetch X, or method Y has issues (aka gfxoff issue mentioned), approach Z might be deprecated (ETA, reason), etc.

evelikov · 2023-03-07T14:08:20Z

@bsolos drmGetStats is legacy API and should not be used. As the in-kernel comment says "getstats is defunct, just clear"

bsolos · 2023-03-07T14:39:48Z

This makes sense now. It seems like finding what is the correct way is much more difficult than I thought

I use the register-polling approach because that's what radeontop does, and it works

agd5f · 2023-03-07T17:40:24Z

For reference on how to use the fdinfo interface see: https://www.spinics.net/lists/intel-gfx/msg294401.html

…

On Tue, Mar 7, 2023 at 9:39 AM bsolos ***@***.***> wrote: This makes sense now. It seems like finding what is the correct way is much more difficult than I thought — Reply to this email directly, view it on GitHub <#925 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVKS5D4NGX6GOY7LJECQU3TW25CD5ANCNFSM6AAAAAAU42PTOE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

evelikov · 2023-03-07T21:13:11Z

@bsolos the site/link is down see https://patchwork.freedesktop.org/series/102175/

@agd5f does that interface provide system-wise statistics? it seems to be per-client and per-fd, where mangohud exposes the total system data. Technically one could iterating over /proc/foo/fdinfo for the total, assuming they have permissions - yet mangohud should not be run as root.

agd5f · 2023-03-08T14:48:29Z

Yes, it's per client. Similar to top for the CPU. Alex

…

On Tue, Mar 7, 2023 at 4:13 PM Emil Velikov ***@***.***> wrote: @bsolos <https://github.com/bsolos> the site/link is down see https://patchwork.freedesktop.org/series/102175/ @agd5f <https://github.com/agd5f> does that interface provide system-wise statistics? it seems to be per-client and per-fd, where mangohud exposes the total system data. Technically one could iterating over /proc/foo/fdinfo for the total, assuming they have permissions - yet mangohud should not be run as root. — Reply to this email directly, view it on GitHub <#925 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVKS5D4K2XUZJ3ATIBPG3KTW26QHHANCNFSM6AAAAAAU42PTOE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Umio-Yasuno · 2023-04-03T19:01:59Z

~~Hello.~~
~~I am developing amdgpu_top.~~
~~amdgpu_top has simple fdinfo parser and performance counters (GRBM, GRBM2, CP_STAT) readings and sensor readings implemented.~~
~~Would it help you?~~

Umio-Yasuno · 2023-04-03T22:58:34Z

~~working branch: https://github.com/Umio-Yasuno/amdgpu_top/tree/json-output~~

RPINerd approved these changes Feb 18, 2023

View reviewed changes

evelikov reviewed Mar 4, 2023

View reviewed changes

meson.build Outdated Show resolved Hide resolved

evelikov reviewed Mar 4, 2023

View reviewed changes

src/amdgpu.cpp Outdated Show resolved Hide resolved

evelikov reviewed Mar 4, 2023

View reviewed changes

src/amdgpu_libdrm.cpp Outdated Show resolved Hide resolved

evelikov reviewed Mar 4, 2023

View reviewed changes

src/amdgpu_libdrm.h Outdated Show resolved Hide resolved

evelikov reviewed Mar 4, 2023

View reviewed changes

src/gpu.cpp Outdated Show resolved Hide resolved

bsolos marked this pull request as draft March 6, 2023 15:02

bsolos marked this pull request as ready for review March 6, 2023 16:00

evelikov reviewed Mar 6, 2023

View reviewed changes

src/amdgpu_libdrm.cpp Outdated Show resolved Hide resolved