Avoid gpu wakeup in hwdec_cuda by deferring cuInit #14028

jrelvas-ipc · 2024-04-30T11:33:02Z

cuInit wakes up the nvidia dgpu on nvidia laptops. This is bad news because the wake up process
is blocking and takes a few seconds. It also needlessly increases power consumption.

Sometimes, a VO loads several hwdecs (like dmabuf_wayland). When cuda is loaded, it calls
cuInit before running all interop inits. However, the first checks in the interops do not
require cuda initialization, so we only need to call cuInit after those checks.

cuInit is handled by the new cuda_priv_init function. It ensures cuInit is only called once.

With these changes, there's no cuda initialization if no OpenGL/Vulkan backend is available. This prevents
dmabuf_wayland and other VOs which automatically load cuda from waking up the nvidia dgpu unnecessarily,
making them start faster and decreasing power consumption on laptops.

Fixes: #13668

video/out/hwdec/hwdec_cuda.c

github-actions · 2024-04-30T11:44:47Z

Download the artifacts for this pull request:

Windows

macOS

video/out/hwdec/hwdec_cuda.c

jrelvas-ipc · 2024-04-30T11:55:16Z

Fixed signing and commit name

video/out/hwdec/hwdec_cuda.c

na-na-hi · 2024-04-30T12:20:00Z

I don't like how this is implemented. The whole point is to check whether the RA is GL or PL before processing, right? Can't we do something like this?

typedef struct cuda_interop_fn {
    bool (*check)(const struct ra_hwdec *hw);
    bool (*init)(const struct ra_hwdec *hw);
};

bool check_valid = false;
for (int i = 0; interop_fns[i]; i++) {
    if (interop_fns[i]->check(hw)) {
        check_valid = true;
        break;
    }
}

if (check_valid)
    ret = CHECK_CU(cu->cuInit(0));

// Do the rest of autoprobing and init here

jrelvas-ipc · 2024-04-30T12:23:18Z

I don't like how this is implemented. The whole point is to check whether the RA is GL or PL before processing, right? Can't we do something like this?

Yeah, there's no problem with that, I considered implementing that way too, I just wasn't sure which method would be the preferred. I can make a new PR implementing that logic instead.

na-na-hi · 2024-04-30T12:26:13Z

I'd prefer this way if others think this approach is OK. No need to open another PR. You can just rebase this one.

jrelvas-ipc · 2024-04-30T12:27:09Z

...actually, it might be desirable to fallback to another interop if the init fails in one of them.

Right now, we keep trying until one of the interop inits returns true. In your implementation, if the check returns but init doesn't, then there's no fallback.

na-na-hi · 2024-04-30T12:30:34Z

You can just check if all check functions fail. Only call cuInit if fn isn't NULL after the checks, and then do the usual autoprobing.

jrelvas-ipc · 2024-04-30T12:35:59Z

So, is it fine to just return an error if the check is successful but the init isn't?

I was just worried since the current behavior is to go through all valid inits.

na-na-hi · 2024-04-30T12:38:36Z

I meant you only do early exit if none of the checks are successful. Otherwise proceed as before and try initialize ALL interops regardless.

jrelvas-ipc · 2024-04-30T12:41:33Z

Ah I see. That should work well as long as the interop init has no dependency on its check.

I guess that could easily be fixed by calling check within init itself as well.

jrelvas-ipc · 2024-04-30T12:51:17Z

Here's how it stands then:

try all checks
if no checks are successful -> exit
initialize cuda
try all interop inits (includes check)

--

I could also put it all in a single loop and initialize cuda after the first successful check.

Andarwinux · 2024-04-30T14:00:33Z

This caused crash on win32 when hwdec auto switch with gpu-context switch.

cmd:
mpv --no-config --gpu-context=d3d11 --hwdec=d3d11va,nvdec --vo=gpu-next

set gpu-context winvk

backtrace

(3e0c.4f64): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
mpv!cuda_priv_init+0xd08:
00007ff7`536eda38 498b0424        mov     rax,qword ptr [r12] ds:00000000`00000000=????????????????
0:017> g
(3e0c.4f64): Access violation - code c0000005 (!!! second chance !!!)
mpv!cuda_priv_init+0xd08:
00007ff7`536eda38 498b0424        mov     rax,qword ptr [r12] ds:00000000`00000000=????????????????
0:017> g
(3e0c.4f64): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
mpv!cuda_priv_init+0xd08:
00007ff7`536eda38 498b0424        mov     rax,qword ptr [r12] ds:00000000`00000000=????????????????
0:017> g
(3e0c.4f64): Access violation - code c0000005 (!!! second chance !!!)
mpv!cuda_priv_init+0xd08:
00007ff7`536eda38 498b0424        mov     rax,qword ptr [r12] ds:00000000`00000000=????????????????
0:017> !analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************


KEY_VALUES_STRING: 1

    Key  : AV.Dereference
    Value: NullPtr

    Key  : AV.Fault
    Value: Read

    Key  : Analysis.CPU.mSec
    Value: 1233

    Key  : Analysis.Elapsed.mSec
    Value: 2488

    Key  : Analysis.IO.Other.Mb
    Value: 0

    Key  : Analysis.IO.Read.Mb
    Value: 85

    Key  : Analysis.IO.Write.Mb
    Value: 84

    Key  : Analysis.Init.CPU.mSec
    Value: 1203

    Key  : Analysis.Init.Elapsed.mSec
    Value: 55657

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 260

    Key  : Failure.Bucket
    Value: NULL_POINTER_READ_c0000005_mpv.exe!cuda_priv_init

    Key  : Failure.Hash
    Value: {5878f643-fc30-6545-e7ad-7aca85b61bd1}

    Key  : Timeline.OS.Boot.DeltaSec
    Value: 199568

    Key  : Timeline.Process.Start.DeltaSec
    Value: 55

    Key  : WER.OS.Branch
    Value: rs_prerelease

    Key  : WER.OS.Version
    Value: 10.0.26040.1000

    Key  : WER.Process.Version
    Value: 2.0.0.0


NTGLOBALFLAG:  0

APPLICATION_VERIFIER_FLAGS:  0

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ff7536eda38 (mpv!cuda_priv_init+0x0000000000000d08)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 0000000000000000
Attempt to read from address 0000000000000000

FAULTING_THREAD:  00004f64

PROCESS_NAME:  mpv.exe

READ_ADDRESS:  0000000000000000 

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s.

EXCEPTION_CODE_STR:  c0000005

EXCEPTION_PARAMETER1:  0000000000000000

EXCEPTION_PARAMETER2:  0000000000000000

FAULTING_LOCAL_VARIABLE_NAME:  p

STACK_TEXT:  
000000c3`6d7fea20 00007ff7`536ee64e     : 00000000`000000a0 000001ff`008c4440 00000000`00000000 000001ff`008c4440 : mpv!cuda_priv_init+0xd08
000000c3`6d7fea80 00007ff7`536edb1e     : 00000000`00000000 00000000`00000000 000001ff`0675d730 000001ff`0675d730 : mpv!cuda_vk_init+0x9e
000000c3`6d7feb00 00007ff7`526978cf     : 000000c3`6d7fe6e0 00000000`00000002 00002242`d90ddb82 00007ff7`565583a6 : mpv!cuda_init+0x2e
000000c3`6d7feb60 00007ff7`5269778e     : 00000000`00000000 00000000`00000000 000000c3`6d7fec18 000000c3`6c7feb80 : mpv!ra_hwdec_load_driver+0xcf
000000c3`6d7febe0 00007ff7`526974f0     : 00000000`00000000 00000000`00000000 00000000`00000010 000000c3`6d7fec60 : mpv!load_add_hwdec+0x4e
000000c3`6d7fec40 00007ff7`52695dff     : 00000000`00000008 00007ffc`8e553b15 00000000`00100008 000001fe`fdd70000 : mpv!ra_hwdec_ctx_load_fmt+0xf0
000000c3`6d7feca0 00007ff7`52c15887     : 000001ff`d7bfe760 00000000`0000022c 00000000`00000000 000000c3`6d7ff5d0 : mpv!control+0x5cf
000000c3`6d7ff5c0 00007ff7`52668d82     : 00000000`00000000 000001ff`ad8fff90 000f423f`fff0bdc0 000000ee`63d60e50 : mpv!run_control+0x47
000000c3`6d7ff630 00007ff7`525b3451     : 00000000`00000000 000001ff`bd5ff630 00000000`00000000 00000000`00000000 : mpv!mp_dispatch_queue_process+0x372
000000c3`6d7ff6c0 00007ffc`8bd4bcc0     : 00000000`00000000 000001ff`00fdb850 00000000`00000000 00000000`00000000 : mpv!vo_thread+0x1e1
000000c3`6d7ff8e0 00007ffc`8da98d17     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x30
000000c3`6d7ff910 00007ffc`8e538260     : 00000000`00000001 000000c3`6d8ff000 00000000`00000000 00000000`00000000 : KERNEL32!BaseThreadInitThunk+0x17
000000c3`6d7ff940 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x20


FAULTING_SOURCE_LINE:  /build/packages/mpv-prefix/src/mpv-build/../../../../src_packages/mpv/video/out/hwdec/hwdec_cuda.c

FAULTING_SOURCE_FILE:  /build/packages/mpv-prefix/src/mpv-build/../../../../src_packages/mpv/video/out/hwdec/hwdec_cuda.c

FAULTING_SOURCE_LINE_NUMBER:  80

FAULTING_SOURCE_CODE:  
No source found for '/build/packages/mpv-prefix/src/mpv-build/../../../../src_packages/mpv/video/out/hwdec/hwdec_cuda.c'


SYMBOL_NAME:  mpv!cuda_priv_init+d08

MODULE_NAME: mpv

IMAGE_NAME:  mpv.exe

STACK_COMMAND:  dt ntdll!LdrpLastDllInitializer BaseDllName ; dt ntdll!LdrpFailureData ; ~17s ; .cxr ; kb

FAILURE_BUCKET_ID:  NULL_POINTER_READ_c0000005_mpv.exe!cuda_priv_init

OS_VERSION:  10.0.26040.1000

BUILDLAB_STR:  rs_prerelease

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

IMAGE_VERSION:  2.0.0.0

FAILURE_ID_HASH:  {5878f643-fc30-6545-e7ad-7aca85b61bd1}

Followup:     MachineOwner
---------

jrelvas-ipc · 2024-04-30T14:10:34Z

Hmm, I'm rebasing this PR later this day so it implements the check suggestion. It should be less bug prone, so hopefully you stop getting that access violation crash. @Andarwinux

Dudemanguy · 2024-04-30T14:15:38Z

Can't you just use ra_is_wldmabuf to check this?

Andarwinux · 2024-04-30T14:31:55Z

This also makes sense for d3d11, where users obviously don't want to accidentally wake up dGPU while playing video on laptops using iGPU via d3d11, but it should also be taken into account that mpv may change gpu-context/vo/hwdec at runtime.

jrelvas-ipc · 2024-04-30T20:23:35Z

Rebased. Initial interop checks have been split into check. CUDA initialization only happens after the first successful check.

video/out/hwdec/hwdec_cuda.c

video/out/hwdec/hwdec_cuda_gl.c

Andarwinux · 2024-05-01T08:51:34Z

../../../../src_packages/mpv/video/out/hwdec/hwdec_cuda.c:62:7: error: expected expression
   62 |     &(static cuda_interop_fn) {.check = cuda_gl_check, .init = cuda_gl_init},
      |       ^
../../../../src_packages/mpv/video/out/hwdec/hwdec_cuda.c:81:5: error: function definition is not allowed here
   81 |     {
      |     ^
../../../../src_packages/mpv/video/out/hwdec/hwdec_cuda.c:106:19: error: call to undeclared function 'backend_init'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
  106 |             ret = backend_init();
      |                   ^
3 errors generated.

video/out/hwdec/hwdec_cuda.c

video/out/hwdec/hwdec_cuda.h

jrelvas-ipc · 2024-05-01T23:53:13Z

Lesson learned: Do not forget to configure meson to use clang...

video/out/hwdec/hwdec_cuda.h

video/out/hwdec/hwdec_cuda.c

Andarwinux · 2024-05-02T10:34:42Z

hwdec=nvdec doesn't work after switching gpu-context at runtime.

mpv --no-config --vo=gpu-next --gpu-context=d3d11 --hwdec=nvdec,d3d11va --msg-level=all=debug --log-file=log.txt .\8k.mkv

set gpu-context winvk

log.txt

[ 34.178][v][vd] Looking at hwdec hevc-nvdec...
[ 34.178][v][vo/gpu-next] Loading hwdec drivers for format: 'cuda'
[ 34.178][v][vo/gpu-next] Loading hwdec driver 'cuda'
[ 34.178][e][vo/gpu-next/cuda] CUDA hwdec only works with OpenGL or Vulkan backends.
[ 34.178][v][vo/gpu-next] Loading failed.
[ 34.179][v][vd] Could not create device.
[ 34.179][v][vd] Using software decoding.

jrelvas-ipc · 2024-05-02T10:39:37Z

@Andarwinux that's weird. As far as I know, this change shouldn't be changing which contexts CUDA can run in

Is this something specific with d3d11? A d3d11 cuda interop wasn't implemented by this change and it didn't exist before either

na-na-hi · 2024-05-02T11:04:31Z

Neither of the check functions ever return true. Of course it doesn't work.

jrelvas-ipc · 2024-05-02T11:13:43Z

Oh yeah... That's certainly an issue.

Not sure how I missed that honestly.

I'm rebasing this PR into a single commit soon. I'll include the fixed checks and move cuda_interop_fn structs out of hwdec_cuda.c

jrelvas-ipc · 2024-05-02T16:29:40Z

Check functions were fixed, moved cuda_interop_fn structs into extern declarations.

`cuInit` wakes up the nvidia dgpu on nvidia laptops. This is bad news because the wake up process is blocking and takes a few seconds. It also needlessly increases power consumption. Sometimes, a VO loads several hwdecs (like `dmabuf_wayland`). When `cuda` is loaded, it calls `cuInit` before running all interop inits. However, the first checks in the interops do not require cuda initialization, so we only need to call `cuInit` after those checks. This commit splits the interop `init` function into `check` and `init`. `check` can be called without initializing the Cuda backend, so cuInit is only called *after* the first interop check. With these changes, there's no cuda initialization if no OpenGL/Vulkan backend is available. This prevents `dmabuf_wayland` and other VOs which automatically load cuda from waking up the nvidia dgpu unnecessarily, making them start faster and decreasing power consumption on laptops. Fixes: mpv-player#13668

This simplifies the code and makes it easier to read.

jrelvas-ipc · 2024-05-03T08:50:57Z

Just a small change which makes the check functions more readable :)

(Should have no behavior changes)

jrelvas-ipc · 2024-05-03T09:19:17Z

@Andarwinux let me know if your issue still happens with these new fixes

Andarwinux · 2024-05-03T11:11:29Z

@Andarwinux let me know if your issue still happens with these new fixes

It's still working perfectly, thanks!

jrelvas-ipc mentioned this pull request Apr 30, 2024

dmabuf-wayland wakes up discrete NVIDIA GPU #13668

Closed

sfan5 reviewed Apr 30, 2024

View reviewed changes

video/out/hwdec/hwdec_cuda.c Outdated Show resolved Hide resolved

kasper93 reviewed Apr 30, 2024

View reviewed changes

video/out/hwdec/hwdec_cuda.c Outdated Show resolved Hide resolved

jrelvas-ipc force-pushed the master branch from a3f13b9 to 10b6961 Compare April 30, 2024 11:54

sfan5 reviewed Apr 30, 2024

View reviewed changes

video/out/hwdec/hwdec_cuda.c Outdated Show resolved Hide resolved

jrelvas-ipc force-pushed the master branch from b80b40f to 3b77fcc Compare April 30, 2024 20:21

kasper93 reviewed Apr 30, 2024

View reviewed changes

video/out/hwdec/hwdec_cuda.c Outdated Show resolved Hide resolved

jrelvas-ipc force-pushed the master branch from 3be516f to 006227b Compare April 30, 2024 22:49

philipl reviewed Apr 30, 2024

View reviewed changes

video/out/hwdec/hwdec_cuda_gl.c Outdated Show resolved Hide resolved

kasper93 added this to the Release v0.39.0 milestone May 1, 2024

philipl requested changes May 1, 2024

View reviewed changes

video/out/hwdec/hwdec_cuda.c Outdated Show resolved Hide resolved

video/out/hwdec/hwdec_cuda.c Outdated Show resolved Hide resolved

kasper93 reviewed May 1, 2024

View reviewed changes

video/out/hwdec/hwdec_cuda.h Outdated Show resolved Hide resolved

jrelvas-ipc requested a review from philipl May 1, 2024 23:52

na-na-hi reviewed May 2, 2024

View reviewed changes

video/out/hwdec/hwdec_cuda.h Outdated Show resolved Hide resolved

sfan5 reviewed May 2, 2024

View reviewed changes

video/out/hwdec/hwdec_cuda.c Show resolved Hide resolved

jrelvas-ipc force-pushed the master branch from 91d6187 to 965c2de Compare May 2, 2024 16:27

jrelvas-ipc force-pushed the master branch 2 times, most recently from 59d1371 to e9eaac6 Compare May 2, 2024 16:32

jrelvas-ipc force-pushed the master branch from e9eaac6 to 93d7c6c Compare May 2, 2024 16:33

hwdec_cuda: reduce nesting in check functions

3053b10

This simplifies the code and makes it easier to read.

philipl approved these changes May 3, 2024

View reviewed changes

kasper93 merged commit 1759d73 into mpv-player:master May 6, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid gpu wakeup in hwdec_cuda by deferring cuInit #14028

Avoid gpu wakeup in hwdec_cuda by deferring cuInit #14028

jrelvas-ipc commented Apr 30, 2024

github-actions bot commented Apr 30, 2024 •

edited

jrelvas-ipc commented Apr 30, 2024

na-na-hi commented Apr 30, 2024 •

edited

jrelvas-ipc commented Apr 30, 2024

na-na-hi commented Apr 30, 2024

jrelvas-ipc commented Apr 30, 2024

na-na-hi commented Apr 30, 2024

jrelvas-ipc commented Apr 30, 2024

na-na-hi commented Apr 30, 2024 •

edited

jrelvas-ipc commented Apr 30, 2024

jrelvas-ipc commented Apr 30, 2024 •

edited

Andarwinux commented Apr 30, 2024 •

edited

jrelvas-ipc commented Apr 30, 2024

Dudemanguy commented Apr 30, 2024

Andarwinux commented Apr 30, 2024

jrelvas-ipc commented Apr 30, 2024

Andarwinux commented May 1, 2024

jrelvas-ipc commented May 1, 2024

Andarwinux commented May 2, 2024

jrelvas-ipc commented May 2, 2024 •

edited

na-na-hi commented May 2, 2024

jrelvas-ipc commented May 2, 2024

jrelvas-ipc commented May 2, 2024

jrelvas-ipc commented May 3, 2024 •

edited

jrelvas-ipc commented May 3, 2024

Andarwinux commented May 3, 2024

Avoid gpu wakeup in hwdec_cuda by deferring cuInit #14028

Avoid gpu wakeup in hwdec_cuda by deferring cuInit #14028

Conversation

jrelvas-ipc commented Apr 30, 2024

github-actions bot commented Apr 30, 2024 • edited

jrelvas-ipc commented Apr 30, 2024

na-na-hi commented Apr 30, 2024 • edited

jrelvas-ipc commented Apr 30, 2024

na-na-hi commented Apr 30, 2024

jrelvas-ipc commented Apr 30, 2024

na-na-hi commented Apr 30, 2024

jrelvas-ipc commented Apr 30, 2024

na-na-hi commented Apr 30, 2024 • edited

jrelvas-ipc commented Apr 30, 2024

jrelvas-ipc commented Apr 30, 2024 • edited

Andarwinux commented Apr 30, 2024 • edited

jrelvas-ipc commented Apr 30, 2024

Dudemanguy commented Apr 30, 2024

Andarwinux commented Apr 30, 2024

jrelvas-ipc commented Apr 30, 2024

Andarwinux commented May 1, 2024

jrelvas-ipc commented May 1, 2024

Andarwinux commented May 2, 2024

jrelvas-ipc commented May 2, 2024 • edited

na-na-hi commented May 2, 2024

jrelvas-ipc commented May 2, 2024

jrelvas-ipc commented May 2, 2024

jrelvas-ipc commented May 3, 2024 • edited

jrelvas-ipc commented May 3, 2024

Andarwinux commented May 3, 2024

github-actions bot commented Apr 30, 2024 •

edited

na-na-hi commented Apr 30, 2024 •

edited

na-na-hi commented Apr 30, 2024 •

edited

jrelvas-ipc commented Apr 30, 2024 •

edited

Andarwinux commented Apr 30, 2024 •

edited

jrelvas-ipc commented May 2, 2024 •

edited

jrelvas-ipc commented May 3, 2024 •

edited