Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid gpu wakeup in hwdec_cuda by deferring cuInit #14028

Merged
merged 2 commits into from May 6, 2024

Conversation

jrelvas-ipc
Copy link
Contributor

cuInit wakes up the nvidia dgpu on nvidia laptops. This is bad news because the wake up process
is blocking and takes a few seconds. It also needlessly increases power consumption.

Sometimes, a VO loads several hwdecs (like dmabuf_wayland). When cuda is loaded, it calls
cuInit before running all interop inits. However, the first checks in the interops do not
require cuda initialization, so we only need to call cuInit after those checks.

cuInit is handled by the new cuda_priv_init function. It ensures cuInit is only called once.

With these changes, there's no cuda initialization if no OpenGL/Vulkan backend is available. This prevents
dmabuf_wayland and other VOs which automatically load cuda from waking up the nvidia dgpu unnecessarily,
making them start faster and decreasing power consumption on laptops.

Fixes: #13668

video/out/hwdec/hwdec_cuda.c Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Apr 30, 2024

Download the artifacts for this pull request:

Windows
macOS

@jrelvas-ipc
Copy link
Contributor Author

Fixed signing and commit name

video/out/hwdec/hwdec_cuda.c Outdated Show resolved Hide resolved
@na-na-hi
Copy link
Contributor

na-na-hi commented Apr 30, 2024

I don't like how this is implemented. The whole point is to check whether the RA is GL or PL before processing, right? Can't we do something like this?

typedef struct cuda_interop_fn {
    bool (*check)(const struct ra_hwdec *hw);
    bool (*init)(const struct ra_hwdec *hw);
};

bool check_valid = false;
for (int i = 0; interop_fns[i]; i++) {
    if (interop_fns[i]->check(hw)) {
        check_valid = true;
        break;
    }
}

if (check_valid)
    ret = CHECK_CU(cu->cuInit(0));

// Do the rest of autoprobing and init here

@jrelvas-ipc
Copy link
Contributor Author

I don't like how this is implemented. The whole point is to check whether the RA is GL or PL before processing, right? Can't we do something like this?

Yeah, there's no problem with that, I considered implementing that way too, I just wasn't sure which method would be the preferred. I can make a new PR implementing that logic instead.

@na-na-hi
Copy link
Contributor

I'd prefer this way if others think this approach is OK. No need to open another PR. You can just rebase this one.

@jrelvas-ipc
Copy link
Contributor Author

...actually, it might be desirable to fallback to another interop if the init fails in one of them.

Right now, we keep trying until one of the interop inits returns true. In your implementation, if the check returns but init doesn't, then there's no fallback.

@na-na-hi
Copy link
Contributor

You can just check if all check functions fail. Only call cuInit if fn isn't NULL after the checks, and then do the usual autoprobing.

@jrelvas-ipc
Copy link
Contributor Author

So, is it fine to just return an error if the check is successful but the init isn't?

I was just worried since the current behavior is to go through all valid inits.

@na-na-hi
Copy link
Contributor

na-na-hi commented Apr 30, 2024

I meant you only do early exit if none of the checks are successful. Otherwise proceed as before and try initialize ALL interops regardless.

@jrelvas-ipc
Copy link
Contributor Author

Ah I see. That should work well as long as the interop init has no dependency on its check.

I guess that could easily be fixed by calling check within init itself as well.

@jrelvas-ipc
Copy link
Contributor Author

jrelvas-ipc commented Apr 30, 2024

Here's how it stands then:

  • try all checks
  • if no checks are successful -> exit
  • initialize cuda
  • try all interop inits (includes check)

--

I could also put it all in a single loop and initialize cuda after the first successful check.

@Andarwinux
Copy link
Contributor

Andarwinux commented Apr 30, 2024

This caused crash on win32 when hwdec auto switch with gpu-context switch.

cmd:
mpv --no-config --gpu-context=d3d11 --hwdec=d3d11va,nvdec --vo=gpu-next

set gpu-context winvk

backtrace
(3e0c.4f64): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
mpv!cuda_priv_init+0xd08:
00007ff7`536eda38 498b0424        mov     rax,qword ptr [r12] ds:00000000`00000000=????????????????
0:017> g
(3e0c.4f64): Access violation - code c0000005 (!!! second chance !!!)
mpv!cuda_priv_init+0xd08:
00007ff7`536eda38 498b0424        mov     rax,qword ptr [r12] ds:00000000`00000000=????????????????
0:017> g
(3e0c.4f64): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
mpv!cuda_priv_init+0xd08:
00007ff7`536eda38 498b0424        mov     rax,qword ptr [r12] ds:00000000`00000000=????????????????
0:017> g
(3e0c.4f64): Access violation - code c0000005 (!!! second chance !!!)
mpv!cuda_priv_init+0xd08:
00007ff7`536eda38 498b0424        mov     rax,qword ptr [r12] ds:00000000`00000000=????????????????
0:017> !analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************


KEY_VALUES_STRING: 1

    Key  : AV.Dereference
    Value: NullPtr

    Key  : AV.Fault
    Value: Read

    Key  : Analysis.CPU.mSec
    Value: 1233

    Key  : Analysis.Elapsed.mSec
    Value: 2488

    Key  : Analysis.IO.Other.Mb
    Value: 0

    Key  : Analysis.IO.Read.Mb
    Value: 85

    Key  : Analysis.IO.Write.Mb
    Value: 84

    Key  : Analysis.Init.CPU.mSec
    Value: 1203

    Key  : Analysis.Init.Elapsed.mSec
    Value: 55657

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 260

    Key  : Failure.Bucket
    Value: NULL_POINTER_READ_c0000005_mpv.exe!cuda_priv_init

    Key  : Failure.Hash
    Value: {5878f643-fc30-6545-e7ad-7aca85b61bd1}

    Key  : Timeline.OS.Boot.DeltaSec
    Value: 199568

    Key  : Timeline.Process.Start.DeltaSec
    Value: 55

    Key  : WER.OS.Branch
    Value: rs_prerelease

    Key  : WER.OS.Version
    Value: 10.0.26040.1000

    Key  : WER.Process.Version
    Value: 2.0.0.0


NTGLOBALFLAG:  0

APPLICATION_VERIFIER_FLAGS:  0

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ff7536eda38 (mpv!cuda_priv_init+0x0000000000000d08)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 0000000000000000
Attempt to read from address 0000000000000000

FAULTING_THREAD:  00004f64

PROCESS_NAME:  mpv.exe

READ_ADDRESS:  0000000000000000 

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s.

EXCEPTION_CODE_STR:  c0000005

EXCEPTION_PARAMETER1:  0000000000000000

EXCEPTION_PARAMETER2:  0000000000000000

FAULTING_LOCAL_VARIABLE_NAME:  p

STACK_TEXT:  
000000c3`6d7fea20 00007ff7`536ee64e     : 00000000`000000a0 000001ff`008c4440 00000000`00000000 000001ff`008c4440 : mpv!cuda_priv_init+0xd08
000000c3`6d7fea80 00007ff7`536edb1e     : 00000000`00000000 00000000`00000000 000001ff`0675d730 000001ff`0675d730 : mpv!cuda_vk_init+0x9e
000000c3`6d7feb00 00007ff7`526978cf     : 000000c3`6d7fe6e0 00000000`00000002 00002242`d90ddb82 00007ff7`565583a6 : mpv!cuda_init+0x2e
000000c3`6d7feb60 00007ff7`5269778e     : 00000000`00000000 00000000`00000000 000000c3`6d7fec18 000000c3`6c7feb80 : mpv!ra_hwdec_load_driver+0xcf
000000c3`6d7febe0 00007ff7`526974f0     : 00000000`00000000 00000000`00000000 00000000`00000010 000000c3`6d7fec60 : mpv!load_add_hwdec+0x4e
000000c3`6d7fec40 00007ff7`52695dff     : 00000000`00000008 00007ffc`8e553b15 00000000`00100008 000001fe`fdd70000 : mpv!ra_hwdec_ctx_load_fmt+0xf0
000000c3`6d7feca0 00007ff7`52c15887     : 000001ff`d7bfe760 00000000`0000022c 00000000`00000000 000000c3`6d7ff5d0 : mpv!control+0x5cf
000000c3`6d7ff5c0 00007ff7`52668d82     : 00000000`00000000 000001ff`ad8fff90 000f423f`fff0bdc0 000000ee`63d60e50 : mpv!run_control+0x47
000000c3`6d7ff630 00007ff7`525b3451     : 00000000`00000000 000001ff`bd5ff630 00000000`00000000 00000000`00000000 : mpv!mp_dispatch_queue_process+0x372
000000c3`6d7ff6c0 00007ffc`8bd4bcc0     : 00000000`00000000 000001ff`00fdb850 00000000`00000000 00000000`00000000 : mpv!vo_thread+0x1e1
000000c3`6d7ff8e0 00007ffc`8da98d17     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x30
000000c3`6d7ff910 00007ffc`8e538260     : 00000000`00000001 000000c3`6d8ff000 00000000`00000000 00000000`00000000 : KERNEL32!BaseThreadInitThunk+0x17
000000c3`6d7ff940 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x20


FAULTING_SOURCE_LINE:  /build/packages/mpv-prefix/src/mpv-build/../../../../src_packages/mpv/video/out/hwdec/hwdec_cuda.c

FAULTING_SOURCE_FILE:  /build/packages/mpv-prefix/src/mpv-build/../../../../src_packages/mpv/video/out/hwdec/hwdec_cuda.c

FAULTING_SOURCE_LINE_NUMBER:  80

FAULTING_SOURCE_CODE:  
No source found for '/build/packages/mpv-prefix/src/mpv-build/../../../../src_packages/mpv/video/out/hwdec/hwdec_cuda.c'


SYMBOL_NAME:  mpv!cuda_priv_init+d08

MODULE_NAME: mpv

IMAGE_NAME:  mpv.exe

STACK_COMMAND:  dt ntdll!LdrpLastDllInitializer BaseDllName ; dt ntdll!LdrpFailureData ; ~17s ; .cxr ; kb

FAILURE_BUCKET_ID:  NULL_POINTER_READ_c0000005_mpv.exe!cuda_priv_init

OS_VERSION:  10.0.26040.1000

BUILDLAB_STR:  rs_prerelease

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

IMAGE_VERSION:  2.0.0.0

FAILURE_ID_HASH:  {5878f643-fc30-6545-e7ad-7aca85b61bd1}

Followup:     MachineOwner
---------

@jrelvas-ipc
Copy link
Contributor Author

Hmm, I'm rebasing this PR later this day so it implements the check suggestion. It should be less bug prone, so hopefully you stop getting that access violation crash. @Andarwinux

@Dudemanguy
Copy link
Member

Can't you just use ra_is_wldmabuf to check this?

@Andarwinux
Copy link
Contributor

This also makes sense for d3d11, where users obviously don't want to accidentally wake up dGPU while playing video on laptops using iGPU via d3d11, but it should also be taken into account that mpv may change gpu-context/vo/hwdec at runtime.

@jrelvas-ipc
Copy link
Contributor Author

Rebased. Initial interop checks have been split into check. CUDA initialization only happens after the first successful check.

@Andarwinux
Copy link
Contributor

../../../../src_packages/mpv/video/out/hwdec/hwdec_cuda.c:62:7: error: expected expression
   62 |     &(static cuda_interop_fn) {.check = cuda_gl_check, .init = cuda_gl_init},
      |       ^
../../../../src_packages/mpv/video/out/hwdec/hwdec_cuda.c:81:5: error: function definition is not allowed here
   81 |     {
      |     ^
../../../../src_packages/mpv/video/out/hwdec/hwdec_cuda.c:106:19: error: call to undeclared function 'backend_init'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
  106 |             ret = backend_init();
      |                   ^
3 errors generated.

@kasper93 kasper93 added this to the Release v0.39.0 milestone May 1, 2024
video/out/hwdec/hwdec_cuda.c Outdated Show resolved Hide resolved
video/out/hwdec/hwdec_cuda.c Outdated Show resolved Hide resolved
@jrelvas-ipc jrelvas-ipc requested a review from philipl May 1, 2024 23:52
@jrelvas-ipc
Copy link
Contributor Author

Lesson learned: Do not forget to configure meson to use clang...

@Andarwinux
Copy link
Contributor

hwdec=nvdec doesn't work after switching gpu-context at runtime.

mpv --no-config --vo=gpu-next --gpu-context=d3d11 --hwdec=nvdec,d3d11va --msg-level=all=debug --log-file=log.txt .\8k.mkv

set gpu-context winvk

log.txt

[ 34.178][v][vd] Looking at hwdec hevc-nvdec...
[ 34.178][v][vo/gpu-next] Loading hwdec drivers for format: 'cuda'
[ 34.178][v][vo/gpu-next] Loading hwdec driver 'cuda'
[ 34.178][e][vo/gpu-next/cuda] CUDA hwdec only works with OpenGL or Vulkan backends.
[ 34.178][v][vo/gpu-next] Loading failed.
[ 34.179][v][vd] Could not create device.
[ 34.179][v][vd] Using software decoding.

@jrelvas-ipc
Copy link
Contributor Author

jrelvas-ipc commented May 2, 2024

@Andarwinux that's weird. As far as I know, this change shouldn't be changing which contexts CUDA can run in

Is this something specific with d3d11? A d3d11 cuda interop wasn't implemented by this change and it didn't exist before either

@na-na-hi
Copy link
Contributor

na-na-hi commented May 2, 2024

Neither of the check functions ever return true. Of course it doesn't work.

@jrelvas-ipc
Copy link
Contributor Author

Oh yeah... That's certainly an issue.

Not sure how I missed that honestly.

I'm rebasing this PR into a single commit soon. I'll include the fixed checks and move cuda_interop_fn structs out of hwdec_cuda.c

@jrelvas-ipc
Copy link
Contributor Author

Check functions were fixed, moved cuda_interop_fn structs into extern declarations.

@jrelvas-ipc jrelvas-ipc force-pushed the master branch 2 times, most recently from 59d1371 to e9eaac6 Compare May 2, 2024 16:32
`cuInit` wakes up the nvidia dgpu on nvidia laptops. This is bad news because the wake up process
is blocking and takes a few seconds. It also needlessly increases power consumption.

Sometimes, a VO loads several hwdecs (like `dmabuf_wayland`). When `cuda` is loaded, it calls
`cuInit` before running all interop inits. However, the first checks in the interops do not
require cuda initialization, so we only need to call `cuInit` after those checks.

This commit splits the interop `init` function into `check` and `init`. `check` can be called without
initializing the Cuda backend, so cuInit is only called *after* the first interop check.

With these changes, there's no cuda initialization if no OpenGL/Vulkan backend is available. This prevents
`dmabuf_wayland` and other VOs which automatically load cuda from waking up the nvidia dgpu unnecessarily,
making them start faster and decreasing power consumption on laptops.

Fixes: mpv-player#13668
This simplifies the code and makes it easier to read.
@jrelvas-ipc
Copy link
Contributor Author

jrelvas-ipc commented May 3, 2024

Just a small change which makes the check functions more readable :)

(Should have no behavior changes)

@jrelvas-ipc
Copy link
Contributor Author

@Andarwinux let me know if your issue still happens with these new fixes

@Andarwinux
Copy link
Contributor

@Andarwinux let me know if your issue still happens with these new fixes

It's still working perfectly, thanks!

@kasper93 kasper93 merged commit 1759d73 into mpv-player:master May 6, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dmabuf-wayland wakes up discrete NVIDIA GPU
8 participants