Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle_remap not found handle #22

Open
RexQian opened this issue May 25, 2022 · 0 comments
Open

Handle_remap not found handle #22

RexQian opened this issue May 25, 2022 · 0 comments

Comments

@RexQian
Copy link

RexQian commented May 25, 2022

1. Issue or feature description

在使用vgpu的过程中偶尔会出现Handle_remap not found handle的问题

2. Steps to reproduce the issue

偶尔会出现 这时候重建pod可以恢复正常
在pod容器中输入nvidia-smi会报错

宿主机输入nvidia-smi正常
同一台宿主机的pod输入nvidia-smi正常

3. Information to attach (optional if deemed irrelevant)

错误日志

root@service416776181220773888-55d7479f64-tvg9r:/# nvidia-smi
[4pdvGPU Debug(99:140414784235264:libvgpu.c:39)]: init_dlsym

[4pdvGPU Debug(99:140414784235264:libvgpu.c:61)]: into dlsym nvmlInitWithFlags
[4pdvGPU Debug(99:140414784235264:hook.c:542)]: nvmlInitWithFlags
...
[4pdvGPU Debug(99:140414784235264:hook.c:129)]: LOADING cuEventDestroy_v2 89
[4pdvGPU Debug(99:140414784235264:hook.c:129)]: LOADING cuModuleLoadDataEx 90
[4pdvGPU Debug(99:140414784235264:hook.c:129)]: LOADING cuModuleLoadFatBinary 91
[4pdvGPU Debug(99:140414784235264:hook.c:129)]: LOADING cuModuleGetFunction 92
[4pdvGPU Info(99:140414784235264:hook.c:136)]: loaded_cuda_libraries
[4pdvGPU Debug(99:140414784235264:multiprocess_memory_limit.c:476)]: Try create shrreg
[4pdvGPU Debug(99:140414784235264:hook.c:558)]: nvmlInit_v2
[4pdvGPU Debug(99:140414784235264:hook.c:560)]: Hijacking nvmlInit_v2
[4pdvGPU Debug(99:140414784235264:hook.c:542)]: nvmlInitWithFlags
[4pdvGPU Debug(99:140414784235264:hook.c:544)]: Hijacking nvmlInitWithFlags
[4pdvGPU Debug(99:140414784235264:hook.c:472)]: nvmlDeviceGetHandleByIndex_v2 index=0
[4pdvGPU Debug(99:140414784235264:hook.c:476)]: Hijacking nvmlDeviceGetHandleByIndex_v2
[4pdvGPU Debug(99:140414784235264:nvml_entry.c:775)]: Hijacking nvmlDeviceGetUUID
[4pdvGPU Debug(99:140414784235264:hook.c:472)]: nvmlDeviceGetHandleByIndex_v2 index=0
[4pdvGPU Debug(99:140414784235264:hook.c:476)]: Hijacking nvmlDeviceGetHandleByIndex_v2
[4pdvGPU Debug(99:140414784235264:nvml_entry.c:775)]: Hijacking nvmlDeviceGetUUID
[4pdvGPU Debug(99:140414784235264:hook.c:472)]: nvmlDeviceGetHandleByIndex_v2 index=1
[4pdvGPU Debug(99:140414784235264:hook.c:476)]: Hijacking nvmlDeviceGetHandleByIndex_v2
[4pdvGPU Debug(99:140414784235264:nvml_entry.c:775)]: Hijacking nvmlDeviceGetUUID
[4pdvGPU Debug(99:140414784235264:hook.c:472)]: nvmlDeviceGetHandleByIndex_v2 index=2
[4pdvGPU Debug(99:140414784235264:hook.c:476)]: Hijacking nvmlDeviceGetHandleByIndex_v2
[4pdvGPU Debug(99:140414784235264:nvml_entry.c:775)]: Hijacking nvmlDeviceGetUUID
[4pdvGPU ERROR (pid:99 thread=140414784235264 hook.c:285)]: Handle_remap not found handle=7fb4daa19938
nvidia-smi: /home/limengxuan/work/libcuda_override/src/nvml/hook.c:285: handle_remap: Assertion `0' failed.
Aborted (core dumped)

error-in-container.log

宿主机 nvidia-smi -a
nvidia-smi-host.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant