We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在使用vgpu的过程中偶尔会出现Handle_remap not found handle的问题
偶尔会出现 这时候重建pod可以恢复正常 在pod容器中输入nvidia-smi会报错
宿主机输入nvidia-smi正常 同一台宿主机的pod输入nvidia-smi正常
错误日志
root@service416776181220773888-55d7479f64-tvg9r:/# nvidia-smi [4pdvGPU Debug(99:140414784235264:libvgpu.c:39)]: init_dlsym [4pdvGPU Debug(99:140414784235264:libvgpu.c:61)]: into dlsym nvmlInitWithFlags [4pdvGPU Debug(99:140414784235264:hook.c:542)]: nvmlInitWithFlags ... [4pdvGPU Debug(99:140414784235264:hook.c:129)]: LOADING cuEventDestroy_v2 89 [4pdvGPU Debug(99:140414784235264:hook.c:129)]: LOADING cuModuleLoadDataEx 90 [4pdvGPU Debug(99:140414784235264:hook.c:129)]: LOADING cuModuleLoadFatBinary 91 [4pdvGPU Debug(99:140414784235264:hook.c:129)]: LOADING cuModuleGetFunction 92 [4pdvGPU Info(99:140414784235264:hook.c:136)]: loaded_cuda_libraries [4pdvGPU Debug(99:140414784235264:multiprocess_memory_limit.c:476)]: Try create shrreg [4pdvGPU Debug(99:140414784235264:hook.c:558)]: nvmlInit_v2 [4pdvGPU Debug(99:140414784235264:hook.c:560)]: Hijacking nvmlInit_v2 [4pdvGPU Debug(99:140414784235264:hook.c:542)]: nvmlInitWithFlags [4pdvGPU Debug(99:140414784235264:hook.c:544)]: Hijacking nvmlInitWithFlags [4pdvGPU Debug(99:140414784235264:hook.c:472)]: nvmlDeviceGetHandleByIndex_v2 index=0 [4pdvGPU Debug(99:140414784235264:hook.c:476)]: Hijacking nvmlDeviceGetHandleByIndex_v2 [4pdvGPU Debug(99:140414784235264:nvml_entry.c:775)]: Hijacking nvmlDeviceGetUUID [4pdvGPU Debug(99:140414784235264:hook.c:472)]: nvmlDeviceGetHandleByIndex_v2 index=0 [4pdvGPU Debug(99:140414784235264:hook.c:476)]: Hijacking nvmlDeviceGetHandleByIndex_v2 [4pdvGPU Debug(99:140414784235264:nvml_entry.c:775)]: Hijacking nvmlDeviceGetUUID [4pdvGPU Debug(99:140414784235264:hook.c:472)]: nvmlDeviceGetHandleByIndex_v2 index=1 [4pdvGPU Debug(99:140414784235264:hook.c:476)]: Hijacking nvmlDeviceGetHandleByIndex_v2 [4pdvGPU Debug(99:140414784235264:nvml_entry.c:775)]: Hijacking nvmlDeviceGetUUID [4pdvGPU Debug(99:140414784235264:hook.c:472)]: nvmlDeviceGetHandleByIndex_v2 index=2 [4pdvGPU Debug(99:140414784235264:hook.c:476)]: Hijacking nvmlDeviceGetHandleByIndex_v2 [4pdvGPU Debug(99:140414784235264:nvml_entry.c:775)]: Hijacking nvmlDeviceGetUUID [4pdvGPU ERROR (pid:99 thread=140414784235264 hook.c:285)]: Handle_remap not found handle=7fb4daa19938 nvidia-smi: /home/limengxuan/work/libcuda_override/src/nvml/hook.c:285: handle_remap: Assertion `0' failed. Aborted (core dumped)
error-in-container.log
宿主机 nvidia-smi -a nvidia-smi-host.txt
The text was updated successfully, but these errors were encountered:
No branches or pull requests
1. Issue or feature description
在使用vgpu的过程中偶尔会出现Handle_remap not found handle的问题
2. Steps to reproduce the issue
偶尔会出现 这时候重建pod可以恢复正常
在pod容器中输入nvidia-smi会报错
宿主机输入nvidia-smi正常
同一台宿主机的pod输入nvidia-smi正常
3. Information to attach (optional if deemed irrelevant)
错误日志
error-in-container.log
宿主机 nvidia-smi -a
nvidia-smi-host.txt
The text was updated successfully, but these errors were encountered: