You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running the v24.9.0 release of the Nvidia GPU Operator, and attempted to install Talos 1.9.0-alpha.2 on my nodes (from 1.8.2). However, it is now unable to find and validate the drivers. I previously had to make some custom modifications to the operator validator logic to make it search under /glibc/lib, relative to the driverInstallDir, but these no longer help either.
These are the driverInstallDir values I have tried, with no success:
/run/nvidia/driver (the default one from Nvidia)
/usr/local
/usr/local/glibc
/usr/local/glibc/usr
From browsing the talos filesystem, as far as I can tell, nvidia-smi and other executables are located in /usr/local/bin, while all the libraries now are located under /usr/lib/glibc/lib64, and symlinked to a few other places as well.
As the Nvidia components do not search glibc by default, I cannot see what value of driverInstallDir that would currently allow these components to find both the libraries, as well as the required binaries. (example discovery logic in the gpu operator validator https://github.com/NVIDIA/gpu-operator/blob/79b1240221f22bbbc60c6c4b659aace48f0b3f42/validator/find.go#L35, also see a few lines below for discovery of the binaries)
From the description of c7eb377, it seemed like it should "just work" now with the gpu operator. Any pointers as to what I might be doing wrong?
The text was updated successfully, but these errors were encountered:
I see now why my earlier modifications no longer work, as the libraries are not found in /usr/local/glibc/lib any more, as this folder is not symlinked like the others.
I'm running the v24.9.0 release of the Nvidia GPU Operator, and attempted to install Talos 1.9.0-alpha.2 on my nodes (from 1.8.2). However, it is now unable to find and validate the drivers. I previously had to make some custom modifications to the operator validator logic to make it search under /glibc/lib, relative to the driverInstallDir, but these no longer help either.
These are the driverInstallDir values I have tried, with no success:
From browsing the talos filesystem, as far as I can tell, nvidia-smi and other executables are located in /usr/local/bin, while all the libraries now are located under /usr/lib/glibc/lib64, and symlinked to a few other places as well.
As the Nvidia components do not search glibc by default, I cannot see what value of driverInstallDir that would currently allow these components to find both the libraries, as well as the required binaries. (example discovery logic in the gpu operator validator https://github.com/NVIDIA/gpu-operator/blob/79b1240221f22bbbc60c6c4b659aace48f0b3f42/validator/find.go#L35, also see a few lines below for discovery of the binaries)
From the description of c7eb377, it seemed like it should "just work" now with the gpu operator. Any pointers as to what I might be doing wrong?
The text was updated successfully, but these errors were encountered: