Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amdgpu anyone? #8

Open
walterav1984 opened this issue Apr 27, 2022 · 5 comments
Open

amdgpu anyone? #8

walterav1984 opened this issue Apr 27, 2022 · 5 comments
Labels
help wanted Extra attention is needed

Comments

@walterav1984
Copy link

walterav1984 commented Apr 27, 2022

Found this extensive opencl docker topic how to do opencl with amdgpu. However there are 'some' side notes considering AMD GPU support, the problem is a mix of diversity and deprecating driver support among different product lines over time. If you consider to include AMDGPU support for your container approach please continue reading. It will be a lot of IF/ELSE conditions but could be very usefull.

First we have the opensource kernel driver amdgpu which ships as the default driver in most of the linuxdistro's. This driver can be extended with MESA opensource OpenGL support, also ships in most linux distro's. Both are maintained and work great on new 5xxx 6xxx navi/big-navi to polaris RX-4xx/5xx and even the legacy GPU's GCN1-3 (HD7xxx-R9-3XX). The Mesa OpenGL performance outruns the closed source OpenGL drivers from AMD themself by huge margins and keeps improving for not only newer but also to the dated GPU's. In most distro's kernel there is also kfd besides amdgpu. This component is used to extend the free opensource kernel driver with compute capabilites closed&open source. However for some PRO graphic apps like Davinci Resolve it is required to use the closed AMDGPU Proprietary OpenGl version besides OpenCL!

This closed OpenGL driver is a first requirement since D.R. 16.1 to work with their AMD proprietary OpenCL stack (interop). Between DR 16.1 - 17, this interop proprietary OpenGL + OpenCL requirement could be worked-around, but since DR 17 it is really required! Good news is that the closed AMDGPU OpenGl driver can co-exist on the same system as with the already installed kernel AMDGPU driver and Mesa OpenGL driver and can be manually loaded/linked with SHELL Variables for specific programs.

Now the OpenCL stack on linux for AMDGPU comes offcourse again in many (forced) flavours, but it gets shipped/delivered with the same AMDGPU proprietary driver from the AMD website that also gives proprietary OpenGL.

*AMD Closed OpenCL ORCA/Legacy GCN1-4 less dated 2021 (usefull for D.R)
*AMD Closed OpenCL PAL Vega very dated 2020 (usefull for D.R)
*AMD ROCM opensource OpenCL (dated releases in the past worked for older GPUs, and newer releases might again for newer GPU's nothing in between...)
*AMD ROCR partly replaces PAL (usefull for D.R)
*Mesa Clover opensource OpenCl dated (not usefull for D.R)
*Mesa RustiCl opensource OpenCl new (not yet tested)

Around spring 2021s AMD deprecated pre-polaris GCN1-3 HD7xxx R9-3XX GPU's from the AMDGPU proprietary driver (all platforms Windows/Linux), and OpenCL support broke earlier... The good news is that newer cards (big)navi rdna1/2 still profit from a similar proprietary OpenGL rand OpenCL replacement / linking trick.

To summarize, this docker+amd Davinci Resolve support to be usefull is atleast 2 directions past/future. Detect distro/kernel version compatible with AMDGPU pro driver or add workaround for forcing the driver installing/extracting libs on other distro/kernel versions than officially supported. Than for AMDGPU's before Polaris like GCN1-3, HD7xxx R9-3XX, filter by device-id/name and download the dated older 21.20 21.30/40/50 although listed on AMD their own website broken for orca OpenCl AMDGPU proprietary driver, extract and only static link proprietary OpenGL with use of static linked Orca/legacy OpenCl for them. Not sure if they still work with newest 'amdgpu/kfd' kernel driver which comes with newer distro's still have to test. Than for newer polaris/vega/rdna1/2(big)navi download the latest and do the same identification but again with multiple directions, using OpenCl PAL or even RocM perhaps for Vega (removed since 20.40/45)?) than RoCR for anything after that like RDNA1/2?

This is offcourse a extreme simplification, but has worked in the past on seperate gpu's using different distro's. First thing to try is if I can get DR 17.x working in supported ubuntu 20.04 with 21.1030 OpenGL/OpenCL within Docker. Than try a newer D.R. 18x or maybe a newer Ubuntu and see if that works?

Nevermind the choices in AMD Vulkan landscape on linux ;-) ...

EDIT: Took the AMD proprietary driver versions suggested by Thomas Debesse.

@fat-tire
Copy link
Owner

Dang- that's confusing! I didn't realize the open source driver was better than the closed-source vendor one, tho-- cool beans!

Definitely see if you can get the container working w/AMD-- just rip out all the stuff having to do with nvidia from the Dockerfile and the build scripts and manually replace it with what works for you. Once you have a working version, maybe we can figure out how to auto-detect the driver running on the host to auto-build the container so that it passes through or installs whatever would need to be there.

It sounds like there's kind of a tree (as you say a lot of if/thens) to support older hardware... it might be easier to be "forward-looking" to get newer stuff working, as no-doubt the hardware reqs will continue to advance. But the goal scenario would be an "it just works" experience where it picks up everything locally and then installs the right version whether you have AMD, NVIDIA, or something else. That's kinda what it's doing now with NVIDIA-- the build will detect and install the same driver in the container as running on the host.

Most important is to "scratch your itch" with the hardware you have-- so if you can get this working well- that would be amazing! Let us know your progress, and if anyone else with an AMD GPU wants to jump in with advice or just your experience, that would be great too.

Thx again!

@walterav1984
Copy link
Author

Dang- that's confusing! I didn't realize the open source driver was better than the closed-source vendor one, tho-- cool beans!

Its almost paradoxal, the out of the box opensource amdgpu kernel driver and mesa OpenCl support shipped with distro's was already OK for almost 10 years. Including videodecode acceleration. But the improvement of the last 3 years is incredible and even surpasses Windows performance in a lof of situations! On the contrary OpenCl and video Encode are worse...

Good news without docker yet, did some quick and dirty testing yesterday of Amd Proprietary driver 21.20(ubuntu 20.04) which is the last one that supports legacy/orca OpenCl for HD7xxx- GCN1-3 to see if it still works on Ubuntu 22.04 and it does work.

Although ubuntu 22.04 comes with a newer opensource amdgpu/kfd kernel driver and Mesa OpenGl, I was able to manually extract the older Proprietary OpenGl drivers and link them for use on specific Apps (have script).
However for OpenCl I just took the easy route and installed orca/legacy using the driver installer which didn't complain about non-matching ubuntu/kernel release version. With a special install argument it only installs OpenCl support, it won't touch kernel drivers nor OpenGl support, but I'd rather have it manually extracted...

With this combination I first tested where I stopped a while ago and saw that D.R. 17.1 (linking Prop. OpenGl and using installed orca OpenCl) did work again with 6K Braw videos. Than I tested D.R 17.4.6 which I couldn't get debbed (MakeResolveDeb) so purged 17.1 and manually installed 17.46 using Blackmagic its own run file which also ran fine. Than D.R. beta 18b1 which also started and scrolled through 6k Braw video fine.

No extensive testing yet by far, but it may show that its possible to cherry pick the right components needed.

Sadly my docker skills are mostly absent, only some awfull experience in the past with BTRFS auto filesystem snapshotting using docker which eat all my HDD space like crazy...

About your docker Nvidia implementation, may the docker container adjust the host system to it needs for instance install the missing or optimal Nvidia driver in the host and then a matching own for its own container?

I also see you are running Centos inside docker, so I think I first have to verify the workings of amdgpu opensource driver with cherry picked prop. OpenGl&OpenCl driver on centos... first. Which with Centos 7 is prehistoric and with 8 is deprecated...

Maybe first some dockerless bash scripts for ubuntu host only which can determin Distro/kernel release amdgpu model and matching driver download, than expand that to docker with ubuntu guest and than maybe with centos guest?

@fat-tire
Copy link
Owner

Yeah so the way the docker nvidia implementatoin works is basically like this:

  • it starts with a "base" of Centos Stream 8-- which is a successor to Centos 7/8 from RedHat (now IBM). Centos is the native OS preferred by DaVinci Resolve, and though the script originally ran with the old Centos 8, this became deprecated/EOL at the beginning of the year, so Centos Stream 8 it is! Once set up as the "base", the system gets updated to the latest existing packages and a couple of specific new packages Resolve requires are installed using the normal Stream OS package manager dnf (the successor to yum, I believe).

  • The version of Nvidia's driver running on the host was detected, and now when building the image, the container-specific part is downloaded from Nvidia -- in the CENTOS container-- and then installed via the .run installer... below are the lines in the Dockerfile that handle this-- the ${ENVIRONMENT_VARIABLES} are passed in as arguments from the build.sh file that kicks off the image build.

       && curl https://us.download.nvidia.com/XFree86/Linux-${ARCH}/${NVIDIA_VERSION}/NVIDIA-Linux-${ARCH}-${NVIDIA_VERSION}.run -o /tmp/NVIDIA-Linux-${ARCH}-${NVIDIA_VERSION}.run \
       && bash /tmp/NVIDIA-Linux-${ARCH}-${NVIDIA_VERSION}.run --no-kernel-module --no-kernel-module-source --run-nvidia-xconfig --no-backup --no-questions --ui=none \
       && rm -f /tmp/NVIDIA-Linux-${ARCH}-${NVIDIA_VERSION}.run \

^^^ This is the bit that would probably need to change for the AMD version. We would have to make the Dockerfile a bit more dynamic to handle the AMD case once you can get it working. But those would be the lines you'd change to get/add-in the AMD bits the container needs to run.

  • Then, a user named "resolve" is added, and a home directory created for that user-- there's no reason to run as "root" in the container. In fact, with Podman, the whole container is actually running under the host's non-root user anyway.

  • The DaVinci_Resolve_Linux.zip (previously downloaded by the user from BlackMagic) contains an installer which is run from the container's /tmp to... install the files into the container so they can be run!

  • Finally, there's a cleanup of /tmp and various installation packages that don't need to stick around... to reduce the size of the container image.

As far as the best way to proceed with the AMD-- ie, whether you think it make more sense to first get it all working on the host and only then transfer the useful bits to the container- or whether it makes more sense to just go for it and try to get it working all at once-- I'll leave this to your judgement. FWIW, once I could run OTHER NVIDIA type programs in the host, I just did some reading on NVIDIA's site about running NVIDIA CUDA/OPENGL apps in containers to realize simply downloading & running the NVIDIA installer would get what I needed. Then it was a matter of choosing the correct version of the driver based on which version was running on the host. (I didn't have to deal with anything but x86_64 because that's the only architecture supported by BlackMagic.)

@walterav1984
Copy link
Author

walterav1984 commented Apr 30, 2022

The official AMD 21.20 drivers that may work for CentOS is currently 8.4, will this match enough with the current rolling release like behaviour of CentOS stream? If so...

Than I might first try a CentOS 8.4 on bare metal, and use your dockerfile as a guide to install D.R., than download the AMD driver, try to extract the Prop.OpenGL related .rpm to a folder and maybe do the same for OpenCl if the installer won't work or won't work with non matching CentOS release/version tags. Than expand the CMD command to run resolve with some env variables to point to the OpenGl libs. Than I might try a the current CentOS 8 stream rolling release on bare metal, and if that still works with the cherrypicked dated AMD drivers I'll dive in the docker part.

For detecting which AMDGPU models needs what OpenCL library, I think atleast 3 routes will do (dated 21.20, very dated 20.40 and very new LATEST), actually 2(dated and very new is maintainable), but 1 route will be a user choice since one generation GPU Vega GFX10 has two options (very dated 20.40(PAL) / very new LATEST) since I don't have that hardware I cannot choose what will fit. For only a simple 2 way split, the line between legacy hardware HD7xxx RX 1XX-R3XX, GCN 1-3 with dropped driver support (Orca OpenCl), or newer RX4xx RX5xx VEGA Navi BigNavi with still current official driver support (RocR OpenCl).

Don't have a ETA for testing yet, but thanks a lot already for pointing to the specifics.

@fat-tire
Copy link
Owner

fat-tire commented May 2, 2022

Awesome! Please give it a shot and keep us all abreast of your progress. I'll keep this issue open for any updates as well as background for anyone else with a AMD machine that wants to jump in!

@fat-tire fat-tire added the help wanted Extra attention is needed label Sep 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants