Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD GPUs (ROCm) support #636

Closed
alissonlauffer opened this issue Oct 25, 2023 · 25 comments · Fixed by #1012
Closed

AMD GPUs (ROCm) support #636

alissonlauffer opened this issue Oct 25, 2023 · 25 comments · Fixed by #1012
Labels
enhancement New feature or request

Comments

@alissonlauffer
Copy link

I'd be a great addition if we could also run the project GPU-accelerated in AMD GPUs too. Thanks!


Please reply with a 👍 if you want this feature.

@alissonlauffer alissonlauffer added the enhancement New feature or request label Oct 25, 2023
@alissonlauffer alissonlauffer changed the title AMD (ROCm) GPUs support AMD GPUs (ROCm) support Oct 26, 2023
@cromefire
Copy link
Contributor

cromefire commented Nov 26, 2023

Experimental support for ROCm enabled GPUs should now me available in #895 . You'll need to build the container yourself though.

@wwayne
Copy link
Contributor

wwayne commented Jan 2, 2024

If you are looking for how to enable ROCm, plz take a look in here
https://slack.tabbyml.com/elpZRnVmD6j

@nilsocket
Copy link

If you are looking for how to enable ROCm, plz take a look in here https://slack.tabbyml.com/elpZRnVmD6j

I'm just trying to install.
Thanks a lot.

@cromefire
Copy link
Contributor

cromefire commented Jan 2, 2024

There's also a proper Linux container you can use in the other branch, but it wasn't merged...
https://github.com/cromefire/tabby/blob/rocm-support/rocm.Dockerfile

It'll freshly build it and build an optimized docker version with the latest stuff (instead of using some old manylinux stuff) and only the ROCm parts you actually need.

@nilsocket
Copy link

nilsocket commented Jan 2, 2024 via email

@cromefire
Copy link
Contributor

There's also a proper Linux container you can use in the other branch, but
it wasn't merged...

Oh, Thanks alot.

If you don't mind can you mention the branch name.

rocm-support, see the link above.

@nilsocket
Copy link

@cromefire , checked the message from gmail, so missed.

Once again.
Thanks.

@nilsocket
Copy link

@cromefire Any specific flags need to be added?

Unable to start it:

thread 'main' panicked at /root/workspace/crates/tabby-common/src/registry.rs:52:21:
Failed to fetch model organization <TabbyML>: error sending request for url (https://raw.githubusercontent.com/TabbyML/registry-tabby/main/models.json): error trying to connect: error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:0A000086:SSL routines:tls_post_process_server_certificate:certificate verify failed:../ssl/statem/statem_clnt.c:1883: (unable to get local issuer certificate)

I ran it with these options:

docker run -it --device /dev/kfd --device /dev/dri/card1 --security-opt seccomp=unconfined --group-add video -p 8080:8080 -v $HOME/.tabby:/data tabby:copilot serve --model TabbyML/DeepseekCoder-6.7B --device rocm

@cromefire
Copy link
Contributor

That means it's missing ca-certificates, I thought I've fixed that, but apparently not, for the time being you can just map your system CA certs into the container and I'll have a look later, maybe it's not committed yet.

@nilsocket
Copy link

nilsocket commented Jan 2, 2024 via email

@cromefire
Copy link
Contributor

cromefire commented Jan 2, 2024

Should be fine now, I had it fixed, but not committed.

--security-opt seccomp=unconfined --group-add video

Also I think you shouldn't even need those.

@nilsocket
Copy link

Hi, @cromefire
After pulling from your latest commit, It is working fine now.

Thanks for your help.

@alsh
Copy link

alsh commented Jan 13, 2024

Hello, @cromefire
I've tried to build your ROCM image. Now, when I try to run tabby, it doesn't seem to use gpu offloading:

curl -X 'GET' \
  'http://localhost:8080/v1/health' \
  -H 'accept: application/json'

{"model":"TabbyML/StarCoder-1B","device":"rocm","arch":"x86_64","cpu_info":"AMD Ryzen 7 3800X 8-Core Processor","cpu_count":16,"accelerators":[],"cuda_devices":[],"version":{"build_date":"2024-01-13","build_timestamp":"2024-01-13T22:38:55.449357661Z","git_sha":"fd0891bd6571e74495c85657b584d7e236d59bd3","git_describe":"fd0891b-dirty"}}

With "accelerators" returning empty list. While inside the docker container, rocminfo recognizes the GPU. What could be wrong?

@cromefire
Copy link
Contributor

Did you run it with --device rocm?

@alsh
Copy link

alsh commented Jan 14, 2024

@cromefire Yes, it was run with --device rocm.
I have used exact command from readme file in your fork:

docker run -it \
  --device /dev/dri --device /dev/kfd \
  -p 8080:8080 -v $HOME/.tabby:/data \
  tabbyml/tabby-rocm \
  serve --model TabbyML/StarCoder-1B --device rocm

The only difference was the image name, which I set to be somewhat different when building the image.

@cromefire
Copy link
Contributor

cromefire commented Jan 14, 2024

Can you share the output from ROCm info? There's a regex there that has to match (to show up it'll also work if it doesn't show up but it might give a clue)

@alsh
Copy link

alsh commented Jan 14, 2024

Can you share the output from ROCm info? There's a regex there that has to match (to show up it'll also work if it doesn't show up but it might give a clue)

Here https://gist.github.com/alsh/97c9ad94274abdf2b41a91857f84781e

@cromefire
Copy link
Contributor

cromefire commented Jan 14, 2024

Well your GPU isn't officially supported by ROCm there's the problem, you can overwrite it to look like a gfx1030 via the override variable (makes it look like a 6900XT; the info on that is in the FAQ). I'll also have a look whether I maybe just can add it as a target...

@alsh
Copy link

alsh commented Jan 14, 2024

Well your GPU isn't officially supported by ROCm there's the problem, you can overwrite it to look like a gfx1030 via the override variable (makes it look like a 6900XT; the info on that is in the FAQ). I'll also have a look whether I maybe just can add it as a target...

I think I found an issue. This line let cmd_res = Command::new("rocminfo").output()?; tries to launch 'rocminfo', but it is not available on the PATH. ROCM packages as prepared by AMD don't make themselves available on the PATH. And rocm.dockerfile doesn't set the PATH to ROCM either.

@cromefire
Copy link
Contributor

Should be easy enough to fix, but that won't fix your problem, as that line just provides the accellerators metadata not the actual acceleration.

@alsh
Copy link

alsh commented Jan 14, 2024

Should be easy enough to fix, but that won't fix your problem, as that line just provides the accellerators metadata not the actual acceleration.

Yes, I started to search for problems because overriding HSA_OVERRIDE_GFX_VERSION did not fix my problem too.

So, another issue:

  • missing build dependency - rocm-device-libs.

Missing rocm-device-libs effectively left llama.cpp build without ROCM support, as cmake wasn't able to find AMDDeviceLibsConfig.cmake file (which this package provides).

Now, I've installed rocm-device-libs, and with HSA_OVERRIDE_GFX_VERSION=10.3.0, it finally seems to work!

2024-01-14T23:10:24.274396Z  INFO tabby::serve: crates/tabby/src/serve.rs:116: Starting server, this might take a few minutes...
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6700 XT, compute capability 10.3
2024-01-14T23:10:27.024540Z  INFO tabby::routes: crates/tabby/src/routes/mod.rs:35: Listening at 0.0.0.0:8080

@cromefire
Copy link
Contributor

cromefire commented Jan 14, 2024

Fixed rocminfo in the container and while checking out the code locally it seems like you GPU should already have been working. Did you actually check whether it uses you GPU or did you rely on the accellerators metadata? (because that isn't accurate at all, on whether it works)

Also it's now possible to use:

docker build --build-arg AMDGPU_TARGETS="$(/opt/rocm/bin/offload-arch | tr '\n' ';')" -t tabby-docker-rocm -f rocm.Dockerfile .

to build an optimized image that only works on your GPU.

So, another issue:

  • missing build dependency - rocm-device-libs.

Interesting, I don't have that issue I think, I'll investigate.

Edit: Yeah something is wrong here and it just falls back on the CPU, but I couldn't get it working by installing rocm-device-libs (it's already installed anyway). It seems like something went wrong when I reduced the size of the container, fallbacks on the CPU really suck...

@cromefire
Copy link
Contributor

cromefire commented Jan 15, 2024

Okay it should work now (at least it finds the GPU now)... and should have a safeguard against any future issues of this kind as well, although it doesn't want to return a result... maybe I should try ROCm 5 for now, ROCm 6 is still very very fresh it seems and the llama.cpp copy of tabby is pretty old now...

@alsh
Copy link

alsh commented Jan 15, 2024

Okay it should work now (at least it finds the GPU now)... and should have a safeguard against any future issues of this kind as well, although it doesn't want to return a result... maybe I should try ROCm 5 for now, ROCm 6 is still very very fresh it seems and the llama.cpp copy of tabby is pretty old now...

Yes, now it looks good, and I'm able to start this docker image with GPU support.

PS:
As for AMDGPU_TARGETS - it's probably of no use, as listing non-supported (officially) targets will not make it work for those GPUs. It will still crash at runtime, like:

rocBLAS error: Cannot read /opt/rocm-6.0.0/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1031
 List of available TensileLibrary Files : 
"/opt/rocm-6.0.0/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/opt/rocm-6.0.0/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/opt/rocm-6.0.0/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
"/opt/rocm-6.0.0/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat"
"/opt/rocm-6.0.0/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat"
"/opt/rocm-6.0.0/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat"
"/opt/rocm-6.0.0/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"
"/opt/rocm-6.0.0/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"
"/opt/rocm-6.0.0/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat"
"/opt/rocm-6.0.0/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat"
"/opt/rocm-6.0.0/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat"
Aborted (core dumped)

So, the HSA_OVERRIDE_GFX_VERSION seems to be the only option to go.

@cromefire
Copy link
Contributor

cromefire commented Jan 15, 2024

As for AMDGPU_TARGETS - it's probably of no use, as listing non-supported (officially) targets will not make it work for those GPUs. It will still crash at runtime, like:

I know (for now), but it makes the build a lot faster as it only builds your GPU. Otherwise it will compile the code for all GPU targets, which takes forever. In your case you'd just put gfx1030 there. Long term maybe we should build it from source and then any GPU supported by LLVM, of the same family should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
5 participants