-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD GPUs (ROCm) support #636
Comments
Experimental support for ROCm enabled GPUs should now me available in #895 . You'll need to build the container yourself though. |
If you are looking for how to enable ROCm, plz take a look in here |
I'm just trying to install. |
There's also a proper Linux container you can use in the other branch, but it wasn't merged... It'll freshly build it and build an optimized docker version with the latest stuff (instead of using some old manylinux stuff) and only the ROCm parts you actually need. |
Oh, Thanks alot.
If you don't mind can you mention the branch name.
…On Tue, Jan 2, 2024, 4:16 PM Cromefire_ ***@***.***> wrote:
There's also a proper Linux container you can use in the other branch, but
it wasn't merged...
—
Reply to this email directly, view it on GitHub
<#636 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHDPHHG2GYLZM7NLKX5K7CDYMPQQLAVCNFSM6AAAAAA6PW3FY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZTHA3DOMZTGM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
|
@cromefire , checked the message from gmail, so missed. Once again. |
@cromefire Any specific flags need to be added? Unable to start it: thread 'main' panicked at /root/workspace/crates/tabby-common/src/registry.rs:52:21:
Failed to fetch model organization <TabbyML>: error sending request for url (https://raw.githubusercontent.com/TabbyML/registry-tabby/main/models.json): error trying to connect: error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file, error:80000002:system library:file_open:reason(2):../providers/implementations/storemgmt/file_store.c:267:calling stat(/usr/lib/ssl/certs), error:0A000086:SSL routines:tls_post_process_server_certificate:certificate verify failed:../ssl/statem/statem_clnt.c:1883: (unable to get local issuer certificate) I ran it with these options: docker run -it --device /dev/kfd --device /dev/dri/card1 --security-opt seccomp=unconfined --group-add video -p 8080:8080 -v $HOME/.tabby:/data tabby:copilot serve --model TabbyML/DeepseekCoder-6.7B --device rocm |
That means it's missing ca-certificates, I thought I've fixed that, but apparently not, for the time being you can just map your system CA certs into the container and I'll have a look later, maybe it's not committed yet. |
Sure, Thanks for than info.
…On Tue, Jan 2, 2024, 8:07 PM Cromefire_ ***@***.***> wrote:
That means it's missing ca-certificates, I thought I've fixed that, but
apparently not, for the time being you can just map your system CA certs
into the container and I'll have a look later, maybe it's not committed yet.
—
Reply to this email directly, view it on GitHub
<#636 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHDPHHAYVTP3FIIC3TDHFM3YMQLTRAVCNFSM6AAAAAA6PW3FY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUGEYDKNZVGU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Should be fine now, I had it fixed, but not committed.
Also I think you shouldn't even need those. |
Hi, @cromefire Thanks for your help. |
Hello, @cromefire
With "accelerators" returning empty list. While inside the docker container, rocminfo recognizes the GPU. What could be wrong? |
Did you run it with |
@cromefire Yes, it was run with
The only difference was the image name, which I set to be somewhat different when building the image. |
Can you share the output from ROCm info? There's a regex there that has to match (to show up it'll also work if it doesn't show up but it might give a clue) |
Here https://gist.github.com/alsh/97c9ad94274abdf2b41a91857f84781e |
Well your GPU isn't officially supported by ROCm there's the problem, you can overwrite it to look like a gfx1030 via the override variable (makes it look like a 6900XT; the info on that is in the FAQ). I'll also have a look whether I maybe just can add it as a target... |
I think I found an issue. This line |
Should be easy enough to fix, but that won't fix your problem, as that line just provides the |
Yes, I started to search for problems because overriding HSA_OVERRIDE_GFX_VERSION did not fix my problem too. So, another issue:
Missing Now, I've installed
|
Fixed Also it's now possible to use: docker build --build-arg AMDGPU_TARGETS="$(/opt/rocm/bin/offload-arch | tr '\n' ';')" -t tabby-docker-rocm -f rocm.Dockerfile . to build an optimized image that only works on your GPU.
Interesting, I don't have that issue I think, I'll investigate. Edit: Yeah something is wrong here and it just falls back on the CPU, but I couldn't get it working by installing |
Okay it should work now (at least it finds the GPU now)... and should have a safeguard against any future issues of this kind as well, although it doesn't want to return a result... maybe I should try ROCm 5 for now, ROCm 6 is still very very fresh it seems and the llama.cpp copy of tabby is pretty old now... |
Yes, now it looks good, and I'm able to start this docker image with GPU support. PS:
So, the HSA_OVERRIDE_GFX_VERSION seems to be the only option to go. |
I know (for now), but it makes the build a lot faster as it only builds your GPU. Otherwise it will compile the code for all GPU targets, which takes forever. In your case you'd just put |
I'd be a great addition if we could also run the project GPU-accelerated in AMD GPUs too. Thanks!
Please reply with a 👍 if you want this feature.
The text was updated successfully, but these errors were encountered: