-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support Intel #895
Conversation
Okay, gut the 501 under control by adding options for OneAPI and ROCm to it, now I just gotta test it (it runs at least, but not sure whether it's using the GPU) and I somehow need to get the ROCm build under control. |
So in theory the intel container should work now... but for some reason it just doesn't want to use the GPU... |
is there sth like NVIDIA container toolkit need to be installed for oneAPI? |
Nope, just pass through |
So the ROCm image is now definitely working. Edit: Big model works as well, though not as snappy (although the difference isn't too bad? That all definitely needs more investigation). If it works, it works, but especially when switching models the GPU seems like the GPU kinda hangs and I need to reboot. It's also always reported at 100% usage. |
Great! You might consider extracting ROCm as individual PR for review to get it checked in. |
Yeah I'll see tomorrow whether I can get a handle on oneAPI or whether I'll postpone that and extract ROCm, but it's 4:30 AM for me, so I really need to do that tomorrow (technically later today). Also I really hate C and it's library linking nonsense... that cost me so much time with this.... |
Also note to my future self: I need to figure out whats happening with the cuda_devices list and the Frontend and match that for ROCm and oneAPI if possible. |
@wsxiaoys also would be great I the intellj extension would be available for Rust Rover, then I could write Tabby code using Tabby. Probably just a setting or so. |
AMD stuff is "moved" to #902, because it already works pretty okay. Regarding the Intel stuff I'm slowly getting insane, as I had it already "working", but it just doesn't want to actually offload anything to the GPU. and most of the time just doesn't reference SYCL at all. @wsxiaoys Could we get llama.cpp as a shared library or so? That sounds way easier |
To confirm, you've been able to make llama.cpp itself work on Intel Arc, but not for tabby, correct? |
Hi, @cromefire |
llama.cpp currently does not use SYCL and the OpenCL implementation uses the CPU for most of the processing. I had taking a run at this a while back and got it working but insanely slow. I have sense found out this is a known issue with llama.cpp. There is currently a PR to get SYCL working correctly. There is also a Vulkan support PR Unfortunately without one of these being merged into llama.cpp Intel dGPUs are going to be very slow. |
Well that explains it... Well I'll update and wait then... Vulkan of course also sounds awesome, if it's pretty close to CUDA/HIP/SYCL, because that seems like it should be the standard backend for something like TabbyML then, because it'd run everywhere. |
It would brle great! I am hoping one or the other get merged soon. BTW here are some things I put together to test llama.cpp on Arc. The logs so the current speeds I am getting. |
I do think both would be good, Vulkan makes a nice and easy default backend, but SYCL might be faster. Vulkan BTW also an easy solution for AMD on Windows. |
Have you tried SYCL vs. Vulkan vs. OpenCL by any chance? (If they actually already run...) Because it sounds like OpenCL is pretty useless right now. Also how did you test, are there any benchmarks available? Would be cool for users, even of something higher level like tabby to know what works best. |
I have not, but hope to try the vulkan fork this week. It seems like that branch is more complete than the SYCL. |
Be sure to report (also how you did the tests), I'd really like to test Vulkan vs. ROCm on AMD as well (as ROCm doesn't work on Windows (yet)). |
# Conflicts: # crates/llama-cpp-bindings/Cargo.toml # crates/llama-cpp-bindings/build.rs # crates/tabby/Cargo.toml # crates/tabby/src/main.rs
Closing as vulkan support will be released in 0.10 |
Realized via Intel MKL (oneAPI)
Fixes #631 (Intel)
Depends on #902