-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Support for Devices other than CUDA in Telemetry and Web UI #902
Conversation
Okay, the GPU name is also in the UI now, it's more like the ID currently, but I've opened an issue for the library to get the marketing (user recognizable) name: PTFOPlayer/rocm_smi_lib_rs#2 |
Documentation is also there now, it should be pretty okay-ish for a start. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
First round of comments focusing on source code
/// Universally unique ID of the accelerator, if available | ||
pub uuid: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest issue now is whether we can prevent this field from being part of the telemetry. If not, it definitely has to be removed, as it's basically almost equivalent to a serial number. It's nice for tracking a GPU in the accelerators, but it's a huge issue if that data leaves your control.
As for the why I use |
Hey, considering the increasing size of this pull request, I have extracted #913 to focus on the basic ROCm support as a preliminary step |
The other PR didn't add any doc but I tried the command in documentation of this PR but couldn't get it to work, is there a trick to it I'm missing?
|
The other PR didn't include the docker image, you'll have to build it yourself |
I am sadly unable to run this docker:
my docker-compose.yml:
|
You'll probably need to remove your iGPU from the container, ROCm sometimes has some problems with that (alternatively I think in the llama.cpp docs it also called out an environment variable to make it only use a specific GPU). |
# Conflicts: # crates/llama-cpp-bindings/build.rs # ee/tabby-ui/app/(dashboard)/cluster/components/worker-card.tsx # ee/tabby-ui/app/(dashboard)/page.tsx # ee/tabby-ui/lib/gql/generates/gql.ts # ee/tabby-ui/lib/gql/generates/graphql.ts # ee/tabby-ui/lib/gql/request-documents.tsx # ee/tabby-ui/lib/hooks/use-workers.ts # ee/tabby-webserver/src/api.rs # ee/tabby-webserver/src/lib.rs # ee/tabby-webserver/src/service/worker.rs # ee/tabby-webserver/ui/404.html # ee/tabby-webserver/ui/_next/static/KwTUFhDElVXkDE3jiCtD0/_buildManifest.js # ee/tabby-webserver/ui/_next/static/KwTUFhDElVXkDE3jiCtD0/_ssgManifest.js # ee/tabby-webserver/ui/_next/static/b_TXnwhyGI26zeMbbpjKm/_buildManifest.js # ee/tabby-webserver/ui/_next/static/b_TXnwhyGI26zeMbbpjKm/_ssgManifest.js # ee/tabby-webserver/ui/_next/static/chunks/app/playground/page-178c9cbcb60d3ef7.js # ee/tabby-webserver/ui/_next/static/yoSiYgCA8KR8UtHATY8_5/_buildManifest.js # ee/tabby-webserver/ui/_next/static/yoSiYgCA8KR8UtHATY8_5/_ssgManifest.js # ee/tabby-webserver/ui/index.html # ee/tabby-webserver/ui/index.txt # ee/tabby-webserver/ui/playground.html # ee/tabby-webserver/ui/playground.txt # ee/tabby-webserver/ui/swagger.html # ee/tabby-webserver/ui/swagger.txt # website/docs/extensions/troubleshooting.md # website/docs/installation/docker-compose.mdx # website/docs/installation/docker.mdx
So updating this from main starts to get complicated now, because there's conflicts in a lot of places, so it would be great to get this in like some time soon... I'm trying to fix it up now, but I'm not sure how long I can do this, it's quite time consuming. |
Thank you! I can verify that pinning devices works perfectly now (1bc6f48):
|
# Conflicts: # ee/tabby-ui/app/(dashboard)/cluster/components/cluster.tsx
Also the llama.cpp fork should probably be updated sooner than later, as there's some fixes for AMD GPU performance in there. |
Hey @cromefire - I would suggest first have a PR of building docker image / binary checkin first, then we can review how to integrate to rocm device information into dashboard. |
Sure. I'll extract that. The changes to support more accelerators are the hardest to maintain though. PS: Also keep in mind that it won't only be ROCm device info soon, but probably oneAPI and/or Vulkan as well, which is why I made this be a more generic |
* Added rocm builds and documentation * Pulled build improvements from #902 * Fixed build container for rocm build * Install git in rocm container * Fixed github step * Try to fix if statement * Added more generic dependency installation * upgraded rustup action * Update sccache * Try pytorch manylinux image * Switched location for toolchain parameter * Downgraded to deprecated action again * Readded set default step * Install minimal rocm on the fly * fixed typo in binary name * Downgraded checkout action * Use curl to download * Add -y flag to yum * Also install rocblas * Update release.yml * Update release.yml * Update prepare_build_environment.sh * Update prepare_build_environment.sh * Update build.rs * Update build.rs * Update README.md * Update website/docs/faq.mdx * Update index.md * Update and rename docker-cuda.yml to docker.yml * Delete .github/workflows/docker-rocm.yml * Delete rocm.Dockerfile * Rename cuda.Dockerfile to Dockerfile * Update docker.yml * Update website/docs/installation/docker.mdx * Update website/docs/installation/docker-compose.mdx * Update docker-compose.mdx * Update docker-compose.mdx * Update docker.mdx * Update docker.mdx * Update website/docs/faq.mdx --------- Co-authored-by: Meng Zhang <[email protected]>
# Conflicts: # .github/workflows/release.yml # Cargo.lock # crates/llama-cpp-bindings/src/llama.rs # crates/tabby/src/worker.rs # ee/tabby-webserver/src/api.rs # ee/tabby-webserver/src/lib.rs # ee/tabby-webserver/ui/404.html # ee/tabby-webserver/ui/_next/static/Yg0h7wsN0w8kbqBtlggWe/_buildManifest.js # ee/tabby-webserver/ui/_next/static/Yg0h7wsN0w8kbqBtlggWe/_ssgManifest.js # ee/tabby-webserver/ui/_next/static/b_TXnwhyGI26zeMbbpjKm/_buildManifest.js # ee/tabby-webserver/ui/_next/static/b_TXnwhyGI26zeMbbpjKm/_ssgManifest.js # ee/tabby-webserver/ui/_next/static/chunks/787-9e31941ac5498659.js # ee/tabby-webserver/ui/_next/static/chunks/app/(dashboard)/page-178d78b0cb76c3d8.js # ee/tabby-webserver/ui/_next/static/chunks/app/(dashboard)/team/page-ec2e0dfcc033c8d6.js # ee/tabby-webserver/ui/_next/static/chunks/app/auth/signin/page-ba1372efadfb09ae.js # ee/tabby-webserver/ui/_next/static/chunks/app/auth/signup/page-259c5cda08cde501.js # ee/tabby-webserver/ui/_next/static/chunks/app/layout-da8c0f298b52b0a9.js # ee/tabby-webserver/ui/_next/static/chunks/app/playground/page-33c06e2d963b9bd9.js # ee/tabby-webserver/ui/_next/static/f19OhW5W2SP9IXexWzvSm/_buildManifest.js # ee/tabby-webserver/ui/_next/static/f19OhW5W2SP9IXexWzvSm/_ssgManifest.js # ee/tabby-webserver/ui/api.html # ee/tabby-webserver/ui/api.txt # ee/tabby-webserver/ui/auth/signin.html # ee/tabby-webserver/ui/auth/signin.txt # ee/tabby-webserver/ui/auth/signup.html # ee/tabby-webserver/ui/auth/signup.txt # ee/tabby-webserver/ui/cluster.html # ee/tabby-webserver/ui/cluster.txt # ee/tabby-webserver/ui/index.html # ee/tabby-webserver/ui/index.txt # ee/tabby-webserver/ui/playground.html # ee/tabby-webserver/ui/playground.txt # ee/tabby-webserver/ui/team.html # ee/tabby-webserver/ui/team.txt # website/docs/faq.mdx
Just removed the conflicts and maybe another friendly reminder that these changes suck up a lot of time the longer they lie around and I constantly have to resolve the conflicts... |
# Conflicts: # .github/workflows/release.yml # ee/tabby-webserver/graphql/schema.graphql # ee/tabby-webserver/src/lib.rs # ee/tabby-webserver/ui/404.html # ee/tabby-webserver/ui/_next/static/DR2uz83UxeFune2UqDmYc/_buildManifest.js # ee/tabby-webserver/ui/_next/static/DR2uz83UxeFune2UqDmYc/_ssgManifest.js # ee/tabby-webserver/ui/_next/static/IyjC1gd1mIwIqFgfamAQE/_buildManifest.js # ee/tabby-webserver/ui/_next/static/IyjC1gd1mIwIqFgfamAQE/_ssgManifest.js # ee/tabby-webserver/ui/_next/static/chunks/320-558d0cf728abd693.js # ee/tabby-webserver/ui/_next/static/chunks/358-42f7d20a1d08acdf.js # ee/tabby-webserver/ui/_next/static/chunks/787-7044efff94e13a32.js # ee/tabby-webserver/ui/_next/static/chunks/936-c5db2bc813a2dffc.js # ee/tabby-webserver/ui/_next/static/chunks/app/(dashboard)/page-dcf4098462fb2820.js # ee/tabby-webserver/ui/_next/static/chunks/app/(dashboard)/team/page-9f81e0759cc05ccf.js # ee/tabby-webserver/ui/_next/static/chunks/app/auth/signup/page-f91f45006a9deb29.js # ee/tabby-webserver/ui/_next/static/chunks/app/layout-6661c62a6a0742fa.js # ee/tabby-webserver/ui/_next/static/chunks/webpack-1456af27b04dd17f.js # ee/tabby-webserver/ui/_next/static/chunks/webpack-3e396986a653b40a.js # ee/tabby-webserver/ui/_next/static/chunks/webpack-543c81bccbb8217e.js # ee/tabby-webserver/ui/_next/static/css/40752a0bacee63c0.css # ee/tabby-webserver/ui/_next/static/css/724bf7d939e05f2e.css # ee/tabby-webserver/ui/_next/static/css/98dd5f72c9a55366.css # ee/tabby-webserver/ui/_next/static/f19OhW5W2SP9IXexWzvSm/_buildManifest.js # ee/tabby-webserver/ui/_next/static/f19OhW5W2SP9IXexWzvSm/_ssgManifest.js # ee/tabby-webserver/ui/api.html # ee/tabby-webserver/ui/api.txt # ee/tabby-webserver/ui/auth/signin.html # ee/tabby-webserver/ui/auth/signin.txt # ee/tabby-webserver/ui/auth/signup.html # ee/tabby-webserver/ui/auth/signup.txt # ee/tabby-webserver/ui/cluster.html # ee/tabby-webserver/ui/cluster.txt # ee/tabby-webserver/ui/index.html # ee/tabby-webserver/ui/index.txt # ee/tabby-webserver/ui/playground.html # ee/tabby-webserver/ui/playground.txt # ee/tabby-webserver/ui/team.html # ee/tabby-webserver/ui/team.txt # website/docs/faq.mdx
super nice, when will the PR be merged? |
# Conflicts: # crates/tabby/src/main.rs
Realized via AMD hipBLAS (ROCm/HIP)
Fixes #636