-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vllm inference plugin #2967
base: master
Are you sure you want to change the base?
vllm inference plugin #2967
Conversation
Signed-off-by: Daniel Sola <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2967 +/- ##
==========================================
+ Coverage 75.71% 76.45% +0.73%
==========================================
Files 214 200 -14
Lines 21598 20922 -676
Branches 2693 2694 +1
==========================================
- Hits 16352 15995 -357
+ Misses 4489 4202 -287
+ Partials 757 725 -32 ☔ View full report in Codecov by Sentry. |
This is huge! |
plugins/flytekit-inference/flytekitplugins/inference/vllm/serve.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Daniel Sola <[email protected]>
mem: str = "10Gi", | ||
): | ||
""" | ||
Initialize NIM class for managing a Kubernetes pod template. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize NIM class for managing a Kubernetes pod template. | |
Initialize VLLM class for managing a Kubernetes pod template. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lovely!
This would work really well with actors on union |
Why are the changes needed?
A vllm addition to the existing
flytekitplugins-inference
plugin which already has NIM and ollama.What changes were proposed in this pull request?
A vllm plugin that lets you easily create a pod template to serve a vllm in an init container for a flyte task. User passes a hugging face secret name and the model in hugging face they want to serve.
How was this patch tested?
Unit tests and running a remote workflow from the README.
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link