-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support vllm quantization #7297
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@@ -27,14 +27,31 @@ def __init__(self, status_code, message): | |||
|
|||
|
|||
# check if vllm is installed | |||
def validate_environment(model: str): | |||
def validate_environment(model: str, optional_params: Union[dict, None]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be implemented in the transformation.py -
class VLLMConfig(HostedVLLMChatConfig): |
@@ -142,7 +159,7 @@ def batch_completions( | |||
) | |||
""" | |||
try: | |||
llm, SamplingParams = validate_environment(model=model) | |||
llm, SamplingParams, optional_params = validate_environment(model=model, optional_params=optional_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a mock test w/ screenshot of this working?
similar to this -
async def test_azure_ai_with_image_url(): |
ideally this test would not add the vllm sdk as a dep on the ci/cd pipeline (so maybe use a magicmock object here)
Hi @krrishdholakia, thank you for your comments. I have moved the default parameters to |
Title
I have implemented a feature for loading quantized models with vellum. The current version does not support quantized models with vllm.
Relevant issues
There is no relevant issue.
Type
🆕 New Feature
Changes
I changed the
validate_environment
invllm/completion/handler.py
to support the loading of quantized versions of models. It was done by providing several vllm's default parameters and updating them with parameters fromoptional_params
.[REQUIRED] Testing - Attach a screenshot of any new tests passing locall
If UI changes, send a screenshot/GIF of working UI fixes