Support vllm quantization #7297

ivanvykopal · 2024-12-18T19:59:59Z

Title

I have implemented a feature for loading quantized models with vellum. The current version does not support quantized models with vllm.

Relevant issues

There is no relevant issue.

Type

🆕 New Feature

Changes

I changed the validate_environment in vllm/completion/handler.py to support the loading of quantized versions of models. It was done by providing several vllm's default parameters and updating them with parameters from optional_params.

[REQUIRED] Testing - Attach a screenshot of any new tests passing locall

If UI changes, send a screenshot/GIF of working UI fixes

vercel · 2024-12-18T20:00:03Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Dec 20, 2024 5:17pm

krrishdholakia · 2024-12-19T01:46:54Z

litellm/llms/vllm/completion/handler.py

@@ -27,14 +27,31 @@ def __init__(self, status_code, message):


 # check if vllm is installed
-def validate_environment(model: str):
+def validate_environment(model: str, optional_params: Union[dict, None]):


this should be implemented in the transformation.py -

litellm/litellm/llms/vllm/completion/transformation.py

Line 14 in 246e3ba

class VLLMConfig(HostedVLLMChatConfig):

krrishdholakia · 2024-12-19T01:50:07Z

litellm/llms/vllm/completion/handler.py

@@ -142,7 +159,7 @@ def batch_completions(
    )
    """
    try:
-        llm, SamplingParams = validate_environment(model=model)
+        llm, SamplingParams, optional_params = validate_environment(model=model, optional_params=optional_params)


can you add a mock test w/ screenshot of this working?

similar to this -

litellm/tests/llm_translation/test_azure_ai.py

Line 48 in 246e3ba

async def test_azure_ai_with_image_url():

ideally this test would not add the vllm sdk as a dep on the ci/cd pipeline (so maybe use a magicmock object here)

ivanvykopal · 2024-12-19T19:24:16Z

Hi @krrishdholakia, thank you for your comments.

I have moved the default parameters to transformation.py and also added tests for vllm, similar to Azure, as you provided an example.

Here is the screenshot of the tests.

feat: support vllm quantization

a78db3d

vercel bot deployed to Preview December 18, 2024 20:00 View deployment

ivanvykopal changed the title ~~feat: support vllm quantization~~ Support vllm quantization Dec 18, 2024

krrishdholakia reviewed Dec 19, 2024

View reviewed changes

ivanvykopal added 2 commits December 19, 2024 19:47

feat: move default params to VLLMConfig

e4fe7e8

feat: add tests for vllm

8049480

vercel bot deployed to Preview December 19, 2024 19:23 View deployment

ivanvykopal marked this pull request as draft December 19, 2024 19:46

ivanvykopal marked this pull request as ready for review December 19, 2024 20:03

Merge branch 'main' into vllm-quantization

f1638a3

vercel bot deployed to Preview December 20, 2024 17:17 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support vllm quantization #7297

Support vllm quantization #7297

ivanvykopal commented Dec 18, 2024

vercel bot commented Dec 18, 2024 •

edited

Loading

krrishdholakia Dec 19, 2024

krrishdholakia Dec 19, 2024

ivanvykopal commented Dec 19, 2024

Support vllm quantization #7297

Are you sure you want to change the base?

Support vllm quantization #7297

Conversation

ivanvykopal commented Dec 18, 2024

Title

Relevant issues

Type

Changes

[REQUIRED] Testing - Attach a screenshot of any new tests passing locall

vercel bot commented Dec 18, 2024 • edited Loading

krrishdholakia Dec 19, 2024

Choose a reason for hiding this comment

krrishdholakia Dec 19, 2024

Choose a reason for hiding this comment

ivanvykopal commented Dec 19, 2024

vercel bot commented Dec 18, 2024 •

edited

Loading