vllm openai compatible endpoint streaming support #2032

victorserbu2709 · 2024-11-13T13:47:22Z

Is your feature request related to a problem? Please describe.
I would like to use vllm server with streaming support. they say that tools output is openai compatible, see: vercel/ai#2231
https://docs.vllm.ai/en/v0.6.3/serving/openai_compatible_server.html#tool-calling-in-the-chat-completion-api

Describe the solution you'd like
It should work

Describe alternatives you've considered
letta configured with vllm endpoint type works but it doesn't support streaming.
I tried configured letta with openai endpoint type

               "OPENAI_API_BASE":"https://vllm/v1/",
                "OPENAI_API_KEY": "sk-test",
                "LOG_LEVEL": "DEBUG"

commented strict:true in

def convert_to_structured_output(openai_function: dict) -> dict:
    """Convert function call objects to structured output objects

    See: https://platform.openai.com/docs/guides/structured-outputs/supported-schemas
    """
    description = openai_function["description"] if "description" in openai_function else ""

    structured_output = {
        "name": openai_function["name"],
        "description": description,
        "strict": True,
        "parameters": {"type": "object", "properties": {}, "additionalProperties": False, "required": []},
    }

but then i receive

messages = [Message(id='message-56f62ff2-55cb-4d77-8efe-abe5644e0286', role=<MessageRole.user: 'user'>, text='{\n  "type": "user_message",\n  "message": "test",\n  "time": "2024-11-13 01:36:41 PM UTC+0000"\n}', user_id='user-00000000-0000-4000-8000-000000000000', agent_id='agent-4eed09c6-7911-45be-840a-73d4b7cd696b', model=None, name='human', created_at=datetime.datetime(2024, 11, 13, 13, 36, 41, 406980, tzinfo=datetime.timezone.utc), tool_calls=None, tool_call_id=None)]
error = 1 validation error for ChatCompletionChunkResponse
choices.0.delta.tool_calls.0.function.arguments
  Field required [type=missing, input_value={'name': 'conversation_search'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
step() failed with an unrecognized exception: '1 validation error for ChatCompletionChunkResponse
choices.0.delta.tool_calls.0.function.arguments
  Field required [type=missing, input_value={'name': 'conversation_search'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing'
Letta.letta.server.server - ERROR - Error in server._step: 1 validation error for ChatCompletionChunkResponse
choices.0.delta.tool_calls.0.function.arguments
  Field required [type=missing, input_value={'name': 'conversation_search'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
Traceback (most recent call last):
  File "/root/stash/git/letta/letta/server/server.py", line 448, in _step
    usage_stats = letta_agent.step(
                  ^^^^^^^^^^^^^^^^^
  File "/root/stash/git/letta/letta/agent.py", line 825, in step
    step_response = self.inner_step(
                    ^^^^^^^^^^^^^^^^
  File "/root/stash/git/letta/letta/agent.py", line 1034, in inner_step
    raise e
  File "/root/stash/git/letta/letta/agent.py", line 950, in inner_step
    response = self._get_ai_reply(
               ^^^^^^^^^^^^^^^^^^^
  File "/root/stash/git/letta/letta/agent.py", line 568, in _get_ai_reply
    raise e
  File "/root/stash/git/letta/letta/agent.py", line 531, in _get_ai_reply
    response = create(
               ^^^^^^^
  File "/root/stash/git/letta/letta/llm_api/llm_api_tools.py", line 97, in wrapper
    raise e
  File "/root/stash/git/letta/letta/llm_api/llm_api_tools.py", line 66, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/stash/git/letta/letta/llm_api/llm_api_tools.py", line 148, in create
    response = openai_chat_completions_process_stream(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/stash/git/letta/letta/llm_api/openai.py", line 354, in openai_chat_completions_process_stream
    raise e
  File "/root/stash/git/letta/letta/llm_api/openai.py", line 247, in openai_chat_completions_process_stream
    for chunk_idx, chat_completion_chunk in enumerate(
  File "/root/stash/git/letta/letta/llm_api/openai.py", line 455, in _sse_post
    raise e
  File "/root/stash/git/letta/letta/llm_api/openai.py", line 420, in _sse_post
    chunk_object = ChatCompletionChunkResponse(**chunk_data)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/stash/git/letta/.venv/lib64/python3.12/site-packages/pydantic/main.py", line 212, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for ChatCompletionChunkResponse
choices.0.delta.tool_calls.0.function.arguments
  Field required [type=missing, input_value={'name': 'conversation_search'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
None
/root/stash/git/letta/letta/server/rest_api/utils.py:64: UserWarning: Error getting usage data: 1 validation error for ChatCompletionChunkResponse
choices.0.delta.tool_calls.0.function.arguments
  Field required [type=missing, input_value={'name': 'conversation_search'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
  warnings.warn(f"Error getting usage data: {e}")

This is the response from vllm

HTTP/1.1 200 OK
date: Wed, 13 Nov 2024 11:11:20 GMT
server: uvicorn
content-type: text/event-stream; charset=utf-8
connection: close
transfer-encoding: chunked

data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"id":"chatcmpl-tool-9c8e90c7e0bd46b1bc0e0581480a634d","typ
e":"function","index":0,"function":{"name":"send_message"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"message\": \""}}]},"
logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"Hello"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":","}}]},"logprobs":null,
"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":" Chad"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"!"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":" It\""}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":", \"inner_thoughts\": \""}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"Normal"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":" greeting"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"."}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":" Eng"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"aging"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":".\"}"}}]},"logprobs":nu
ll,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":""}}]},"logprobs":null,"
finish_reason":"tool_calls","stop_reason":128008}]}
data: [DONE]

if I let letta configured with vllm configured as openai endpoint type and disable streaming it works

The text was updated successfully, but these errors were encountered:

sarahwooders · 2024-11-14T02:34:55Z

Hi - currently our streaming support is only tested with OpenAI. We will look into this issue for vLLM support!

github-actions · 2024-12-15T02:13:46Z

This issue is stale because it has been open for 30 days with no activity.

github-project-automation bot added this to 🐛 MemGPT issue tracker Nov 13, 2024

github-project-automation bot moved this to To triage in 🐛 MemGPT issue tracker Nov 13, 2024

github-actions bot added the stale label Dec 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm openai compatible endpoint streaming support #2032

vllm openai compatible endpoint streaming support #2032

victorserbu2709 commented Nov 13, 2024 •

edited

Loading

sarahwooders commented Nov 14, 2024

github-actions bot commented Dec 15, 2024

vllm openai compatible endpoint streaming support #2032

vllm openai compatible endpoint streaming support #2032

Comments

victorserbu2709 commented Nov 13, 2024 • edited Loading

sarahwooders commented Nov 14, 2024

github-actions bot commented Dec 15, 2024

victorserbu2709 commented Nov 13, 2024 •

edited

Loading