Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vllm openai compatible endpoint streaming support #2032

Open
victorserbu2709 opened this issue Nov 13, 2024 · 2 comments
Open

vllm openai compatible endpoint streaming support #2032

victorserbu2709 opened this issue Nov 13, 2024 · 2 comments
Labels

Comments

@victorserbu2709
Copy link

victorserbu2709 commented Nov 13, 2024

Is your feature request related to a problem? Please describe.
I would like to use vllm server with streaming support. they say that tools output is openai compatible, see: vercel/ai#2231
https://docs.vllm.ai/en/v0.6.3/serving/openai_compatible_server.html#tool-calling-in-the-chat-completion-api

Describe the solution you'd like
It should work

Describe alternatives you've considered
letta configured with vllm endpoint type works but it doesn't support streaming.
I tried configured letta with openai endpoint type

               "OPENAI_API_BASE":"https://vllm/v1/",
                "OPENAI_API_KEY": "sk-test",
                "LOG_LEVEL": "DEBUG"

commented strict:true in

def convert_to_structured_output(openai_function: dict) -> dict:
    """Convert function call objects to structured output objects

    See: https://platform.openai.com/docs/guides/structured-outputs/supported-schemas
    """
    description = openai_function["description"] if "description" in openai_function else ""

    structured_output = {
        "name": openai_function["name"],
        "description": description,
        "strict": True,
        "parameters": {"type": "object", "properties": {}, "additionalProperties": False, "required": []},
    }

but then i receive

messages = [Message(id='message-56f62ff2-55cb-4d77-8efe-abe5644e0286', role=<MessageRole.user: 'user'>, text='{\n  "type": "user_message",\n  "message": "test",\n  "time": "2024-11-13 01:36:41 PM UTC+0000"\n}', user_id='user-00000000-0000-4000-8000-000000000000', agent_id='agent-4eed09c6-7911-45be-840a-73d4b7cd696b', model=None, name='human', created_at=datetime.datetime(2024, 11, 13, 13, 36, 41, 406980, tzinfo=datetime.timezone.utc), tool_calls=None, tool_call_id=None)]
error = 1 validation error for ChatCompletionChunkResponse
choices.0.delta.tool_calls.0.function.arguments
  Field required [type=missing, input_value={'name': 'conversation_search'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
step() failed with an unrecognized exception: '1 validation error for ChatCompletionChunkResponse
choices.0.delta.tool_calls.0.function.arguments
  Field required [type=missing, input_value={'name': 'conversation_search'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing'
Letta.letta.server.server - ERROR - Error in server._step: 1 validation error for ChatCompletionChunkResponse
choices.0.delta.tool_calls.0.function.arguments
  Field required [type=missing, input_value={'name': 'conversation_search'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
Traceback (most recent call last):
  File "/root/stash/git/letta/letta/server/server.py", line 448, in _step
    usage_stats = letta_agent.step(
                  ^^^^^^^^^^^^^^^^^
  File "/root/stash/git/letta/letta/agent.py", line 825, in step
    step_response = self.inner_step(
                    ^^^^^^^^^^^^^^^^
  File "/root/stash/git/letta/letta/agent.py", line 1034, in inner_step
    raise e
  File "/root/stash/git/letta/letta/agent.py", line 950, in inner_step
    response = self._get_ai_reply(
               ^^^^^^^^^^^^^^^^^^^
  File "/root/stash/git/letta/letta/agent.py", line 568, in _get_ai_reply
    raise e
  File "/root/stash/git/letta/letta/agent.py", line 531, in _get_ai_reply
    response = create(
               ^^^^^^^
  File "/root/stash/git/letta/letta/llm_api/llm_api_tools.py", line 97, in wrapper
    raise e
  File "/root/stash/git/letta/letta/llm_api/llm_api_tools.py", line 66, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/stash/git/letta/letta/llm_api/llm_api_tools.py", line 148, in create
    response = openai_chat_completions_process_stream(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/stash/git/letta/letta/llm_api/openai.py", line 354, in openai_chat_completions_process_stream
    raise e
  File "/root/stash/git/letta/letta/llm_api/openai.py", line 247, in openai_chat_completions_process_stream
    for chunk_idx, chat_completion_chunk in enumerate(
  File "/root/stash/git/letta/letta/llm_api/openai.py", line 455, in _sse_post
    raise e
  File "/root/stash/git/letta/letta/llm_api/openai.py", line 420, in _sse_post
    chunk_object = ChatCompletionChunkResponse(**chunk_data)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/stash/git/letta/.venv/lib64/python3.12/site-packages/pydantic/main.py", line 212, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for ChatCompletionChunkResponse
choices.0.delta.tool_calls.0.function.arguments
  Field required [type=missing, input_value={'name': 'conversation_search'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
None
/root/stash/git/letta/letta/server/rest_api/utils.py:64: UserWarning: Error getting usage data: 1 validation error for ChatCompletionChunkResponse
choices.0.delta.tool_calls.0.function.arguments
  Field required [type=missing, input_value={'name': 'conversation_search'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
  warnings.warn(f"Error getting usage data: {e}")

This is the response from vllm

HTTP/1.1 200 OK
date: Wed, 13 Nov 2024 11:11:20 GMT
server: uvicorn
content-type: text/event-stream; charset=utf-8
connection: close
transfer-encoding: chunked

data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"id":"chatcmpl-tool-9c8e90c7e0bd46b1bc0e0581480a634d","typ
e":"function","index":0,"function":{"name":"send_message"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"message\": \""}}]},"
logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"Hello"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":","}}]},"logprobs":null,
"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":" Chad"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"!"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":" It\""}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":", \"inner_thoughts\": \""}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"Normal"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":" greeting"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"."}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":" Eng"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"aging"}}]},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":".\"}"}}]},"logprobs":nu
ll,"finish_reason":null}]}
data: {"id":"chatcmpl-30252a924b2b4470984d162f9d2a836f","object":"chat.completion.chunk","created":1731496280,"model":"Llama3.2 90B","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":""}}]},"logprobs":null,"
finish_reason":"tool_calls","stop_reason":128008}]}
data: [DONE]

if I let letta configured with vllm configured as openai endpoint type and disable streaming it works

@sarahwooders
Copy link
Collaborator

Hi - currently our streaming support is only tested with OpenAI. We will look into this issue for vLLM support!

Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants