Letta does not see the ollama server (API call got non-200 response code) #2282

hherpa · 2024-12-19T01:44:57Z

Bug description

Letta does not see the ollama server. It seems to me that it's not about ollama, since everything works with llama_index and langchain

Install

pip install letta

Agent setting

from letta import create_client, LLMConfig, EmbeddingConfig

client = create_client()

agent_state = client.create_agent(
    llm_config=LLMConfig(
        model="qwen2.5:0.5b",
        model_endpoint_type="ollama",
        model_endpoint="http://localhost:11434",
        context_window=128000
    ), 
    embedding_config=EmbeddingConfig(
        embedding_endpoint_type="ollama",
        embedding_endpoint=None,
        embedding_model="all-minilm",
        embedding_dim=1536,
        embedding_chunk_size=300
    )
)

Launch ollama

Launch agent

response = client.send_message(
  agent_id=agent_state.id, 
  role="user", 
  message="hello"
)
print("Usage", response.usage)
print("Agent messages", response.messages)

Response

Letta.letta.server.server - ERROR - Error in server._step: API call got non-200 response code (code=500, msg={"error":"llama runner process has terminated: exit status 2"}) for address: http://localhost:11434/api/generate. Make sure that the ollama API server is running and reachable at http://localhost:11434/api/generate.

Traceback (most recent call last):
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\server\server.py", line 450, in _step
    usage_stats = letta_agent.step(
                  ^^^^^^^^^^^^^^^^^
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\agent.py", line 910, in step
    step_response = self.inner_step(
                    ^^^^^^^^^^^^^^^^
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\agent.py", line 1111, in inner_step
    raise e
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\agent.py", line 1026, in inner_step
    response = self._get_ai_reply(
               ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\agent.py", line 650, in _get_ai_reply
    raise e
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\agent.py", line 613, in _get_ai_reply
    response = create(
               ^^^^^^^
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\llm_api\llm_api_tools.py", line 100, in wrapper
    raise e
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\llm_api\llm_api_tools.py", line 69, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\llm_api\llm_api_tools.py", line 389, in create
    return get_chat_completion(
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\local_llm\chat_completion_proxy.py", line 167, in get_chat_completion
    result, usage = get_ollama_completion(endpoint, auth_type, auth_key, model, prompt, context_window)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\akidra\AppData\Roaming\Python\Python311\site-packages\letta\local_llm\ollama\api.py", line 68, in get_ollama_completion
    raise Exception(
Exception: API call got non-200 response code (code=500, msg={"error":"llama runner process has terminated: exit status 2"}) for address: http://localhost:11434/api/generate. Make sure that the ollama API server is running and reachable at http://localhost:11434/api/generate.
None
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[50], line 23
      6 agent_state = client.create_agent(
      7     llm_config=LLMConfig(
      8         model="qwen2.5:0.5b",
   (...)
     19     )
     20 )
     22 # Message an agent
---> 23 response = client.send_message(
     24   agent_id=agent_state.id, 
     25   role="user", 
     26   message="hello"
     27 )
     28 print("Usage", response.usage)
     29 print("Agent messages", response.messages)

File ~\AppData\Roaming\Python\Python311\site-packages\letta\client\client.py:2488, in LocalClient.send_message(self, message, role, name, agent_id, agent_name, stream_steps, stream_tokens)
   2485     raise NotImplementedError
   2486 self.interface.clear()
-> 2488 usage = self.server.send_messages(
   2489     actor=self.user,
   2490     agent_id=agent_id,
   2491     messages=[MessageCreate(role=MessageRole(role), text=message, name=name)],
   2492 )
   2494 ## TODO: need to make sure date/timestamp is propely passed
   2495 ## TODO: update self.interface.to_list() to return actual Message objects
   2496 ##       here, the message objects will have faulty created_by timestamps
   (...)
   2504 
   2505 # format messages
   2506 messages = self.interface.to_list()

File ~\AppData\Roaming\Python\Python311\site-packages\letta\server\server.py:761, in SyncServer.send_messages(self, actor, agent_id, messages, wrap_user_message, wrap_system_message, interface)
    758     raise ValueError(f"All messages must be of type Message or MessageCreate, got {[type(message) for message in messages]}")
    760 # Run the agent state forward
--> 761 return self._step(actor=actor, agent_id=agent_id, input_messages=message_objects, interface=interface)

File ~\AppData\Roaming\Python\Python311\site-packages\letta\server\server.py:450, in SyncServer._step(self, actor, agent_id, input_messages, interface)
    447 token_streaming = letta_agent.interface.streaming_mode if hasattr(letta_agent.interface, "streaming_mode") else False
    449 logger.debug(f"Starting agent step")
--> 450 usage_stats = letta_agent.step(
    451     messages=input_messages,
    452     chaining=self.chaining,
    453     max_chaining_steps=self.max_chaining_steps,
    454     stream=token_streaming,
    455     skip_verify=True,
    456 )
    458 # save agent after step
    459 save_agent(letta_agent)

File ~\AppData\Roaming\Python\Python311\site-packages\letta\agent.py:910, in Agent.step(self, messages, chaining, max_chaining_steps, **kwargs)
    908 kwargs["first_message"] = False
    909 kwargs["step_count"] = step_count
--> 910 step_response = self.inner_step(
    911     messages=next_input_message,
    912     **kwargs,
    913 )
    914 heartbeat_request = step_response.heartbeat_request
    915 function_failed = step_response.function_failed

File ~\AppData\Roaming\Python\Python311\site-packages\letta\agent.py:1111, in Agent.inner_step(self, messages, first_message, first_message_retry_limit, skip_verify, stream, step_count)
   1109 else:
   1110     printd(f"step() failed with an unrecognized exception: '{str(e)}'")
-> 1111     raise e

File ~\AppData\Roaming\Python\Python311\site-packages\letta\agent.py:1026, in Agent.inner_step(self, messages, first_message, first_message_retry_limit, skip_verify, stream, step_count)
   1023             raise Exception(f"Hit first message retry limit ({first_message_retry_limit})")
   1025 else:
-> 1026     response = self._get_ai_reply(
   1027         message_sequence=input_message_sequence,
   1028         first_message=first_message,
   1029         stream=stream,
   1030         step_count=step_count,
   1031     )
   1033 # Step 3: check if LLM wanted to call a function
   1034 # (if yes) Step 4: call the function
   1035 # (if yes) Step 5: send the info on the function call and function response to LLM
   1036 response_message = response.choices[0].message

File ~\AppData\Roaming\Python\Python311\site-packages\letta\agent.py:650, in Agent._get_ai_reply(self, message_sequence, function_call, first_message, stream, empty_response_retry_limit, backoff_factor, max_delay, step_count)
    646             time.sleep(delay)
    648     except Exception as e:
    649         # For non-retryable errors, exit immediately
--> 650         raise e
    652 raise Exception("Retries exhausted and no valid response received.")

File ~\AppData\Roaming\Python\Python311\site-packages\letta\agent.py:613, in Agent._get_ai_reply(self, message_sequence, function_call, first_message, stream, empty_response_retry_limit, backoff_factor, max_delay, step_count)
    611 for attempt in range(1, empty_response_retry_limit + 1):
    612     try:
--> 613         response = create(
    614             llm_config=self.agent_state.llm_config,
    615             messages=message_sequence,
    616             user_id=self.agent_state.created_by_id,
    617             functions=allowed_functions,
    618             # functions_python=self.functions_python, do we need this?
    619             function_call=function_call,
    620             first_message=first_message,
    621             force_tool_call=force_tool_call,
    622             stream=stream,
    623             stream_interface=self.interface,
    624         )
    626         # These bottom two are retryable
    627         if len(response.choices) == 0 or response.choices[0] is None:

File ~\AppData\Roaming\Python\Python311\site-packages\letta\llm_api\llm_api_tools.py:100, in retry_with_exponential_backoff.<locals>.wrapper(*args, **kwargs)
     98 # Raise exceptions for any errors not specified
     99 except Exception as e:
--> 100     raise e

File ~\AppData\Roaming\Python\Python311\site-packages\letta\llm_api\llm_api_tools.py:69, in retry_with_exponential_backoff.<locals>.wrapper(*args, **kwargs)
     67 while True:
     68     try:
---> 69         return func(*args, **kwargs)
     71     except requests.exceptions.HTTPError as http_err:
     73         if not hasattr(http_err, "response") or not http_err.response:

File ~\AppData\Roaming\Python\Python311\site-packages\letta\llm_api\llm_api_tools.py:389, in create(llm_config, messages, user_id, functions, functions_python, function_call, first_message, force_tool_call, use_tool_naming, stream, stream_interface, max_tokens, model_settings)
    387 if stream:
    388     raise NotImplementedError(f"Streaming not yet implemented for {llm_config.model_endpoint_type}")
--> 389 return get_chat_completion(
    390     model=llm_config.model,
    391     messages=messages,
    392     functions=functions,
    393     functions_python=functions_python,
    394     function_call=function_call,
    395     context_window=llm_config.context_window,
    396     endpoint=llm_config.model_endpoint,
    397     endpoint_type=llm_config.model_endpoint_type,
    398     wrapper=llm_config.model_wrapper,
    399     user=str(user_id),
    400     # hint
    401     first_message=first_message,
    402     # auth-related
    403     auth_type=model_settings.openllm_auth_type,
    404     auth_key=model_settings.openllm_api_key,
    405 )

File ~\AppData\Roaming\Python\Python311\site-packages\letta\local_llm\chat_completion_proxy.py:167, in get_chat_completion(model, messages, functions, functions_python, function_call, context_window, user, wrapper, endpoint, endpoint_type, function_correction, first_message, auth_type, auth_key)
    165     result, usage = get_koboldcpp_completion(endpoint, auth_type, auth_key, prompt, context_window, grammar=grammar)
    166 elif endpoint_type == "ollama":
--> 167     result, usage = get_ollama_completion(endpoint, auth_type, auth_key, model, prompt, context_window)
    168 elif endpoint_type == "vllm":
    169     result, usage = get_vllm_completion(endpoint, auth_type, auth_key, model, prompt, context_window, user)

File ~\AppData\Roaming\Python\Python311\site-packages\letta\local_llm\ollama\api.py:68, in get_ollama_completion(endpoint, auth_type, auth_key, model, prompt, context_window, grammar)
     66         result = result_full["response"]
     67     else:
---> 68         raise Exception(
     69             f"API call got non-200 response code (code={response.status_code}, msg={response.text}) for address: {URI}."
     70             + f" Make sure that the ollama API server is running and reachable at {URI}."
     71         )
     73 except:
     74     # TODO handle gracefully
     75     raise

Exception: API call got non-200 response code (code=500, msg={"error":"llama runner process has terminated: exit status 2"}) for address: http://localhost:11434/api/generate. Make sure that the ollama API server is running and reachable at http://localhost:11434/api/generate.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Letta does not see the ollama server (API call got non-200 response code) #2282

Letta does not see the ollama server (API call got non-200 response code) #2282

hherpa commented Dec 19, 2024

Letta does not see the ollama server (API call got non-200 response code) #2282

Letta does not see the ollama server (API call got non-200 response code) #2282

Comments

hherpa commented Dec 19, 2024

Bug description

Install

Agent setting

Launch ollama

Launch agent

Response