Add robust token counter with 0 default on failure for ollama_chat #7380

TensorTemplar · 2024-12-23T16:00:53Z

Title

This is a fix or the latest ollama implementing function calling properly, which now returns a json object in messages. The current litellm code will try to pass that to token counter which tries to concatenate it with text and fail.

I could not find where this code is tested so please advise on that, if needed.

15:44:54 - LiteLLM Router:INFO: router.py:968 - litellm.acompletion(model=ollama_chat/llama3.1:70b-instruct-q4_K_M) Exception litellm.APIConnectionError: can only concatenate str (not "dict") to str
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/litellm/main.py", line 485, in acompletion
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/litellm/llms/ollama_chat.py", line 595, in ollama_acompletion
    raise e  # don't use verbose_logger.exception, if exception is raised
    ^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/litellm/llms/ollama_chat.py", line 577, in ollama_acompletion
    prompt_tokens = response_json.get("prompt_eval_count", litellm.token_counter(messages=data["messages"]))  # type: ignore
                                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/litellm/utils.py", line 1566, in token_counter
    text += function_arguments
TypeError: can only concatenate str (not "dict") to str

Instead fixing this in token_counter this PR:

wraps it in exception handling with default tokens returned as 0, so the response can at least be returned
dumps the json object so it can be passed safely

Since returning 0 tokens in the extreme failure scenario (both ollama and token_counter failing) is potentially expensive for customers, let me know if you want that case to hard fail instead, as previously

Relevant issues

#6958 (should be fixed in this PR)
#7094 (not addressed here explicitly, unclear how streaming json objects should behave)

Type

🐛 Bug Fix

Changes

[REQUIRED] Testing - Attach a screenshot of any new tests passing locall

If UI changes, send a screenshot/GIF of working UI fixes

Do a tool call to an ollama model that supports function calling, e.g. llama3.1:70b-instruct-q4_K_M
Observe a correct response

vercel · 2024-12-23T16:00:58Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Dec 23, 2024 4:22pm

vercel bot deployed to Preview December 23, 2024 16:01 View deployment

Add robust token counter with 0 default on failure

a52de48

TensorTemplar force-pushed the fix-token-count-for-ollama-with-tool-call branch from 3600c20 to a52de48 Compare December 23, 2024 16:21

vercel bot deployed to Preview December 23, 2024 16:22 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add robust token counter with 0 default on failure for ollama_chat #7380

Add robust token counter with 0 default on failure for ollama_chat #7380

TensorTemplar commented Dec 23, 2024 •

edited

Loading

vercel bot commented Dec 23, 2024 •

edited

Loading

Add robust token counter with 0 default on failure for ollama_chat #7380

Are you sure you want to change the base?

Add robust token counter with 0 default on failure for ollama_chat #7380

Conversation

TensorTemplar commented Dec 23, 2024 • edited Loading

Title

Relevant issues

Type

Changes

[REQUIRED] Testing - Attach a screenshot of any new tests passing locall

vercel bot commented Dec 23, 2024 • edited Loading

TensorTemplar commented Dec 23, 2024 •

edited

Loading

vercel bot commented Dec 23, 2024 •

edited

Loading