Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG: v3 Tokenizer ] #43

Open
nirual81 opened this issue Aug 31, 2024 · 1 comment
Open

[BUG: v3 Tokenizer ] #43

nirual81 opened this issue Aug 31, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@nirual81
Copy link

Python -VV

...

Pip Freeze

annotated-types==0.7.0
attrs==24.2.0
certifi==2024.7.4
charset-normalizer==3.3.2
idna==3.8
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
mistral_common==1.3.4
pydantic==2.6.1
pydantic_core==2.16.2
referencing==0.35.1
regex==2024.7.24
requests==2.32.3
rpds-py==0.20.0
sentencepiece==0.2.0
tiktoken==0.7.0
typing_extensions==4.12.2
urllib3==2.2.2

Reproduction Steps

Ive copied the example from: https://github.com/mistralai/mistral-common/blob/main/examples/tokenizer.ipynb
But the tokens aren't properly formatted.

The tokens are like this:
[AVAILABLE_TOOLS]▁[{"type":▁"function",▁"function":▁{"name":▁"get_current_weather",▁"description":▁"Get▁the▁current▁weather",▁"parameters":▁{"type":▁"object",▁"properties":▁{"location":▁{"type":▁"string",▁"description":▁"The▁city▁and▁state,▁e.g.▁San▁Francisco,▁CA"},▁"format":▁{"type":▁"string",▁"enum":▁["celsius",▁"fahrenheit"],▁"description":▁"The▁temperature▁unit▁to▁use.▁Infer▁this▁from▁the▁users▁location."}},▁"required":▁["location",▁"format"]}}}][/AVAILABLE_TOOLS][INST]▁What's▁the▁weather▁like▁today▁in▁Paris[/INST]

First Ive thought that it is intended but with the second message the LLM returns messages like this:
▁The▁current▁weather▁in▁Paris▁is▁72°C.

Expected Behavior

The tokens should have proper formatting with spaces instead of underlines at the most places:

[INST] What's the weather like today in Paris[/INST][TOOL_CALLS] [{"name": "get_current_weather", "arguments": {"location": "Paris, France", "format": "celsius"}, "id": "VvvODy9mT"}][TOOL_RESULTS] {"call_id": "VvvODy9mT", "content": 22}[/TOOL_RESULTS] The current temperature in Paris, France is 22 degrees Celsius.[AVAILABLE_TOOLS] [{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "format": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The temperature unit to use. Infer this from the users location."}}, "required": ["location", "format"]}}}][/AVAILABLE_TOOLS][INST] What's the weather like today in San Francisco[/INST][TOOL_CALLS] [{"name": "get_current_weather", "arguments": {"location": "San Francisco", "format": "celsius"}, "id": "fAnpW3TEV"}][TOOL_RESULTS] {"call_id": "fAnpW3TEV", "content": 20}[/TOOL_RESULTS]

Additional Context

I am using the mistral-nemo model.

Suggested Solutions

No response

@nirual81 nirual81 added the bug Something isn't working label Aug 31, 2024
@patrickvonplaten
Copy link
Contributor

Sorry I don't fully understand the problem here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants