You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Looking at the tokenizer.json as on huggingface hub there is quite a lot of tokens that decode to unkown that probably shouldn't? Is this expected or am I making a mistake during decoding?
Additional Context
There are 606 tokens that are output as UNK:
The issue was originally found when using PreTrainedTokenizerFast
Suggested Solutions
No response
The text was updated successfully, but these errors were encountered:
Python -VV
Pip Freeze
Reproduction Steps
Output:
�
Expected Behavior
Looking at the tokenizer.json as on huggingface hub there is quite a lot of tokens that decode to unkown that probably shouldn't? Is this expected or am I making a mistake during decoding?
Additional Context
There are 606 tokens that are output as UNK:
The issue was originally found when using
PreTrainedTokenizerFast
Suggested Solutions
No response
The text was updated successfully, but these errors were encountered: