Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Gemma 2b v1.1, all quants. #92

Open
cooperll opened this issue Jun 3, 2024 · 0 comments
Open

Issues with Gemma 2b v1.1, all quants. #92

cooperll opened this issue Jun 3, 2024 · 0 comments

Comments

@cooperll
Copy link

cooperll commented Jun 3, 2024

For some reason, there seems to be an issue with the Gemma 2b v1.0 and v1.1 models. Even on a branch of this repository that has an up-to-date llama.cpp, no matter the chat format I'm still seeing superfluous tokens and bad quality responses come back.

I've tried all sorts of chat templates.
The one from the HF page:

<bos><start_of_turn>user
Write a hello world program<end_of_turn>
<start_of_turn>model

The one used by LM Studio:

<start_of_turn>user
USER_PROMPT_HERE<end_of_turn>
<start_of_turn>model

No chat template at all:

USER_PROMPT_HERE

And many more.

In the case of no chat template, I see <eos> printed, and that's all.

In the other cases, I sometimes see a reasonable generation, but then I also see extra strings printed at the end, such as:
</start_of_turn><eos>

The issue does not appear to be in llama.cpp. (1) I've tested with llama-cpp-python and llama.cpp directly, and (2) LM Studio does not have these issues, even with the same weights. I also do not have problems with this repository when using Mistral 7b v0.2.

Has anyone else seen this happen? Is there something I'm missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant