-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phi-3 deployment issue #1139
Labels
huggingchat
For issues related to HuggingChat specifically
models
This issue is related to model performance/reliability
Comments
For reference, I am starting the TGI server with the following: model=microsoft/Phi-3-mini-4k-instruct
volume=$PWD/data
docker run --gpus all \
--shm-size 1g \
-p 8080:80 \
-v $volume:/data ghcr.io/huggingface/text-generation-inference:latest \
--model-id $model \
--trust-remote-code |
@nsarrazin could you please take a look into this? |
Hi! thanks for digging into this, will report it internally and come back to you! |
nsarrazin
added
models
This issue is related to model performance/reliability
huggingchat
For issues related to HuggingChat specifically
labels
May 27, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
huggingchat
For issues related to HuggingChat specifically
models
This issue is related to model performance/reliability
Good afternoon everyone!
We know that
Phi-3-mini-4k-instruct
has been suffering from some gibberish outputs when used with HuggingChat and I think I have been finally able to track where the issue is coming from:If I run the Python request from above, you will see that some gibberish is generated, something like:
However, if I deploy a local instance of TGI, change the
API_URL = "http://127.0.0.1:8080"
and run the very same script, the generation starts to make sense:My suspicion is that the model that has been deployed to
https://api-inference.huggingface.co/models/microsoft/Phi-3-mini-4k-instruct
, which is consumed by the HuggingChat uses an older version of code/tokenizer configuration. It was added on the release day, and we did some updates after that day.Another possibility could be an issue with a previous version of
flash-attn
(if it is being used) and somehow crashing regarding thesliding_window
? I remember some older versions had a problem where the window was not being "accurately" computed.Could you please re-deploy the model or take a look in it?
Thanks for your attention and best regards,
Gustavo.
The text was updated successfully, but these errors were encountered: