Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] tgi-1.1.0 - Please install EETQ from https://github.com/NetEase-FuXi/EETQ #3377

Open
1 of 6 tasks
Daan-Grashoff opened this issue Oct 4, 2023 · 6 comments
Open
1 of 6 tasks

Comments

@Daan-Grashoff
Copy link

Checklist

Concise Description:
When deploying a model with EETQ HF_MODEL_QUANTIZE, it results in the error "ImportError: Please install EETQ from https://github.com/NetEase-FuXi/EETQ".

DLC image/dockerfile:

  • v1.0-hf-tgi-1.1.0-pt-2.0.1-inf-gpu-py39
  • 763104351884.dkr.ecr.us-east-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04-v1.0

Current behavior:
Returns with error ImportError: Please install EETQ from https://github.com/NetEase-FuXi/EETQ

Expected behavior:
The model should deploy and run without any issues.

Additional context:

@chintanckg
Copy link

I am facing the same issue!

@Daan-Grashoff
Copy link
Author

@chintanckg Which image are you using?
763104351884.dkr.ecr.us-east-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04
or
763104351884.dkr.ecr.us-east-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04-v1.0

@chintanckg
Copy link

I am not sure how to get the exact image version, please help me with it.

@chintanckg
Copy link

I used the :latest tag and all is sorted now.

@Daan-Grashoff
Copy link
Author

Can you share your code?

@chintanckg
Copy link

chintanckg commented Oct 13, 2023

model= #path to model or hugging face path

volume=$PWD

docker run --gpus all --shm-size 24g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --max-total-tokens 5024 --max-input-length 4096 --num-shard 4 --max-concurrent-requests 128 --quantize eetq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants