Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use quantized model on inference #4

Open
yachty66 opened this issue Aug 22, 2023 · 1 comment
Open

How to use quantized model on inference #4

yachty66 opened this issue Aug 22, 2023 · 1 comment

Comments

@yachty66
Copy link

yachty66 commented Aug 22, 2023

I have successfully quantized the facebook/opt-125m model using the opt.py script with the following command:

CUDA_VISIBLE_DEVICES=0 python opt.py facebook/opt-125m c4 --wbits 4 --quant ldlq --incoh_processing --save quantized_model

This command generates a quantized model named quantized_model. My question is, should I replace the original weights from https://huggingface.co/facebook/opt-125m/tree/main with the weights from quantized_model to run the 2-bit model on inference?

@yachty66 yachty66 changed the title How to use quantized model for inference How to use quantized model on inference Aug 22, 2023
@jerry-chee
Copy link
Collaborator

Can you share an example of how you plan to run the model on inference? If you're using the scripts in this repo to evaluation perplexity / zeroshot accuracy, then you just need to provide the saved file with the --load argument.

If you're using the huggingface from_pretrained() function, then what I've done is to put the saved model as well as the config into the same folder, and reference that in from_pretrained(). You can copy the config from huggingface, example https://huggingface.co/facebook/opt-125m/tree/main. You'll also need to rename the saved model to one of the names that from_pretrained() is looking for, like pytorch_model.bin.

Sorry this is a bit hacky, we're working on releasing model checkpoints and a better guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants