[BUG] Following the quant_with_alpaca.py example but keep getting "You shouldn't move a model that is dispatched using accelerate hooks." and the model is never saved. #670

murtaza-nasir · 2024-05-13T15:50:09Z

Describe the bug
I am using the quant_with_alpaca.py script to quantize MaziyarPanahi/Llama-3-70B-Instruct-32k-v0.1. I am using the following command:

python quant_with_alpaca.py \
--pretrained_model_dir "/home/murtaza/work/ml/text-generation-webui/models/MaziyarPanahi_Llama-3-70B-Instruct-32k-v0.1" \
--quantized_model_dir "/home/murtaza/work/ml/text-generation-webui/models/MurtazaNasir_Llama-3-70B-Instruct-32k-v0.1-GPTQ" \
--per_gpu_max_memory 6 \
--cpu_max_memory 200 \
--quant_batch_size 16 \
--bits 4 --use_triton --save_and_reload

I have tried running the above without --save_and_reload and the script quantizes the model and then runs inference which seems fine. But the model never gets saved anywhere. With the --save_and_reload switch, I get the this output:

INFO - Model packed.
2024-05-13 03:45:45 INFO [auto_gptq.modeling._utils] Model packed.
WARNING - using autotune_warmup will move model to GPU, make sure you have enough VRAM to load the whole model.
2024-05-13 03:45:45 WARNING [auto_gptq.modeling._utils] using autotune_warmup will move model to GPU, make sure you have enough VRAM to load the whole model.
2024-05-13 03:45:45 WARNING [accelerate.big_modeling] You shouldn't move a model that is dispatched using accelerate hooks.

After this it crashes because of a CUDA OOM error.

When run without the --save_and_reload switch, the script tests the quant with 4 instructions and then exits without any error (although the inference speed was painfully slow).

Hardware details
I have an EPYC 7532 processor with 256GB ram and 4x 3090s.

Software version
Ubuntu 22.04.4 LTS (6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
Python 3.10.14
auto_gptq Version: 0.8.0.dev0+cu121
Torch 2.3.0+cu121
Transformers 4.40.2
Accelerate 0.30.1

To Reproduce

Clone repository.
Build.
Go to quantization directory.
Run above command.

I made one change to one of the files to add the damp 0.1 argument for quantization.

Expected behavior
I was hoping to get a GPTQ quant of the above model.

The text was updated successfully, but these errors were encountered:

murtaza-nasir added the bug Something isn't working label May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Following the quant_with_alpaca.py example but keep getting "You shouldn't move a model that is dispatched using accelerate hooks." and the model is never saved. #670

[BUG] Following the quant_with_alpaca.py example but keep getting "You shouldn't move a model that is dispatched using accelerate hooks." and the model is never saved. #670

murtaza-nasir commented May 13, 2024 •

edited

[BUG] Following the quant_with_alpaca.py example but keep getting "You shouldn't move a model that is dispatched using accelerate hooks." and the model is never saved. #670

[BUG] Following the quant_with_alpaca.py example but keep getting "You shouldn't move a model that is dispatched using accelerate hooks." and the model is never saved. #670

Comments

murtaza-nasir commented May 13, 2024 • edited

murtaza-nasir commented May 13, 2024 •

edited