You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
root@ac6edc15b00f:/workspace/quantization# python test_gptq.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 53%|██████████████████████████████████████████████▉ | 16/30 [09:03<07:55, 34.00s/it]
Traceback (most recent call last):
File "/workspace//quantization/test_gptq.py", line 27, in
model = AutoGPTQForCausalLM.from_pretrained(pretrained_model_dir, quantize_config)
File "/opt/conda/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 76, in from_pretrained
return GPTQ_CAUSAL_LM_MODEL_MAP[model_type].from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 787, in from_pretrained
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path, **merged_kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
return model_class.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3677, in from_pretrained
) = cls._load_pretrained_model(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4084, in _load_pretrained_model
state_dict = load_state_dict(shard_file, is_quantized=is_quantized)
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 507, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f: safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
Describe the bug
root@ac6edc15b00f:/workspace/quantization# python test_gptq.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 53%|██████████████████████████████████████████████▉ | 16/30 [09:03<07:55, 34.00s/it]
Traceback (most recent call last):
File "/workspace//quantization/test_gptq.py", line 27, in
model = AutoGPTQForCausalLM.from_pretrained(pretrained_model_dir, quantize_config)
File "/opt/conda/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 76, in from_pretrained
return GPTQ_CAUSAL_LM_MODEL_MAP[model_type].from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 787, in from_pretrained
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path, **merged_kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
return model_class.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3677, in from_pretrained
) = cls._load_pretrained_model(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4084, in _load_pretrained_model
state_dict = load_state_dict(shard_file, is_quantized=is_quantized)
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 507, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f:
safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
Hardware details
A800
root@ac6edc15b00f:/workspace/code/qwen/quantization2# nvidia-smi
Tue Apr 30 06:00:36 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A800 80GB PCIe Off | 00000000:17:00.0 Off | 0 |
| N/A 38C P0 65W / 300W | 64181MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A800 80GB PCIe Off | 00000000:31:00.0 Off | 0 |
| N/A 36C P0 65W / 300W | 62285MiB / 81920MiB | 0% Default |
| | | Disabled |
Software version
Linux ac6edc15b00f 5.4.0-177-generic #197-Ubuntu SMP Thu Mar 28 22:45:47 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Python 3.10.13
root@ac6edc15b00f:/workspace/quantization2# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
torch Version: 2.2.1
accelerate Version: 0.29.3
transformers Version: 4.40.1
To Reproduce
Expected behavior
Screenshots
Additional context
已经下载好的llama3 instruct,尝试量化,是因为显存不够了吗?
The text was updated successfully, but these errors were encountered: