Align vocab_size to 32-multiples to Prevent Shape Mismatch Errors on Ampere GPUs with bnb Quantization #172

hjh0119 · 2024-05-14T12:09:54Z

As mentioned in #129

It is recommended to set the tokenizer's vocab_size to be a multiple of 32 (and consequently adjust the dimensions of the embedding and the final lm_head, i.e., the language_model.output accordingly). Otherwise, after quantization with bitsandbytes (bnb), the model may encounter errors when computing gradients (backward) on Ampere GPUs with a version greater than 8. This is due to bnb manually padding the shape to the nearest multiple of 32, leading to shape mismatches.

https://github.com/TimDettmers/bitsandbytes/blob/main/bitsandbytes/functional.py#L508-L512

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align vocab_size to 32-multiples to Prevent Shape Mismatch Errors on Ampere GPUs with bnb Quantization #172

Align vocab_size to 32-multiples to Prevent Shape Mismatch Errors on Ampere GPUs with bnb Quantization #172

hjh0119 commented May 14, 2024

Align vocab_size to 32-multiples to Prevent Shape Mismatch Errors on Ampere GPUs with bnb Quantization #172

Align vocab_size to 32-multiples to Prevent Shape Mismatch Errors on Ampere GPUs with bnb Quantization #172

Comments

hjh0119 commented May 14, 2024