Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Convert.py: Option to skip measurement when setting 8.0/8.0 #673

Open
3 tasks done
Originalimoc opened this issue Nov 13, 2024 · 3 comments
Open
3 tasks done

Comments

@Originalimoc
Copy link

Originalimoc commented Nov 13, 2024

Problem

Still doing mesurement when set to 8.0 bpw.

Solution

Skip the measurement/generate a dummy meaurement file.

Alternatives

No response

Explanation

What's the point of measurement if you're using 8.0 on all layers anyway? Or is there any ignored/acceptable loss threshold will cause lower bpw like 5~6 to be used even 8 is set?

Examples

No response

Additional context

No response

Acknowledgements

  • I have looked for similar requests before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will make my requests politely.
@Originalimoc Originalimoc changed the title [REQUEST] Convert.py: Skip measurement when setting 8.0/8.0 [REQUEST] Convert.py: Option to skip measurement when setting 8.0/8.0 Nov 18, 2024
@pchristidis
Copy link

If you want to work around this, just grab a random measurement.json from huggingface for the same base model.

Bartowski does quants for most of the big models, and usually puts the measurement.json in the main branch eg:

https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-exl2/tree/main

Then just use -m /path/to/measurement.json in the conversion script when you're doing 8bpw

@necrogay
Copy link

necrogay commented Jan 9, 2025

What concerns me the most is the lack of an option to manually override the optimization process. The system decides on its own which layers to quantize and to what degree, sometimes doing so in situations where it’s not entirely appropriate. For instance, I want to set a maximum quantization level of 8+ by specifying the parameters -b 8 [9,10,16,255]. However, this doesn’t seem to matter, as the system still arbitrarily quantizes many layers to 4, 5, or 6 bpw.

What’s more frustrating is that every time I run the process, it selects layers for quantization in a random order. For example, in one run, it might choose layers 3, 5, and 39, but after restarting with the same parameters, it could switch to layers 4, 9, and 28, and so on.

It would be great to have an option to explicitly specify which layers should not be optimized and should instead be quantized with the maximum value. Additionally, it would be useful to define specific quantization ranges for particular layers. For instance, having an additional configuration file where such quantization ranges could be defined would make the process much more convenient and flexible.

@turboderp
Copy link
Member

Part of this is because 8 bpw requires some layers to use less than the maximum bitrate. The bitrate specified is the actual number of bits per weight including overhead. With that overhead, the actual maximum is about 8.05 bpw (it varies a bit depending on tensor shapes).

I just checked and there was a slight inaccuracy in the optimizer which made it ever so slightly undershoot the target bitrate if the last annealing step left a tiny bit of the cost budget unused. This shouldn't happen, so I fixed it in the latest commit to the dev branch.

With that, you should be able to set a target bitrate of e.g. 9 and always get the largest setting for each layer.

Note that it's highly unlikely to make any practical difference since the reason this happens in the first place is that the measured difference between the highest and next-highest setting for a given layer is below the noise floor.

I might add a shortcut to skip measurement and simply use the max bitrate as an option, but I'm also looking at completely reworking the quantization scheme anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants