ggml : fix quants nans when all the group weights are very close to zero #7313

slaren · 2024-05-15T21:34:23Z

When the group abs max value is very close to zero but not zero, it may still result in a division by zero when computing the scale, which ends with a nan scale. To avoid this, we check the max value against an epsilon instead of zero. With the IQ quants, this could also result in a Oops: found point %u not on grid error.

While doing this, I noticed that there was already a similar check with 1e-30 epsilon in make_qx_quants, however values this small can still result in nan, so I bumped it to 1e-20 and extended it to all the cases that I could find. I used the commented code in test-backend-ops to find these cases. It is possible that an even higher epsilon may be necessary.

I don't expect this to result in lower precision in the quants since the epsilon is so small, but it may be worth checking.

Fixes #7311.

JohannesGaessler

There is no way the change in threshold has any significant effects on the results. Even a threshold of $10^{-10}$ should still be fine.

slaren · 2024-05-15T22:05:39Z

I tried progressively higher values and found that some quants still fail with 1e-8, so I increased the eps to 1e-7.

JohannesGaessler · 2024-05-15T22:53:15Z

While increasing GROUP_MAX_EPS by factors of 10, I could go as high as $10^{-4}$ before the results started to change for LLaMA 3 8b q6_K.

JohannesGaessler · 2024-05-16T10:08:53Z

To clarify, I'm not sure how generalizable my results are to other models; I think the model for which the fix is needed at least should also be checked since that particular model seems to have some blocks with only very small values.

slaren · 2024-05-17T23:24:23Z

I have tried to find the lowest possible eps for the quants that require lower than 1e-15, so that only the quants that actually require it use the higher eps, to reduce the risk of introducing errors. Mostly that's the IQ quants.

I don't really like this solution, I think the best way to handle this would be to check for zero before doing the division, but that would require deeper changes, the code is not very easy to follow, and don't want to risk introducing bugs that may cause models with bad quants to be distributed.

ggml-ci

…ero (ggerganov#7313)

ggml : fix quants nans when all the group weights are very close to zero

6fa6a9a

JohannesGaessler approved these changes May 15, 2024

View reviewed changes

increase eps to 1e-7

f59edee

slaren force-pushed the sl/fix-quant-near-zero branch from 0e1c4f6 to f59edee Compare May 15, 2024 22:18

mofosyne added bugfix fixes an issue or bug Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level labels May 16, 2024

ggerganov approved these changes May 16, 2024

View reviewed changes

slaren force-pushed the sl/fix-quant-near-zero branch from 6b41894 to 61e8a0a Compare May 17, 2024 23:26

use higher eps only for the quants that need it

f07e570

ggml-ci

slaren force-pushed the sl/fix-quant-near-zero branch from 61e8a0a to f07e570 Compare May 17, 2024 23:29

slaren merged commit 0583484 into master May 18, 2024
68 of 73 checks passed

slaren deleted the sl/fix-quant-near-zero branch May 18, 2024 00:39

Nexesenex pushed a commit to Nexesenex/kobold.cpp that referenced this pull request May 18, 2024

ggml : fix quants nans when all the group weights are very close to z…

53a1b30

…ero (ggerganov#7313)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : fix quants nans when all the group weights are very close to zero #7313

ggml : fix quants nans when all the group weights are very close to zero #7313

slaren commented May 15, 2024 •

edited

JohannesGaessler left a comment

slaren commented May 15, 2024

JohannesGaessler commented May 15, 2024

JohannesGaessler commented May 16, 2024

slaren commented May 17, 2024

ggml : fix quants nans when all the group weights are very close to zero #7313

ggml : fix quants nans when all the group weights are very close to zero #7313

Conversation

slaren commented May 15, 2024 • edited

JohannesGaessler left a comment

Choose a reason for hiding this comment

slaren commented May 15, 2024

JohannesGaessler commented May 15, 2024

JohannesGaessler commented May 16, 2024

slaren commented May 17, 2024

slaren commented May 15, 2024 •

edited