Supporting group-wise quantization and sub1 packing #4

NicoNico6 · 2023-11-05T11:39:31Z

Dear Authors,

Sorry for the intrusion once more.

To the best of my understanding, the original GPTQ algorithm accommodates a range of group-wise quantizations, such as group sizes of -1, 128, and 64. Upon reviewing the code, and assuming my interpretation is correct, it appears that although the batch_GPTQ inherently supports various group sizes, the add_expert function within the Sub1CheckpointManager class and the make function in the Sub1Linear seemingly only support row-wise quantization by default, corresponding to a group size of -1. Consequently, only the row-wise min_max variable is preserved for subsequent packing operations.

Would it be feasible to apply the LWZ algorithm to tensors that have undergone group-wise quantization (for instance, groupsize=128, ternary weights) and to design the sub1 packing process accordingly?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting group-wise quantization and sub1 packing #4

Supporting group-wise quantization and sub1 packing #4

NicoNico6 commented Nov 5, 2023 •

edited

Loading

Supporting group-wise quantization and sub1 packing #4

Supporting group-wise quantization and sub1 packing #4

Comments

NicoNico6 commented Nov 5, 2023 • edited Loading

NicoNico6 commented Nov 5, 2023 •

edited

Loading