-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DEPRECATION] Discussion on Fused attention and QiGEN #655
Comments
1.Anyone still using these? maybe. No. 2.is fused attention actually faster than marlin even if working properly? No, I think it's more of a legacy of fused attention and would be good to get rid of. 3.Is SYCL [SYCL] Intel SYCL runtime support for AutoGPTQ #638 a candidate to replace qigen? Intel staff is willing to actively support this in autogptq. qigen is a kernel that makes inference possible on the CPU. If this SYCL is CPU inferable, it seems like a good idea to remove qigen. 4.With vllm's new marlin kernel that will support almost all groups sizes act-order, do we even need fused attention? Additionally, it seems like a good idea to remove triton v1 and replace it with triton v2, since all the features of triton v1 are supported by v2 and it is faster. |
Qbits (intel) PenghuiCheng #660 is another qigen alternative and actively support by Intel. |
Hi @Qubitium , We greatly appreciate your interest in QBits. For a comprehensive introduction to QBits, please refer to the RFC. It's worth noting that QBits is still under active development, and we're committed to continuous improvement in both performance and features. Performance enhancements:
Feature enhancements:
Regarding pr660 replacing qigen:wonder if that can pr660 totally replace qigen? if cant, what other efforts should we take? |
I think the current Qbits can replace all parts except 2 and 3 bits of Qigen. |
hi, ITREX will release next version in late may, which support 2/3bit linear |
@PanQiWei @LaaZa @fxmarty @qwopqwop200
I want to start a discussion on major refractor, more like hack off unsupported or flat-out broken features in the current tree.
Questions:
With vllm's new marlin kernel that will support almost all groups sizes act-order, do we even need fused attention? #653
EDIT: added triton v1/v2 to the discussion
The text was updated successfully, but these errors were encountered: