-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WebGPU EP] Support GroupQueryAttention #22658
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
514217f
to
d49ecb4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
…sajandhy/webgpu-ep-gqa-new
… shader code and added to hint.
…A or not." This reverts commit e448b1a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok to merge, only open issue is shared buffer which we can fix in a new PR
…QA or not." This reverts commit 60af2f5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
Fixed lint errors and coding guidelines
works for dynamic kv_cache, fix for static kv_cache comes in a new PR. |
### Description <!-- Describe your changes. --> Support GroupQueryAttention operator for native webgpu ep. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is required for inferencing some LLMs.
### Description <!-- Describe your changes. --> Support GroupQueryAttention operator for native webgpu ep. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is required for inferencing some LLMs.
### Description <!-- Describe your changes. --> Support GroupQueryAttention operator for native webgpu ep. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is required for inferencing some LLMs.
### Description <!-- Describe your changes. --> Support GroupQueryAttention operator for native webgpu ep. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is required for inferencing some LLMs.
Description
Support GroupQueryAttention operator for native webgpu ep.
Motivation and Context
This is required for inferencing some LLMs.