Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WebGPU EP] Support GroupQueryAttention #22658

Merged
merged 52 commits into from
Dec 2, 2024
Merged

Conversation

satyajandhyala
Copy link
Contributor

@satyajandhyala satyajandhyala commented Oct 30, 2024

Description

Support GroupQueryAttention operator for native webgpu ep.

Motivation and Context

This is required for inferencing some LLMs.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

onnxruntime/contrib_ops/webgpu/bert/attention.cc Outdated Show resolved Hide resolved
@satyajandhyala satyajandhyala marked this pull request as ready for review November 1, 2024 19:28
@satyajandhyala satyajandhyala force-pushed the sajandhy/webgpu-ep-gqa-new branch from 514217f to d49ecb4 Compare November 4, 2024 20:20
@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Nov 6, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

onnxruntime/contrib_ops/webgpu/bert/attention.cc Outdated Show resolved Hide resolved
onnxruntime/contrib_ops/webgpu/bert/attention.cc Outdated Show resolved Hide resolved
onnxruntime/contrib_ops/webgpu/bert/attention.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

onnxruntime/contrib_ops/webgpu/bert/attention.cc Outdated Show resolved Hide resolved
guschmue
guschmue previously approved these changes Nov 25, 2024
Copy link
Contributor

@guschmue guschmue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to merge, only open issue is shared buffer which we can fix in a new PR

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

onnxruntime/contrib_ops/webgpu/bert/attention_common.h Outdated Show resolved Hide resolved
onnxruntime/contrib_ops/webgpu/bert/attention_common.h Outdated Show resolved Hide resolved
@satyajandhyala satyajandhyala dismissed github-actions[bot]’s stale review November 28, 2024 04:00

Fixed lint errors and coding guidelines

@guschmue guschmue merged commit e8bf46a into main Dec 2, 2024
95 checks passed
@guschmue guschmue deleted the sajandhy/webgpu-ep-gqa-new branch December 2, 2024 20:40
@guschmue
Copy link
Contributor

guschmue commented Dec 2, 2024

works for dynamic kv_cache, fix for static kv_cache comes in a new PR.

ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
### Description
<!-- Describe your changes. -->
Support GroupQueryAttention operator for native webgpu ep.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is required for inferencing some LLMs.
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
### Description
<!-- Describe your changes. -->
Support GroupQueryAttention operator for native webgpu ep.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is required for inferencing some LLMs.
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
### Description
<!-- Describe your changes. -->
Support GroupQueryAttention operator for native webgpu ep.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is required for inferencing some LLMs.
tarekziade pushed a commit to tarekziade/onnxruntime that referenced this pull request Jan 10, 2025
### Description
<!-- Describe your changes. -->
Support GroupQueryAttention operator for native webgpu ep.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is required for inferencing some LLMs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebGPU ort-web webgpu provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants