[WebGPU EP] Support GroupQueryAttention #22658

satyajandhyala · 2024-10-30T07:09:29Z

Description

Support GroupQueryAttention operator for native webgpu ep.

Motivation and Context

This is required for inferencing some LLMs.

onnxruntime/contrib_ops/webgpu/bert/attention.cc

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/contrib_ops/webgpu/bert/attention.cc

onnxruntime/contrib_ops/webgpu/bert/group_query_attention.cc

onnxruntime/contrib_ops/webgpu/bert/attention_common.h

onnxruntime/contrib_ops/webgpu/bert/group_query_attention.cc

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/contrib_ops/webgpu/bert/attention.cc

onnxruntime/contrib_ops/webgpu/bert/group_query_attention.cc

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/contrib_ops/webgpu/bert/attention.cc

…mmon.h from attention.h

…class" This reverts commit ba45303.

…iple definitions error.

…sajandhy/webgpu-ep-gqa-new

… shader code and added to hint.

…A or not." This reverts commit e448b1a.

guschmue

ok to merge, only open issue is shared buffer which we can fix in a new PR

…QA or not." This reverts commit 60af2f5.

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/contrib_ops/webgpu/bert/attention_common.h

Fixed lint errors and coding guidelines

This reverts commit 0f11088.

…nel is GQA or not."" This reverts commit b494c73.

guschmue · 2024-12-02T20:40:49Z

works for dynamic kv_cache, fix for static kv_cache comes in a new PR.

### Description  Support GroupQueryAttention operator for native webgpu ep. ### Motivation and Context  This is required for inferencing some LLMs.

github-advanced-security bot found potential problems Oct 30, 2024

View reviewed changes

onnxruntime/contrib_ops/webgpu/bert/attention.cc Fixed Show resolved Hide resolved

onnxruntime/contrib_ops/webgpu/bert/attention.cc Fixed Show fixed Hide fixed

github-actions bot previously requested changes Oct 31, 2024

View reviewed changes

onnxruntime/contrib_ops/webgpu/bert/attention.cc Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Oct 31, 2024

View reviewed changes

onnxruntime/contrib_ops/webgpu/bert/attention.cc Fixed Show fixed Hide fixed

satyajandhyala marked this pull request as ready for review November 1, 2024 19:28

skottmckay reviewed Nov 4, 2024

View reviewed changes

onnxruntime/contrib_ops/webgpu/bert/group_query_attention.cc Show resolved Hide resolved

skottmckay reviewed Nov 4, 2024

View reviewed changes

onnxruntime/contrib_ops/webgpu/bert/attention_common.h Outdated Show resolved Hide resolved

satyajandhyala force-pushed the sajandhy/webgpu-ep-gqa-new branch from 514217f to d49ecb4 Compare November 4, 2024 20:20

guschmue added the ep:WebGPU ort-web webgpu provider label Nov 6, 2024

github-advanced-security bot found potential problems Nov 6, 2024

View reviewed changes

onnxruntime/contrib_ops/webgpu/bert/group_query_attention.cc Fixed Show fixed Hide fixed

github-actions bot reviewed Nov 6, 2024

View reviewed changes

onnxruntime/contrib_ops/webgpu/bert/attention.cc Outdated Show resolved Hide resolved

satyajandhyala added 19 commits November 12, 2024 18:51

Added attention_common.h

0a5d212

wip

5bfa070

Fix compilation errors

e6615e9

lint

449afb4

Modified MultiHeadAttention to not derive from AttentionBase class

8d10472

Uncomment GQA registration

4ea58d1

Moved TransferBSToBNSH and ApplyAttention declaration to attention_co…

4bcf257

…mmon.h from attention.h

Revert "Modified MultiHeadAttention to not derive from AttentionBase …

5c5c934

…class" This reverts commit ba45303.

Converted CheckInput function to template to fix compiler/linker mult…

e716546

…iple definitions error.

lint

aba59e5

Fixed conflicts.

067ecd1

copying errors

53f1c78

Fixed inplacesoftmax dispatch

f4dc9fc

Initialize required parameter data

3d1af1c

Map total_seqlen_tensor input to CPU

2eaeebc

Use uniforms variable name consistently to avoid confusion.

9c828cc

Keep InplaceSoftmax dispatch 3-dim.

26caa06

Formatting changes.

64b093f

Use total_seqlen_tensor input only to determin is_first_prompt.

a8bd38b

satyajandhyala added 8 commits November 18, 2024 15:14

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

664022f

…sajandhy/webgpu-ep-gqa-new

changed variable name

a48d782

Removed is_first_prompt from uniforms, used in a condition generating…

4334b39

… shader code and added to hint.

error

d53d7ef

initialize scale

5dc95c8

Calculate output chunk size based on whether the kernel is GQA or not.

e448b1a

Revert "Calculate output chunk size based on whether the kernel is GQ…

60af2f5

…A or not." This reverts commit e448b1a.

Bug fix

47e6f52

guschmue previously approved these changes Nov 25, 2024

View reviewed changes

satyajandhyala added 3 commits November 25, 2024 15:28

Reapply "Calculate output chunk size based on whether the kernel is G…

b494c73

…QA or not." This reverts commit 60af2f5.

tmp

0f11088

Simplified logic.

217058d

satyajandhyala dismissed guschmue’s stale review via 217058d November 26, 2024 17:53

satyajandhyala added 2 commits November 26, 2024 10:15

lint

7f53931

minor coding issue

64976fd

github-actions bot reviewed Nov 26, 2024

View reviewed changes

onnxruntime/contrib_ops/webgpu/bert/attention_common.h Outdated Show resolved Hide resolved

onnxruntime/contrib_ops/webgpu/bert/attention_common.h Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Nov 26, 2024

View reviewed changes

onnxruntime/contrib_ops/webgpu/bert/attention_common.h Fixed Show fixed Hide fixed

lint

2677a0b

satyajandhyala requested a review from guschmue November 27, 2024 21:06

satyajandhyala added 2 commits December 1, 2024 21:20

Revert "tmp"

f209a38

This reverts commit 0f11088.

Revert "Reapply "Calculate output chunk size based on whether the ker…

1aff4d4

…nel is GQA or not."" This reverts commit b494c73.

guschmue approved these changes Dec 2, 2024

View reviewed changes

guschmue merged commit e8bf46a into main Dec 2, 2024
95 checks passed

guschmue deleted the sajandhy/webgpu-ep-gqa-new branch December 2, 2024 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WebGPU EP] Support GroupQueryAttention #22658

[WebGPU EP] Support GroupQueryAttention #22658

satyajandhyala commented Oct 30, 2024 •

edited

Loading

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

guschmue left a comment

github-actions bot left a comment

guschmue commented Dec 2, 2024

[WebGPU EP] Support GroupQueryAttention #22658

[WebGPU EP] Support GroupQueryAttention #22658

Conversation

satyajandhyala commented Oct 30, 2024 • edited Loading

Description

Motivation and Context

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

guschmue left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

guschmue commented Dec 2, 2024

satyajandhyala commented Oct 30, 2024 •

edited

Loading