ALibi & Flash Attention #864

dashstander · 2023-03-29T19:09:07Z

The current version Flash Attention does not support the ALiBi positional embedding. This PR adds support for the Triton-based Flash Attention that does allow a bias added to the self-attention calculation.

This can be enabled simply by setting "pos_emb": "alibi" and adding "flash" to the attention config in your model's configuration.

To support this feature we bumped up the listed Triton dependency to triton==2.0.0.dev20221202 and in the course of implementing it we discovered that SparseAttention became broken at some point in the past, see #863.

Signed-off-by: Dashiell Stander <[email protected]>

dashstander added 18 commits March 22, 2023 18:38

Code formatting

47c358d

Signed-off-by: Dashiell Stander <[email protected]>

Import it differently

295c823

Signed-off-by: Dashiell Stander <[email protected]>

Bump up triton dependency

4af81e0

Signed-off-by: Dashiell Stander <[email protected]>

No kwargs

a459bc6

Signed-off-by: Dashiell Stander <[email protected]>

No kwargs

39190f4

Signed-off-by: Dashiell Stander <[email protected]>

Get the signature right

d526540

Signed-off-by: Dashiell Stander <[email protected]>

Get the signature right

6b1805f

Signed-off-by: Dashiell Stander <[email protected]>

Add dim for num heads to bias

4cb6d2d

Signed-off-by: Dashiell Stander <[email protected]>

Add dim for num heads to bias

bcbc3eb

Signed-off-by: Dashiell Stander <[email protected]>

Add dim for num heads to bias

35d521c

Signed-off-by: Dashiell Stander <[email protected]>

Think I have the shape right now?

915ae00

Signed-off-by: Dashiell Stander <[email protected]>

blegh shapes

68e4691

Signed-off-by: Dashiell Stander <[email protected]>

blegh shapes

7904f46

Signed-off-by: Dashiell Stander <[email protected]>

Need to get the triton version just right

05af11b

Signed-off-by: Dashiell Stander <[email protected]>

Remove debug print statements

82e8992

Signed-off-by: Dashiell Stander <[email protected]>

Need to permute the dimensions before returning

b9d1396

Signed-off-by: Dashiell Stander <[email protected]>

Update SparseAttention signature

4ef31de

Signed-off-by: Dashiell Stander <[email protected]>

Clean up code.

1beaed5

Signed-off-by: Dashiell Stander <[email protected]>

dashstander requested a review from a team as a code owner March 29, 2023 19:09

dashstander linked an issue Mar 29, 2023 that may be closed by this pull request

AliBi + Flash Attention Support #848

Closed

dashstander requested review from Quentin-Anthony and ShivanshuPurohit March 29, 2023 19:09

github-actions and others added 2 commits March 29, 2023 19:09

Update NeoXArgs docs automatically

6017ec1

Merge branch 'main' into 848-alibi-flash-attention

cb6042d

Quentin-Anthony previously approved these changes Apr 11, 2023

View reviewed changes

Update NeoXArgs docs automatically

75d7a25

github-actions bot dismissed Quentin-Anthony’s stale review via 75d7a25 April 11, 2023 22:30

Quentin-Anthony approved these changes Apr 11, 2023

View reviewed changes

Quentin-Anthony merged commit f3d65b5 into main Apr 11, 2023

Quentin-Anthony deleted the 848-alibi-flash-attention branch April 11, 2023 22:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ALibi & Flash Attention #864

ALibi & Flash Attention #864

dashstander commented Mar 29, 2023

ALibi & Flash Attention #864

ALibi & Flash Attention #864

Conversation

dashstander commented Mar 29, 2023