-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepSpeed Sparse Attention is Broken #863
Comments
Is this a “real” issue or can we just change the required version to obtain support? |
I tried a range of versions (including with a handful of easy changes to the code) and nothing worked right away. With an updated Triton version it probably wouldn't be very much effort to fix, but this was in the tail end of adding support for the new Triton Flash Attention and so @Quentin-Anthony advised to just separate this out as an issue. Though, to be clear, the issue isn't introduced by Triton Flash Attention -- DeepSpeed updated without us and now just bumping up the version isn't quite enough to put things right. |
SparseAttention relies on Triton for specific kernels. GPT-NeoX currently has as a dependency
triton==0.4.2
, which is behind the DeepSpeed version of1.0.0
. It is far behind the version of Triton that we would like to use,2.0.0.dev20221202
, which is required for new Triton features.Current NeoX and DeepSpeed code cannot use Sparse Attention with any of these versions.
The text was updated successfully, but these errors were encountered: