You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm testing a fused attention kernel which potentially supports fp8. The code works in fp16. However, when I enable fp8 and try it on NVIDIA H100 with latest triton built from source main branch, I got an obscure compile error
python: /home/ccyang/.triton/llvm/llvm-ed4e505c-ubuntu-x64/include/llvm/Support/Casting.h:572: decltype(auto) llvm::cast(From&) [with To = mlir::IntegerAttr; From = mlir::Attribute]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
Aborted (core dumped)
With some testing I found that, if I remove the boundary checking and padding usage defined in load_fn, the compile error can go away. But I think those are needed for correctness. I tried to narrow it down to a minimally reproducible example (see below), where I removed most logic and keep only block pointers, load, tl.dot and store back.
If you toggle between these two lines, you can either reproduce the compile error or get the code to compile. It suggests that there is a bug or some limitation in fp8 block pointer boundary checks and padding option. If this is a limitation, could you suggest a workaround? Thanks.
Running with pytest a call stack is dumped after the abortion
Current thread 0x00007f4c8a158740 (most recent call first):
File "/home/ccyang/github.com/triton/python/triton/backends/nvidia/compiler.py", line 212 in make_llir
File "/home/ccyang/github.com/triton/python/triton/backends/nvidia/compiler.py", line 302 in <lambda>
File "/home/ccyang/github.com/triton/python/triton/compiler/compiler.py", line 282 in compile
File "/home/ccyang/github.com/triton/python/triton/runtime/jit.py", line 662 in run
File "/home/ccyang/github.com/triton/python/triton/runtime/autotuner.py", line 174 in run
File "/home/ccyang/github.com/triton/python/triton/runtime/jit.py", line 345 in <lambda>
File "/home/ccyang/github.com/cyang49/foundation-model-stack/fms/triton/bug_reproducer.py", line 251 in forward
The text was updated successfully, but these errors were encountered:
Hello,
I'm testing a fused attention kernel which potentially supports fp8. The code works in fp16. However, when I enable fp8 and try it on NVIDIA H100 with latest triton built from source main branch, I got an obscure compile error
With some testing I found that, if I remove the boundary checking and padding usage defined in
load_fn
, the compile error can go away. But I think those are needed for correctness. I tried to narrow it down to a minimally reproducible example (see below), where I removed most logic and keep only block pointers, load, tl.dot and store back.If you toggle between these two lines, you can either reproduce the compile error or get the code to compile. It suggests that there is a bug or some limitation in fp8 block pointer boundary checks and padding option. If this is a limitation, could you suggest a workaround? Thanks.
Running with pytest a call stack is dumped after the abortion
The text was updated successfully, but these errors were encountered: