Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on test 03 #3805

Open
hforoughmand opened this issue Apr 30, 2024 · 1 comment
Open

Error on test 03 #3805

hforoughmand opened this issue Apr 30, 2024 · 1 comment

Comments

@hforoughmand
Copy link

When I run the test 03 (https://triton-lang.org/main/getting-started/tutorials/03-matrix-multiplication.html#sphx-glr-getting-started-tutorials-03-matrix-multiplication-py) on a V100 I get the following error.

triton_output_with_fp16_inputs=tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float16)
torch_output_with_fp16_inputs=tensor([[  1.1045, -36.9688,  31.4688,  ..., -11.3906,  24.4531, -32.3438],
        [  6.3516, -19.6094,  34.0938,  ...,  -5.8906,   5.2812,   6.8828],
        [-32.0625,   5.9531,  15.3984,  ..., -21.4062, -23.9844, -10.1328],
        ...,
        [ -5.7070,   7.4492,   8.2656,  ..., -10.6953, -40.0000,  17.7500],
        [ 25.5000,  24.3438,  -8.4609,  ..., -18.9375,  32.5312, -29.9219],
        [ -5.3477,   4.9805,  11.8828,  ...,   5.5859,   6.4023, -17.3125]],
       device='cuda:0', dtype=torch.float16)
❌ Triton and Torch differ
loc("03-matrix-multiplication.py":261:35): error:  size mismatch when packing elements for LLVM struct expected 32 but got 64
python: /root/.triton/llvm/llvm-5e5a22ca-centos-x64/include/llvm/ADT/ArrayRef.h:257: const T& llvm::ArrayRef<T>::operator[](size_t) const [with T = mlir::Type; size_t = long unsigned int]: Assertion `Index < Length && "Invalid index!"' failed.

Is that a problem with the installation or a known bug? My cuda version is 11.8, my python version is 3.8.18, my pytorch version is 2.3.0+cu118.

@fkouteib
Copy link
Contributor

hey @hforoughmand, if you want to run the version of the tutorial at tip of main branch (also published on the triton website), I recommend building/installing Triton main branch from source, or installing a nightly build (v3.0-*) per readme instructions.

If you want to run a Triton stable release (2.x) installed from PyPI (or implicitly installed with PyTorch stable release install), then I recommend you run the version of the tutorial code in the corresponding release branch (which may be different from the website).

For mat mul on V100 specifically, you may want to review open issues referencing V100. I think there may be a known regression on it for FP16, see #3478.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants