Error on test 03 #3805

hforoughmand · 2024-04-30T18:59:11Z

When I run the test 03 (https://triton-lang.org/main/getting-started/tutorials/03-matrix-multiplication.html#sphx-glr-getting-started-tutorials-03-matrix-multiplication-py) on a V100 I get the following error.

triton_output_with_fp16_inputs=tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float16)
torch_output_with_fp16_inputs=tensor([[  1.1045, -36.9688,  31.4688,  ..., -11.3906,  24.4531, -32.3438],
        [  6.3516, -19.6094,  34.0938,  ...,  -5.8906,   5.2812,   6.8828],
        [-32.0625,   5.9531,  15.3984,  ..., -21.4062, -23.9844, -10.1328],
        ...,
        [ -5.7070,   7.4492,   8.2656,  ..., -10.6953, -40.0000,  17.7500],
        [ 25.5000,  24.3438,  -8.4609,  ..., -18.9375,  32.5312, -29.9219],
        [ -5.3477,   4.9805,  11.8828,  ...,   5.5859,   6.4023, -17.3125]],
       device='cuda:0', dtype=torch.float16)
❌ Triton and Torch differ
loc("03-matrix-multiplication.py":261:35): error:  size mismatch when packing elements for LLVM struct expected 32 but got 64
python: /root/.triton/llvm/llvm-5e5a22ca-centos-x64/include/llvm/ADT/ArrayRef.h:257: const T& llvm::ArrayRef<T>::operator[](size_t) const [with T = mlir::Type; size_t = long unsigned int]: Assertion `Index < Length && "Invalid index!"' failed.

Is that a problem with the installation or a known bug? My cuda version is 11.8, my python version is 3.8.18, my pytorch version is 2.3.0+cu118.

The text was updated successfully, but these errors were encountered:

fkouteib · 2024-04-30T23:31:52Z

hey @hforoughmand, if you want to run the version of the tutorial at tip of main branch (also published on the triton website), I recommend building/installing Triton main branch from source, or installing a nightly build (v3.0-*) per readme instructions.

If you want to run a Triton stable release (2.x) installed from PyPI (or implicitly installed with PyTorch stable release install), then I recommend you run the version of the tutorial code in the corresponding release branch (which may be different from the website).

For mat mul on V100 specifically, you may want to review open issues referencing V100. I think there may be a known regression on it for FP16, see #3478.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on test 03 #3805

Error on test 03 #3805

hforoughmand commented Apr 30, 2024

fkouteib commented Apr 30, 2024

Error on test 03 #3805

Error on test 03 #3805

Comments

hforoughmand commented Apr 30, 2024

fkouteib commented Apr 30, 2024