[QUESTION] Why `Warp`'s tile-based matmul is much slower than torch's one? #461

chaoming0625 · 2025-01-27T03:25:34Z

I have tried the 1.6.0 version of warp, and tested the performance between tile-based matmul in warp and matmul of torch. I found that the performance of warp seems to be very slow. I am wondering why and in the future it is possible to solve this performance discrepancy?

TILE_M       TILE_N       TILE_K       BLOCK        Warp Time    Torch Time   Relative    
64           64           64           256          981.684936   363.559419   2.7002049312879994
64           64           64           512          1121.447108  363.559419   3.084632248243306
64           64           64           1024         1146.522702  363.559419   3.153604726164446
64           64           128          256          1436.224992  363.559419   3.950454635312309
64           64           128          512          1050.912843  363.559419   2.890621967354393
64           64           128          1024         1039.730605  363.559419   2.859864304602159
64           128          64           256          1321.610127  363.559419   3.6351970487663254
64           128          64           512          1231.751565  363.559419   3.388033704058703
64           128          64           1024         1123.240676  363.559419   3.089565604130311

Thanks.

The text was updated successfully, but these errors were encountered:

shi-eric · 2025-01-27T16:29:48Z

Hi @chaoming0625, there are various improvements on the way to close the performance gap between cuBLAS and cuBLASDx. Could you please share complete details about your benchmark so that we can understand this comparison better?

GPU
Memory clock, SM clock
Data type
Matrix size
Benchmark script

chaoming0625 added the question The issue author requires information label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Why `Warp`'s tile-based matmul is much slower than torch's one? #461

[QUESTION] Why `Warp`'s tile-based matmul is much slower than torch's one? #461

chaoming0625 commented Jan 27, 2025

shi-eric commented Jan 27, 2025

[QUESTION] Why Warp's tile-based matmul is much slower than torch's one? #461

[QUESTION] Why Warp's tile-based matmul is much slower than torch's one? #461

Comments

chaoming0625 commented Jan 27, 2025

shi-eric commented Jan 27, 2025

[QUESTION] Why `Warp`'s tile-based matmul is much slower than torch's one? #461

[QUESTION] Why `Warp`'s tile-based matmul is much slower than torch's one? #461