New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inductor cutlass backend] Enabled nonzero workspace and Cutlass StreamK #125406
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125406
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Unrelated FailureAs of commit 0c5f2bb with merge base bfd5bb0 (): NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Enable nonzero workspace and Cutlass StreamK for Inductor Cutlass GEMM ops. This is a simpler rewrite of my original version of pytorch#119005 using peterbell10 's workspace allocation mechanism from pytorch#117992 Test Plan: - Additional unit test in test_cutlass_backend.py which specifically tests StreamK GEMM with workspace requirement - CI ghstack-source-id: 24d06299f90a1e31af6b097316b76689e4944df2 Pull Request resolved: pytorch#125406
Enable nonzero workspace and Cutlass StreamK for Inductor Cutlass GEMM ops. This is a simpler rewrite of my original version of pytorch#119005 using peterbell10 's workspace allocation mechanism from pytorch#117992 Test Plan: - Additional unit test in test_cutlass_backend.py which specifically tests StreamK GEMM with workspace requirement - CI ghstack-source-id: 6b2a29b3d2754b1981b503939f79f7bc5889216e Pull Request resolved: pytorch#125406
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
The currently failing test appears flaky, judging from hud.pytorch.org - e.g. I could find the same error being reported sporadically on trunk. |
@pytorchbot merge --ignore-current |
Merge startedYour change will be merged while ignoring the following 2 checks: pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5, linux.4xlarge.nvidia.gpu), inductor / rocm6.1-py3.8-inductor / test (inductor, 1, 1, linux.rocm.gpu.2) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Enable nonzero workspace and Cutlass StreamK for Inductor Cutlass GEMM ops. This is a simpler rewrite of my original version of #119005 using peterbell10 's workspace allocation mechanism from #117992 Test Plan: - Additional unit test in test_cutlass_backend.py which specifically tests StreamK GEMM with workspace requirement - CI ghstack-source-id: eccd80aedce633b345a88eb25ca1d1149bb12756 Pull Request resolved: #125406
…124928) This diff makes sure that a custom exception is thrown when no valid choices remain during autotuning. This allows to gracefully fall back to a default choice, even if that default choice has not been passed to autotune_select_algorithm. Additionally, this diff handles RuntimeErrors during autotuning gracefully, e.g. the corresponding choice is ignored but it does not lead to the compilation failure of the entire model if a problematic choice is encountered during autotuning. ( An error is being logged, though). Test Plan: CI Pull Request resolved: #124928 Approved by: https://github.com/int3 ghstack dependencies: #125406
Stack from ghstack (oldest at bottom):
Enable nonzero workspace and Cutlass StreamK for Inductor Cutlass GEMM ops.
This is a simpler rewrite of my original version of #119005 using @peterbell10 's workspace allocation mechanism from #117992
Test Plan:
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @muchulee8 @ColinPeppler @amjames @desertfire @chauhang