[Inductor cutlass backend] Enabled nonzero workspace and Cutlass StreamK #125406

kadeng · 2024-05-02T16:54:19Z

Stack from ghstack (oldest at bottom):

Enable nonzero workspace and Cutlass StreamK for Inductor Cutlass GEMM ops.

This is a simpler rewrite of my original version of #119005 using @peterbell10 's workspace allocation mechanism from #117992

Test Plan:

Additional unit test in test_cutlass_backend.py which specifically tests StreamK GEMM with workspace requirement
CI

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-05-02T16:54:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125406

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 0c5f2bb with merge base bfd5bb0 ():

NEW FAILURE - The following job has failed:

pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5, linux.4xlarge.nvidia.gpu) (gh)
test_linalg.py::TestLinalgCUDA::test_svd_cuda_complex64

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / rocm6.1-py3.8-inductor / test (inductor, 1, 1, linux.rocm.gpu.2) (gh) (trunk failure)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

Enable nonzero workspace and Cutlass StreamK for Inductor Cutlass GEMM ops. This is a simpler rewrite of my original version of pytorch#119005 using peterbell10 's workspace allocation mechanism from pytorch#117992 Test Plan: - Additional unit test in test_cutlass_backend.py which specifically tests StreamK GEMM with workspace requirement - CI ghstack-source-id: 24d06299f90a1e31af6b097316b76689e4944df2 Pull Request resolved: pytorch#125406

[ghstack-poisoned]

Enable nonzero workspace and Cutlass StreamK for Inductor Cutlass GEMM ops. This is a simpler rewrite of my original version of pytorch#119005 using peterbell10 's workspace allocation mechanism from pytorch#117992 Test Plan: - Additional unit test in test_cutlass_backend.py which specifically tests StreamK GEMM with workspace requirement - CI ghstack-source-id: 6b2a29b3d2754b1981b503939f79f7bc5889216e Pull Request resolved: pytorch#125406

kadeng · 2024-05-05T15:17:00Z

@pytorchbot merge

pytorchmergebot · 2024-05-05T15:18:52Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-05-05T15:19:02Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5, linux.4xlarge.nvidia.gpu)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

kadeng · 2024-05-05T15:26:38Z

The currently failing test appears flaky, judging from hud.pytorch.org - e.g. I could find the same error being reported sporadically on trunk.

kadeng · 2024-05-05T15:26:41Z

@pytorchbot merge --ignore-current

pytorchmergebot · 2024-05-05T15:28:22Z

Merge started

Your change will be merged while ignoring the following 2 checks: pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5, linux.4xlarge.nvidia.gpu), inductor / rocm6.1-py3.8-inductor / test (inductor, 1, 1, linux.rocm.gpu.2)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Enable nonzero workspace and Cutlass StreamK for Inductor Cutlass GEMM ops. This is a simpler rewrite of my original version of #119005 using peterbell10 's workspace allocation mechanism from #117992 Test Plan: - Additional unit test in test_cutlass_backend.py which specifically tests StreamK GEMM with workspace requirement - CI ghstack-source-id: eccd80aedce633b345a88eb25ca1d1149bb12756 Pull Request resolved: #125406

…124928) This diff makes sure that a custom exception is thrown when no valid choices remain during autotuning. This allows to gracefully fall back to a default choice, even if that default choice has not been passed to autotune_select_algorithm. Additionally, this diff handles RuntimeErrors during autotuning gracefully, e.g. the corresponding choice is ignored but it does not lead to the compilation failure of the entire model if a problematic choice is encountered during autotuning. ( An error is being logged, though). Test Plan: CI Pull Request resolved: #124928 Approved by: https://github.com/int3 ghstack dependencies: #125406

Update

1518750

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels May 2, 2024

kadeng requested review from int3, jansel, ipiszy, eellison and peterbell10 May 2, 2024 17:08

kadeng marked this pull request as ready for review May 2, 2024 17:09

Update

872cd19

[ghstack-poisoned]

Update

ccad876

[ghstack-poisoned]

kadeng added topic: not user facing topic category ciflow/trunk Trigger trunk jobs on your pull request labels May 3, 2024

kadeng added 4 commits May 3, 2024 16:15

Update

7e0ae9b

[ghstack-poisoned]

Update

9acebf4

[ghstack-poisoned]

Update

9d7634e

[ghstack-poisoned]

Update

0c5f2bb

[ghstack-poisoned]

jansel approved these changes May 4, 2024

View reviewed changes

pytorchmergebot added the merging label May 5, 2024

pytorchmergebot removed the merging label May 5, 2024

pytorchmergebot added the merging label May 5, 2024

pytorchmergebot closed this in 10f6735 May 5, 2024

pytorchmergebot added Merged and removed merging labels May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inductor cutlass backend] Enabled nonzero workspace and Cutlass StreamK #125406

[Inductor cutlass backend] Enabled nonzero workspace and Cutlass StreamK #125406

kadeng commented May 2, 2024 •

edited

pytorch-bot bot commented May 2, 2024 •

edited

kadeng commented May 5, 2024

pytorchmergebot commented May 5, 2024

pytorchmergebot commented May 5, 2024

kadeng commented May 5, 2024

kadeng commented May 5, 2024

pytorchmergebot commented May 5, 2024

[Inductor cutlass backend] Enabled nonzero workspace and Cutlass StreamK #125406

[Inductor cutlass backend] Enabled nonzero workspace and Cutlass StreamK #125406

Conversation

kadeng commented May 2, 2024 • edited

pytorch-bot bot commented May 2, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125406

❌ 1 New Failure, 1 Unrelated Failure

kadeng commented May 5, 2024

pytorchmergebot commented May 5, 2024

Merge started

pytorchmergebot commented May 5, 2024

Merge failed

kadeng commented May 5, 2024

kadeng commented May 5, 2024

pytorchmergebot commented May 5, 2024

Merge started

kadeng commented May 2, 2024 •

edited

pytorch-bot bot commented May 2, 2024 •

edited