Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inductor max autotune] Make autotune_select_algorithm more robust #124928

Closed
wants to merge 14 commits into from

Conversation

kadeng
Copy link
Contributor

@kadeng kadeng commented Apr 25, 2024

Stack from ghstack (oldest at bottom):

This diff makes sure that a custom exception is thrown when no valid
choices remain during autotuning. This allows to gracefully fall back
to a default choice, even if that default choice has not been passed to
autotune_select_algorithm.

Additionally, this diff handles RuntimeErrors during autotuning gracefully, e.g. the corresponding choice is ignored but it does not lead to the compilation failure of the entire model if a problematic choice is encountered during autotuning.
( An error is being logged, though).

Test Plan:
CI

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Apr 25, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124928

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 9464a45 with merge base 8a0529e (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
[ghstack-poisoned]
@kadeng kadeng added the topic: not user facing topic category label Apr 25, 2024
[ghstack-poisoned]
@kadeng kadeng requested a review from int3 May 2, 2024 10:10
@kadeng
Copy link
Contributor Author

kadeng commented May 2, 2024

@int3 I noticed you wrote a diff that's doing something similar. Will take a look and see that this one is still compatible with your changes.

Comment on lines 1123 to 1126
(not isinstance(selected_time, float))
or (selected_time < 0.0)
or (not math.isfinite(selected_time))
or math.isnan(selected_time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels a little overkill (pretty sure we only generate inf and regular floats) but not a big deal

@int3
Copy link
Contributor

int3 commented May 2, 2024

Ah I didn't see you had this in the works. Yeah, this is more or less compatible with my changes. I can add my tests after you land this.

[ghstack-poisoned]
[ghstack-poisoned]
kadeng added a commit that referenced this pull request May 2, 2024
This diff makes sure that a custom exception is thrown when no valid
choices remain during autotuning. This allows to gracefully fall back
to a default choice, even if that default choice has not been passed to
autotune_select_algorithm.

Additionally, this diff handles RuntimeErrors during autotuning gracefully, e.g. the corresponding choice is ignored but it does not lead to the compilation failure of the entire model if a problematic choice is encountered during autotuning.
( An error is being logged, though).

TODO:
 * Add unit test
 * Add an assertion that we use autune_in_subproc when CUTLASS backend is enabled

ghstack-source-id: b092fce1684a822311c5733c13c54e41b463c03e
Pull Request resolved: #124928
OnlyFor pushed a commit to OnlyFor/pytorch that referenced this pull request May 3, 2024
This diff makes sure that a custom exception is thrown when no valid
choices remain during autotuning. This allows to gracefully fall back
to a default choice, even if that default choice has not been passed to
autotune_select_algorithm.

Additionally, this diff handles RuntimeErrors during autotuning gracefully, e.g. the corresponding choice is ignored but it does not lead to the compilation failure of the entire model if a problematic choice is encountered during autotuning.
( An error is being logged, though).

TODO:
 * Add unit test
 * Add an assertion that we use autune_in_subproc when CUTLASS backend is enabled

ghstack-source-id: 42d7c737a4918b2db7af54bb43b2188615f6aecb
Pull Request resolved: pytorch#124928
[ghstack-poisoned]
kadeng added a commit that referenced this pull request May 3, 2024
This diff makes sure that a custom exception is thrown when no valid
choices remain during autotuning. This allows to gracefully fall back
to a default choice, even if that default choice has not been passed to
autotune_select_algorithm.

Additionally, this diff handles RuntimeErrors during autotuning gracefully, e.g. the corresponding choice is ignored but it does not lead to the compilation failure of the entire model if a problematic choice is encountered during autotuning.
( An error is being logged, though).

TODO:
 * Add unit test
 * Add an assertion that we use autune_in_subproc when CUTLASS backend is enabled

ghstack-source-id: d735ec8f3f9951c90061e458dba3a7839c0b6ff3
Pull Request resolved: #124928
[ghstack-poisoned]
kadeng added a commit that referenced this pull request May 3, 2024
This diff makes sure that a custom exception is thrown when no valid
choices remain during autotuning. This allows to gracefully fall back
to a default choice, even if that default choice has not been passed to
autotune_select_algorithm.

Additionally, this diff handles RuntimeErrors during autotuning gracefully, e.g. the corresponding choice is ignored but it does not lead to the compilation failure of the entire model if a problematic choice is encountered during autotuning.
( An error is being logged, though).

TODO:
 * Add unit test
 * Add an assertion that we use autune_in_subproc when CUTLASS backend is enabled

ghstack-source-id: ebfa46c5f1bf89cffe6e05658bb7f151ac172e0a
Pull Request resolved: #124928
[ghstack-poisoned]
kadeng added a commit that referenced this pull request May 3, 2024
This diff makes sure that a custom exception is thrown when no valid
choices remain during autotuning. This allows to gracefully fall back
to a default choice, even if that default choice has not been passed to
autotune_select_algorithm.

Additionally, this diff handles RuntimeErrors during autotuning gracefully, e.g. the corresponding choice is ignored but it does not lead to the compilation failure of the entire model if a problematic choice is encountered during autotuning.
( An error is being logged, though).

TODO:
 * Add unit test
 * Add an assertion that we use autune_in_subproc when CUTLASS backend is enabled

ghstack-source-id: b9fffcd75ecc5d58eeb87b494119efa7db859c2a
Pull Request resolved: #124928
[ghstack-poisoned]
kadeng added a commit that referenced this pull request May 3, 2024
This diff makes sure that a custom exception is thrown when no valid
choices remain during autotuning. This allows to gracefully fall back
to a default choice, even if that default choice has not been passed to
autotune_select_algorithm.

Additionally, this diff handles RuntimeErrors during autotuning gracefully, e.g. the corresponding choice is ignored but it does not lead to the compilation failure of the entire model if a problematic choice is encountered during autotuning.
( An error is being logged, though).

TODO:
 * Add unit test
 * Add an assertion that we use autune_in_subproc when CUTLASS backend is enabled

ghstack-source-id: b9fffcd75ecc5d58eeb87b494119efa7db859c2a
Pull Request resolved: #124928
[ghstack-poisoned]
kadeng added a commit that referenced this pull request May 3, 2024
This diff makes sure that a custom exception is thrown when no valid
choices remain during autotuning. This allows to gracefully fall back
to a default choice, even if that default choice has not been passed to
autotune_select_algorithm.

Additionally, this diff handles RuntimeErrors during autotuning gracefully, e.g. the corresponding choice is ignored but it does not lead to the compilation failure of the entire model if a problematic choice is encountered during autotuning.
( An error is being logged, though).

TODO:
 * Add unit test
 * Add an assertion that we use autune_in_subproc when CUTLASS backend is enabled

ghstack-source-id: ed4af3cb67af81ce438faf062b20c166e5625a36
Pull Request resolved: #124928
[ghstack-poisoned]
kadeng added a commit that referenced this pull request May 4, 2024
This diff makes sure that a custom exception is thrown when no valid
choices remain during autotuning. This allows to gracefully fall back
to a default choice, even if that default choice has not been passed to
autotune_select_algorithm.

Additionally, this diff handles RuntimeErrors during autotuning gracefully, e.g. the corresponding choice is ignored but it does not lead to the compilation failure of the entire model if a problematic choice is encountered during autotuning.
( An error is being logged, though).

TODO:
 * Add unit test
 * Add an assertion that we use autune_in_subproc when CUTLASS backend is enabled

ghstack-source-id: fd9d31206fe483b2d6ed22d61390a5de08510ace
Pull Request resolved: #124928
@kadeng kadeng marked this pull request as ready for review May 4, 2024 05:12
[ghstack-poisoned]
kadeng added a commit that referenced this pull request May 4, 2024
This diff makes sure that a custom exception is thrown when no valid
choices remain during autotuning. This allows to gracefully fall back
to a default choice, even if that default choice has not been passed to
autotune_select_algorithm.

Additionally, this diff handles RuntimeErrors during autotuning gracefully, e.g. the corresponding choice is ignored but it does not lead to the compilation failure of the entire model if a problematic choice is encountered during autotuning.
( An error is being logged, though).

ghstack-source-id: 904251b11a684054ab1c734a307d5a0b0f910a05
Pull Request resolved: #124928
Copy link
Contributor

@int3 int3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kadeng
Copy link
Contributor Author

kadeng commented May 5, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 5, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

[ghstack-poisoned]
kadeng added a commit that referenced this pull request May 5, 2024
This diff makes sure that a custom exception is thrown when no valid
choices remain during autotuning. This allows to gracefully fall back
to a default choice, even if that default choice has not been passed to
autotune_select_algorithm.

Additionally, this diff handles RuntimeErrors during autotuning gracefully, e.g. the corresponding choice is ignored but it does not lead to the compilation failure of the entire model if a problematic choice is encountered during autotuning.
( An error is being logged, though).

ghstack-source-id: 568b81be89e1a2436efc86e5b98497a1641268a8
Pull Request resolved: #124928
@kadeng
Copy link
Contributor Author

kadeng commented May 5, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/kadeng/57/head branch June 5, 2024 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants