Replies: 0 comments 1 reply
-
Looking at your code and comparing the generated PTX, it looks like when using a type other than https://godbolt.org/z/TGY1qs8x8 An infinite loop results in undefined behavior, so the compiler is free to throw away the kernel entirely, which is what is happening here. I don't know why it doesn't do this for For reference, the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
So this blocks the process on CPU and acts as it would be a sync for_each. If I remove the half or replace with float, it works as expected.
Beta Was this translation helpful? Give feedback.
All reactions