Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding atomix tests #308

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

adding atomix tests #308

wants to merge 3 commits into from

Conversation

leios
Copy link
Contributor

@leios leios commented Jun 13, 2022

This is a separate PR for the atomix tests from #306. Still needed tests:

  • inc
  • dec
  • and
  • or
  • xor

Known issues that need to be fixed:

  • Atomic addition does not seem to work on non-integer types for CPU
  • Atomic subtraction '-=' does not seem to work on GPUs

This was referenced Jun 14, 2022
@leios
Copy link
Contributor Author

leios commented Jun 14, 2022

#306 (comment) <--- continuing this discussion with @tkf here as it is related to getting Atomix support in KA

So my understanding is that for the short-term in KA, we need to use AtomixCUDA for CUDA operations and a newer version of UnsafeAtomics for CPU? My assumption is that this will fix the two big errors here, right?

From there, I guess it's just a matter of figuring out how to write certain tests in Atomix's non-macro interface (and macro interface, I guess).

@tkf
Copy link
Contributor

tkf commented Jun 15, 2022

My assumption is that this will fix the two big errors here, right?

What are the big errors? "Known issues that need to be fixed:" in the OP?

Atomic addition does not seem to work on non-integer types for CPU

Please see what are tested in https://github.com/JuliaConcurrent/UnsafeAtomics.jl/blob/934c34df1fd0fac0b6fa6edc07d98a05b9f98395/test/UnsafeAtomicsTests/src/test_core.jl#L9-L55. If what you want to use are not tested there, I don't think it's useful to test them here.

Atomic subtraction '-=' does not seem to work on GPUs

AtomixCUDA.jl simply calls CUDA.atomic_sub! https://github.com/JuliaConcurrent/AtomixCUDA.jl/blob/a965ec87fcd1b5250c11a4b7438cd7f947bed656/src/AtomixCUDA.jl#L39-L40

However, note that AtomixCUDA.jl completely ignores atomic ordering. Maybe the least bad approach is to dispatch to CUDA.atomic_* only with monotonic ordering.

Also, similar to the comments on the atomic ops on CPU, I think it'd be better to test that they are CI'ed on GPU first. Note that I have test working locally but I'm still waiting for buildkite to be enabled (or rather, I haven't been pinging the GPU guys enough to let them enable the build server)
JuliaConcurrent/UnsafeAtomicsLLVM.jl#3
JuliaConcurrent/AtomixCUDA.jl#1

In general, nothing special should be done in KA since atomic ops are implemented at much lower level. I feel it would be more constructive to test only that the "KA magics" do not interact badly with the atomics implementations (e.g., maybe @device_override-based implementations, once we start using them, would require a bit more caution).

@leios
Copy link
Contributor Author

leios commented Jun 15, 2022

Hey @tkf. You did not answer my questions in the previous post, I believe I may have poorly communicated my intentions here, so let me be more explicit:

Things we both agree on:

  1. Atomix can be the long-term API for KA.
  2. It's best if Atomix "just works"
  3. Whatever adding atomic support with atomix #299 did to get atomix to work in KA clearly failed, as shown by the CI here
  4. Atomic attempts 3 #306 is not a conflicting PR. It's necessary right now because Atomix is failing basic tests

Now for the questions:

How do I get -= to work on the GPU? You said the solution was to use AtomixCUDA, but also

If you want to invoke other atomic operations but prefer avoiding a PR for CUDA.jl, I feel it would be cleaner to resurrect AtomixCUDA.jl and use Atomix as the default API. This way, we can gradually drop "behind the scene hacks" like AtomixCUDA.jl and UnsafeAtomicsLLVM.jl while minimizing effects on user code.

This raises a bunch more questions than it solves. Namely:

  1. Does that mean we should be using AtomixCUDA for GPU, but UnsafeAtomicsLLVM for CPU? Do we need deviceoverrides for this?
  2. What about AMD? I guess it still uses UnsafeAtomicsLLVM as those tests are actually passing?
  3. If we are long-term relying on UnsafeAtomicsLLVM, are you saying the better PR is one for CUDA?
    The reason I asked you to comment here is because I was looking for a specific answer to these questions.

How do I get += working on floats?
Am I correct that just updating to the latest version of UnsafeAtomicsLLVM will fix this? Locally, this does not seem to work.

What do I do for the specific tests mentioned above of inc, dec, and, or, and xor?
The reason I am asking these questions is not because these tests are absolutely necessary, but because we should at least have feature parity with CUDAs API. The reason this is important is that KA has a GPU-heavy syntax. Many people will probably be starting from there. I need to know how to implement these so I can point people in the right direction for understanding how to use KA in the first place.

As direct answers, I believe it's just a modify!(...) for and, or, and xor, but what about inc and dec? In general, I find the modify function a little harder to understand than just atomic_... in the CUDA API but also recognize that Julia's Core.Intrinsics and CUDA's API are different. From the user's perspective, CUDA's is quite easy to use and understand, so I am also asking: Is there any intention to change the Atomix nonmacro API in the future to make it more straight-forward?

What does "I think it'd be better to test that they are CI'ed on GPU first" mean?

Now for some comments:

On tests

I feel it would be more constructive to test only that the "KA magics" do not interact badly with the atomics implementations (e.g., maybe @device_override-based implementations, once we start using them, would require a bit more caution).

Initially, I agreed with this statement, and was going to say that these are tests that should be in Atomix.
Now that I think more about it, I kinda disagree at a fundamental level. KA is a conglomeration of different APIs, all of which can change at any given moment in time. If we have tests to ensure that the APIs are stable here, then we don't need to go tracking down esoteric errors later.

I have personally wasted literal months of time trying to hunt down bugs in KA that would have been caught if we just had a test for API changes upstream.

More than that, the tests here are a way to show users, "Here is how you can do atomic addition. Here is how to do atomic max..." There should also be docs when these tests pass with pretty much the same functionality and showing the difference between what we currently have and CUDA. I believe this is essential for KAs wider adoption

The state of Atomix
Again, I 100% agree that Atomix should be the long-term API, but it is not properly tested or documented and it's a bit unclear how everything works together. I recognize that Atomix has better ordering than the CUDA API, but right now I am really struggling at getting it to work for any actual application. This is why I proposed (in #306 and #282) to first merge the API that is passing the tests and then revert that commit when Atomix does the same.

Overall, I am not attacking you. I am trying to fix everything that is currently broken in #299. I am also really running out of time. I have already spent months on this and know I still need to add docs once these tests pass.

@leios
Copy link
Contributor Author

leios commented Jun 15, 2022

Ok, after calling @tkf, I think we have solutions to most things:

  1. CPU tests should now pass by using UnsafeAtomics#master. I was accidentally updating UnsafeAtomicsLLVM, which is why I couldn't get the tests to pass locally.
  2. The best course of action is probably to add some device overrides to CUDA, itself for loads, stores, subtractions, etc. It's possible to use AtomixCUDA for this in KA for the short-term, but I think we can just use Atomic attempts 3 #306 for the short-term for now (even if it's unmerged, it can be used by those who need it).
  3. For other atomic operations, there are some weird unicode symbols for and, xor, or, etc, but maybe the best solution is to try @atomic xor(...) first. I'll try that in the tests.
  4. The hope is that Atomix is merged into Base Julia at some point, which will hopefully replace the CUDA API, so it's best to just standardize on it with the macro when possible. I am not sure what this means for those who want the non-macro UI because it returns the old value.
  5. The CI comment was about how CI is not set up to test GPU atomic functionality upstream.

I'll be working on what I can in the next few weeks when I have time, but since we have Atomix in KA working for most things and #306 for back-up, this PR is on a slightly lower priority.

@leios leios closed this Sep 12, 2022
@vchuravy vchuravy reopened this Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants