Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialized vector rand! for many distributions #1879

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

quildtide
Copy link
Contributor

@quildtide quildtide commented Jul 25, 2024

The much-larger sequel to #1269 and #1680

All claims tested on julia 1.11.0-rc1, Windows 10, AMD Ryzen 9 5900X, RTX 3080

I was originally planning on making the normal dispatch path faster for all univariate distributions on rand!, but it turns out that the current dispatch path is probably about as fast as we will ever get with a generic rand -> rand! fallback path. I tested a few alternatives and they were all similar or slower.

The root cause of why this dispatch path is slower than specialized vector methods is because Julia's base rand/randn/randexp functions have rand! specializations for both Xoshiro and Mersenne that perform much faster than calling scalar rand many times. Xoshiro actually has a separate SIMD implementation in Random.jl and both Mersenne and Xoshiro branch based off of vector length to choose optimal algorithms.

Creating specialized rand! methods yields some pretty impressive performance gains as a result. I plan on continuing further with this, but this pull request contains:

Vectorized rand! specializations for:

  • Exponential (~67% faster)
  • Laplace (~185% faster)
  • Pareto (~86% faster)
  • InverseGaussian (~102% faster)
  • LogitNormal (~46% faster)
  • LogNormal (~234% faster)
  • NormalCanon (~108% faster)
  • PGeneralizedGaussian (~60% faster)

Benchmarking was done in a very ad hoc way; I grabbed 3 values from @time, excluding any runs that included compilation or GC time. This is a small sample size, but results were pretty consistent and I don't really feel a need to prove significance when they're all consistently >40% faster.

Positive side-effects of vectorized specializations

Improved GPU compatibility

I don't have any non-CUDA GPUs to test with, so testing has been CUDA-only, but these things are probably true for other GPU types too.

CUDA.jl does not like the for-loop in the generic fallback rand! dispatch path in Distributions.jl. However, specialized dispatch paths can help alleviate this issue.

These distributions now have functional out-of-the-box CUDA.jl support for rand!:

  • LogitNormal
  • LogNormal
  • NormalCanon

These distributions would probably work with the addition of a randexp implementation in CUDA.jl:

  • Exponential
  • Pareto
  • Laplace

rand!(::PGeneralizedGaussian) is dependent on rand(::Gamma). CUDA.jl does not like functions that call rand(::Gamma).

I'm not sure why rand(::InverseGaussian) isn't working. I'll have to revisit that later.

Improved Dagger.jl compatibility

The current default fallback rand! methods perform so poorly on Dagger.jl that they cannot even really be benchmarked. This is mostly due to issues on the Dagger.jl side that are currently being addressed. I only did a tiny bit of testing with these new methods, but they do perform better than the generic fallbacks.

Minor adjustments to scalar rand methods:

While I was implementing the specialized rand! methods, I noticed opportunities to make small changes to some scalar rand methods that would, in theory, yield small performance or accuracy benefits. In practice, none of these changes seem to have significant observable changes in speed:

  • LogitNormal uses muladd now
  • Lognormal uses muladd now
  • Normal uses muladd now
  • PGeneralizedGaussian calls rand(rng, Bool) instead of rand(rng) < 0.5 now

Additional bloating for the test suite

One negative drawback to increased specialization of methods is that there's a larger surface for bugs to emerge. This is doubly so when I'm also making small adjustments to the scalar rand methods.

I compensated by adding new functionality to test_distr and test_samples (enabled by keyword) that allows these functions to call the scalar rand functions multiple times. I also added test_distr entries for all of the distributions that I've added new methods for.

Unfortunately, I'm pretty sure a lot of the testing I added also duplicates work done by existing tests. 🤷

I didn't remove any existing tests because I was unsure exactly which parts were redundant. I think code coverage probably went up a bit as a side effect.

I also had to make minor tolerance adjustments to 2 tests.

Future Plans

A lot of distributions are dependent on rand(::Gamma), which is non-trivial to translate to a vectorized form. Accomplishing this would probably enable out-of-the-box GPU support and large performance gains for all dependent distributions.

Most of the distributions addressed in this pull request were picked because I thought they would be trivial or easy to implement, but there's still a ton of others remaining in this category.

@quildtide
Copy link
Contributor Author

This pull request will probably cause a repeat of #1687 on a larger scale, but I don't think it's really an issue, as this is consistent with Julia behavior. Weird trivia example regarding Random.jl:

rand(Xoshiro(123), 7) == rand(Xoshiro(123), 7)

rand(Xoshiro(123), 7) != rand(Xoshiro(123), 8)[1:7]

@codecov-commenter
Copy link

codecov-commenter commented Jul 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.08%. Comparing base (e1340f0) to head (0746991).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1879      +/-   ##
==========================================
+ Coverage   85.99%   86.08%   +0.08%     
==========================================
  Files         144      144              
  Lines        8666     8693      +27     
==========================================
+ Hits         7452     7483      +31     
+ Misses       1214     1210       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@quildtide
Copy link
Contributor Author

@devmotion Could I get this reviewed if the scope of the pull request is not too wide?

src/univariate/continuous/exponential.jl Outdated Show resolved Hide resolved
src/univariate/continuous/inversegaussian.jl Outdated Show resolved Hide resolved
src/univariate/continuous/inversegaussian.jl Outdated Show resolved Hide resolved
src/univariate/continuous/laplace.jl Outdated Show resolved Hide resolved
src/univariate/continuous/laplace.jl Outdated Show resolved Hide resolved
src/univariate/continuous/lognormal.jl Outdated Show resolved Hide resolved
src/univariate/continuous/normal.jl Outdated Show resolved Hide resolved
src/univariate/continuous/pgeneralizedgaussian.jl Outdated Show resolved Hide resolved
src/univariate/continuous/pareto.jl Outdated Show resolved Hide resolved
src/univariate/continuous/normalcanon.jl Outdated Show resolved Hide resolved
@quildtide quildtide marked this pull request as draft August 18, 2024 00:03
@quildtide quildtide marked this pull request as ready for review August 18, 2024 00:30
@devmotion
Copy link
Member

devmotion commented Aug 26, 2024

@quildtide can you fix the conflict? I'm busy until Wednesday this week but please ping me on Friday in case I haven't reviewed the PR by then.

@quildtide
Copy link
Contributor Author

@devmotion

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add newline characters at the end of the newly added (and modified) pages? Thank you!

test/testutils.jl Outdated Show resolved Hide resolved
test/testutils.jl Outdated Show resolved Hide resolved
test/testutils.jl Outdated Show resolved Hide resolved
test/testutils.jl Outdated Show resolved Hide resolved
test/testutils.jl Outdated Show resolved Hide resolved
test/univariate/continuous/exponential.jl Outdated Show resolved Hide resolved
test/univariate/continuous/logitnormal.jl Outdated Show resolved Hide resolved
test/univariate/continuous/lognormal.jl Outdated Show resolved Hide resolved
test/univariate/continuous/normalcanon.jl Outdated Show resolved Hide resolved
test/univariate/continuous/pareto.jl Outdated Show resolved Hide resolved
@quildtide
Copy link
Contributor Author

@devmotion Requested changes have been made.

@@ -28,8 +28,7 @@ end
# testing the implementation of a discrete univariate distribution
#
function test_distr(distr::DiscreteUnivariateDistribution, n::Int;
testquan::Bool=true)

testquan::Bool=true, rng::AbstractRNG=MersenneTwister(123))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the rng needed?

@@ -583,6 +623,7 @@ end
allow_test_stats(d::UnivariateDistribution) = true
allow_test_stats(d::NoncentralBeta) = false
allow_test_stats(::StudentizedRange) = false
allow_test_stats(::LogitNormal) = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what the problem with LogitNormal was that you had to fix here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants