Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add float32 implementation of min/max/sum #39

Merged
merged 2 commits into from
Aug 25, 2019
Merged

Conversation

mratsim
Copy link
Owner

@mratsim mratsim commented Aug 25, 2019

This creates a proper API for reduction primitives min/max/sum to address #36.

It's 80x faster than naive reduction on my 18 cores machines:

https://github.com/numforge/laser/blob/f4930cb03f9bf8ec4180f8c34d7b12552c3ebb08/benchmarks/fp_reduction_latency/reduction_max_bench.nim

Warmup: 0.9007 s, result 224 (displayed to avoid compiler optimizing warmup away)

Max reduction - prod impl - float32
Collected 1000 samples in 0.250 seconds
Average time: 0.248 ms
Stddev  time: 0.641 ms
Min     time: 0.149 ms
Max     time: 8.449 ms
Theoretical perf: 40287.484 MFLOP/s

Display sum of samples sums to make sure it's not optimized away
0.9999996423721313

Reduction - 1 accumulator - simple iter - float32
Collected 1000 samples in 18.544 seconds
Average time: 18.543 ms
Stddev  time: 0.234 ms
Min     time: 18.470 ms
Max     time: 25.110 ms
Theoretical perf: 539.277 MFLOP/s

Display sum of samples sums to make sure it's not optimized away
0.9999996423721313

Reduction - 1 accumulator - macro iter - float32
Collected 1000 samples in 18.603 seconds
Average time: 18.602 ms
Stddev  time: 0.037 ms
Min     time: 18.472 ms
Max     time: 18.687 ms
Theoretical perf: 537.569 MFLOP/s

Display sum of samples sums to make sure it's not optimized away
0.9999996423721313

Reduction - 2 accumulators - simple iter - float32
Collected 1000 samples in 10.287 seconds
Average time: 10.286 ms
Stddev  time: 0.046 ms
Min     time: 10.212 ms
Max     time: 10.451 ms
Theoretical perf: 972.164 MFLOP/s

Display sum of samples sums to make sure it's not optimized away
0.9999996423721313

Reduction - 3 accumulators - simple iter - float32
Collected 1000 samples in 7.722 seconds
Average time: 7.721 ms
Stddev  time: 0.094 ms
Min     time: 7.574 ms
Max     time: 8.015 ms
Theoretical perf: 1295.233 MFLOP/s

Display sum of samples sums to make sure it's not optimized away
0.9999996423721313

Reduction - 4 accumulators - simple iter - float32
Collected 1000 samples in 6.062 seconds
Average time: 6.061 ms
Stddev  time: 0.055 ms
Min     time: 5.965 ms
Max     time: 6.221 ms
Theoretical perf: 1649.943 MFLOP/s

Display sum of samples sums to make sure it's not optimized away
0.9999994039535522

Reduction - 5 accumulators - simple iter - float32
Collected 1000 samples in 5.506 seconds
Average time: 5.505 ms
Stddev  time: 0.058 ms
Min     time: 5.395 ms
Max     time: 5.796 ms
Theoretical perf: 1816.395 MFLOP/s

Display sum of samples sums to make sure it's not optimized away
0.9999996423721313

@mratsim mratsim merged commit 2f619fd into master Aug 25, 2019
@mratsim mratsim mentioned this pull request Aug 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant