-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significantly slower broadcasting #356
Comments
Just to clarify: the issue disappears when working with normal arrays, as in julia> as = rand(10_000);
julia> bs = similar(as);
julia> @benchmark (@. $bs = $as / (1.0 + $as^2))
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
Range (min … max): 4.668 μs … 20.108 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 6.278 μs ┊ GC (median): 0.00%
Time (mean ± σ): 6.231 μs ± 1.076 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▇ ▁ ▅▁▇▄█▄▇▃▆▁▂▁▁ ▂
██████▃██▁███████████████▇▇▇▆▆▆▇▇▇▇▇▇▇▇▇▇▇▅▆▆▇▆▄▅▆▆▆▅▆▅▅▄▆ █
4.67 μs Histogram: log(frequency) by time 10.8 μs <
Memory estimate: 0 bytes, allocs estimate: 0. It also disappears if, instead of broadcasting operations, I use explicit loops: julia> function f!(ys, xs)
for i ∈ eachindex(xs)
@inbounds ys[i] = xs[i] / (first(xs[i])^2 + last(xs[i])^2)
end
ys
end
f! (generic function with 1 method)
julia> @benchmark f!($ys, $xs)
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
Range (min … max): 8.789 μs … 39.339 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 9.053 μs ┊ GC (median): 0.00%
Time (mean ± σ): 9.876 μs ± 2.016 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▇▅▅▄▅▅▃▄ ▂ ▃ ▂ ▂ ▂ ▁▄ ▃ ▄ ▂
█████████▅█▃██▃▁█▁█▁▇▆▁█▃█▁▃█▁▁█▁▄██▅█▆█▆▆▅█▅▄▄▃▆▄▅▃▃▁▁▄▄▄ █
8.79 μs Histogram: log(frequency) by time 16.5 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark f!($oys, $oxs)
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
Range (min … max): 8.808 μs … 44.438 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 9.446 μs ┊ GC (median): 0.00%
Time (mean ± σ): 10.056 μs ± 1.749 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▆▂▇ ▇ ▅ ▇ ▇▂ ▄ ▃ ▃ ▁ ▁ ▅▂ ▄ ▁ ▂
█▅███▅█▆██▆█▄██▃█▁▄█▃▁█▁▁█▁▁█▇▁▁█▃▁▃█▃▁█▇▁▁▁█▁▅▅██▆▆▅█▅▅▆▆█ █
8.81 μs Histogram: log(frequency) by time 14.5 μs <
Memory estimate: 0 bytes, allocs estimate: 0. |
I think the issue is that |
Thank you @jishnub , thank you for your reply. Looking at the source code, I saw a link to a similar issue reported here: I am not entirely sure the situation is identical... |
I have been testing the speed of broadcast operations with OffsetArrays of StaticArrays, and it looks like there is a significant penalty in time. In fact, on my laptop I see
When using OffsetArrays instead these are the results:
The$> 5 \times$ penalty is even larger than the SIMD vector size of my laptop (4 for
Float64
arrays).Any help or clarification is really appreciated!
The text was updated successfully, but these errors were encountered: