You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's me again. That guy who keeps finding bugs when using sum. In Polyester, in KernelAbstractions, and now in OffsetArrays. Sigh...
Here is a case which slows down a loop by 25x(!) when using a sum over an OffsetArray. Expanding the sum into a for loop restores the normal performance when using Array.
## Set up
# Cartesian step
@inline δ(i, I::CartesianIndex{m}) where{m} = CartesianIndex(ntuple(j -> j==i ? 1 : 0, m))
# Finite difference
@inline ∂(a,I::CartesianIndex{m},u::AbstractArray{T,n}) where {T,n,m} = @inbounds u[I+δ(a,I),a]-u[I,a]
# Divergence using sum function
@fastmath @inline div_sum(I::CartesianIndex{m},u) where {m} = sum(@inbounds ∂(i,I,u) for i ∈ 1:m)
# Divergence expanded using for loop
@fastmath @inline function div_expanded(I::CartesianIndex{m},u) where {m}
init=zero(eltype(u))
for i in 1:m
init += @inbounds ∂(i,I,u)
end
return init
end
# loop over the range R
loop!(div,u,f,R) = @inbounds @simd for I ∈ R
div[I] = f(I,u)
end
## Test case
# Benchmark with Arrays
N = (2^10+2,2^10+2) # size of the array including halo
R = CartesianIndices(map(n->n-2,N)).+CartesianIndex(map(n->1,N)) # range inside the halo
div = zeros(Float32,N);
u = rand(Float32,N...,length(N));
using BenchmarkTools
@btime loop!($div,$u,$div_sum,$R) # 188.200 μs (0 allocations: 0 bytes)
@btime loop!($div,$u,$div_expanded,$R) # 193.400 μs (0 allocations: 0 bytes)
# Benchmark with OA
using OffsetArrays: Origin
divOA = Origin(0)(div);
uOA = Origin((zeros(Int,length(N))...,1))(u);
ROA = CartesianIndices(map(n->n-2,N))
@btime loop!($divOA,$uOA,$div_expanded,$ROA) # 200.000 μs (0 allocations: 0 bytes)
@btime loop!($divOA,$uOA,$div_sum,$ROA) # 4.960 ms (0 allocations: 0 bytes) WTH!?!
The text was updated successfully, but these errors were encountered:
julia>@btimemapfoldl(identity, +, (∂(1, I, $uOA) for I inCartesianIndices((1:10, 1:1))));
54.481 ns (0 allocations:0 bytes)
julia>@btimemapfoldl(identity, +, (∂(1, I, $u) for I inCartesianIndices((1:10, 1:1))));
19.071 ns (0 allocations:0 bytes)
It seems the iteration is slow
julia>@btime (u ->iterate(∂(1, I, u) for I inCartesianIndices((1:10, 1:1))))($uOA);
12.965 ns (0 allocations:0 bytes)
julia>@btime (u ->iterate(∂(1, I, u) for I inCartesianIndices((1:10, 1:1))))($u);
3.365 ns (0 allocations:0 bytes)
Oddly,
julia>@btime∂(1, CartesianIndex(1,1), $uOA);
3.381 ns (0 allocations:0 bytes)
julia>@btime (I ->∂(1, I, $uOA))(CartesianIndex(1,1));
7.636 ns (0 allocations:0 bytes)
It's me again. That guy who keeps finding bugs when using
sum
. In Polyester, in KernelAbstractions, and now in OffsetArrays. Sigh...Here is a case which slows down a loop by 25x(!) when using a
sum
over an OffsetArray. Expanding thesum
into afor
loop restores the normal performance when usingArray
.The text was updated successfully, but these errors were encountered: