-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boilerplate metrics #235
Comments
I'll try to respond in more detail later today when I have time, but for now I'd say the biggest issue is that we are missing basic tools for memory management that would allow us to generically write even simple methods |
I'm going to start with a simple method that should help illustrate some of the problems I've run into. using BenchmarkTools
memory_reference(A) = _memory_reference(A, parent(A))
_memory_reference(::A, b::A) where {A} = b
_memory_reference(::A, b::B) where {A,B} = memory_reference(b)
@inline function fast_copy(src::AbstractArray{T,3}) where {T}
buf = memory_reference(src)
stride_1, stride_2, stride_3 = strides(src)
size_1, size_2, size_3 = size(src)
dest = Array{T}(undef, size(src))
dest_index = 1
i3 = 0
@inbounds while i3 < (size_3 * stride_3)
i2 = 0
while i2 < (size_2 * stride_2)
i1 = 0
while i1 < (size_1 * stride_1)
dest[dest_index] = buf[i1 + i2 + i3 + 1]
dest_index += 1
i1 += stride_1
end
i2 += stride_2
end
i3 += stride_3
end
return dest
end
x = PermutedDimsArray(rand(3, 3, 3), (3, 1, 2));
@btime copy($x); # 99.267 ns (1 allocation: 272 bytes)
@btime fast_copy($x); # 48.823 ns (1 allocation: 272 bytes) Although
We really need all of these components in place to provide concise solutions to most problems. |
I've started putting together a more formal testing grounds for some related ideas I've been toying with here in case anyone is interested (well, more formal than intangible ideas). |
As I understand, the main goal of this is to easily express algorithms that can use either standard dispatch or generated functions, depending on what's known statically. Writing very general high-performance code for a given situation is always possible; the problem is that it can sometimes require lots of boilerplate.
Would it be helpful to make this explicit? A given proposal could be weighed based on its effect on reducing boilerplate for high-performance code that works for both static and dynamic cases. Maybe there could even be a small collection of running examples. Complete code isn't necessary for this, what matters is dispatch patterns and efficiently (in terms of human time) passing the right information for generated functions.
The text was updated successfully, but these errors were encountered: