-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference failure when inlining #1602
Comments
Maybe even more interesting, removing Base.Broadcast.broadcasted(
fs::AbstractFieldStyle,
::Type{T},
args...,
) where {T} = Base.Broadcast.broadcasted(fs, (x...) -> T(x...), args...) entirely not only fixes the issue, but the function is 20% faster and seems to have less noise. I'll see if this can be removed. It was added in a very large PR (#767), so it's possible that it was a kluge to get things working. |
Let's add an allocation unit test to reproduce this. |
Can you clarify? Are you saying that there is a problem when an |
It looks like this is supposed to handle constructions like Something like this should not be called during time-stepping. It might be even better to avoid this kind of pattern altogether, if possible. Not sure what the use case is. |
It should no longer be required, now that JuliaGPU/CUDA.jl#2000 is in CUDA.jl |
Yes, when we inline Thermodynamic's
Yes but, interestingly, this function Base.Broadcast.broadcasted(
fs::Base.BroadcastStyle,
::Type{T},
args...,
) where {T}
println("Boom! $T, $args")
type_cast(x...) = T(x...)
Base.Broadcast.broadcasted(fs, type_cast, args...)
end
# Base's definition (FYI):
# function Base.Broadcast.broadcasted(
# fs::Base.BroadcastStyle,
# f::T,
# args...,
# ) where {T}
# Base.Broadcast.Broadcasted(fs, f, args...)
# end
struct Foo{T}
x::T
end
struct Bar{T}
x::T
end
struct Baz{T}
x::T
end
function Bar(x::Int)
print("wow ")
Bar{Float64}(x);
end;
function get_Baz(x::Int)
print("nice ")
Baz{Float64}(x);
end;
function main_foo!(x)
x .= x .+ getproperty.(Foo.(x), :x)
return nothing
end;
function main_bar!(x)
x .= x .+ getproperty.(Bar.(x), :x) # this Bar is a method, not the type!
return nothing
end;
function main_baz!(x)
x .= x .+ getproperty.(get_Baz.(x), :x) # broadcasted does not catch it if there is no method
return nothing
end;
a = [1,2,3];
main_foo!(a);
main_bar!(a);
main_baz!(a);
Agreed, I think we can remove it, we just need to fix the resulting issues. Which, at the moment is (from this build): ERROR: LoadError: MethodError: adapt_structure(::CUDA.KernelAdaptor, ::Base.Broadcast.Broadcasted{ClimaCore.Fields.FieldStyle{ClimaCore.DataLayouts.VIJFHStyle{4, CUDA.CuArray{Float32, N, CUDA.Mem.DeviceBuffer} where N}}, ClimaCore.Operators.CenterPlaceholderSpace, Type{ClimaCore.Geometry.Contravariant3Vector}, Tuple{ClimaCore.Fields.Field{ClimaCore.DataLayouts.VIJFH{ClimaCore.Geometry.Covariant12Vector{Float32}, 4, SubArray{Float32, 5, CUDA.CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}}, ClimaCore.Operators.PlaceholderSpace}, ClimaCore.Fields.Field{ClimaCore.DataLayouts.VIJFH{ClimaCore.Geometry.LocalGeometry{(1, 2, 3), ClimaCore.Geometry.LatLongZPoint{Float32}, Float32, StaticArraysCore.SMatrix{3, 3, Float32, 9}}, 4, CUDA.CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}}, ClimaCore.Operators.PlaceholderSpace}}}) is ambiguous.
|
| Candidates:
| adapt_structure(to::CUDA.KernelAdaptor, bc::Base.Broadcast.Broadcasted{Style, <:Any, Type{T}}) where {Style, T}
| @ CUDA /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/packages/CUDA/rXson/src/compiler/execution.jl:174
| adapt_structure(to, bc::Base.Broadcast.Broadcasted{Style}) where Style<:ClimaCore.Fields.AbstractFieldStyle
| @ ClimaCore.Fields /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/packages/ClimaCore/SnmZh/src/Fields/broadcast.jl:38
|
| Possible fix, define
| adapt_structure(::CUDA.KernelAdaptor, ::Base.Broadcast.Broadcasted{…} where Axes) where {…}
|
| Stacktrace:
| [1] adapt(to::CUDA.KernelAdaptor, x::Base.Broadcast.Broadcasted{…})
| @ Adapt /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/packages/Adapt/cdaEv/src/Adapt.jl:40
| [2] (::Base.Fix1{…})(y::Base.Broadcast.Broadcasted{…})
| @ Base ./operators.jl:1118
| [3] map(f::Base.Fix1{…}, t::Tuple{…})
| @ Base ./tuple.jl:292
| [4] adapt_structure(to::CUDA.KernelAdaptor, xs::Tuple{Base.Broadcast.Broadcasted{…}, Base.Broadcast.Broadcasted{…}})
| @ Adapt /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/packages/Adapt/cdaEv/src/base.jl:3
| [5] adapt(to::CUDA.KernelAdaptor, x::Tuple{Base.Broadcast.Broadcasted{…}, Base.Broadcast.Broadcasted{…}})
| @ Adapt /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/packages/Adapt/cdaEv/src/Adapt.jl:40
| [6] adapt_structure(to::CUDA.KernelAdaptor, sbc::ClimaCore.Operators.StencilBroadcasted{…})
| @ ClimaCore.Operators /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/packages/ClimaCore/SnmZh/src/Operators/finitedifference.jl:238
| [7] adapt(to::CUDA.KernelAdaptor, x::ClimaCore.Operators.StencilBroadcasted{…})
| @ Adapt /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/packages/Adapt/cdaEv/src/Adapt.jl:40
| [8] cudaconvert(arg::ClimaCore.Operators.StencilBroadcasted{…})
| @ CUDA /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/packages/CUDA/rXson/src/compiler/execution.jl:188
| [9] map (repeats 2 times)
| @ ./tuple.jl:294 [inlined]
| [10] macro expansion
| @ /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/packages/CUDA/rXson/src/compiler/execution.jl:102 [inlined]
| [11] copyto!(out::ClimaCore.Fields.Field{…}, bc::ClimaCore.Operators.StencilBroadcasted{…})
| @ ClimaCore.Operators /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/packages/ClimaCore/SnmZh/src/Operators/finitedifference.jl:3366
| [12] materialize!(dest::ClimaCore.Fields.Field{…}, opbc::ClimaCore.Operators.StencilBroadcasted{…})
| @ ClimaCore.Operators /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/packages/ClimaCore/SnmZh/src/Operators/common.jl:61
| [13] set_ᶠuₕ³!(ᶠuₕ³::ClimaCore.Fields.Field{…}, Y::ClimaCore.Fields.FieldVector{…})
| @ ClimaAtmos /central/scratch/esm/slurm-buildkite/climaatmos-ci/16362/climaatmos-ci/src/cache/precomputed_quantities.jl:138
| [14] macro expansion
| @ /central/scratch/esm/slurm-buildkite/climaatmos-ci/16362/climaatmos-ci/src/cache/precomputed_quantities.jl:424 [inlined]
| [15] set_precomputed_quantities!(Y::ClimaCore.Fields.FieldVector{…}, p::@NamedTuple{…}, t::Float32) |
Interestingly, I've found another issue, from analyzing and simplifying this issue. |
I need to dig a bit more into why this happens, however, what I've seen is that if
PhasePartition
(in Thermodynamics.jl) in this reproducer is inlined, then thebroadcasted
anonymous function inClimaCore.jl/Fields/broadcast.jl
:is not inferred. Here's the reproducer:
This function is ~100x slower as a result. cc @glwagner and @dennisYatunin (just FYI). It's possible that we're hitting a heuristic (
PhasePartition
callsPhasePartition_equil
with a handful of arguments), or maybe it's the use ofphase_type
, which is somewhat of an edge case in Thermodynamics.Whatever the issue is, this is a useful reproducer because one other thing I noticed is that
@. ᶜS_ρq_tot = ρ * TD.PhasePartition(thermo_params, ᶜts).tot
allocates, making it slower. It's important that we track and document these issues, and spread best-practice patterns to avoid shortcomings with our broadcast machinery, or improve its robustness.In addition, JET seems to miss this inference failure, however, it is caught in the flame graph (and the timing difference is observed through BenchmarkTools).
The text was updated successfully, but these errors were encountered: