Useful packages for (sparse) differentiation? #229

gdalle · 2024-05-16T18:05:03Z

Hey there friends of JSO!

@adrhill and I have been hard at work on two autodiff toolkits which you might find useful:

DifferentiationInterface.jl, an interface to every possible autodiff package with support for derivative, gradient, jacobian, hessian, JVP, VJP, HVP. It also supports sparse autodiff with a coloring-based approach.
SparseConnectivityTracer.jl, a much faster alternative to Symbolics.jl for sparsity pattern detection.

We would be happy to discuss potential uses, and integration with your ecosystem. We're having similar talks with Optimization.jl and the SciML organization at large.

Ping @amontoison @abelsiqueira since we've already chatted a few times :)

amontoison · 2024-05-17T00:44:30Z

@gdalle
I propose to start by replacing SparseDiffTools.jl and Symbolics.jl by SparseConnectivityTracer.jl.

We just need to update sparse_sym.jl and sparse_diff_tools.jl.

gdalle · 2024-05-17T06:22:50Z

Sounds good! We're currently in the middle of a big refactor for SparseConnectivityTracer.jl, so it will have to wait a few days

adrhill · 2024-05-17T16:58:36Z

The refactor has been merged and SparseConnectivityTracer v0.4.0 has just been tagged, so it's ready to be broken by you folks! 😁

PRs adding @test_broken tests are more than welcome.

amontoison · 2024-05-18T00:18:26Z

@adrhill @gdalle
I started to interface SparseConnectivityTracer.jl quickly tonight and I have a few feedbacks:

Is SparseConnectivityTracer.jacobian_pattern(c!, cx, x0) more efficient than SparseConnectivityTracer.jacobian_pattern(c, x0) ?
Note that SparseConnectivityTracer.jacobian_pattern(c!, y, x) is not documented.
Is it possible to add an option to only return one triangle of the sparsity pattern of the Hessian? You could add a keyword argument with possible values F (default), L or U.
Can we exploit the symmetry of the hessian to determine it more efficiently?
In optimization we want the Hessian of the augmented Lagrangian when we have constraints so it means that we want the sparsity structure of the Hessian of ℓ(x) = f(x) + dot(c(x), y). I have an error because SparseConnectivityTracer doesn't like dot. I can hack it with the help of a sum.
Can you do better if you we provide f and c instead of directly ℓ as input?
Can we compute in the the sparsity structure of the Jacobian of c(x) and the Hessian of augmented Lagrangian l(x) = f(x) + dot(c(x), y) to have a speed-up?

MWE:

using SparseConnectivityTracer

y = zeros(3)
x = rand(3)

function J(y, x)
  y[1] = x[1]^2
  y[2] = 2 * x[1] * x[2]^2
  y[3] = sin(x[3])
  return y
end

S = jacobian_pattern(J, y, x)


x = rand(3)
y = rand(2)

f(x) = 2 * x[1] * x[2]^2
c(x) = [x[3]^3; x[3]^2]
S = hessian_pattern(f, x)

ℓ(x) = f(x) + dot(c(x), y)
S = hessian_pattern(ℓ, x)

ERROR: MethodError: no method matching conj(::SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}})

Closest candidates are:
  conj(::Missing)
   @ Base missing.jl:101
  conj(::Complex)
   @ Base complex.jl:282
  conj(::SymTridiagonal)
   @ LinearAlgebra ~/julia/julia-1.10.3/share/julia/stdlib/v1.10/LinearAlgebra/src/tridiag.jl:160
  ...

Stacktrace:
 [1] dot(x::SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}, y::Float64)
   @ LinearAlgebra ~/julia/julia-1.10.3/share/julia/stdlib/v1.10/LinearAlgebra/src/generic.jl:876
 [2] dot(x::Vector{SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}}, y::Vector{Float64})
   @ LinearAlgebra ~/julia/julia-1.10.3/share/julia/stdlib/v1.10/LinearAlgebra/src/generic.jl:886
 [3] ℓ(x::Vector{SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}})
   @ Main ./REPL[41]:1
 [4] trace_function(::Type{SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{…}}}}, f::typeof(ℓ), x::Vector{Float64})
   @ SparseConnectivityTracer ~/.julia/packages/SparseConnectivityTracer/JvHcU/src/pattern.jl:32
 [5] hessian_pattern(f::Function, x::Vector{Float64}, ::Type{BitSet}, ::Type{Set{Tuple{Int64, Int64}}})
   @ SparseConnectivityTracer ~/.julia/packages/SparseConnectivityTracer/JvHcU/src/pattern.jl:326
 [6] top-level scope
   @ REPL[42]:1
Some type information was truncated. Use `show(err)` to see complete types.

If I also preallocate the storage of c for the Hessian of the augmented Lagrangian, SCT.jl returns an error because the storage of c is not correct.

function compute_hessian_sparsity(f, nvar, c!, ncon)
  if ncon == 0
    x0 = rand(nvar)
    S = SparseConnectivityTracer.hessian_pattern(f, x0)
    return S
  else
    x0 = rand(nvar)
    y0 = rand(ncon)
    cx = similar(y0)
    ℓ(x) = f(x) + sum(c!(cx, x)[i] * y0[i] for i = 1:ncon)
    S = SparseConnectivityTracer.hessian_pattern(ℓ, x0)
    return S
  end
end

Basic Jacobian derivative with backend=ADNLPModels.SparseADJacobian and T=Float64: Error During Test at /home/alexis/Bureau/git/ADNLPModels.jl/test/sparse_jacobian.jl:6
  Got exception outside of a @test
  TypeError: in typeassert, expected Float64, got a value of type SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}
  Stacktrace:
    [1] setindex!(A::Vector{Float64}, x::SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}, i1::Int64)
      @ Base ./array.jl:1021
    [2] (::var"#c!#21")(cx::Vector{Float64}, x::Vector{SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}})
      @ Main ~/Bureau/git/ADNLPModels.jl/test/sparse_jacobian.jl:10
    [3] (::ADNLPModels.var"#35#37"{Vector{SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}}, var"#c!#21", Vector{Float64}, Vector{Float64}})(i::Int64)

Maybe the best solution is to not provide c in-place?

tmigot · 2024-05-18T01:32:30Z

Hey @gdalle @adrhill ! I love the idea and agree with @amontoison the sparse Jacobian/Hessian is a good starting point.

For the DifferentiationInterface.jl, the best would be if you could provide an example that does this:

T = Float64
n = 2
x0 = rand(T, n)
f(x) = sum(x) # or something
function c!(cx, x)
cx[1] = x[1]^2 + x[2]^2 + x[1] * x[2] 
# a) compute gradient in-place
gx = similar(x0)
# ... ∇f(x)
# b) Hessian-vector product in-place
v = ones(2)
# ... ∇f²(x)*v
# c) Jacobian-vector product
# ∇c!(cx, x) * v
# d) transpose Jacobian-vector product
# ∇c!(cx, x)ᵀ * v
# e) Lagrangian Hessian-vector product
# ... (∇f²(x) + y_i ∇²cᵢ!(cx, x))*v

Usually, we store everything you need to compute the derivatives in a backend that is initialized with the model.
The assumption is that we will do the derivatives multiple times.

gdalle · 2024-05-18T07:21:58Z

Here are some answers to your questions:

Is SparseConnectivityTracer.jacobian_pattern(c!, cx, x0) more efficient than SparseConnectivityTracer.jacobian_pattern(c, x0) ?
Note that SparseConnectivityTracer.jacobian_pattern(c!, y, x) is not documented.

The rationale behind it is that f!(y, x) is more efficient than y = f(x) for array-to-array functions. Thus, some functions are implemented in-place, and we need to support them in Jacobian sparsity detection. However, the actual gain of efficiency within the sparsity detection is minimal, since lots of allocations occur anyway due to set manipulations.

Is it possible to add an option to only return one triangle of the sparsity pattern of the Hessian? You could add a keyword argument with possible values F (default), L or U.
Can we exploit the symmetry of the hessian to determine it more efficiently?

At the moment we do not exploit the symmetry of the Hessian. Internally, the Hessian pattern is represented as a Set{Tuple{Int,Int}}, and we make no efforts to only store one triangle. It's "only" a factor-2 efficiency gain, so it's clearly something we wish we did, but:

it would complexify the code, which is still very much in flux
there are other optimizations with much larger potential improvements that we want to try first

In optimization we want the Hessian of the augmented Lagrangian when we have constraints so it means that we want the sparsity structure of the Hessian of ℓ(x) = f(x) + dot(c(x), y). I have an error because SparseConnectivityTracer doesn't like dot. I can hack it with the help of a sum.

Essentially we need to add some array rules to SparseConnectivityTracer, and dot will be one of the very first.

Support for linear algebra adrhill/SparseConnectivityTracer.jl#68

In optimization we want the Hessian of the augmented Lagrangian when we have constraints so it means that we want the sparsity structure of the Hessian of ℓ(x) = f(x) + dot(c(x), y). Can you do better if you we provide f and c instead of directly ℓ as input? Can we compute in the the sparsity structure of the Jacobian of c(x) and the Hessian of augmented Lagrangian l(x) = f(x) + dot(c(x), y) to have a speed-up?

For SparseConnectivityTracer you can always detect the sparsity pattern of f and c separately, then combine them. But of course it won't be done automatically by the package, so you need to assemble the parts manually.

For DifferentiationInterface, @timholy asked a similar question a week ago, and he had some suggestions:

Feel free to weigh in on those discussions.

If I also preallocate the storage of c for the Hessian of the augmented Lagrangian, SCT.jl returns an error because the storage of c is not correct. Maybe the best solution is to not provide c in-place?

Don't provide c in-place, because SparseConnectivityTracer uses custom number types and operator overloading (like ForwardDiff).

gdalle · 2024-05-18T07:42:12Z

@tmigot here's an example doing everything you asked. The mantra of DifferentiationInterface is "prepare once, reuse multiple times", so it fits well with an optimization point of view.
Note that although the differentiation overwrites the storage you provide, we cannot guarantee that other allocations won't occur.

using DifferentiationInterface
import Enzyme, ForwardDiff, Zygote

forward_backend = AutoForwardDiff()
reverse_backend = AutoEnzyme(Enzyme.Reverse)  # supports mutation
second_order_backend = SecondOrder(AutoForwardDiff(), AutoZygote())

T = Float64
n = 2  # variables
m = 1  # constraints
x0 = rand(T, n)

obj(x) = sum(x)

function cons!(c, x)
    c[1] = sum(abs2, x) - one(eltype(x))
    return nothing
end

# a) gradient

extras_a = prepare_gradient(obj, reverse_backend, x0)
g0 = similar(x0)
gradient!(obj, g0, reverse_backend, x0, extras_a)

# b) Hessian-vector product

extras_b = prepare_hvp(obj, second_order_backend, x0, v0)
p0 = similar(x0)
hvp!(obj, p0, second_order_backend, x0, v0, extras_b)

# c) Jacobian-vector product

c0 = ones(T, m)
extras_c = prepare_pushforward(cons!, c0, forward_backend, x0, v0)
p0 = similar(c0)
pushforward!(cons!, c0, p0, forward_backend, x0, v0, extras_c)

# d) transpose Jacobian-vector product

extras_d = prepare_pullback(cons!, c0, reverse_backend, x0, v0)
p0 = similar(x0)
pullback!(cons!, c0, p0, reverse_backend, x0, v0, extras_d)

# e) Lagrangian Hessian-vector product

# not sure what you mean by this one, can either be obtained by summing the HVPs of `f` and `ci` or by taking the HVP of the Lagrangian as a whole

I think, as above, the main point of discussion will be handling the Lagrangian aspect

gdalle · 2024-06-27T05:51:36Z

@amontoison as of v0.5.6 (released this morning), DifferentiationInterface supports vector-mode pushforwards, pullbacks and HVPs, which makes Jacobians and Hessians much faster. It only works with ForwardDiff for now, but I want to add Enzyme soon.
That was the last ingredient necessary for DI to be feature-complete, at least for one-argument functions. From now on, if something doesn't work well, it will likely be due to a missing or suboptimal implementation, but not to a hole in the design.
I think we should start discussing its integration in ADNLPModels. The hardest remaining issue will be handling the Lagrangian, which takes 2 inputs ($x$ and $\lambda$), but I would very much benefit from hashing out possible solutions with you.

amontoison · 2024-07-02T19:02:06Z

@gdalle We can do a meeting next week if you are available to talk about that.

gdalle · 2024-07-03T20:42:30Z

Unfortunately I'll be at JuliaCon all week! The week after?

amontoison · 2024-07-03T20:47:31Z

Yes, perfect.
Enjoy JuliaCon 2024!

gdalle mentioned this issue May 16, 2024

Possible direct users JuliaDiff/DifferentiationInterface.jl#134

Closed

gdalle mentioned this issue May 27, 2024

Replace ColPack with SparseMatrixColorings? #237

Closed

gdalle mentioned this issue Jun 4, 2024

Use generic sparsity detector #238

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Useful packages for (sparse) differentiation? #229

Useful packages for (sparse) differentiation? #229

gdalle commented May 16, 2024

amontoison commented May 17, 2024 •

edited

Loading

gdalle commented May 17, 2024

adrhill commented May 17, 2024

amontoison commented May 18, 2024 •

edited

Loading

tmigot commented May 18, 2024

gdalle commented May 18, 2024 •

edited

Loading

gdalle commented May 18, 2024

gdalle commented Jun 27, 2024

amontoison commented Jul 2, 2024

gdalle commented Jul 3, 2024

amontoison commented Jul 3, 2024

Useful packages for (sparse) differentiation? #229

Useful packages for (sparse) differentiation? #229

Comments

gdalle commented May 16, 2024

amontoison commented May 17, 2024 • edited Loading

gdalle commented May 17, 2024

adrhill commented May 17, 2024

amontoison commented May 18, 2024 • edited Loading

tmigot commented May 18, 2024

gdalle commented May 18, 2024 • edited Loading

gdalle commented May 18, 2024

gdalle commented Jun 27, 2024

amontoison commented Jul 2, 2024

gdalle commented Jul 3, 2024

amontoison commented Jul 3, 2024

amontoison commented May 17, 2024 •

edited

Loading

amontoison commented May 18, 2024 •

edited

Loading

gdalle commented May 18, 2024 •

edited

Loading