Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AD error with rat42 #338

Open
dpo opened this issue Aug 18, 2024 · 11 comments
Open

AD error with rat42 #338

dpo opened this issue Aug 18, 2024 · 11 comments

Comments

@dpo
Copy link
Member

dpo commented Aug 18, 2024

Even though the AD backend for the residual Jacobian is defined, it returns an empty Jacobian:

julia> rat42_nls = rat42(; use_nls = true)                                                                                                                        
ADNLSModel - Nonlinear least-squares model with automatic differentiation backend ADModelBackend{                                                                 
  ForwardDiffADGradient,                                                                                                                                          
  ForwardDiffADHvprod,                                                                                                                                            
  EmptyADbackend,                                                                                                                                                 
  EmptyADbackend,                                                                                                                                                 
  EmptyADbackend,                                                                                                                                                 
  SparseADHessian,                                                                                                                                                
  EmptyADbackend,                                                                                                                                                 
  ForwardDiffADHvprod,                                                                                                                                            
  ForwardDiffADJprod,                                                                                                                                             
  ForwardDiffADJtprod,                                                                                                                                            
  SparseADJacobian,                                                                                                                                               
  SparseADHessian,                                                                                                                                                
}         

julia> jac_structure(rat42_nls)                                                                                                                                   
(Int64[], Int64[]) 

julia> x = [99.94710393753867, 1.6483148108416925, -12.429444473714828]

julia> jac(rat42_nls, x)                                                                                                                                          
0×3 SparseMatrixCSC{Float64, Int64} with 0 stored entries  

There is also an error with the gradient of the NLP model:

julia> rat42_model = rat42()
ADNLPModel - Model with automatic differentiation backend ADModelBackend{
  ForwardDiffADGradient,
  ForwardDiffADHvprod,
  EmptyADbackend,
  EmptyADbackend,
  EmptyADbackend,
  SparseADHessian,
  EmptyADbackend,
}

julia> obj(rat42_model, x)
9111.7101

julia> grad(rat42_model, x)
3-element Vector{Float64}:
 NaN
 NaN
 NaN

@amontoison @tmigot Any ideas?

@amontoison
Copy link
Member

amontoison commented Aug 18, 2024

@dpo rat42 is an unconstrained optimization problem so we have have an empty backend for the jacobian in rat42_nls.
What you want is jac_structure_residuals(rat42_nls).

I think we should better displayed what backend is associated to what when we display an ADNLPModel / ADNLSModel.

For the gradient of rat42_model, it seems to be a bug in ForwardDiff.jl, I will investigate.

@dpo
Copy link
Member Author

dpo commented Aug 18, 2024

Sorry, you’re right. However:

julia> jac_residual(rat42_nls, x)
9×3 SparseMatrixCSC{Float64, Int64} with 27 stored entries:
  -5.03261e-50     5.02995e-48    -4.52696e-47
  -5.14752e-77     5.1448e-75     -7.20271e-74
  -8.42023e-115    8.41577e-113   -1.76731e-111
  -1.37737e-152    1.37664e-150   -3.85459e-149
  -3.68555e-228   -0.0            -0.0
 NaN             NaN             NaN
 NaN             NaN             NaN
 NaN             NaN             NaN
 NaN             NaN             NaN

@amontoison
Copy link
Member

I suspect that it's related to this vector of Rational:
https://github.com/JuliaSmoothOptimizers/OptimizationProblems.jl/blob/main/src/ADNLPProblems/rat42.jl#L9

@dpo
Copy link
Member Author

dpo commented Aug 18, 2024

Me too. That's very weird modeling. For info, with JuMP:

julia> model = MathOptNLPModel(rat42())

julia> obj(model, x)
9111.7101


julia> grad(model, x)
3-element Vector{Float64}:
 -4.494124501515864e-49
  4.491747286612452e-47
 -4.042572557951207e-46

The JuMP/NLS version is not in the repository, but:

julia> model = Model()

julia> @variable(model, x[j = 1:3])

julia> set_start_value.(x, [100, 1, 0.1])

julia> for i = 1:9
         @eval @NLexpression(model, $(Symbol("F$i")),  y[$i, 1] - x[1] / (1 + exp(x[2] - x[3] * y[$i, 2])))
       end

julia> rat42_nls = MathOptNLSModel(model, [F1, F2, F3, F4, F5, F6, F7, F8, F9])

julia> xx = [99.94710393753867, 1.6483148108416925, -12.429444473714828]

julia> obj(rat42_nls, xx)
9111.7101

julia> jac_residual(rat42_nls, xx)
9×3 SparseMatrixCSC{Float64, Int64} with 27 stored entries:
 -5.03261e-50   5.02995e-48   -4.52696e-47
 -5.14752e-77   5.1448e-75    -7.20271e-74
 -8.42023e-115  8.41577e-113  -1.76731e-111
 -1.37737e-152  1.37664e-150  -3.85459e-149
 -3.68555e-228  0.0            0.0
  0.0           0.0            0.0
  0.0           0.0            0.0
  0.0           0.0            0.0
  0.0           0.0            0.0

@dpo
Copy link
Member Author

dpo commented Aug 18, 2024

Actually, converting the array y to type T produces the same errors.

@dpo
Copy link
Member Author

dpo commented Aug 18, 2024

ReverseDiff is only marginally better:

julia> grad(model, xx)
3-element Vector{Float64}:
  -4.494124501515864e-49
 NaN
 NaN

@dpo
Copy link
Member Author

dpo commented Aug 18, 2024

And, finally,

julia> model = rat42(; gradient_backend = ADNLPModels.ZygoteADGradient)

julia> grad(model, xx)
3-element Vector{Float64}:
  -4.494124501515864e-49
 NaN
 NaN

@dpo
Copy link
Member Author

dpo commented Aug 18, 2024

I suppose the problem here is simply that we need problem scaling. The term exp(1.6 + 12 * 79) overflows. JuMP must be scaling the problem.

@amontoison
Copy link
Member

amontoison commented Aug 19, 2024

I am not aware of any scaling in JuMP. They probably just do something smart if they encounter a division by Inf.

@dpo
Copy link
Member Author

dpo commented Aug 19, 2024

Why would they have different arithmetic than Julia itself?

@amontoison
Copy link
Member

Because they compute the derivative with a reverse pass directly on their expression tree.
They can check if they propagate or not a NaN or a Inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants