-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MadNLPGPU] Upgrade CUDSS -- support iterative refinement and hybrid mode #329
Conversation
cc @frapac |
I tested on my cluster and I have a few tests that are failing. Error During Test at /home/montalex/.julia/packages/MadNLPTests/kA3ek/src/MadNLPTests.jl:139
Got exception outside of a @test
type CuSparseMatrixCSC has no field nzval
Stacktrace:
[1] getproperty
@ ./Base.jl:37 [inlined]
[2] initialize!(kkt::MadNLP.SparseCondensedKKTSystem{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CUDA.CUSPARSE.CuSparseMatrixCSC{Float64, Int32}, MadNLP.ExactHessian{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}, LapackGPUSolver{Float64}, CuArray{Int64, 1, CUDA.DeviceMemory}, CuArray{Int32, 1, CUDA.DeviceMemory}, CuArray{Tuple{Int32, Int32}, 1, CUDA.DeviceMemory}, CuArray{Tuple{Int32, Tuple{Int64, Int64, Int64}}, 1, CUDA.DeviceMemory}, @NamedTuple{jptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, hess_com_ptr::CuArray{Tuple{Int64, Int64}, 1, CUDA.DeviceMemory}, hess_com_ptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, jt_csc_ptr::CuArray{Tuple{Int64, Int64}, 1, CUDA.DeviceMemory}, jt_csc_ptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, diag_map_to::CuArray{Int32, 1, CUDA.DeviceMemory}, diag_map_fr::CuArray{Int32, 1, CUDA.DeviceMemory}}})
@ MadNLP ~/.julia/packages/MadNLP/u0fX5/src/KKT/sparse.jl:426
[3] initialize!(solver::MadNLPSolver{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CuArray{Int64, 1, CUDA.DeviceMemory}, MadNLP.SparseCondensedKKTSystem{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CUDA.CUSPARSE.CuSparseMatrixCSC{Float64, Int32}, MadNLP.ExactHessian{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}, LapackGPUSolver{Float64}, CuArray{Int64, 1, CUDA.DeviceMemory}, CuArray{Int32, 1, CUDA.DeviceMemory}, CuArray{Tuple{Int32, Int32}, 1, CUDA.DeviceMemory}, CuArray{Tuple{Int32, Tuple{Int64, Int64, Int64}}, 1, CUDA.DeviceMemory}, @NamedTuple{jptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, hess_com_ptr::CuArray{Tuple{Int64, Int64}, 1, CUDA.DeviceMemory}, hess_com_ptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, jt_csc_ptr::CuArray{Tuple{Int64, Int64}, 1, CUDA.DeviceMemory}, jt_csc_ptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, diag_map_to::CuArray{Int32, 1, CUDA.DeviceMemory}, diag_map_fr::CuArray{Int32, 1, CUDA.DeviceMemory}}}, MadNLPTests.SparseWrapperModel{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, Float64, Vector{Int64}, Vector{Float64}, NLPModelsJuMP.MathOptNLPModel}, MadNLP.SparseCallback{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CuArray{Int64, 1, CUDA.DeviceMemory}, MadNLPTests.SparseWrapperModel{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, Float64, Vector{Int64}, Vector{Float64}, NLPModelsJuMP.MathOptNLPModel}, MadNLP.RelaxBound, MadNLP.RelaxEquality}, MadNLP.RichardsonIterator{Float64, MadNLP.SparseCondensedKKTSystem{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CUDA.CUSPARSE.CuSparseMatrixCSC{Float64, Int32}, MadNLP.ExactHessian{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}, LapackGPUSolver{Float64}, CuArray{Int64, 1, CUDA.DeviceMemory}, CuArray{Int32, 1, CUDA.DeviceMemory}, CuArray{Tuple{Int32, Int32}, 1, CUDA.DeviceMemory}, CuArray{Tuple{Int32, Tuple{Int64, Int64, Int64}}, 1, CUDA.DeviceMemory}, @NamedTuple{jptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, hess_com_ptr::CuArray{Tuple{Int64, Int64}, 1, CUDA.DeviceMemory}, hess_com_ptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, jt_csc_ptr::CuArray{Tuple{Int64, Int64}, 1, CUDA.DeviceMemory}, jt_csc_ptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, diag_map_to::CuArray{Int32, 1, CUDA.DeviceMemory}, diag_map_fr::CuArray{Int32, 1, CUDA.DeviceMemory}}}}, MadNLP.InertiaBased, MadNLP.UnreducedKKTVector{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CuArray{Int64, 1, CUDA.DeviceMemory}}})
@ MadNLP ~/.julia/packages/MadNLP/u0fX5/src/IPM/solver.jl:60
[4] solve!(nlp::MadNLPTests.SparseWrapperModel{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, Float64, Vector{Int64}, Vector{Float64}, NLPModelsJuMP.MathOptNLPModel}, solver::MadNLPSolver{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CuArray{Int64, 1, CUDA.DeviceMemory}, MadNLP.SparseCondensedKKTSystem{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CUDA.CUSPARSE.CuSparseMatrixCSC{Float64, Int32}, MadNLP.ExactHessian{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}, LapackGPUSolver{Float64}, CuArray{Int64, 1, CUDA.DeviceMemory}, CuArray{Int32, 1, CUDA.DeviceMemory}, CuArray{Tuple{Int32, Int32}, 1, CUDA.DeviceMemory}, CuArray{Tuple{Int32, Tuple{Int64, Int64, Int64}}, 1, CUDA.DeviceMemory}, @NamedTuple{jptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, hess_com_ptr::CuArray{Tuple{Int64, Int64}, 1, CUDA.DeviceMemory}, hess_com_ptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, jt_csc_ptr::CuArray{Tuple{Int64, Int64}, 1, CUDA.DeviceMemory}, jt_csc_ptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, diag_map_to::CuArray{Int32, 1, CUDA.DeviceMemory}, diag_map_fr::CuArray{Int32, 1, CUDA.DeviceMemory}}}, MadNLPTests.SparseWrapperModel{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, Float64, Vector{Int64}, Vector{Float64}, NLPModelsJuMP.MathOptNLPModel}, MadNLP.SparseCallback{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CuArray{Int64, 1, CUDA.DeviceMemory}, MadNLPTests.SparseWrapperModel{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, Float64, Vector{Int64}, Vector{Float64}, NLPModelsJuMP.MathOptNLPModel}, MadNLP.RelaxBound, MadNLP.RelaxEquality}, MadNLP.RichardsonIterator{Float64, MadNLP.SparseCondensedKKTSystem{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CUDA.CUSPARSE.CuSparseMatrixCSC{Float64, Int32}, MadNLP.ExactHessian{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}, LapackGPUSolver{Float64}, CuArray{Int64, 1, CUDA.DeviceMemory}, CuArray{Int32, 1, CUDA.DeviceMemory}, CuArray{Tuple{Int32, Int32}, 1, CUDA.DeviceMemory}, CuArray{Tuple{Int32, Tuple{Int64, Int64, Int64}}, 1, CUDA.DeviceMemory}, @NamedTuple{jptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, hess_com_ptr::CuArray{Tuple{Int64, Int64}, 1, CUDA.DeviceMemory}, hess_com_ptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, jt_csc_ptr::CuArray{Tuple{Int64, Int64}, 1, CUDA.DeviceMemory}, jt_csc_ptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, diag_map_to::CuArray{Int32, 1, CUDA.DeviceMemory}, diag_map_fr::CuArray{Int32, 1, CUDA.DeviceMemory}}}}, MadNLP.InertiaBased, MadNLP.UnreducedKKTVector{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CuArray{Int64, 1, CUDA.DeviceMemory}}}, stats::MadNLP.MadNLPExecutionStats{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}; x::Nothing, y::Nothing, zl::Nothing, zu::Nothing, kwargs::@Kwargs{})
@ MadNLP ~/.julia/packages/MadNLP/u0fX5/src/IPM/solver.jl:159
[5] solve!
@ ~/.julia/packages/MadNLP/u0fX5/src/IPM/solver.jl:128 [inlined]
[6] solve!
@ ~/.julia/packages/MadNLP/u0fX5/src/IPM/solver.jl:14 [inlined]
[7] solve!(solver::MadNLPSolver{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CuArray{Int64, 1, CUDA.DeviceMemory}, MadNLP.SparseCondensedKKTSystem{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CUDA.CUSPARSE.CuSparseMatrixCSC{Float64, Int32}, MadNLP.ExactHessian{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}, LapackGPUSolver{Float64}, CuArray{Int64, 1, CUDA.DeviceMemory}, CuArray{Int32, 1, CUDA.DeviceMemory}, CuArray{Tuple{Int32, Int32}, 1, CUDA.DeviceMemory}, CuArray{Tuple{Int32, Tuple{Int64, Int64, Int64}}, 1, CUDA.DeviceMemory}, @NamedTuple{jptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, hess_com_ptr::CuArray{Tuple{Int64, Int64}, 1, CUDA.DeviceMemory}, hess_com_ptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, jt_csc_ptr::CuArray{Tuple{Int64, Int64}, 1, CUDA.DeviceMemory}, jt_csc_ptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, diag_map_to::CuArray{Int32, 1, CUDA.DeviceMemory}, diag_map_fr::CuArray{Int32, 1, CUDA.DeviceMemory}}}, MadNLPTests.SparseWrapperModel{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, Float64, Vector{Int64}, Vector{Float64}, NLPModelsJuMP.MathOptNLPModel}, MadNLP.SparseCallback{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CuArray{Int64, 1, CUDA.DeviceMemory}, MadNLPTests.SparseWrapperModel{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, Float64, Vector{Int64}, Vector{Float64}, NLPModelsJuMP.MathOptNLPModel}, MadNLP.RelaxBound, MadNLP.RelaxEquality}, MadNLP.RichardsonIterator{Float64, MadNLP.SparseCondensedKKTSystem{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CUDA.CUSPARSE.CuSparseMatrixCSC{Float64, Int32}, MadNLP.ExactHessian{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}}, LapackGPUSolver{Float64}, CuArray{Int64, 1, CUDA.DeviceMemory}, CuArray{Int32, 1, CUDA.DeviceMemory}, CuArray{Tuple{Int32, Int32}, 1, CUDA.DeviceMemory}, CuArray{Tuple{Int32, Tuple{Int64, Int64, Int64}}, 1, CUDA.DeviceMemory}, @NamedTuple{jptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, hess_com_ptr::CuArray{Tuple{Int64, Int64}, 1, CUDA.DeviceMemory}, hess_com_ptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, jt_csc_ptr::CuArray{Tuple{Int64, Int64}, 1, CUDA.DeviceMemory}, jt_csc_ptrptr::CuArray{Int64, 1, CUDA.DeviceMemory}, diag_map_to::CuArray{Int32, 1, CUDA.DeviceMemory}, diag_map_fr::CuArray{Int32, 1, CUDA.DeviceMemory}}}}, MadNLP.InertiaBased, MadNLP.UnreducedKKTVector{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, CuArray{Int64, 1, CUDA.DeviceMemory}}})
@ MadNLP ~/.julia/packages/MadNLP/u0fX5/src/IPM/solver.jl:17
[8] madnlp(model::MadNLPTests.SparseWrapperModel{Float64, CuArray{Float64, 1, CUDA.DeviceMemory}, Float64, Vector{Int64}, Vector{Float64}, NLPModelsJuMP.MathOptNLPModel}; kwargs::@Kwargs{print_level::MadNLP.LogLevels, linear_solver::UnionAll, lapack_algorithm::MadNLP.LinearFactorization})
@ MadNLP ~/.julia/packages/MadNLP/u0fX5/src/IPM/solver.jl:11
[9] macro expansion
@ ~/.julia/packages/MadNLPTests/kA3ek/src/MadNLPTests.jl:149 [inlined]
[10] macro expansion
@ ~/Applications/julia/julia-1.10.3/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
[11] unbounded(optimizer_constructor::var"#11#22"; Arr::Type)
@ MadNLPTests ~/.julia/packages/MadNLPTests/kA3ek/src/MadNLPTests.jl:140
[12] macro expansion
@ ~/.julia/packages/MadNLPTests/kA3ek/src/MadNLPTests.jl:115 [inlined]
[13] macro expansion
@ ~/Applications/julia/julia-1.10.3/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
[14] test_madnlp(name::String, optimizer_constructor::Function, exclude::Vector{String}; Arr::Type)
@ MadNLPTests ~/.julia/packages/MadNLPTests/kA3ek/src/MadNLPTests.jl:114
[15] macro expansion
@ ~/Argonne/MadNLP.jl/lib/MadNLPGPU/test/madnlpgpu_test.jl:106 [inlined]
[16] macro expansion
@ ~/Applications/julia/julia-1.10.3/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
[17] top-level scope
@ ~/Argonne/MadNLP.jl/lib/MadNLPGPU/test/madnlpgpu_test.jl:102
[18] include(fname::String)
@ Base.MainInclude ./client.jl:489
[19] macro expansion
@ ~/Argonne/MadNLP.jl/lib/MadNLPGPU/test/runtests.jl:7 [inlined]
[20] macro expansion
@ ~/Applications/julia/julia-1.10.3/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
[21] top-level scope
@ ~/Argonne/MadNLP.jl/lib/MadNLPGPU/test/runtests.jl:7
[22] include(fname::String)
@ Base.MainInclude ./client.jl:489
[23] top-level scope
@ none:6
[24] eval
@ ./boot.jl:385 [inlined]
[25] exec_options(opts::Base.JLOptions)
@ Base ./client.jl:291
[26] _start()
@ Base ./client.jl:552
Test Summary: | Pass Error Total Time
MadNLPGPU test | 170 51 221 5m20.0s
MadNLPGPU test | 2 51 53 2m42.3s
CUDSS | 5 5 1m19.8s
infeasible | 1 1 1m08.2s
unbounded | 1 1 0.6s
lootsma | 1 1 2.1s
eigmina | 1 1 2.1s
lp_examodels_issue75 | 1 1 0.4s
CUDSS-AMD | 5 5 5.5s
infeasible | 1 1 0.5s
unbounded | 1 1 0.7s
lootsma | 1 1 0.3s
eigmina | 1 1 0.4s
lp_examodels_issue75 | 1 1 0.3s
CUDSS-METIS | 5 5 5.3s
infeasible | 1 1 0.4s
unbounded | 1 1 0.3s
lootsma | 1 1 0.3s
eigmina | 1 1 0.4s
lp_examodels_issue75 | 1 1 0.3s
CUDSS-HYBRID | 5 5 5.3s
infeasible | 1 1 0.6s
unbounded | 1 1 0.3s
lootsma | 1 1 0.4s
eigmina | 1 1 0.4s
lp_examodels_issue75 | 1 1 0.3s
CUSOLVERRF | 5 5 9.7s
infeasible | 1 1 4.9s
unbounded | 1 1 0.4s
lootsma | 1 1 0.3s
eigmina | 1 1 0.4s
lp_examodels_issue75 | 1 1 0.3s
CUSOLVER-CHOLESKY | 5 5 10.1s
infeasible | 1 1 5.5s
unbounded | 1 1 0.3s
lootsma | 1 1 0.3s
eigmina | 1 1 0.4s
lp_examodels_issue75 | 1 1 0.3s
GLU | 5 5 8.6s
infeasible | 1 1 3.9s
unbounded | 1 1 0.3s
lootsma | 1 1 0.3s
eigmina | 1 1 0.4s
lp_examodels_issue75 | 1 1 0.3s
LapackGPU-BUNCHKAUFMAN | 5 5 8.8s
infeasible | 1 1 4.0s
unbounded | 1 1 0.3s
lootsma | 1 1 0.3s
eigmina | 1 1 0.4s
lp_examodels_issue75 | 1 1 0.3s
LapackGPU-LU | 5 5 5.0s
infeasible | 1 1 0.3s
unbounded | 1 1 0.3s
lootsma | 1 1 0.4s
eigmina | 1 1 0.4s
lp_examodels_issue75 | 1 1 0.3s
LapackGPU-QR | 5 5 5.0s
infeasible | 1 1 0.3s
unbounded | 1 1 0.4s
lootsma | 1 1 0.3s
eigmina | 1 1 0.4s
lp_examodels_issue75 | 1 1 0.3s
LapackGPU-CHOLESKY | 1 1 2.8s
unbounded | 1 1 2.8s
MadNLPGPU (MadNLP.DenseKKTSystem) | 60 60 1m44.8s
MadNLPGPU (MadNLP.DenseCondensedKKTSystem) | 60 60 33.8s
MadNLP: MadNLP.BFGS + MadNLP.DenseKKTSystem | 12 12 4.1s
MadNLP: MadNLP.BFGS + MadNLP.DenseCondensedKKTSystem | 12 12 3.8s
MadNLP: MadNLP.DampedBFGS + MadNLP.DenseKKTSystem | 12 12 3.5s
MadNLP: MadNLP.DampedBFGS + MadNLP.DenseCondensedKKTSystem | 12 12 3.8s
ERROR: LoadError: Some tests did not pass: 170 passed, 0 failed, 51 errored, 0 broken.
in expression starting at /home/montalex/Argonne/MadNLP.jl/lib/MadNLPGPU/test/runtests.jl:6
ERROR: Package MadNLPGPU errored during testing |
@amontoison Tests for MadNLPGPU are passing locally on my machine. I am not sure to understand why moonshot cannot find CUDSS.jl 0.3.1 there. |
I restarted the tests. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #329 +/- ##
=======================================
Coverage 70.17% 70.17%
=======================================
Files 45 45
Lines 3943 3943
=======================================
Hits 2767 2767
Misses 1176 1176 ☔ View full report in Codecov by Sentry. |
Thank you Alexis! |
No description provided.