Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra LAI #1086

Draft
wants to merge 2 commits into
base: develop
Choose a base branch
from
Draft

Tpetra LAI #1086

wants to merge 2 commits into from

Conversation

klevzoff
Copy link
Contributor

@klevzoff klevzoff commented Jul 30, 2020

This PR adds support for Tpetra-based LAI. It provides similar functionality to other existing LA interfaces. It is not enabled by default, meaning the Epetra-based Trilinos interface is still default. The plan is to hopefully get it used and tested and eventually replace the old one.

It uses OpenMP and/or GPU acceleration that is available (via Kokkos) in Trilinos packages:

  • Tpetra (matrices/vectors)
  • Belos (Krylov solvers)
  • Ifpack2 (smoothers)
  • MueLu (multigrid)

There is still a host-based matrix copy involved - the issue of copy avoiding is a lot more complex and is not addressed here. For vectors, the copy from LvArray::ArrayView occurs on device, however there is another host copy that gets allocated to support Tpetra's dual-view semantics. These cannot currently be eliminated due to a fundamental issue - Tpetra requires all its device allocations to be in Unified Memory (most solvers, in particular Krylov and preconditioners, seem to work fine when given non-UM vector views, but direct solver KLU2 fails). Some issues in their tracker dating back 4 years suggest relaxing that requirement may not be easy for them any time soon.

More specifically:

  • new classes TpetraVector, TpetraMatrix, TrilinosTpetraPreconditioner, TrilinosTpetraSolver and TrilinosTpetraInterface, with identical capabilities to exising Trilinos/Hypre/Petsc classes
  • added a "direct" option for preconditionerType input - it was mainly useful to troubleshoot BlockPreconditioner (eliminating the possibility of sub-block solvers underperforming) and I decided to leave it in
  • improved/refactored unit test for vector classes, with a proper test for each of the main API calls

The new interface:

  • passes all unit tests, without and with CUDA (though we don't run the latter on Travis)
  • passes almost all integrated tests without CUDA, with a few exceptions listed below
  • passes almost all single-rank integrated tests with CUDA (unfortunately, I can't run multi-rank tests with CUDA reliably)

Integrated tests that fail:

Test Status Reason
Hydrofracture_KGD_NodeBased_C3D6 (all) FAIL RUN Only works with Epetra
KGD_ZeroToughness (all) FAIL_RUN Only works with Epetra
KGD_ZeroViscosity (all) FAIL_RUN Only works with Epetra
Sneddon (CUDA only) FAIL RUN breakdown in numerical factorization in KLU2, probably related to #1083
beamBending (CUDA only) FAIL_RUN Kokkos::Impl::ParallelReduce< Cuda > requested too much L0 scratch memory during MueLu prolongation construction, new (did not fail earlier, including post-13.0 Trilinos update)

I also need someone to test the PR (including TPL build) on Lassen for possible build issues, configuring with -D GEOSX_LA_INTERFACE=TrilinosTpetra. Other than that, this is ready to merge, rebaseline not required.

Related to:
GEOS-DEV/thirdPartyLibs#120
GEOS-DEV/LvArray#189
GEOS-DEV/GEOSX_PTP#20
https://github.com/GEOSX/integratedTests/pull/84

@andrea-franceschini
Copy link
Contributor

On my Ubuntu machine, I am able to run both the two ContactMechanics_SimpleCubes_0[8,9] integratedTests and they pass their checks.
Do you think is it possible to have more details on the error that you see?

BTW, great PR @klevzoff! Thanks!

@klevzoff
Copy link
Contributor Author

Do you think is it possible to have more details on the error that you see?

Are you sure you compiled with GEOSX_LA_INTERFACE=TrilinosTpetra? Anyway, here's what I get after a few timesteps already done:

Received signal 6: Aborted

** StackTrace of 24 frames **
Frame 1: LvArray::system::stackTraceHandler(int, bool)
Frame 2: 
Frame 3: gsignal
Frame 4: abort
Frame 5: 
Frame 6: 
Frame 7: 
Frame 8: 
Frame 9: _ZN6Tpetra11Distributor7doPostsIN6Kokkos4ViewIPKdJNS2_11LayoutRightENS2_6DeviceINS2_6OpenMPENS2_9HostSpaceEEENS2_12MemoryTraitsILj0EEEEEENS3_IPdJS6_SA_EEEEENSt9enable_ifIXaasr6Kokkos4Impl7is_viewIT_EE5valuesr6Kokkos4Impl7is_viewIT0_EE5valueEvE4typeERKSH_mRKSI_
Frame 10: _ZN6Tpetra11Distributor15doPostsAndWaitsIN6Kokkos4ViewIPKdJNS2_11LayoutRightENS2_6DeviceINS2_6OpenMPENS2_9HostSpaceEEENS2_12MemoryTraitsILj0EEEEEENS3_IPdJS6_SA_EEEEENSt9enable_ifIXaasr6Kokkos4Impl7is_viewIT_EE5valuesr6Kokkos4Impl7is_viewIT0_EE5valueEvE4typeERKSH_mRKSI_
Frame 11: Tpetra::DistObject<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::doTransferNew(Tpetra::SrcDistObject const&, Tpetra::CombineMode, unsigned long, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, void, void> const&, Tpetra::Distributor&, Tpetra::DistObject<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::ReverseOption, bool, bool)
Frame 12: Tpetra::DistObject<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::doTransfer(Tpetra::SrcDistObject const&, Tpetra::Details::Transfer<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, char const*, Tpetra::DistObject<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::ReverseOption, Tpetra::CombineMode, bool)
Frame 13: Tpetra::DistObject<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >::doImport(Tpetra::SrcDistObject const&, Tpetra::Import<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const&, Tpetra::CombineMode, bool)
Frame 14: Amesos2::MultiVecAdapter<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >::put1dData(Teuchos::ArrayView<double const> const&, unsigned long, Teuchos::Ptr<Tpetra::Map<int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const>, Amesos2::EDistribution)
Frame 15: Amesos2::Util::put_1d_data_helper<Amesos2::MultiVecAdapter<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >, double>::do_put(Teuchos::Ptr<Amesos2::MultiVecAdapter<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > > > const&, Teuchos::ArrayView<double> const&, unsigned long, Amesos2::EDistribution, long long)
Frame 16: Amesos2::KLU2<Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >::solve_impl(Teuchos::Ptr<Amesos2::MultiVecAdapter<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > > >, Teuchos::Ptr<Amesos2::MultiVecAdapter<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > > const>) const
Frame 17: Amesos2::SolverCore<Amesos2::KLU2, Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >::solve(Teuchos::Ptr<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >, Teuchos::Ptr<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > const>) const
Frame 18: Amesos2::SolverCore<Amesos2::KLU2, Tpetra::CrsMatrix<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> >, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace> > >::solve()
Frame 19: geosx::TrilinosTpetraSolver::solve_direct(geosx::TpetraMatrix&, geosx::TpetraVector&, geosx::TpetraVector&)
Frame 20: geosx::TrilinosTpetraSolver::solve(geosx::TpetraMatrix&, geosx::TpetraVector&, geosx::TpetraVector&, geosx::DofManager const*)
Frame 21: geosx::SolverBase::SolveSystem(geosx::DofManager const&, geosx::TpetraMatrix&, geosx::TpetraVector&, geosx::TpetraVector&)
Frame 22: geosx::LagrangianContactSolver::SolveSystem(geosx::DofManager const&, geosx::TpetraMatrix&, geosx::TpetraVector&, geosx::TpetraVector&)
Frame 23: geosx::LagrangianContactSolver::NonlinearImplicitStep(double const&, double const&, int, geosx::DomainPartition&)
Frame 24: geosx::LagrangianContactSolver::SolverStep(double const&, double const&, int, geosx::DomainPartition&)

And the full log: ContactMechanics_SimpleCubes_08.data.txt

I will run it in debug later when I get a chance and hopefully find out what that error actually means.

@andrea-franceschini
Copy link
Contributor

Yes, I did a clean build with -DGEOSX_LA_INTERFACE=TrilinosTpetra.

@klevzoff klevzoff force-pushed the feature/klevzoff/tpetra-interface branch from a680573 to 621c345 Compare August 11, 2020 06:41
@klevzoff
Copy link
Contributor Author

On my Ubuntu machine, I am able to run both the two ContactMechanics_SimpleCubes_0[8,9] integratedTests and they pass their checks.

Turns out they pass for me too if I run each test separately, but fail when I run the whole test suite (and oversubscribe my cores). Not going to look into it anymore, it's just my setup that makes it fail.

@klevzoff klevzoff force-pushed the feature/klevzoff/tpetra-interface branch from 621c345 to 37944cd Compare August 21, 2020 00:33
@klevzoff
Copy link
Contributor Author

I think this is as ready as it's going to be. Of the three failing tests that remain, two aren't expected to pass and one only has tiny diffs (see updated PR description). I've also updated Trilinos to 13.0.0 released a couple weeks ago and it seems to be running fine.

Before getting this PR merged, I need someone to build the updated TPLs on LC (and preferably test the build of GEOSX PR as well) and push updated host-configs.

@klevzoff klevzoff force-pushed the feature/klevzoff/tpetra-interface branch 2 times, most recently from dab4dcc to 6b68f4c Compare September 7, 2020 00:55
Copy link
Contributor

@andrea-franceschini andrea-franceschini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! Thanks for removing some add, insert, set combinations no longer used.

{
// Default options only for the moment
GEOSX_LAI_CHECK_ERROR( PCSetType( precond, PCGAMG ) );
GEOSX_UNUSED_VAR( params )

// TODO: need someone familiar with PETSc to take a look at this
#if 0
GEOSX_LAI_CHECK_ERROR( PCSetType( prec, PCHMG ) );
GEOSX_LAI_CHECK_ERROR( PCHMGSetInnerPCType( prec, PCGAMG ) );
GEOSX_LAI_CHECK_ERROR( PCSetType( precond, PCHMG ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR ... but are we still planning to improve this section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great, but I don't have a plan. @castelletto1 has spent quite some time trying to make it work and looks like it's not at all straightforward.

bool const keepDiag,
real64 const diagValue )
{
// TODO: Deprecate/remove this method?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method still used to impose BC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since BCs are now imposed on the local matrix/rhs objects prior to creation of parallel matrix/rhs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so it can be safely removed.

@klevzoff klevzoff force-pushed the feature/klevzoff/tpetra-interface branch from 599d972 to 319bd00 Compare September 11, 2020 09:23
@klevzoff klevzoff added the flag: requires rebaseline Requires rebaseline branch in integratedTests label Sep 11, 2020
@klevzoff klevzoff added the ci: run CUDA builds Allows to triggers (costly) CUDA jobs label Sep 11, 2020
@klevzoff klevzoff force-pushed the feature/klevzoff/tpetra-interface branch from 319bd00 to 58a47e3 Compare September 22, 2020 03:39
@klevzoff klevzoff removed flag: ready for review ci: run CUDA builds Allows to triggers (costly) CUDA jobs labels Mar 24, 2021
@rrsettgast rrsettgast marked this pull request as draft October 18, 2022 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flag: requires rebaseline Requires rebaseline branch in integratedTests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants