Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build]How to use OpenMPI and multi GPU parallel calling AMGX? #324

Open
ZiwuZheng opened this issue Sep 4, 2024 · 5 comments
Open

[Build]How to use OpenMPI and multi GPU parallel calling AMGX? #324

ZiwuZheng opened this issue Sep 4, 2024 · 5 comments

Comments

@ZiwuZheng
Copy link

Describe the issue

I can obtain the correct results when using a single GPU to call AMGX to solve a system of linear equations (Poisson's equations), but when using openmpi and multi GPU parallel calls to AMGX, an error occurs. This error may be caused by an error in uploading the matrix when calling AMGX_matrix_upload_all_global, or by parameter errors when creating configuration files or solvers. How should I give the relevant parameters of AMGX_matrix_upload_all_global when calling AMGX with multiple GPUs? Especially how is the ghost grid set up? What are the requirements for matrices when setting up ghost grids? What are the precautions when establishing configuration files, solvers, parallel environments, etc?

A clear and concise description of what the issue is.

Environment information:

  • OS: [Ubuntu 20.04]
  • Compiler version: [e.g. mpicxx]
  • CMake version: [3.23]
  • CUDA used for AMGX compilation: [e.g. CUDA12.2]
  • MPI version (if applicable): [e.g. OpenMPI 4.0.3]
  • AMGX version or commit hash [e.g. v2.3.0]

Compilation information
mpicxx -cuda -gpu=ccall,cuda12.2 CSR_3Dplan_global.cu -L /home/zcy/software/AMGX-main-nvhpc/lib -lamgxsh -I /home/zcy/software/AMGX-main-nvhpc/include/ -L /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/comm_libs/mpi/lib/ -lmpi -lmpi_cxx -L /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/math_libs/12.2/lib64 -lcufft -I /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/math_libs/12.2/include

Issue information

at: /home/stu1/software/AMGX-main-nvhpc/src/distributed/comms_visitors3.cu:23
Stack trace:
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : ()+0x21eb697
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : amgx::ExcHalo2AsyncFunctor<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2>, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> > >::operator()(amgx::CommsMPIHostBufferStream<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&)+0
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : void amgx::CommsMPIHostBufferStream<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::do_exchange_halo<amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> > >(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Matrix<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> > const&, int)+0x206
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : void amgx::multiply<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >(amgx::Matrix<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::ViewType)+0xf45
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : void amgx::axmb<amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> > >(amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, int, int)+0x58
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : amgx::FGMRES_Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve_iteration(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x9e8
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x594
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve_no_throw(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGXCaught amgx exception: Vector size too small: not enough space for halo elements.
Vector: {tag = 1, size = 288}
Required size: 304

Looking forward to your answer! Best wishes for you!

@ZiwuZheng
Copy link
Author

How can I call AMGX's AMGX_matrix_upload_all_global function to solve a five-diagonal banded sparse matrix in parallel on multiple GPUs? How to handle halos between different GPUs during parallelism?

How is the matrix constructed in amgx_mpi_poisson5pt.c? Why isn't it a five-diagonal banded sparse matrix?

@marsaev
Copy link
Collaborator

marsaev commented Sep 10, 2024

Hi @ZiwuZheng , sorry for delayed reply. Let me answer your second post first.

How can I call AMGX's AMGX_matrix_upload_all_global function to solve a five-diagonal banded sparse matrix in parallel on multiple GPUs?
How is the matrix constructed in amgx_mpi_poisson5pt.c? Why isn't it a five-diagonal banded sparse matrix?

We don't focus on solving stencil cases alone, but rather having a solver capable of addressing unstructured cases too. Matrix is same 5pt stencil, but represented in CSR format. In order to solver your diagonal matrix you can convert it to CSR first and then pass to AMGX.

How to handle halos between different GPUs during parallelism?

AMGX will handle this for you. There are few functions for uploading your matrix data to the solver. For example, you can find information on the AMGX_matrix_upload_all_global function here: https://github.com/NVIDIA/AMGX/blob/main/doc/AMGX_Reference.pdf

@marsaev
Copy link
Collaborator

marsaev commented Sep 10, 2024

Especially how is the ghost grid set up? What are the requirements for matrices when setting up ghost grids?

If you already have your halo information set up, you might want to look at the upload functions AMGX_matrix_comm_from_maps_one_ring and AMGX_matrix_comm_from_maps in the documentation, maybe those will be closer to the behaviour that you expect comparing to upload_all_global() routine.

@ZiwuZheng
Copy link
Author

Hi @ZiwuZheng , sorry for delayed reply. Let me answer your second post first.

How can I call AMGX's AMGX_matrix_upload_all_global function to solve a five-diagonal banded sparse matrix in parallel on multiple GPUs?
How is the matrix constructed in amgx_mpi_poisson5pt.c? Why isn't it a five-diagonal banded sparse matrix?

We don't focus on solving stencil cases alone, but rather having a solver capable of addressing unstructured cases too. Matrix is same 5pt stencil, but represented in CSR format. In order to solver your diagonal matrix you can convert it to CSR first and then pass to AMGX.

How to handle halos between different GPUs during parallelism?

AMGX will handle this for you. There are few functions for uploading your matrix data to the solver. For example, you can find information on the AMGX_matrix_upload_all_global function here: https://github.com/NVIDIA/AMGX/blob/main/doc/AMGX_Reference.pdf

Hi, marsaev. Thank you very much for your reply! I am currently solving a two-dimensional Poisson equation with Dirichlet and Neumann boundary conditions, which will result in a five diagonal strip sparse matrix. I have currently converted the matrix to CSR format and uploaded the CSR matrix using the AMGX_matrix_upload_all_global function. And the solution was validated on a single GPU, but there were errors when solving on multiple GPUs (as mentioned in the first issue above). After modifying the halo information of CSR matrix, the code can run, but the results are incorrect in adjacent locations on different GPUs (as shown in the figure below, which shows the results of 4 GPUs running in parallel). How can I modify the code to obtain the correct result?
Image

@marsaev
Copy link
Collaborator

marsaev commented Sep 17, 2024

Is there a chance to obtain tiny version of your problem to visually inspect how data is handled?
I.e.

I have currently converted the matrix to CSR format
After modifying the halo information of CSR matrix

Could be possible to make problem, let's say, 16x16 and try to solve it on 2 GPUs? That way we could take a look on what data is passed to AMGX and if it aligns with API expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants