[Build]How to use OpenMPI and multi GPU parallel calling AMGX? #324

ZiwuZheng · 2024-09-04T13:33:19Z

Describe the issue

I can obtain the correct results when using a single GPU to call AMGX to solve a system of linear equations (Poisson's equations), but when using openmpi and multi GPU parallel calls to AMGX, an error occurs. This error may be caused by an error in uploading the matrix when calling AMGX_matrix_upload_all_global, or by parameter errors when creating configuration files or solvers. How should I give the relevant parameters of AMGX_matrix_upload_all_global when calling AMGX with multiple GPUs? Especially how is the ghost grid set up? What are the requirements for matrices when setting up ghost grids? What are the precautions when establishing configuration files, solvers, parallel environments, etc?

A clear and concise description of what the issue is.

Environment information:

OS: [Ubuntu 20.04]
Compiler version: [e.g. mpicxx]
CMake version: [3.23]
CUDA used for AMGX compilation: [e.g. CUDA12.2]
MPI version (if applicable): [e.g. OpenMPI 4.0.3]
AMGX version or commit hash [e.g. v2.3.0]

Compilation information
mpicxx -cuda -gpu=ccall,cuda12.2 CSR_3Dplan_global.cu -L /home/zcy/software/AMGX-main-nvhpc/lib -lamgxsh -I /home/zcy/software/AMGX-main-nvhpc/include/ -L /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/comm_libs/mpi/lib/ -lmpi -lmpi_cxx -L /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/math_libs/12.2/lib64 -lcufft -I /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/math_libs/12.2/include

Issue information

at: /home/stu1/software/AMGX-main-nvhpc/src/distributed/comms_visitors3.cu:23
Stack trace:
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : ()+0x21eb697
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : amgx::ExcHalo2AsyncFunctor<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2>, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> > >::operator()(amgx::CommsMPIHostBufferStream<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&)+0
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : void amgx::CommsMPIHostBufferStream<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::do_exchange_halo<amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> > >(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Matrix<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> > const&, int)+0x206
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : void amgx::multiply<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >(amgx::Matrix<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::ViewType)+0xf45
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : void amgx::axmb<amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> > >(amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, int, int)+0x58
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : amgx::FGMRES_Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve_iteration(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x9e8
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x594
/home/stu1/software/AMGX-main-nvhpc/lib/libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve_no_throw(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGXCaught amgx exception: Vector size too small: not enough space for halo elements.
Vector: {tag = 1, size = 288}
Required size: 304

Looking forward to your answer! Best wishes for you!

The text was updated successfully, but these errors were encountered:

ZiwuZheng · 2024-09-10T12:45:55Z

How can I call AMGX's AMGX_matrix_upload_all_global function to solve a five-diagonal banded sparse matrix in parallel on multiple GPUs? How to handle halos between different GPUs during parallelism?

How is the matrix constructed in amgx_mpi_poisson5pt.c? Why isn't it a five-diagonal banded sparse matrix?

marsaev · 2024-09-10T18:58:00Z

Hi @ZiwuZheng , sorry for delayed reply. Let me answer your second post first.

How can I call AMGX's AMGX_matrix_upload_all_global function to solve a five-diagonal banded sparse matrix in parallel on multiple GPUs?
How is the matrix constructed in amgx_mpi_poisson5pt.c? Why isn't it a five-diagonal banded sparse matrix?

We don't focus on solving stencil cases alone, but rather having a solver capable of addressing unstructured cases too. Matrix is same 5pt stencil, but represented in CSR format. In order to solver your diagonal matrix you can convert it to CSR first and then pass to AMGX.

How to handle halos between different GPUs during parallelism?

AMGX will handle this for you. There are few functions for uploading your matrix data to the solver. For example, you can find information on the AMGX_matrix_upload_all_global function here: https://github.com/NVIDIA/AMGX/blob/main/doc/AMGX_Reference.pdf

marsaev · 2024-09-10T19:03:18Z

Especially how is the ghost grid set up? What are the requirements for matrices when setting up ghost grids?

If you already have your halo information set up, you might want to look at the upload functions AMGX_matrix_comm_from_maps_one_ring and AMGX_matrix_comm_from_maps in the documentation, maybe those will be closer to the behaviour that you expect comparing to upload_all_global() routine.

ZiwuZheng · 2024-09-11T01:13:07Z

Hi @ZiwuZheng , sorry for delayed reply. Let me answer your second post first.

How can I call AMGX's AMGX_matrix_upload_all_global function to solve a five-diagonal banded sparse matrix in parallel on multiple GPUs?
How is the matrix constructed in amgx_mpi_poisson5pt.c? Why isn't it a five-diagonal banded sparse matrix?

We don't focus on solving stencil cases alone, but rather having a solver capable of addressing unstructured cases too. Matrix is same 5pt stencil, but represented in CSR format. In order to solver your diagonal matrix you can convert it to CSR first and then pass to AMGX.

How to handle halos between different GPUs during parallelism?

AMGX will handle this for you. There are few functions for uploading your matrix data to the solver. For example, you can find information on the AMGX_matrix_upload_all_global function here: https://github.com/NVIDIA/AMGX/blob/main/doc/AMGX_Reference.pdf

Hi, marsaev. Thank you very much for your reply! I am currently solving a two-dimensional Poisson equation with Dirichlet and Neumann boundary conditions, which will result in a five diagonal strip sparse matrix. I have currently converted the matrix to CSR format and uploaded the CSR matrix using the AMGX_matrix_upload_all_global function. And the solution was validated on a single GPU, but there were errors when solving on multiple GPUs (as mentioned in the first issue above). After modifying the halo information of CSR matrix, the code can run, but the results are incorrect in adjacent locations on different GPUs (as shown in the figure below, which shows the results of 4 GPUs running in parallel). How can I modify the code to obtain the correct result?

marsaev · 2024-09-17T18:58:21Z

Is there a chance to obtain tiny version of your problem to visually inspect how data is handled?
I.e.

I have currently converted the matrix to CSR format
After modifying the halo information of CSR matrix

Could be possible to make problem, let's say, 16x16 and try to solve it on 2 GPUs? That way we could take a look on what data is passed to AMGX and if it aligns with API expectations.

ZiwuZheng added the build issues label Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Build]How to use OpenMPI and multi GPU parallel calling AMGX? #324

[Build]How to use OpenMPI and multi GPU parallel calling AMGX? #324

ZiwuZheng commented Sep 4, 2024

ZiwuZheng commented Sep 10, 2024

marsaev commented Sep 10, 2024 •

edited

Loading

marsaev commented Sep 10, 2024

ZiwuZheng commented Sep 11, 2024

marsaev commented Sep 17, 2024 •

edited

Loading

[Build]How to use OpenMPI and multi GPU parallel calling AMGX? #324

[Build]How to use OpenMPI and multi GPU parallel calling AMGX? #324

Comments

ZiwuZheng commented Sep 4, 2024

ZiwuZheng commented Sep 10, 2024

marsaev commented Sep 10, 2024 • edited Loading

marsaev commented Sep 10, 2024

ZiwuZheng commented Sep 11, 2024

marsaev commented Sep 17, 2024 • edited Loading

marsaev commented Sep 10, 2024 •

edited

Loading

marsaev commented Sep 17, 2024 •

edited

Loading