Skip to content
Matthew Martineau edited this page Nov 14, 2023 · 4 revisions

Build AmgX

AmgX uses CMake, so you can follow a standard CMake build process.

A key parameter is CUDA_ARCH, which accepts the numerical value for the architecture (e.g. 80 for SM80). The parameter currently accepts up to 90.

An example script for building AmgX:

#!/bin/bash -ex

BUILD_TYPE=RelWithTraces

mkdir $BUILD_TYPE
cd $BUILD_TYPE

cmake -DCMAKE_INSTALL_PREFIX=../install/ \
   -DCMAKE_C_COMPILER=mpicc \
   -DCMAKE_CXX_COMPILER=mpic++ \
   -DCMAKE_CUDA_HOST_COMPILER=mpic++ 
   -DCUDA_ARCH=90 \
   -DCMAKE_BUILD_TYPE=$BUILD_TYPE ..

make VERBOSE=true -j
make install

Boilerplate

You can find examples of the AmgX API in the examples directory, e.g., amgx_mpi_capi.c.

MPI_Init(&argc, &argv);    // Initialise MPI if you require multi-GPU
cudaSetDevice(local_rank); // Set the CUDA device, potentially to the local rank (if 1 rank per GPU)
AMGX_SAFE_CALL(AMGX_initialize());  // Initialise AmgX
AMGX_SAFE_CALL(AMGX_register_print_callback(&print_callback)); // Pass 

Example print callback function that ensures only 1 MPI process outputs to stdout:

void print_callback(const char *msg, int length)
{
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (rank == 0) { printf("%s", msg); }
}

Configuration Parameters

Performance

  • "min_rows_latency_hiding": <number_of_rows>
    • declared in the outermost scope of the configuration file (i.e., in the same scope as "config_version": 2)
    • Enables latency hiding that will be disabled after <number_of_rows>
      • Latency hiding overlaps communication and computation, with the <number_of_rows> allowing the feature to be disabled when there is not enough compute to overlap
      • Typically a value of somewhere around 30-50000 is reasonable, but depends upon the problem and GPU

Helpful API calls

Output matrix to file

For debugging and sharing test cases, API calls are provided to output matrices to file:

AMGX_write_system(const AMGX_matrix_handle mtx, const AMGX_vector_handle rhs, const AMGX_vector_handle sol, const char *filename)

AMGX_write_system_distributed(const AMGX_matrix_handle mtx, const AMGX_vector_handle rhs, const AMGX_vector_handle sol, const char *filename, int allocated_halo_depth, int num_partitions, const int *partition_sizes, int partition_vector_size, const int *partition_vector)

In order to control the type of output, there is a configuration parameter, matrix_writer, which you can set to either matrixmarket or binary. The matrix market format is a simple readable ASCII format, useful if you want to look at the matrix, while the binary format is suited for cases where the matrix data is large.

To set the writer to binary, add the following to the outermost scope of the configuration file:

"matrix_writer": "binary"

Clone this wiki locally