v2.3.0
Changes:
- Increased minimum CMake version to 3.18 and adapted to use CUDA as a language, making it possible to compile with HPC SDK
- Improved performance of compute_values_kernel by ~1.3x
- Optimised block tuning for aggressive coarsening
- Added an exact coarse solve, accessible via the default scope flag "exact_coarse_solve"
- Fixed issue where latency hiding could be enabled/disabled asymmetrically across available ranks
- Fixed bug with SpGEMM fallback that deleted cuSPARSE handle incorrectly
- Fixed bug with use of shared memory in estimate_c_hat_kernel
Tested configurations:
- Linux x86-64:
-- Ubuntu 20.04, Ubuntu 18.04
-- gcc 7.4.0, gcc 9.3.0
-- OpenMPI 4.0.x
-- CUDA 11.0, 11.2 - Windows 10 x86-64:
-- MS Visual Studio 2019 (msvc 19.28)
-- MS MPI v10.1.2
-- CUDA 11.0
Note that while AMGX has support for building in Windows, testing on Windows is very limited.