This library provides a high-performance implementation of the Discrete time Algebraic Riccati Equation (DARE) solver. The DARE solver is used in Adaptive Optics (AO) system of extremely large ground-based telescopes with instruments such as MAVIS. It has significant better application's peformance than other traditional methods. However, the computational cost is also much higher. According to our benchmark on MAVIS specifications, the application has best accuracy when the matrix dimension is more than 20k x 20k. Our high-performance DARE solver exploits advanced linear algebra libraries associated with runtime system, such as Chameleon or DPLASMA, to deploy the DARE algorithm onto shared-memory multicore systems equipped with multiple GPU hardware accelerators.
- CMake (>=3.19)
- CUDA (>=11.0)
- Intel MKL (>=2018)
- OpenMPI (>=4.0)
- Chameleon or DPLASMA runtime system
We provide two install scripts for user to download dependencies and compile the dare solver library.
Input files available at: https://kaust-my.sharepoint.com/:f:/g/personal/ltaiefh_kaust_edu_sa/EgaqEKCdWURBgDq3x2CNsXQBteQs4x0VIdiiHJcY8AVyVA?e=EzETNF
The usage is provided by ./ddare --help
:
./ddare [options]
Options are:
--help Show this help
--n=X #samples x #layers. states (default: 500)
--ninstr=X Instrument dimension. measurements (default: 500)
--nb=X tile size. (default: 128)
--datapath=X path to the data folder
this folder must contain the following files:
At.fits
BinvRt.fits
Btinitial.fits
Rt.fits
Qt.fits
the parameters '--n' and '--ninstr' will be overwritten by the
corresponding dimensions of the files
--threads=X Number of CPU workers (default: _SC_NPROCESSORS_ONLN)
--gpus=X Number of GPU workers (default: 0)
--sync Synchronize the execution of all calls (default: async)
--nooptalgo Leverage matrix structure in the algorithm (default: optalgo)
--profile Profile the execution of all calls (default: no profile)
--check Check numerical correctness (default: no check)
The StarPU runtime also needs to be set up. Example:
STARPU_SILENT=0 OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 STARPU_CUDA_PIPELINE=4 STARPU_NWORKER_PER_CUDA=4 STARPU_SCHED=prio STARPU_CALIBRATE=1 ./ddare --threads=31 --n=7090 --ninstr=19078 --nb=320 --ib=80 --gpus=1
where n
and ninstr
are the dimensions of the problem. In this case, the matrices will be generated randomly
Or using input matrices (as fits files):
STARPU_SILENT=1 OMP_NUM_THREADS=8 MKL_NUM_THREADS=8 STARPU_CUDA_PIPELINE=2 STARPU_NWORKER_PER_CUDA=12 STARPU_SCHED=$j STARPU_CALIBRATE=1 ./ddare --threads=4 --nb=720 --ib=180 --gpus=1 --datapath=<path/to/data>
For more information and questions please send email to [email protected] and [email protected].