Single-node data parallelism in Julia with CUDA
-
Updated
May 6, 2024 - Julia
Single-node data parallelism in Julia with CUDA
EUMaster4HPC student challenge group 7 - EuroHPC Summit 2024 Antwerp
Advanced High Performance Computing in C with OpenMP, CUDA, MPI and NCCL. The folder project includes my final project for the special course. I implemented a Jacobi-solver for the Poisson partial differential problem both using OpenMP in the CPU, using CUDA on the GPU and using CUDA, MPI and NCCL on multiple GPUs.
Blood Cell Simulation server
Default Docker image used to run experiments on csquare.run.
Distributed deep learning framework based on pytorch/numba/nccl and zeromq.
Installation script to install Nvidia driver and CUDA automatically in Ubuntu
jupyter/scipy-notebook with CUDA Toolkit, cuDNN, NCCL, and TensorRT
Librería de operaciones matemáticas con matrices multi-gpu utilizando Nvidia NCCL.
Hands-on Labs in Parallel Computing
Blink+: Increase GPU group bandwidth by utilizing across tenant NVLink.
Experiments with low level communication patterns that are useful for distributed training.
use ncclSend ncclRecv realize ncclSendrecv ncclGather ncclScatter ncclAlltoall
NCCL Examples from Official NVIDIA NCCL Developer Guide.
Add a description, image, and links to the nccl topic page so that developers can more easily learn about it.
To associate your repository with the nccl topic, visit your repo's landing page and select "manage topics."