Skip to content

lessw2020/QuantFour_AdamW_Cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QuantFour_AdamW

Triton does not support thread indexing and so had to move to Cuda for parallelized binary search support with quantization.
Will HIP'ify for AMD support.

This is a productionized implementation of the paper:
"Memory Efficient Optimizers with 4-bit States"
Bingrui Li, Jianfei Chen, Jun Zhu
https://arxiv.org/abs/2309.01507