Skip to content

Provide with pre-build flash-attention package wheels using GitHub Actions

Notifications You must be signed in to change notification settings

mjun0812/flash-attention-prebuild-wheels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 

Repository files navigation

flash-attention pre-build wheels

This repository provides wheels for the pre-build flash-attention.

Since building flash-attention takes a very long time and is resource-intensive, I also build and provide combinations of CUDA and PyTorch that are not officially distributed.

The building Github Actions Workflow can be found here.

The built packages are available on the release page.

Install

pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.0.0/flash_attn-2.6.3+cu124torch2.5-cp312-cp312-linux_x86_64.whl

Packages

flash_attn-[FLASH_ATTN_VERSION]+cu[CUDA_VERSION]torch[TORCH_VERSION]-cp[PYTHON_VERSION]-cp[PYTHON_VERSION]-linux_x86_64.whl

# example: flash_attn=v2.6.3, CUDA=12.4.1, torch=2.5.1, Python=3.12
flash_attn-2.6.3+cu124torch2.5-cp312-cp312-linux_x86_64.whl

v0.0.2

Release

Flash-Attention Python PyTorch CUDA
2.4.3, 2.5.6, 2.6.3, 2.7.0.post2 3.10, 3.11, 3.12 2.0.1, 2.1.2, 2.2.2, 2.3.1, 2.4.1, 2.5.1 11.8.0, 12.1.1, 12.4.1

v0.0.1

Release

flash-attention Python PyTorch CUDA
1.0.9, 2.4.3, 2.5.6, 2.5.9, 2.6.3 3.10, 3.11, 3.12 2.0.1, 2.1.2, 2.2.2, 2.3.1, 2.4.1, 2.5.0 11.8.0, 12.1.1, 12.4.1

v0.0.0

Release

flash-attention Python PyTorch CUDA
2.4.3, 2.5.6, 2.5.9, 2.6.3 3.11, 3.12 2.0.1, 2.1.2, 2.2.2, 2.3.1, 2.4.1, 2.5.0 11.8.0, 12.1.1, 12.4.1

Original

repo

@inproceedings{dao2022flashattention,
  title={Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness},
  author={Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2022}
}
@inproceedings{dao2023flashattention2,
  title={Flash{A}ttention-2: Faster Attention with Better Parallelism and Work Partitioning},
  author={Dao, Tri},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2024}
}

About

Provide with pre-build flash-attention package wheels using GitHub Actions

Resources

Stars

Watchers

Forks

Packages

No packages published