Skip to content

A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.

License

Notifications You must be signed in to change notification settings

GreenBitAI/bitorch-engine

Repository files navigation

BITorch Engine (BIE)

Bitorch Engine is a cutting-edge computation library for neural networks that enhances PyTorch by integrating specialized layers and functions tailored for Low-Bit quantized neural network operations. It harnesses the robust capabilities of high-performance computing platforms, including GPUs and CPUs, and is designed with future adaptability in mind to extend support to emerging NPU hardware technologies.

More about BIE

Bitorch Engine offers a suite of optimized neural network components that are designed to leverage the full power of modern GPUs. This includes custom CUDA kernels, quantization-aware training mechanisms, and a variety of layer types that are specifically crafted to reduce computational overhead while maintaining high precision and accuracy in deep learning models.

Building on the foundational strengths of Bitorch Engine, the technology has been employed in pioneering projects that push the boundaries of neural network training and inference. For instance:

  • green-bit-llm-trainer: In this project, BIE represents a significant leap in the field of Large Language Model (LLM) fine-tuning. Unlike traditional approaches that either quantize a fully trained model or introduce a few additional trainable parameters for LoRA style fine-tuning, this project innovates by directly fine-tuning the quantized parameters of LLMs. This paradigm shift allows for the full-scale quantization fine-tuning of LLMs, ensuring that the training process tightly integrates with the quantization schema from the outset.
  • green-bit-llm-inference also showcase the BIE's adeptness at supporting inference for models quantized from 4 to 2-bits without any significant loss in accuracy compared to the original 32 or 16-bits models. It stands as a testament to BIE's capability to maintain the delicate balance between model size, computational efficiency, and accuracy, addressing one of the key challenges in deploying sophisticated neural networks in resource-constrained environments.

These projects exemplify the practical applications of Bitorch Engine and underscore its flexibility and efficiency for modern AI research and development. However, keep in mind that BIE is still in an early beta stage, see our roadmap below.

Roadmap

Our goals for BITorch engine in the future are (not necessarily in this order):

  • Add support for (Distributed) Data Parallel training strategies (for selected layers)
  • Provide better support for Metal kernels
  • Improve our existing code, so it becomes even faster, more memory-efficient and easier to use
  • Binary pip releases which include the built extensions

We are planning to release new features and improvements as they become available, but this also means breaking changes can occur in the API during our beta stage.

Installation

The requirements are:

  • A compiler that fully supports C++17, such as clang or gcc (gcc 9.4.0 or newer is required, but gcc 12.x is not supported yet)
  • Python 3.9 or later
  • PyTorch 1.8 or later

Please check your operating system's options for the C++ compiler. For more detailed information, you can check the requirements to build PyTorch from source. In addition, for layers to speed up on specific hardware (such as CUDA devices, or MacOS M1/2/3 chips), we recommend installing:

  • CUDA Toolkit 11.8 or 12.1 for CUDA accelerated layers
  • MLX for mlx-based layers on MacOS
  • CUTLASS for cutlass-based layers

Binary Release

A first experimental binary release for Linux with CUDA 12.1 is ready. It only supports GPUs with CUDA compute capability with 8.6 or higher (check here). For MacOS or lower compute capability, build the package from source (additional binary release options are planned in the future). We recommend to create a conda environment to manage the installed CUDA version and other packages:

  1. Create Environment for Python 3.10 and activate it:
conda create -y --name bitorch-engine python=3.10
conda activate bitorch-engine

As an alternative, you can also store the environment in a relative path.

Click to here to expand the instructions for this.
export BITORCH_WORKSPACE="${HOME}/bitorch-workspace"
mkdir -p "${BITORCH_WORKSPACE}" && cd "${BITORCH_WORKSPACE}"
conda create -y --prefix ./conda-env python=3.10
conda activate ./conda-env
  1. Install CUDA (if it is not installed already on the system):
conda install -y -c "nvidia/label/cuda-12.1.0" cuda-toolkit
  1. Install our customized torch that allows gradients on INT tensors and install it with pip (this URL is for CUDA 12.1 and Python 3.10 - you can find other versions here) together with bitorch engine:
pip install \
  "https://packages.greenbit.ai/whl/cu121/torch/torch-2.3.0-cp310-cp310-linux_x86_64.whl" \
  "https://packages.greenbit.ai/whl/cu121/bitorch-engine/bitorch_engine-0.2.6-cp310-cp310-linux_x86_64.whl"

Build From Source

We provide instructions for the following options:

We recommend managing your BITorch Engine installation in a conda environment (otherwise you should adapt/remove certain variables, e.g. CUDA_HOME). You may want to keep everything (environment, code, etc.) in one directory or use the default directory for conda environments. You may wish to adapt the CUDA version to 12.1 where applicable.

Conda on Linux (with CUDA)

To use these instructions, you need to have conda and a suitable C++ compiler installed.

  1. Create Environment for Python 3.9 and activate it:
conda create -y --name bitorch-engine python=3.9
conda activate bitorch-engine
  1. Install CUDA
conda install -y -c "nvidia/label/cuda-11.8.0" cuda-toolkit
  1. Install our customized torch that allows gradients on INT tensors and install it with pip (this URL is for CUDA 11.8 and Python 3.9 - you can find other versions here):
pip install "https://packages.greenbit.ai/whl/cu118/torch/torch-2.1.0-cp39-cp39-linux_x86_64.whl"
  1. To use cutlass layers, you should also install CUTLASS 2.8.0 (from source), adjust CUTLASS_HOME (this is where we clone and install cutlass) (if you have older or newer GPUs you may need to add your CUDA compute capability in CUTLASS_NVCC_ARCHS):
export CUTLASS_HOME="/some/path"
mkdir -p "${CUTLASS_HOME}"
git clone --depth 1 --branch "v2.8.0" "https://github.com/NVIDIA/cutlass.git" --recursive ${CUTLASS_HOME}/source
mkdir -p "${CUTLASS_HOME}/build" && mkdir -p "${CUTLASS_HOME}/install"
cd "${CUTLASS_HOME}/build"
cmake ../source -DCMAKE_INSTALL_PREFIX="${CUTLASS_HOME}/install" -DCUTLASS_ENABLE_TESTS=OFF -DCUTLASS_ENABLE_EXAMPLES=OFF -DCUTLASS_NVCC_ARCHS='75;80;86'
make -j 4
cmake --install .

If you have difficulties installing cutlass, you can check the official documentation, use the other layers without installing it or try the docker installation.

As an alternative to the instructions above, you can also store the environment and clone all repositories within one "root" directory.

Click to here to expand the instructions for this.
  1. Set workspace dir (use an absolute path!):
export BITORCH_WORKSPACE="${HOME}/bitorch-workspace"
mkdir -p "${BITORCH_WORKSPACE}" && cd "${BITORCH_WORKSPACE}"
  1. Create Environment for Python 3.9 and activate it:
conda create -y --prefix ./conda-env python=3.9
conda activate ./conda-env
  1. Install CUDA
conda install -y -c "nvidia/label/cuda-11.8.0" cuda-toolkit
  1. Install our customized torch that allows gradients on INT tensors and install it with pip (this url is for CUDA 11.8 and Python 3.9 - you can find other versions here):
pip install "https://packages.greenbit.ai/whl/cu118/torch/torch-2.1.0-cp39-cp39-linux_x86_64.whl"
  1. To use cutlass layers, you should also install CUTLASS 2.8.0 (if you have older or newer GPUs you may need to add your CUDA compute capability in CUTLASS_NVCC_ARCHS):
export CUTLASS_HOME="${BITORCH_WORKSPACE}/cutlass"
mkdir -p "${CUTLASS_HOME}"
git clone --depth 1 --branch "v2.8.0" "https://github.com/NVIDIA/cutlass.git" --recursive ${CUTLASS_HOME}/source
mkdir -p "${CUTLASS_HOME}/build" && mkdir -p "${CUTLASS_HOME}/install"
cd "${CUTLASS_HOME}/build"
cmake ../source -DCMAKE_INSTALL_PREFIX="${CUTLASS_HOME}/install" -DCUTLASS_ENABLE_TESTS=OFF -DCUTLASS_ENABLE_EXAMPLES=OFF -DCUTLASS_NVCC_ARCHS='75;80;86'
make -j 4
cmake --install .
cd "${BITORCH_WORKSPACE}"

If you have difficulties installing cutlass, you can check the official documentation, use the other layers without installing it or try the docker installation.

After setting up the environment, clone the code and build with pip (to hide the build output remove -v):

# make sure you are in a suitable directory, e.g. your bitorch workspace
git clone --recursive https://github.com/GreenBitAI/bitorch-engine
cd bitorch-engine
# only gcc versions 9.x, 10.x, 11.x are supported
# to select the correct gcc, use:
# export CC=gcc-11 CPP=g++-11 CXX=g++-11
CPATH="${CUTLASS_HOME}/install/include" CUDA_HOME="${CONDA_PREFIX}" pip install -e . -v

Docker (with CUDA)

You can also use our prepared Dockerfile to build a docker image (which includes building the engine under /bitorch-engine):

cd docker
docker build -t bitorch/engine .
docker run -it --rm --gpus all --volume "/path/to/your/project":"/workspace" bitorch/engine:latest

Check the docker readme for options and more details.

Conda on MacOS (with MLX)

  1. We recommend to create a virtual environment for and activate it. In the following example we use a conda environment for python 3.9, but virtualenv should work as well.
conda create -y --name bitorch-engine python=3.9
conda activate bitorch-engine
  1. Install our customized torch that allows gradients on INT tensors and install it with pip (this URL is for macOS with Python 3.9 - you can find other versions here):
pip install "https://packages.greenbit.ai/whl/macosx/torch/torch-2.2.1-cp39-none-macosx_11_0_arm64.whl"
  1. For MacOS users and to use OpenMP acceleration, install OpenMP with Homebrew and configure the environment:
brew install libomp
# during libomp installation it should remind you, you need something like this:
export LDFLAGS="-L$(brew --prefix)/opt/libomp/lib"
export CPPFLAGS="-I$(brew --prefix)/opt/libomp/include"
  1. To use the mlx accelerated MPQLinearLayer, you need to install the python library.
# use one of the following, to either install with pip or conda:
pip install mlx==0.4.0
conda install conda-forge::mlx=0.4.0

Currently, we only tested version 0.4.0. However, newer versions might also work. To train the MPQLinearLayer you need to install our custom PyTorch version (see steps above). Without it, you need to specify requires_grad=False when initializing MPQLinearLayer. 5. You should now be able to build with:

git clone --recursive https://github.com/GreenBitAI/bitorch-engine
cd bitorch-engine
pip install -e . -v

Build options

Building Specific Extensions

While developing, a specific cpp/cuda extension can be (re-)build, by using the environment variable BIE_BUILD_ONLY, like so:

BIE_BUILD_ONLY="bitorch_engine/layers/qlinear/binary/cpp" pip install -e . -v

It needs to a relative path to one extension directory.

Building for a Specific CUDA Architecture

To build for a different CUDA Arch, use the environment variable BIE_CUDA_ARCH (e.g. use 'sm_75', 'sm_80', 'sm_86'):

BIE_CUDA_ARCH="sm_86" pip install -e . -v

Force Building CUDA Modules

If you have CUDA development libraries installed, but torch.cuda.is_available() is False, e.g. in HPC or docker environments, you can still build the extensions that depend on CUDA, by setting BIE_FORCE_CUDA="true":

BIE_FORCE_CUDA="true" pip install -e . -v

Skip Library File Building

If you just want to avoid rebuilding any files, you can set BIE_SKIP_BUILD:

BIE_SKIP_BUILD="true" python3 -m build --no-isolation --wheel

This would create a wheel and package .so files without trying to rebuild them.

Development

To adjust the build options or address build failures, modify the configurations in cpp_extension.py/ cuda_extension.py.

You may want to clean the build output before rebuilding, which may help to avoid errors and/or install development requirements:

python setup.py clean
# now build like usually, use ".[dev]" for development requirements, e.g.
CUDA_HOME="${CONDA_PREFIX}" pip install -e ".[dev]" -v

You can run our tests with pytest:

pytest

Cuda Device Selection

To select a certain CUDA device, set the environment variable BIE_DEVICE, e.g.:

export BIE_DEVICE=1  # This selects the second CUDA device, as indexing starts from 0.

Documentation

Check out the Documentation for API reference.

Examples

Contributors

BIE is under active development and currently maintained by contributors: Haojin Yang, Joseph Bethge, Nianhui Guo, Maximilian Schulze, Hong Guo, Paul Mattes.

Check our contributing guide to learn about how to contribute to the project.

License

Bitorch Engine is made available under the Apache 2.0 License. See the LICENSE file for details.

Citation

If you use our approach in your research, please cite our work as follows:

@article{bitorch_engine,
  title={Bitorch Engine: Streamlining AI with Open-Source Low-Bit Quantization},
  author={Yang, Haojin and Bethge, Joseph and Guo, Nianhui and Schulze, Maximilian and Guo, Hong},
  journal={https://github.com/GreenBitAI/bitorch-engine},
  year={2024}
}

References and Acknowledgements

This project builds upon or uses concepts from the following open-source projects:

We extend our heartfelt gratitude to the developers of these projects for their invaluable contributions to the open-source community. Without their exceptional work, none of this would be possible. The corresponding licenses of the reference projects can be found in the licenses directory of the source tree.

Open Source Software Acknowledgment

This project makes use of open source software (OSS) components. The original code of these components is kept under their respective licenses and copyrights. We are grateful to the open-source community for making these resources available. For specific information about each component's license, please refer to the corresponding sections within our project documentation or the direct references provided in the "References" section of this document.

We endeavor to comply with all open source licenses and their requirements, including proper acknowledgment and notice. If there are any concerns or questions regarding our license acknowledgments, please reach out to us for clarification.

About

A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.

Resources

License

Stars

Watchers

Forks

Packages

No packages published