AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
-
Updated
Jun 6, 2024 - Python
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
Chess engine
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
This is the official implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models", and it is also an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
Pruning tool to identify small subsets of network partitions that are significant from the perspective of stochastic block model inference. This method works for single-layer and multi-layer networks, as well as for restricting focus to a fixed number of communities when desired.
AIMET GitHub pages documentation
[ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
[CVPR 2023] Towards Any Structural Pruning; LLMs / SAM / Diffusion / Transformers / YOLOv8 / CNNs
Neural Network Compression Framework for enhanced OpenVINO™ inference
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support LLaMA, Llama-2, BLOOM, Vicuna, Baichuan, etc.
OpenMMLab Model Compression Toolbox and Benchmark.
Characterization study repository for pruning, a popular way to compress a DL model. this repo also investigates optimal sparse tensor layouts for pruned nets
《李宏毅深度学习教程》(李宏毅老师推荐👍),PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases
Architecture for pruning methods analysis using pytorch prune module
PaddleSlim is an open-source library for deep model compression and architecture search.
Add a description, image, and links to the pruning topic page so that developers can more easily learn about it.
To associate your repository with the pruning topic, visit your repo's landing page and select "manage topics."