pruning

This is the official implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models", and it is also an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.

benchmark deployment tool evaluation pruning quantization large-language-models llm

Updated Jun 5, 2024
Python

alibaba / TinyNeuralNetwork

Star

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

deep-neural-networks deep-learning pytorch pruning model-compression model-converter quantization-aware-training post-training-quantization

Updated Jun 5, 2024
Python

ragibson / ModularityPruning

Star

Pruning tool to identify small subsets of network partitions that are significant from the perspective of stochastic block model inference. This method works for single-layer and multi-layer networks, as well as for restricting focus to a fixed number of communities when desired.

community-detection network-graph pruning stochastic-block-model multilayer-networks

Updated Jun 5, 2024
Python

quic / aimet-pages

Star

AIMET GitHub pages documentation

open-source machine-learning opensource deep-neural-networks compression deep-learning pruning quantization auto-ml network-quantization network-compression

Updated Jun 4, 2024
HTML

ROIM1998 / APT

Star

[ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

pruning bert peft roberta t5 efficient-deep-learning llm llama2 llm-finetuning peft-fine-tuning-llm

Updated Jun 4, 2024
Python

VainF / Torch-Pruning

Star

[CVPR 2023] Towards Any Structural Pruning; LLMs / SAM / Diffusion / Transformers / YOLOv8 / CNNs

pruning model-compression channel-pruning network-pruning efficient-deep-learning depgraph structural-pruning cvpr2023

Updated Jun 4, 2024
Python

openvinotoolkit / nncf

Star

Neural Network Compression Framework for enhanced OpenVINO™ inference

nlp sparsity compression deep-learning tensorflow transformers pytorch classification pruning object-detection quantization semantic-segmentation bert hawq onnx openvino mmdetection mixed-precision-training quantization-aware-training

Updated Jun 5, 2024
Python

horseee / LLM-Pruner

Star

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support LLaMA, Llama-2, BLOOM, Vicuna, Baichuan, etc.

bloom compression pruning llama language-model vicuna baichuan pruning-algorithms llm chatglm neurips-2023 llama-2

Updated Jun 4, 2024
Python

open-mmlab / mmrazor

Star

OpenMMLab Model Compression Toolbox and Benchmark.

detection pytorch classification segmentation pruning darts quantization nas knowledge-distillation spos autoslim

Updated Jun 3, 2024
Python

amikom-gace-research-group / characterize-pruning

Star

Characterization study repository for pruning, a popular way to compress a DL model. this repo also investigates optimal sparse tensor layouts for pruned nets

pruning model-compression edge-devices sparse-neural-networks characterization-study