-
Hasso Plattner Institute (HPI)
- Potsdam, German
Stars
Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
A generative speech model for daily dialogue.
Repository and hands-on workshop on how to develop applications with local LLMs
A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.
A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.
Run GreenBitAI's Quantized LLMs on Apple Devices with MLX
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Ingest files for retrieval augmented generation (RAG) with open-source Large Language Models (LLMs), all without 3rd parties or sensitive data leaving your network.
A fast inference library for running LLMs locally on modern consumer-class GPUs
Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs
MobiLlama : Small Language Model tailored for edge devices
Strong and Open Vision Language Assistant for Mobile Devices
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Official implementation of Half-Quadratic Quantization (HQQ)
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
PB-LLM: Partially Binarized Large Language Models
[ICLR 2024] Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
A collection of phenomenons observed during the scaling of big foundation models, which may be developed into consensus, principles, or laws in the future
AlpinDale / QuIP-for-Llama
Forked from Cornell-RelaxML/QuIPCode for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Awesome LLM compression research papers and tools.
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization