Skip to content
View NicoNico6's full-sized avatar
🏠
Working from home
🏠
Working from home
  • Hasso Plattner Institute (HPI)
  • Potsdam, German

Block or report NicoNico6

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs

Python 76 6 Updated Nov 25, 2024

A generative speech model for daily dialogue.

Python 33,514 3,637 Updated Jan 7, 2025

Repository and hands-on workshop on how to develop applications with local LLMs

Jupyter Notebook 391 65 Updated Jul 3, 2024

A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.

Python 78 8 Updated Jan 9, 2025

A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.

Python 28 5 Updated Jun 25, 2024

Run GreenBitAI's Quantized LLMs on Apple Devices with MLX

Python 15 3 Updated Jan 8, 2025

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 367 44 Updated Sep 11, 2024

Ingest files for retrieval augmented generation (RAG) with open-source Large Language Models (LLMs), all without 3rd parties or sensitive data leaving your network.

Python 565 66 Updated Aug 12, 2024

每天阅读过的论文的简要笔记

206 9 Updated Jan 5, 2025

A fast inference library for running LLMs locally on modern consumer-class GPUs

Python 3,828 290 Updated Jan 9, 2025

Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs

Python 110 5 Updated Jan 11, 2024

MobiLlama : Small Language Model tailored for edge devices

Python 616 48 Updated Mar 3, 2024

Strong and Open Vision Language Assistant for Mobile Devices

Python 1,098 71 Updated Apr 15, 2024

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 261 25 Updated Oct 10, 2024

Official implementation of Half-Quadratic Quantization (HQQ)

Python 729 73 Updated Jan 7, 2025

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,747 522 Updated Dec 14, 2024
Python 124 14 Updated Jan 22, 2024
Python 1 Updated Nov 21, 2023

Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

Python 263 22 Updated Nov 3, 2023

PB-LLM: Partially Binarized Large Language Models

Python 149 10 Updated Nov 20, 2023

[ICLR 2024] Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation

Python 149 17 Updated Mar 1, 2024

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 6,738 373 Updated Jul 11, 2024

A collection of phenomenons observed during the scaling of big foundation models, which may be developed into consensus, principles, or laws in the future

275 19 Updated Aug 13, 2023

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models

Python 36 3 Updated Aug 4, 2023

Inference code for Llama models

Python 57,148 9,650 Updated Aug 18, 2024

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 37,613 4,637 Updated Jan 9, 2025

Awesome LLM compression research papers and tools.

1,304 86 Updated Jan 3, 2025

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Python 667 43 Updated Aug 13, 2024
Next
Showing results