Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
bloom falcon moe gemma mistral mixture-of-experts model-quantization multi-gpu-inference m2m100 llamacpp llm-inference internlm llama2 qwen baichuan2 mixtral phi-2 deepseek minicpm
-
Updated
Mar 15, 2024 - C++