llm-serving

Here are 49 public repositories matching this topic...

ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Updated May 13, 2024
Python

vllm-project / vllm

Star

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving mlops llm inferentia llmops llm-serving trainium

Updated May 13, 2024
Python

bentoml / OpenLLM

Star

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.

Updated May 13, 2024
Python

bentoml / BentoML

Star

The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated May 13, 2024
Python

liguodongiot / llm-action

Star

本项目旨在分享大模型相关技术原理以及实战经验。

llm llmops llm-serving llm-training llm-inference

Updated May 10, 2024
HTML

skypilot-org / skypilot

Star

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

Updated May 13, 2024
Python

🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.

Updated May 13, 2024
Python

microsoft / aici

Star

AICI: Prompts as (Wasm) Programs

rust ai wasm inference transformer language-model model-serving wasmtime llm llmops llm-serving llm-inference llm-framework

Updated May 10, 2024
Rust

predibase / lorax

Star

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated May 11, 2024
Python

ray-project / ray-llm

Star

RayLLM - LLMs on Ray

distributed-systems transformers ray serving large-language-models llm llmops llm-serving llm-inference

Updated May 6, 2024
Python

mosecorg / mosec

Star

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

python rust machine-learning deep-learning mxnet tensorflow gpu cv pytorch tts hacktoberfest model-serving nerual-network machine-learning-platform jax mlops llm llm-serving

Updated May 11, 2024
Python

hpcaitech / SwiftInfer

Star

Efficient AI Inference & Serving

deep-learning inference artificial-intelligence llama gpt llm-serving llm-inference llama2

Updated Jan 8, 2024
Python

alibaba / rtp-llm

Star

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

inference llama gpt model-serving llm llmops llm-serving

Updated May 9, 2024
C++

rohan-paul / LLM-FineTuning-Large-Language-Models

Star

LLM (Large Language Model) FineTuning

pytorch gpt-3 large-language-models llm llm-serving gpt3-turbo llm-training llm-inference open-source-llm llama2 llm-finetuning mistral-7b

Updated May 3, 2024
Jupyter Notebook

ray-project / ray-educational-materials

Star

This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.

deep-learning ray distributed-machine-learning ray-tune ray-train ray-distributed llm generative-ai ray-serve ray-data llm-serving llm-inference

Updated Feb 13, 2024
Jupyter Notebook

substratusai / runbooks

Star

Finetune LLMs on K8s by using Runbooks

kubernetes kubernetes-operator mlops ml-platform llmops llm-serving llm-training llm-inference

Updated Nov 21, 2023
Go

chenhunghan / ialacol

Star

🪶 Lightweight OpenAI drop-in replacement for Kubernetes

python kubernetes ai gpu helm cuda openai cloudnative llm langchain llm-serving llamacpp ggml gptq llm-inference

Updated Feb 5, 2024
Python

slai-labs / get-beam

Star

Run GPU inference and training jobs on serverless infrastructure that scales with you.

python data-science machine-learning deep-learning serverless hpc distributed-computing artificial-intelligence cloud-computing gpu-acceleration serverless-architectures gpu-computing cost-optimization ml-infrastructure mlops llm-serving llm-training

Updated Apr 30, 2024
Shell

HPMLL / BurstGPT

Star

A GPT-3.5 & GPT-4 Workload Trace to Optimize LLM Serving Systems

dataset mlsys llm llm-serving

Updated Apr 28, 2024

galeselee / Awesome_LLM_System-PaperList

Star

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!

system papers paperlist llm-serving llm-inference

Updated May 10, 2024

Improve this page

Add a description, image, and links to the llm-serving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-serving topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-serving

Here are 49 public repositories matching this topic...

ray-project / ray

vllm-project / vllm

bentoml / OpenLLM

bentoml / BentoML

liguodongiot / llm-action

skypilot-org / skypilot

SuperDuperDB / superduperdb

microsoft / aici

predibase / lorax

ray-project / ray-llm

mosecorg / mosec

hpcaitech / SwiftInfer

alibaba / rtp-llm

rohan-paul / LLM-FineTuning-Large-Language-Models

ray-project / ray-educational-materials

substratusai / runbooks

chenhunghan / ialacol

slai-labs / get-beam

HPMLL / BurstGPT

galeselee / Awesome_LLM_System-PaperList

Improve this page

Add this topic to your repo