Skip to content
@bentoml

BentoML

The easiest way to build fast and reliable AI serving systems

Welcome to BentoML 👋 Twitter Follow Slack

github banner

What's cooking? 👩‍🍳

🍱 BentoML: The Unified Serving Framework for AI/ML Systems

BentoML is a Python library for building online serving systems optimized for AI apps and model inference. It supports serving any model format/runtime and custom Python code, offering the key primitives for serving optimizations, task queues, batching, multi-model chains, distributed orchestration, and multi-GPU serving.

🎨 Examples: Learn by doing!

A collection of examples for BentoML, from deploying OpenAI-compatible LLM service, to building voice phone calling agents and RAG applications. Use these examples to learn how to use BentoML and build your own solutions.

🦾 OpenLLM: Self-hosting Large Language Models Made Easy

Run any open-source LLMs (Llama, Mistral, Qwen, Phi and more) or custom fine-tuned models as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference performance, and a simplified workflow for production-grade cloud deployment.

☁️ BentoCloud: Unified Inference Platform for any model, on any cloud

BentoCloud is the easist way to build and deploy with BentoML, in our cloud or yours. It brings fast and scalable inference infrastructure into any cloud, allowing AI teams to move 10x faster in building AI applications with ML/AI models, while reducing compute cost - by maxmizing compute utilization, fast GPU autoscaling, minimimal coldstarts and full observability. Sign up today!.

Get in touch 💬

👉 Join our Slack community!

👀 Follow us on X @bentomlai and LinkedIn

📖 Read our blog

Pinned Loading

  1. BentoML BentoML Public

    The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

    Python 7.2k 797

  2. OpenLLM OpenLLM Public

    Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

    Python 10.3k 653

  3. BentoDiffusion BentoDiffusion Public

    BentoDiffusion: A collection of diffusion models served with BentoML

    Python 342 25

  4. BentoVLLM BentoVLLM Public

    Self-host LLMs with vLLM and BentoML

    Python 79 12

  5. comfy-pack comfy-pack Public

    A comprehensive toolkit for reliably locking, packing and deploying environments for ComfyUI workflows.

    Python 65 9

  6. BentoVoiceAgent BentoVoiceAgent Public

    Build Phone Calling Voice Agent fully powered by open source models.

    Python 9 1

Repositories

Showing 10 of 103 repositories
  • comfy-pack Public

    A comprehensive toolkit for reliably locking, packing and deploying environments for ComfyUI workflows.

    bentoml/comfy-pack’s past year of commit activity
    Python 65 Apache-2.0 9 0 0 Updated Dec 24, 2024
  • BentoML Public

    The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

    bentoml/BentoML’s past year of commit activity
    Python 7,241 Apache-2.0 797 153 10 Updated Dec 24, 2024
  • BentoBLIP Public

    how to build an image captioning application on top of a BLIP model with BentoML

    bentoml/BentoBLIP’s past year of commit activity
    Python 3 2 0 2 Updated Dec 24, 2024
  • OpenLLM Public

    Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

    bentoml/OpenLLM’s past year of commit activity
    Python 10,272 Apache-2.0 653 19 0 Updated Dec 24, 2024
  • BentoSGLang Public
    bentoml/BentoSGLang’s past year of commit activity
    Python 1 1 0 0 Updated Dec 23, 2024
  • yatai-image-builder Public

    🐳 Build OCI images for Bentos in k8s

    bentoml/yatai-image-builder’s past year of commit activity
    Go 15 10 4 7 Updated Dec 23, 2024
  • BentoCLIP Public

    building a CLIP application using BentoML

    bentoml/BentoCLIP’s past year of commit activity
    Python 8 2 0 2 Updated Dec 23, 2024
  • BentoSentenceTransformers Public

    how to build a sentence embedding application using BentoML

    bentoml/BentoSentenceTransformers’s past year of commit activity
    Python 6 2 0 1 Updated Dec 23, 2024
  • BentoWhisperX Public
    bentoml/BentoWhisperX’s past year of commit activity
    Python 11 5 0 8 Updated Dec 23, 2024
  • BentoXTTS Public

    how to build an text-to-speech application using BentoML

    bentoml/BentoXTTS’s past year of commit activity
    Python 5 2 1 3 Updated Dec 23, 2024