Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Batching in LiteLLM for models that do not have native batching support. #7194

Open
markoff-dev opened this issue Dec 12, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@markoff-dev
Copy link

The Feature

I wish that LiteLLM can provide an automatic batching at the library level itself, allowing them to execute queries from the transferred file in the same way as it is implemented for models with native batching support.

A similar request was made here: #361 (comment)

Question:
Are you considering adding similar functionality at the LiteLLM level?

Motivation, pitch

Problem:
Many LiteLLM usage scenarios require multiple model queries to process large datasets. However, some models do not support native batching, which leads to:

  • Increased network overhead (a lot of requests */chat/completions).
  • Longer response time for a large number of requests.

At the moment, LiteLLM supports batching for Azure OpenAI, OpenAI, Vertex AI providers, but this feature is not available for models that do not have batching support (for example, models with the OpenAI interface do not have /file and /batches methods).

Are you a ML Ops Team?

No

Twitter / LinkedIn details

No response

@markoff-dev markoff-dev added the enhancement New feature or request label Dec 12, 2024
@markoff-dev
Copy link
Author

@krrishdholakia maybe you can tell me the answer to the question:

Are you considering adding similar functionality at the LiteLLM level?

Or who can I contact with this question?

@krrishdholakia
Copy link
Contributor

Hey @markoff-dev missed this. On the pain points:

  • Increased network overhead (a lot of requests */chat/completions).
  • Longer response time for a large number of requests.

For providers that don't support it, how would you anticipate litellm solving this problem?

Here's how you can run batch completions on the proxy today - https://docs.litellm.ai/docs/proxy/user_keys#beta-batch-completions---pass-multiple-models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants