Skip to content

Commit

Permalink
Support budget/rate limit tiers for keys (#7429)
Browse files Browse the repository at this point in the history
* feat(proxy/utils.py): get associated litellm budget from db in combined_view for key

allows user to create rate limit tiers and associate those to keys

* feat(proxy/_types.py): update the value of key-level tpm/rpm/model max budget metrics with the associated budget table values if set

allows rate limit tiers to be easily applied to keys

* docs(rate_limit_tiers.md): add doc on setting rate limit / budget tiers

make feature discoverable

* feat(key_management_endpoints.py): return litellm_budget_table value in key generate

make it easy for user to know associated budget on key creation

* fix(key_management_endpoints.py): document 'budget_id' param in `/key/generate`

* docs(key_management_endpoints.py): document budget_id usage

* refactor(budget_management_endpoints.py): refactor budget endpoints into separate file - makes it easier to run documentation testing against it

* docs(test_api_docs.py): add budget endpoints to ci/cd doc test + add missing param info to docs

* fix(customer_endpoints.py): use new pydantic obj name

* docs(user_management_heirarchy.md): add simple doc explaining teams/keys/org/users on litellm

* Litellm dev 12 26 2024 p2 (#7432)

* (Feat) Add logging for `POST v1/fine_tuning/jobs`  (#7426)

* init commit ft jobs logging

* add ft logging

* add logging for FineTuningJob

* simple FT Job create test

* (docs) - show all supported Azure OpenAI endpoints in overview  (#7428)

* azure batches

* update doc

* docs azure endpoints

* docs endpoints on azure

* docs azure batches api

* docs azure batches api

* fix(key_management_endpoints.py): fix key update to actually work

* test(test_key_management.py): add e2e test asserting ui key update call works

* fix: proxy/_types - fix linting erros

* test: update test

---------

Co-authored-by: Ishaan Jaff <[email protected]>

* fix: test

* fix(parallel_request_limiter.py): enforce tpm/rpm limits on key from tiers

* fix: fix linting errors

* test: fix test

* fix: remove unused import

* test: update test

* docs(customer_endpoints.py): document new model_max_budget param

* test: specify unique key alias

* docs(budget_management_endpoints.py): document new model_max_budget param

* test: fix test

* test: fix tests

---------

Co-authored-by: Ishaan Jaff <[email protected]>
  • Loading branch information
krrishdholakia and ishaan-jaff authored Dec 27, 2024
1 parent 12c4e7e commit 539f166
Show file tree
Hide file tree
Showing 25 changed files with 761 additions and 373 deletions.
4 changes: 2 additions & 2 deletions docs/my-website/docs/proxy/customers.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@ import Image from '@theme/IdealImage';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# 🙋‍♂️ Customers
# 🙋‍♂️ Customers / End-User Budgets

Track spend, set budgets for your customers.

## Tracking Customer Credit
## Tracking Customer Spend

### 1. Make LLM API call w/ Customer ID

Expand Down
68 changes: 68 additions & 0 deletions docs/my-website/docs/proxy/rate_limit_tiers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# ✨ Budget / Rate Limit Tiers

Create tiers with different budgets and rate limits. Making it easy to manage different users and their usage.

:::info

This is a LiteLLM Enterprise feature.

Get a 7 day free trial + get in touch [here](https://litellm.ai/#trial).

See pricing [here](https://litellm.ai/#pricing).

:::


## 1. Create a budget

```bash
curl -L -X POST 'http://0.0.0.0:4000/budget/new' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"budget_id": "my-test-tier",
"rpm_limit": 0
}'
```

## 2. Assign budget to a key

```bash
curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"budget_id": "my-test-tier"
}'
```

Expected Response:

```json
{
"key": "sk-...",
"budget_id": "my-test-tier",
"litellm_budget_table": {
"budget_id": "my-test-tier",
"rpm_limit": 0
}
}
```

## 3. Check if budget is enforced on key

```bash
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-...' \ # 👈 KEY from step 2.
-d '{
"model": "<REPLACE_WITH_MODEL_NAME_FROM_CONFIG.YAML>",
"messages": [
{"role": "user", "content": "hi my email is ishaan"}
]
}'
```


## [API Reference](https://litellm-api.up.railway.app/#/budget%20management)

13 changes: 13 additions & 0 deletions docs/my-website/docs/proxy/user_management_heirarchy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
import Image from '@theme/IdealImage';


# User Management Heirarchy

<Image img={require('../../img/litellm_user_heirarchy.png')} style={{ width: '100%', maxWidth: '4000px' }} />

LiteLLM supports a heirarchy of users, teams, organizations, and budgets.

- Organizations can have multiple teams. [API Reference](https://litellm-api.up.railway.app/#/organization%20management)
- Teams can have multiple users. [API Reference](https://litellm-api.up.railway.app/#/team%20management)
- Users can have multiple keys. [API Reference](https://litellm-api.up.railway.app/#/budget%20management)
- Keys can belong to either a team or a user. [API Reference](https://litellm-api.up.railway.app/#/end-user%20management)
Binary file added docs/my-website/img/litellm_user_heirarchy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 18 additions & 5 deletions docs/my-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ const sidebars = {
{
type: "category",
label: "Architecture",
items: ["proxy/architecture", "proxy/db_info", "router_architecture"],
items: ["proxy/architecture", "proxy/db_info", "router_architecture", "proxy/user_management_heirarchy"],
},
{
type: "link",
Expand Down Expand Up @@ -99,8 +99,13 @@ const sidebars = {
},
{
type: "category",
label: "Spend Tracking + Budgets",
items: ["proxy/cost_tracking", "proxy/users", "proxy/custom_pricing", "proxy/team_budgets", "proxy/billing", "proxy/customers"],
label: "Spend Tracking",
items: ["proxy/cost_tracking", "proxy/custom_pricing", "proxy/billing",],
},
{
type: "category",
label: "Budgets + Rate Limits",
items: ["proxy/users", "proxy/rate_limit_tiers", "proxy/team_budgets", "proxy/customers"],
},
{
type: "link",
Expand Down Expand Up @@ -135,9 +140,17 @@ const sidebars = {
"oidc"
]
},
{
type: "category",
label: "Create Custom Plugins",
description: "Modify requests, responses, and more",
items: [
"proxy/call_hooks",
"proxy/rules",
]
},
"proxy/caching",
"proxy/call_hooks",
"proxy/rules",

]
},
{
Expand Down
8 changes: 6 additions & 2 deletions litellm/integrations/prometheus.py
Original file line number Diff line number Diff line change
Expand Up @@ -633,8 +633,12 @@ def _set_virtual_key_rate_limit_metrics(
)
remaining_tokens_variable_name = f"litellm-key-remaining-tokens-{model_group}"

remaining_requests = metadata.get(remaining_requests_variable_name, sys.maxsize)
remaining_tokens = metadata.get(remaining_tokens_variable_name, sys.maxsize)
remaining_requests = (
metadata.get(remaining_requests_variable_name, sys.maxsize) or sys.maxsize
)
remaining_tokens = (
metadata.get(remaining_tokens_variable_name, sys.maxsize) or sys.maxsize
)

self.litellm_remaining_api_key_requests_for_model.labels(
user_api_key, user_api_key_alias, model_group
Expand Down
48 changes: 34 additions & 14 deletions litellm/proxy/_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from litellm.types.router import RouterErrors, UpdateRouterConfig
from litellm.types.utils import (
EmbeddingResponse,
GenericBudgetConfigType,
ImageResponse,
LiteLLMPydanticObjectBase,
ModelResponse,
Expand Down Expand Up @@ -614,37 +615,39 @@ class GenerateRequestBase(LiteLLMPydanticObjectBase):
rpm_limit: Optional[int] = None
budget_duration: Optional[str] = None
allowed_cache_controls: Optional[list] = []
soft_budget: Optional[float] = None
config: Optional[dict] = {}
permissions: Optional[dict] = {}
model_max_budget: Optional[dict] = (
{}
) # {"gpt-4": 5.0, "gpt-3.5-turbo": 5.0}, defaults to {}

model_config = ConfigDict(protected_namespaces=())
send_invite_email: Optional[bool] = None
model_rpm_limit: Optional[dict] = None
model_tpm_limit: Optional[dict] = None
guardrails: Optional[List[str]] = None
blocked: Optional[bool] = None
aliases: Optional[dict] = {}


class _GenerateKeyRequest(GenerateRequestBase):
class KeyRequestBase(GenerateRequestBase):
key: Optional[str] = None


class GenerateKeyRequest(_GenerateKeyRequest):
budget_id: Optional[str] = None
tags: Optional[List[str]] = None
enforced_params: Optional[List[str]] = None


class GenerateKeyResponse(_GenerateKeyRequest):
class GenerateKeyRequest(KeyRequestBase):
soft_budget: Optional[float] = None
send_invite_email: Optional[bool] = None


class GenerateKeyResponse(KeyRequestBase):
key: str # type: ignore
key_name: Optional[str] = None
expires: Optional[datetime]
user_id: Optional[str] = None
token_id: Optional[str] = None
litellm_budget_table: Optional[Any] = None

@model_validator(mode="before")
@classmethod
Expand All @@ -669,7 +672,7 @@ def set_model_info(cls, values):
return values


class UpdateKeyRequest(GenerateKeyRequest):
class UpdateKeyRequest(KeyRequestBase):
# Note: the defaults of all Params here MUST BE NONE
# else they will get overwritten
key: str # type: ignore
Expand Down Expand Up @@ -765,7 +768,7 @@ class DeleteUserRequest(LiteLLMPydanticObjectBase):
AllowedModelRegion = Literal["eu", "us"]


class BudgetNew(LiteLLMPydanticObjectBase):
class BudgetNewRequest(LiteLLMPydanticObjectBase):
budget_id: Optional[str] = Field(default=None, description="The unique budget id.")
max_budget: Optional[float] = Field(
default=None,
Expand All @@ -788,6 +791,10 @@ class BudgetNew(LiteLLMPydanticObjectBase):
default=None,
description="Max duration budget should be set for (e.g. '1hr', '1d', '28d')",
)
model_max_budget: Optional[GenericBudgetConfigType] = Field(
default=None,
description="Max budget for each model (e.g. {'gpt-4o': {'max_budget': '0.0000001', 'budget_duration': '1d', 'tpm_limit': 1000, 'rpm_limit': 1000}})",
)


class BudgetRequest(LiteLLMPydanticObjectBase):
Expand All @@ -805,11 +812,11 @@ class CustomerBase(LiteLLMPydanticObjectBase):
allowed_model_region: Optional[AllowedModelRegion] = None
default_model: Optional[str] = None
budget_id: Optional[str] = None
litellm_budget_table: Optional[BudgetNew] = None
litellm_budget_table: Optional[BudgetNewRequest] = None
blocked: bool = False


class NewCustomerRequest(BudgetNew):
class NewCustomerRequest(BudgetNewRequest):
"""
Create a new customer, allocate a budget to them
"""
Expand Down Expand Up @@ -1426,6 +1433,19 @@ class LiteLLM_VerificationTokenView(LiteLLM_VerificationToken):
# Time stamps
last_refreshed_at: Optional[float] = None # last time joint view was pulled from db

def __init__(self, **kwargs):
# Handle litellm_budget_table_* keys
for key, value in list(kwargs.items()):
if key.startswith("litellm_budget_table_") and value is not None:
# Extract the corresponding attribute name
attr_name = key.replace("litellm_budget_table_", "")
# Check if the value is None and set the corresponding attribute
if getattr(self, attr_name, None) is None:
kwargs[attr_name] = value

# Initialize the superclass
super().__init__(**kwargs)


class UserAPIKeyAuth(
LiteLLM_VerificationTokenView
Expand Down Expand Up @@ -2194,9 +2214,9 @@ class ProviderBudgetResponseObject(LiteLLMPydanticObjectBase):
Configuration for a single provider's budget settings
"""

budget_limit: float # Budget limit in USD for the time period
time_period: str # Time period for budget (e.g., '1d', '30d', '1mo')
spend: float = 0.0 # Current spend for this provider
budget_limit: Optional[float] # Budget limit in USD for the time period
time_period: Optional[str] # Time period for budget (e.g., '1d', '30d', '1mo')
spend: Optional[float] = 0.0 # Current spend for this provider
budget_reset_at: Optional[str] = None # When the current budget period resets


Expand Down
9 changes: 9 additions & 0 deletions litellm/proxy/auth/auth_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,12 @@ def get_key_model_rpm_limit(user_api_key_dict: UserAPIKeyAuth) -> Optional[dict]
if user_api_key_dict.metadata:
if "model_rpm_limit" in user_api_key_dict.metadata:
return user_api_key_dict.metadata["model_rpm_limit"]
elif user_api_key_dict.model_max_budget:
model_rpm_limit: Dict[str, Any] = {}
for model, budget in user_api_key_dict.model_max_budget.items():
if "rpm_limit" in budget and budget["rpm_limit"] is not None:
model_rpm_limit[model] = budget["rpm_limit"]
return model_rpm_limit

return None

Expand All @@ -426,6 +432,9 @@ def get_key_model_tpm_limit(user_api_key_dict: UserAPIKeyAuth) -> Optional[dict]
if user_api_key_dict.metadata:
if "model_tpm_limit" in user_api_key_dict.metadata:
return user_api_key_dict.metadata["model_tpm_limit"]
elif user_api_key_dict.model_max_budget:
if "tpm_limit" in user_api_key_dict.model_max_budget:
return user_api_key_dict.model_max_budget["tpm_limit"]

return None

Expand Down
Loading

0 comments on commit 539f166

Please sign in to comment.