You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from typing import Any, Dict, List, Literal, Optional, Union
import litellm
from litellm._logging import verbose_proxy_logger
from litellm.caching.caching import DualCache
from litellm.integrations.custom_guardrail import CustomGuardrail
from litellm.proxy._types import UserAPIKeyAuth
from litellm.proxy.guardrails.guardrail_helpers import should_proceed_based_on_metadata
from litellm.types.guardrails import GuardrailEventHooks
import json
class myCustomGuardrail(CustomGuardrail):
def __init__(
self,
**kwargs,
):
# store kwargs as optional_params
self.optional_params = kwargs
super().__init__(**kwargs)
async def async_post_call_success_hook(
self,
data: dict,
user_api_key_dict: UserAPIKeyAuth,
response,
):
"""
Runs on response from LLM API call
It can be used to reject a response
If a response contains invalid JSON -> we will raise an exception
"""
if isinstance(response, litellm.ModelResponse):
for choice in response.choices:
if isinstance(choice, litellm.Choices):
if isinstance(choice.message.content, str):
try:
json_content = json.loads(choice.message.content)
except json.JSONDecodeError as e:
raise ValueError(f"Invalid JSON in response content: {e}")
And the following custom rule:
import json
from litellm._logging import verbose_proxy_logger
def my_custom_rule(input): # receives the model response
try:
verbose_proxy_logger.debug("[TEST]: input %s", input)
json_content = json.loads(input)
return {"decision": True}
except json.JSONDecodeError as e:
return {
"decision": False,
"message": f"Invalid JSON in response content: {e}"
}
I am using LiteLLM as a proxy server with Docker Compose to handle both JSON and plain text responses. However, there are limitations in the retry and validation mechanisms:
Guardrail Failures Stop Retries:
If a guardrail raises an exception (e.g., ValueError due to invalid JSON), the request is stopped, and an error is returned. This prevents retries, even though subsequent requests to LLM might return valid responses.
Inflexibility in post_call_rules:
post_call_rules always runs if defined, which causes issues for requests that do not require JSON validation.
There is no way to:
Skip post_call_rules for specific requests.
OR Dynamically define post_call_rules in the request body.
Validate JSON Schema:
The current approach relies on JSON schema validation, which is unnecessary for my use case. I only need to verify that the response is valid JSON, not validate against a specific schema.
Proposed Solutions
One of the following solutions would address the issues:
Enable Retries on Guardrail Exceptions:
Add a mechanism to allow retries when a custom guardrail raises an exception (e.g., ValueError). This would enable subsequent requests to attempt to generate a valid response. It is possible to have a specific exception or a configuration parameter for the Guardrail that will allow further retrying.
Explicitly define post_call_rules to trigger validation only when specified in the CURL request body.
Lightweight JSON Validation:
Add a built-in validator to check if a response is valid JSON without requiring schema validation. CURL requests could include a specific parameter (e.g., "response_json_format_validator": true) to enable this validation. If the response is invalid, retries should follow the configuration rules.
Problem Solutions and Implementation Examples
Solution 1: Enable Retries on Guardrail Exceptions
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
"model": "openai-default-model",
"messages": [
{
"role": "user",
"content": "Give me the example of JSON object with a cat name and its age. Please respond in JSON. JSON:"
}
],
"guardrails": ["check-json-guard"],
}'
Example Request 1: JSON Expected, Custom Post Call Rule
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
"model": "openai-default-model",
"messages": [
{
"role": "user",
"content": "Give me the example of JSON object with a cat name and its age. Please respond in JSON. JSON:"
}
]
}'
Example Request 2: Plain Text, Skip Post Call Rule
We remove post_call_rules from litellm_settings and send them as a part of request body.
Example Request 1: JSON Expected, Custom Post Call Rule
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
"model": "openai-default-model",
"messages": [
{
"role": "user",
"content": "Give me the example of JSON object with a cat name and its age. Please respond in JSON. JSON:"
}
],
"post_call_rules": ["post_call_rules.my_custom_rule"]
}'
If validation fails - the retry policy from configuration and fallbacks, if they are provided, should work.
Example Request 1: JSON Response with Lightweight Validation
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
"model": "openai-default-model",
"messages": [
{
"role": "user",
"content": "Give me the example of JSON object with a cat name and its age. Please respond in JSON. JSON:"
}
],
"response_json_format_validator": true
}'
Example Request 2: Plain Text Response, No Validation
Can LiteLLM implement retries after guardrail exceptions?
Is it feasible to dynamically define or skip post_call_rules in the request body?
Could a lightweight JSON validator be added to validate structure without schemas?
Motivation, pitch
Motivation
The motivation for this proposal stems from practical challenges encountered while using LiteLLM as a proxy server to manage both JSON and plain text responses in a real-world application. Specifically, the existing retry and validation mechanisms have limitations that hinder flexibility and reliability in dynamic scenarios.
Key challenges include:
Inflexible Guardrail Behavior:
Currently, guardrail failures stop retries, which is suboptimal for use cases where transient issues (e.g., invalid JSON in responses) could be resolved with subsequent retries. This limits the effectiveness of LiteLLM in handling real-time, high-availability workflows.
Static post_call_rules Configuration:
The inability to dynamically enable or disable post_call_rules based on the request context introduces unnecessary overhead for requests that do not require JSON validation. For instance, plain text responses are subject to the same rules as JSON responses, leading to redundant or invalid processing.
Excessive Overhead in JSON Schema Validation:
JSON schema validation is overkill for use cases where only basic JSON structure validation is required. This increases complexity and processing time, detracting from LiteLLM's lightweight nature.
Pitch
The proposed solutions directly address these challenges by introducing three complementary features:
Retries on Guardrail Exceptions:
Allowing retries for requests even when a guardrail exception is raised ensures robustness in handling transient errors, improving reliability in production workflows.
Dynamic post_call_rules Handling:
Introducing request-level parameters to skip or define post_call_rules dynamically enables greater flexibility and efficiency. This ensures that requests are processed only with the rules they require, reducing unnecessary overhead.
Lightweight JSON Validation:
A basic JSON structure validator provides a more efficient alternative to full schema validation, catering to simpler use cases without compromising functionality.
By implementing these changes, LiteLLM can provide a more flexible and reliable proxy solution, better aligned with diverse use cases ranging from simple text processing to complex JSON handling.
This proposal directly addresses the needs of developers working on systems requiring dynamic response validation (e.g., conversational AI platforms, dynamic API integrations). It reduces operational friction and improves LiteLLM's utility in production environments.
If applicable, this proposal could tie into other GitHub issues related to retry behavior, validation mechanisms, or guardrail enhancements (please link them if available).
The Feature
Current configuration
I have implemented a custom guardrail:
And the following custom rule:
Here is the configuration:
Problem Statement
I am using LiteLLM as a proxy server with Docker Compose to handle both JSON and plain text responses. However, there are limitations in the retry and validation mechanisms:
Guardrail Failures Stop Retries:
ValueError
due to invalid JSON), the request is stopped, and an error is returned. This prevents retries, even though subsequent requests to LLM might return valid responses.Inflexibility in
post_call_rules
:post_call_rules
always runs if defined, which causes issues for requests that do not require JSON validation.post_call_rules
for specific requests.post_call_rules
in the request body.Validate JSON Schema:
Proposed Solutions
One of the following solutions would address the issues:
Enable Retries on Guardrail Exceptions:
ValueError
). This would enable subsequent requests to attempt to generate a valid response. It is possible to have a specific exception or a configuration parameter for the Guardrail that will allow further retrying.Dynamic Handling of
post_call_rules
:post_call_rules
entirely (e.g.,"skip_post_call_rules": true
).post_call_rules
to trigger validation only when specified in the CURL request body.Lightweight JSON Validation:
"response_json_format_validator": true
) to enable this validation. If the response is invalid, retries should follow the configuration rules.Problem Solutions and Implementation Examples
Solution 1: Enable Retries on Guardrail Exceptions
Example Request 1: Expects JSON Response
Example Request 2: Expects Plain Text Response
Solution 2.1: Allow skip
post_call_rules
Example Request 1: JSON Expected, Custom Post Call Rule
Example Request 2: Plain Text, Skip Post Call Rule
Solution 2.2: Dynamic Handling of
post_call_rules
We remove
post_call_rules
fromlitellm_settings
and send them as a part of request body.Example Request 1: JSON Expected, Custom Post Call Rule
Example Request 2: Plain Text, No Post Call Rule
Solution 3: Lightweight JSON Validation
If validation fails - the retry policy from configuration and fallbacks, if they are provided, should work.
Example Request 1: JSON Response with Lightweight Validation
Example Request 2: Plain Text Response, No Validation
Questions
post_call_rules
in the request body?Motivation, pitch
Motivation
The motivation for this proposal stems from practical challenges encountered while using LiteLLM as a proxy server to manage both JSON and plain text responses in a real-world application. Specifically, the existing retry and validation mechanisms have limitations that hinder flexibility and reliability in dynamic scenarios.
Key challenges include:
Inflexible Guardrail Behavior:
Static
post_call_rules
Configuration:post_call_rules
based on the request context introduces unnecessary overhead for requests that do not require JSON validation. For instance, plain text responses are subject to the same rules as JSON responses, leading to redundant or invalid processing.Excessive Overhead in JSON Schema Validation:
Pitch
The proposed solutions directly address these challenges by introducing three complementary features:
Retries on Guardrail Exceptions:
Dynamic
post_call_rules
Handling:post_call_rules
dynamically enables greater flexibility and efficiency. This ensures that requests are processed only with the rules they require, reducing unnecessary overhead.Lightweight JSON Validation:
By implementing these changes, LiteLLM can provide a more flexible and reliable proxy solution, better aligned with diverse use cases ranging from simple text processing to complex JSON handling.
This proposal directly addresses the needs of developers working on systems requiring dynamic response validation (e.g., conversational AI platforms, dynamic API integrations). It reduces operational friction and improves LiteLLM's utility in production environments.
If applicable, this proposal could tie into other GitHub issues related to retry behavior, validation mechanisms, or guardrail enhancements (please link them if available).
Are you a ML Ops Team?
No
Twitter / LinkedIn details
https://www.linkedin.com/in/alexandrphilippov/
The text was updated successfully, but these errors were encountered: