[Feature]: Add Retry Logic for Guardrails or Allow Skip Post Call Rules or Add Response Format Validator with Type JSON #7320

aleksandrphilippov · 2024-12-20T01:09:05Z

The Feature

Current configuration

I have implemented a custom guardrail:

from typing import Any, Dict, List, Literal, Optional, Union

import litellm
from litellm._logging import verbose_proxy_logger
from litellm.caching.caching import DualCache
from litellm.integrations.custom_guardrail import CustomGuardrail
from litellm.proxy._types import UserAPIKeyAuth
from litellm.proxy.guardrails.guardrail_helpers import should_proceed_based_on_metadata
from litellm.types.guardrails import GuardrailEventHooks
import json

class myCustomGuardrail(CustomGuardrail):
    def __init__(
        self,
        **kwargs,
    ):
        # store kwargs as optional_params
        self.optional_params = kwargs

        super().__init__(**kwargs)

    async def async_post_call_success_hook(
            self,
            data: dict,
            user_api_key_dict: UserAPIKeyAuth,
            response,
        ):
            """
            Runs on response from LLM API call

            It can be used to reject a response

            If a response contains invalid JSON -> we will raise an exception
            """
            if isinstance(response, litellm.ModelResponse):
                for choice in response.choices:
                    if isinstance(choice, litellm.Choices):
                        if isinstance(choice.message.content, str):
                            try:
                                json_content = json.loads(choice.message.content)
                            except json.JSONDecodeError as e:
                                raise ValueError(f"Invalid JSON in response content: {e}")

And the following custom rule:

import json
from litellm._logging import verbose_proxy_logger

def my_custom_rule(input): # receives the model response
    try:
        verbose_proxy_logger.debug("[TEST]: input %s", input)
        json_content = json.loads(input)
        return {"decision": True}
    except json.JSONDecodeError as e:
        return {
            "decision": False,
            "message": f"Invalid JSON in response content: {e}"
        }

Here is the configuration:

model_list:
  - model_name: openai-default-model
    litellm_params:
      model: gpt-4o
      api_key: os.environ/OPENAI_API_KEY

guardrails:
  - guardrail_name: "check-json-guard"
    litellm_params:
      guardrail: custom_guardrail.myCustomGuardrail
      mode: "post_call"

litellm_settings:
  json_logs: true
  num_retries: 3
  retry_after: 1
  request_timeout: 600
  disable_cooldowns: true
  failure_callback: ["sentry"]
  redact_user_api_key_info: true
  post_call_rules: post_call_rules.my_custom_rule

Problem Statement

I am using LiteLLM as a proxy server with Docker Compose to handle both JSON and plain text responses. However, there are limitations in the retry and validation mechanisms:

Guardrail Failures Stop Retries:
- If a guardrail raises an exception (e.g., ValueError due to invalid JSON), the request is stopped, and an error is returned. This prevents retries, even though subsequent requests to LLM might return valid responses.
Inflexibility in post_call_rules:
- post_call_rules always runs if defined, which causes issues for requests that do not require JSON validation.
- There is no way to:
  - Skip post_call_rules for specific requests.
  - OR Dynamically define post_call_rules in the request body.
Validate JSON Schema:
- The current approach relies on JSON schema validation, which is unnecessary for my use case. I only need to verify that the response is valid JSON, not validate against a specific schema.

Proposed Solutions

One of the following solutions would address the issues:

Enable Retries on Guardrail Exceptions:
- Add a mechanism to allow retries when a custom guardrail raises an exception (e.g., ValueError). This would enable subsequent requests to attempt to generate a valid response. It is possible to have a specific exception or a configuration parameter for the Guardrail that will allow further retrying.
Dynamic Handling of post_call_rules:
- Introduce a parameter in the request body to:
  - Skip post_call_rules entirely (e.g., "skip_post_call_rules": true).
  - Explicitly define post_call_rules to trigger validation only when specified in the CURL request body.
Lightweight JSON Validation:
- Add a built-in validator to check if a response is valid JSON without requiring schema validation. CURL requests could include a specific parameter (e.g., "response_json_format_validator": true) to enable this validation. If the response is invalid, retries should follow the configuration rules.

Problem Solutions and Implementation Examples

Solution 1: Enable Retries on Guardrail Exceptions

guardrails:
  - guardrail_name: "check-json-guard"
    litellm_params:
      guardrail: custom_guardrail.myCustomGuardrail
      mode: "post_call"
      allow_retries_on_guardrail_failure: true

litellm_settings:
  num_retries: 3
  retry_after: 1

Example Request 1: Expects JSON Response

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "Give me the example of JSON object with a cat name and its age. Please respond in JSON. JSON:"
    }
  ],
  "guardrails": ["check-json-guard"],
}'

Example Request 2: Expects Plain Text Response

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "How are you?"
    }
  ]
}'

Solution 2.1: Allow skip `post_call_rules`

Example Request 1: JSON Expected, Custom Post Call Rule

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "Give me the example of JSON object with a cat name and its age. Please respond in JSON. JSON:"
    }
  ]
}'

Example Request 2: Plain Text, Skip Post Call Rule

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "How are you?"
    }
  ],
  "skip_post_call_rules": true
}'

Solution 2.2: Dynamic Handling of `post_call_rules`

We remove post_call_rules from litellm_settings and send them as a part of request body.

Example Request 1: JSON Expected, Custom Post Call Rule

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "Give me the example of JSON object with a cat name and its age. Please respond in JSON. JSON:"
    }
  ],
  "post_call_rules": ["post_call_rules.my_custom_rule"]
}'

Example Request 2: Plain Text, No Post Call Rule

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "How are you?"
    }
  ]
}'

Solution 3: Lightweight JSON Validation

If validation fails - the retry policy from configuration and fallbacks, if they are provided, should work.

Example Request 1: JSON Response with Lightweight Validation

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "Give me the example of JSON object with a cat name and its age. Please respond in JSON. JSON:"
    }
  ],
  "response_json_format_validator": true
}'

Example Request 2: Plain Text Response, No Validation

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "How are you?"
    }
  ]
}'

Questions

Can LiteLLM implement retries after guardrail exceptions?
Is it feasible to dynamically define or skip post_call_rules in the request body?
Could a lightweight JSON validator be added to validate structure without schemas?

Motivation, pitch

Motivation

The motivation for this proposal stems from practical challenges encountered while using LiteLLM as a proxy server to manage both JSON and plain text responses in a real-world application. Specifically, the existing retry and validation mechanisms have limitations that hinder flexibility and reliability in dynamic scenarios.

Key challenges include:

Inflexible Guardrail Behavior:
- Currently, guardrail failures stop retries, which is suboptimal for use cases where transient issues (e.g., invalid JSON in responses) could be resolved with subsequent retries. This limits the effectiveness of LiteLLM in handling real-time, high-availability workflows.
Static post_call_rules Configuration:
- The inability to dynamically enable or disable post_call_rules based on the request context introduces unnecessary overhead for requests that do not require JSON validation. For instance, plain text responses are subject to the same rules as JSON responses, leading to redundant or invalid processing.
Excessive Overhead in JSON Schema Validation:
- JSON schema validation is overkill for use cases where only basic JSON structure validation is required. This increases complexity and processing time, detracting from LiteLLM's lightweight nature.

Pitch

The proposed solutions directly address these challenges by introducing three complementary features:

Retries on Guardrail Exceptions:
- Allowing retries for requests even when a guardrail exception is raised ensures robustness in handling transient errors, improving reliability in production workflows.
Dynamic post_call_rules Handling:
- Introducing request-level parameters to skip or define post_call_rules dynamically enables greater flexibility and efficiency. This ensures that requests are processed only with the rules they require, reducing unnecessary overhead.
Lightweight JSON Validation:
- A basic JSON structure validator provides a more efficient alternative to full schema validation, catering to simpler use cases without compromising functionality.

By implementing these changes, LiteLLM can provide a more flexible and reliable proxy solution, better aligned with diverse use cases ranging from simple text processing to complex JSON handling.

This proposal directly addresses the needs of developers working on systems requiring dynamic response validation (e.g., conversational AI platforms, dynamic API integrations). It reduces operational friction and improves LiteLLM's utility in production environments.

If applicable, this proposal could tie into other GitHub issues related to retry behavior, validation mechanisms, or guardrail enhancements (please link them if available).

Are you a ML Ops Team?

No

Twitter / LinkedIn details

https://www.linkedin.com/in/alexandrphilippov/

The text was updated successfully, but these errors were encountered:

aleksandrphilippov · 2024-12-20T01:11:51Z

@krrishdholakia Could you please check it? Feel free to ask any questions about it.

aleksandrphilippov added the enhancement New feature or request label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add Retry Logic for Guardrails or Allow Skip Post Call Rules or Add Response Format Validator with Type JSON #7320

[Feature]: Add Retry Logic for Guardrails or Allow Skip Post Call Rules or Add Response Format Validator with Type JSON #7320

aleksandrphilippov commented Dec 20, 2024

aleksandrphilippov commented Dec 20, 2024

[Feature]: Add Retry Logic for Guardrails or Allow Skip Post Call Rules or Add Response Format Validator with Type JSON #7320

[Feature]: Add Retry Logic for Guardrails or Allow Skip Post Call Rules or Add Response Format Validator with Type JSON #7320

Comments

aleksandrphilippov commented Dec 20, 2024

The Feature

Current configuration

Problem Statement

Proposed Solutions

Problem Solutions and Implementation Examples

Solution 1: Enable Retries on Guardrail Exceptions

Example Request 1: Expects JSON Response

Example Request 2: Expects Plain Text Response

Solution 2.1: Allow skip post_call_rules

Example Request 1: JSON Expected, Custom Post Call Rule

Example Request 2: Plain Text, Skip Post Call Rule

Solution 2.2: Dynamic Handling of post_call_rules

Example Request 1: JSON Expected, Custom Post Call Rule

Example Request 2: Plain Text, No Post Call Rule

Solution 3: Lightweight JSON Validation

Example Request 1: JSON Response with Lightweight Validation

Example Request 2: Plain Text Response, No Validation

Questions

Motivation, pitch

Motivation

Pitch

Are you a ML Ops Team?

Twitter / LinkedIn details

aleksandrphilippov commented Dec 20, 2024

Solution 2.1: Allow skip `post_call_rules`

Solution 2.2: Dynamic Handling of `post_call_rules`