Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add Retry Logic for Guardrails or Allow Skip Post Call Rules or Add Response Format Validator with Type JSON #7320

Open
aleksandrphilippov opened this issue Dec 20, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@aleksandrphilippov
Copy link

The Feature

Current configuration

I have implemented a custom guardrail:

from typing import Any, Dict, List, Literal, Optional, Union

import litellm
from litellm._logging import verbose_proxy_logger
from litellm.caching.caching import DualCache
from litellm.integrations.custom_guardrail import CustomGuardrail
from litellm.proxy._types import UserAPIKeyAuth
from litellm.proxy.guardrails.guardrail_helpers import should_proceed_based_on_metadata
from litellm.types.guardrails import GuardrailEventHooks
import json

class myCustomGuardrail(CustomGuardrail):
    def __init__(
        self,
        **kwargs,
    ):
        # store kwargs as optional_params
        self.optional_params = kwargs

        super().__init__(**kwargs)

    async def async_post_call_success_hook(
            self,
            data: dict,
            user_api_key_dict: UserAPIKeyAuth,
            response,
        ):
            """
            Runs on response from LLM API call

            It can be used to reject a response

            If a response contains invalid JSON -> we will raise an exception
            """
            if isinstance(response, litellm.ModelResponse):
                for choice in response.choices:
                    if isinstance(choice, litellm.Choices):
                        if isinstance(choice.message.content, str):
                            try:
                                json_content = json.loads(choice.message.content)
                            except json.JSONDecodeError as e:
                                raise ValueError(f"Invalid JSON in response content: {e}")

And the following custom rule:

import json
from litellm._logging import verbose_proxy_logger

def my_custom_rule(input): # receives the model response
    try:
        verbose_proxy_logger.debug("[TEST]: input %s", input)
        json_content = json.loads(input)
        return {"decision": True}
    except json.JSONDecodeError as e:
        return {
            "decision": False,
            "message": f"Invalid JSON in response content: {e}"
        }

Here is the configuration:

model_list:
  - model_name: openai-default-model
    litellm_params:
      model: gpt-4o
      api_key: os.environ/OPENAI_API_KEY

guardrails:
  - guardrail_name: "check-json-guard"
    litellm_params:
      guardrail: custom_guardrail.myCustomGuardrail
      mode: "post_call"

litellm_settings:
  json_logs: true
  num_retries: 3
  retry_after: 1
  request_timeout: 600
  disable_cooldowns: true
  failure_callback: ["sentry"]
  redact_user_api_key_info: true
  post_call_rules: post_call_rules.my_custom_rule

Problem Statement

I am using LiteLLM as a proxy server with Docker Compose to handle both JSON and plain text responses. However, there are limitations in the retry and validation mechanisms:

  1. Guardrail Failures Stop Retries:

    • If a guardrail raises an exception (e.g., ValueError due to invalid JSON), the request is stopped, and an error is returned. This prevents retries, even though subsequent requests to LLM might return valid responses.
  2. Inflexibility in post_call_rules:

    • post_call_rules always runs if defined, which causes issues for requests that do not require JSON validation.
    • There is no way to:
      • Skip post_call_rules for specific requests.
      • OR Dynamically define post_call_rules in the request body.
  3. Validate JSON Schema:

    • The current approach relies on JSON schema validation, which is unnecessary for my use case. I only need to verify that the response is valid JSON, not validate against a specific schema.

Proposed Solutions

One of the following solutions would address the issues:

  1. Enable Retries on Guardrail Exceptions:

    • Add a mechanism to allow retries when a custom guardrail raises an exception (e.g., ValueError). This would enable subsequent requests to attempt to generate a valid response. It is possible to have a specific exception or a configuration parameter for the Guardrail that will allow further retrying.
  2. Dynamic Handling of post_call_rules:

    • Introduce a parameter in the request body to:
      • Skip post_call_rules entirely (e.g., "skip_post_call_rules": true).
      • Explicitly define post_call_rules to trigger validation only when specified in the CURL request body.
  3. Lightweight JSON Validation:

    • Add a built-in validator to check if a response is valid JSON without requiring schema validation. CURL requests could include a specific parameter (e.g., "response_json_format_validator": true) to enable this validation. If the response is invalid, retries should follow the configuration rules.

Problem Solutions and Implementation Examples

Solution 1: Enable Retries on Guardrail Exceptions

guardrails:
  - guardrail_name: "check-json-guard"
    litellm_params:
      guardrail: custom_guardrail.myCustomGuardrail
      mode: "post_call"
      allow_retries_on_guardrail_failure: true

litellm_settings:
  num_retries: 3
  retry_after: 1

Example Request 1: Expects JSON Response

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "Give me the example of JSON object with a cat name and its age. Please respond in JSON. JSON:"
    }
  ],
  "guardrails": ["check-json-guard"],
}'

Example Request 2: Expects Plain Text Response

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "How are you?"
    }
  ]
}'

Solution 2.1: Allow skip post_call_rules

Example Request 1: JSON Expected, Custom Post Call Rule

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "Give me the example of JSON object with a cat name and its age. Please respond in JSON. JSON:"
    }
  ]
}'

Example Request 2: Plain Text, Skip Post Call Rule

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "How are you?"
    }
  ],
  "skip_post_call_rules": true
}'

Solution 2.2: Dynamic Handling of post_call_rules

We remove post_call_rules from litellm_settings and send them as a part of request body.

Example Request 1: JSON Expected, Custom Post Call Rule

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "Give me the example of JSON object with a cat name and its age. Please respond in JSON. JSON:"
    }
  ],
  "post_call_rules": ["post_call_rules.my_custom_rule"]
}'

Example Request 2: Plain Text, No Post Call Rule

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "How are you?"
    }
  ]
}'

Solution 3: Lightweight JSON Validation

If validation fails - the retry policy from configuration and fallbacks, if they are provided, should work.

Example Request 1: JSON Response with Lightweight Validation

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "Give me the example of JSON object with a cat name and its age. Please respond in JSON. JSON:"
    }
  ],
  "response_json_format_validator": true
}'

Example Request 2: Plain Text Response, No Validation

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-12345' \
--data '{
  "model": "openai-default-model",
  "messages": [
    {
      "role": "user",
      "content": "How are you?"
    }
  ]
}'

Questions

  • Can LiteLLM implement retries after guardrail exceptions?
  • Is it feasible to dynamically define or skip post_call_rules in the request body?
  • Could a lightweight JSON validator be added to validate structure without schemas?

Motivation, pitch

Motivation

The motivation for this proposal stems from practical challenges encountered while using LiteLLM as a proxy server to manage both JSON and plain text responses in a real-world application. Specifically, the existing retry and validation mechanisms have limitations that hinder flexibility and reliability in dynamic scenarios.

Key challenges include:

  1. Inflexible Guardrail Behavior:

    • Currently, guardrail failures stop retries, which is suboptimal for use cases where transient issues (e.g., invalid JSON in responses) could be resolved with subsequent retries. This limits the effectiveness of LiteLLM in handling real-time, high-availability workflows.
  2. Static post_call_rules Configuration:

    • The inability to dynamically enable or disable post_call_rules based on the request context introduces unnecessary overhead for requests that do not require JSON validation. For instance, plain text responses are subject to the same rules as JSON responses, leading to redundant or invalid processing.
  3. Excessive Overhead in JSON Schema Validation:

    • JSON schema validation is overkill for use cases where only basic JSON structure validation is required. This increases complexity and processing time, detracting from LiteLLM's lightweight nature.

Pitch

The proposed solutions directly address these challenges by introducing three complementary features:

  1. Retries on Guardrail Exceptions:

    • Allowing retries for requests even when a guardrail exception is raised ensures robustness in handling transient errors, improving reliability in production workflows.
  2. Dynamic post_call_rules Handling:

    • Introducing request-level parameters to skip or define post_call_rules dynamically enables greater flexibility and efficiency. This ensures that requests are processed only with the rules they require, reducing unnecessary overhead.
  3. Lightweight JSON Validation:

    • A basic JSON structure validator provides a more efficient alternative to full schema validation, catering to simpler use cases without compromising functionality.

By implementing these changes, LiteLLM can provide a more flexible and reliable proxy solution, better aligned with diverse use cases ranging from simple text processing to complex JSON handling.

This proposal directly addresses the needs of developers working on systems requiring dynamic response validation (e.g., conversational AI platforms, dynamic API integrations). It reduces operational friction and improves LiteLLM's utility in production environments.

If applicable, this proposal could tie into other GitHub issues related to retry behavior, validation mechanisms, or guardrail enhancements (please link them if available).

Are you a ML Ops Team?

No

Twitter / LinkedIn details

https://www.linkedin.com/in/alexandrphilippov/

@aleksandrphilippov aleksandrphilippov added the enhancement New feature or request label Dec 20, 2024
@aleksandrphilippov
Copy link
Author

@krrishdholakia Could you please check it? Feel free to ask any questions about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant