Integration with Databricks serving endpoint #902

azuretime · 2024-12-12T01:45:32Z

Did you check the docs?

I have read all the NeMo-Guardrails docs

Is your feature request related to a problem? Please describe.

Hi, there is no detail on how to use Nemo-Guardrails with Databricks serving endpoint in the docs. What should be the correct way to use the endpoints with a Databricks access token?

Describe the solution you'd like

I have an existing RAG chain and want to use Nemo-Guardrails to filter input. Models in my notebook are loaded with Databricks(endpoint_name="endpoint_name",max_tokens=,temperature=) and ChatDatabricks(endpoint_name="endpoint_name",max_tokens=,temperature=) ChatDatabricks . The models I use include llama3 and databricks dbrx serving endpoints.

from langchain_community.chat_models import ChatDatabricks
from langchain_community.llms import Databricks

I know it's possible to use RunnableRails or "guardrails | some_chain" (LangChain Integration) but I want the self check input step to be done before the retrieval step inside the chain. That means if self check input decides the request should be blocked (Yes), the chain should reply with a default answer without retrieving any context. So how can I load the LLM from "endpoint_name" to check input inside the chain?

rails:
  input:
    flows:
      - self check input

Using the above rail, if the request passes the check, two tasks are performed:

Summary: 2 LLM call(s) took 5.13 seconds .

Task self_check_input took 2.46 seconds .
Task general took 2.67 seconds .

How can I get the intermediate result from task 1 self_check_input and remove the Task general, so that I can decide when to return my default answer (when block) or start retrieving context (No block)?

Describe alternatives you've considered

Also, is something like the method below possible?
In config.yml:

models:
  - type: main
    engine: "databricks_endpoint"
    model: "meta-llama-3-8b-instruct"
    headers:
          Authorization: "Bearer <your-access-token>"
    parameters:
      endpoint_url: https://..../serving-endpoints/meta-llama-3-8b-instruct/invocations
      task: "chat"
      model_kwargs:
        temperature: 0.1
        max_length: 500

Additional context

No response

The text was updated successfully, but these errors were encountered:

azuretime added enhancement New feature or request status: needs triage New issues that have not yet been reviewed or categorized. labels Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration with Databricks serving endpoint #902

Integration with Databricks serving endpoint #902

azuretime commented Dec 12, 2024 •

edited

Loading

Integration with Databricks serving endpoint #902

Integration with Databricks serving endpoint #902

Comments

azuretime commented Dec 12, 2024 • edited Loading

Did you check the docs?

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

azuretime commented Dec 12, 2024 •

edited

Loading