Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with Databricks serving endpoint #902

Open
1 task done
azuretime opened this issue Dec 12, 2024 · 0 comments
Open
1 task done

Integration with Databricks serving endpoint #902

azuretime opened this issue Dec 12, 2024 · 0 comments
Labels
enhancement New feature or request status: needs triage New issues that have not yet been reviewed or categorized.

Comments

@azuretime
Copy link

azuretime commented Dec 12, 2024

Did you check the docs?

  • I have read all the NeMo-Guardrails docs

Is your feature request related to a problem? Please describe.

Hi, there is no detail on how to use Nemo-Guardrails with Databricks serving endpoint in the docs. What should be the correct way to use the endpoints with a Databricks access token?

Describe the solution you'd like

I have an existing RAG chain and want to use Nemo-Guardrails to filter input. Models in my notebook are loaded with Databricks(endpoint_name="endpoint_name",max_tokens=,temperature=) and ChatDatabricks(endpoint_name="endpoint_name",max_tokens=,temperature=) ChatDatabricks . The models I use include llama3 and databricks dbrx serving endpoints.

from langchain_community.chat_models import ChatDatabricks
from langchain_community.llms import Databricks

I know it's possible to use RunnableRails or "guardrails | some_chain" (LangChain Integration) but I want the self check input step to be done before the retrieval step inside the chain. That means if self check input decides the request should be blocked (Yes), the chain should reply with a default answer without retrieving any context. So how can I load the LLM from "endpoint_name" to check input inside the chain?

rails:
  input:
    flows:
      - self check input

Using the above rail, if the request passes the check, two tasks are performed:

Summary: 2 LLM call(s) took 5.13 seconds .

  1. Task self_check_input took 2.46 seconds .
  2. Task general took 2.67 seconds .

How can I get the intermediate result from task 1 self_check_input and remove the Task general, so that I can decide when to return my default answer (when block) or start retrieving context (No block)?

Describe alternatives you've considered

Also, is something like the method below possible?
In config.yml:

models:
  - type: main
    engine: "databricks_endpoint"
    model: "meta-llama-3-8b-instruct"
    headers:
          Authorization: "Bearer <your-access-token>"
    parameters:
      endpoint_url: https://..../serving-endpoints/meta-llama-3-8b-instruct/invocations
      task: "chat"
      model_kwargs:
        temperature: 0.1
        max_length: 500

Additional context

No response

@azuretime azuretime added enhancement New feature or request status: needs triage New issues that have not yet been reviewed or categorized. labels Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request status: needs triage New issues that have not yet been reviewed or categorized.
Projects
None yet
Development

No branches or pull requests

1 participant