Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making Asynchronous API Requests - Colang 2.x in Firebase Environment #756

Open
grupocopa opened this issue Sep 16, 2024 · 9 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@grupocopa
Copy link

grupocopa commented Sep 16, 2024

Context

I am encountering an issue while testing the NeMo Guardrails Framework Colang 2.x on a platform that utilizes a Firebase database. I have full control over events and thread IDs. My goal is to perform API requests to interact with my models within NeMo, similar to the capabilities in Colang 1.0, allowing asynchronous requests from different users with their respective userIds and threadIds.

Steps to Reproduce

1. System Configuration:

Operating System: Windows 10
NeMo Guardrails Version: 0.9.0
Colang Version: 2.x
Firebase: Used for managing events and threads
Virtual Machine: Compute Engine on Google Cloud Platform (GCP) for handling asynchronous requests

2. Functional Colang 1.0 Setup:

Utilizes asynchronous API and HTTP requests to communicate with actions.

flows.co:

define flow main

    user express start_automation_forms1
    bot express disparo_mensagem

    when user express positive
        bot ask how many leads
        when user respond positive
            bot ask losing sales because can't follow up
            when user respond positive
                execute desativa_bot(threadid=$threadid)
            else when user respond negative
                execute desativa_bot(threadid=$threadid)

actions.py:

@action()
async def desativa_bot(threadid):
    url = "https://example.net/desativaBot/" + threadid 

    async with httpx.AsyncClient() as client:
        try:
            response = await client.put(url)
            response.raise_for_status()  # Raises exception for HTTP error codes

            try:
                data = response.json()
            except ValueError:
                return {"error": "Invalid JSON response from API"}

            return data

        except httpx.RequestError as exc:
            return {"error": f"An error occurred while requesting {exc.request.url!r}."}
        except httpx.HTTPStatusError as exc:
            return {"error": f"Error response {exc.response.status_code} while requesting {exc.request.url!r}."}

Postman POST HTTP REQUEST:

{
    "config_id": "agent-c2",
    "document_id": "teste-v8.10",
    "thread_id": "teste-v8.10",
    "messages": [
        {
            "content": {
                "instancia": "testeMegav1",
                "threadid": "testeMegav1-phoneNumber"
            },
            "role": "context"
        },
        {
            "content": "olá",
            "role": "user"
        }
    ]
}

3. Testing with Colang 2.x:

Initial tests using nemoguardrails chat CLI worked flawlessly.
When executing the same requests via Postman, an error occurs after the first interaction.

Entered verbose mode.
<google.cloud.firestore_v1.client.Client object at 0x0000025E63DA8670>
Fetching 5 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<?, ?it/s]
Processing event StartFlow {'flow_id': 'main'}
Processing event {'type': 'ContextUpdate', 'data': {'instancia': 'testeMegav1', 'threadid': 'testeMegav1-phoneNumber'}}
Processing event {'type': 'UtteranceUserActionFinished', 'final_transcript': 'olá'}
UtteranceBotActionFinished
{"action_uid": "d6c4...", "action_name": "UtteranceBotAction", "status": "success", "is_success": true, "return_value": null, "events": [], "final_script": "Oi, tudo bem com voc\u00ea? Voc\u00ea usa automa\u00e7\u00e3o no WhatsApp?"}
Total processing took 0.15 seconds. LLM Stats: 0 total calls, 0 total time, 0 total tokens, 0 total prompt tokens, 0 total completion tokens, [] as latencies
INFO:     127.0.0.1:56319 - "POST /v1/chat/completions HTTP/1.1" 200 OK
ERROR:nemoguardrails.server.api:Providing `assistant` messages as input is not supported for Colang 2.0 configurations.
Traceback (most recent call last):
  File "C:\Users\Caio Moreno\Desktop\nemo-v3-production\venv\lib\site-packages\nemoguardrails\server\api.py", line 337, in chat_completion
    res = await llm_rails.generate_async(
  File "C:\Users\Caio Moreno\Desktop\nemo-v3-production\venv\lib\site-packages\nemoguardrails\rails\llm\llmrails.py", line 629, in generate_async
    events = self._get_events_for_messages(messages, state)
  File "C:\Users\Caio Moreno\Desktop\nemo-v3-production\venv\lib\site-packages\nemoguardrails\rails\llm\llmrails.py", line 490, in _get_events_for_messages       
    raise ValueError(
ValueError: Providing `assistant` messages as input is not supported for Colang 2.0 configurations.

Questions:

  1. Is this issue still considered a bug in Colang 2.x?
  2. Is there a way to perform asynchronous API requests in Colang 2.x similar to Colang 1.0, allowing multiple users with their respective userIds and threadIds?

Are there any additional configurations required to support asynchronous requests in the Colang 2.x environment?

Thank you very much for all support, and this incredible tool that is helping us to scale our solutions :)

@Pouyanpi
Copy link
Collaborator

Hi @grupocopa

The error "Providing assistant messages as input is not supported for Colang 2.0 configurations.". So the issue is using thread_id (i.e., DataStore) for Colang 2.x. which is not supported as of now. Try it out without thread id and it should work.

@Pouyanpi Pouyanpi self-assigned this Sep 17, 2024
@Pouyanpi Pouyanpi added the enhancement New feature or request label Sep 17, 2024
@grupocopa
Copy link
Author

Hey @Pouyanpi , thank you for your prompt response!

I have a question: Without a threadId, how can different users maintain the conversation flow?

In my case, I have a platform with numerous users, and I use asynchronous requests to deliver responses to them. How can I ensure that each user's conversation remains coherent, organized and the Colang flow state without relying on a threadId?

Thank you for your assistance!

@grupocopa
Copy link
Author

Just a contribution here: I am trying to do the same thing with Colang 1.0 where we can have multiples users with asynchronous requests, but it's hard to maintain the flow control and the consistency of Bot and User intents for each response.

@Pouyanpi, do you have any hints, directions or opinions to solve my case? Or just Colang 2.x can fix that problems?

Thank you very much!!

@Pouyanpi
Copy link
Collaborator

Hey @Pouyanpi , thank you for your prompt response!

I have a question: Without a threadId, how can different users maintain the conversation flow?

In my case, I have a platform with numerous users, and I use asynchronous requests to deliver responses to them. How can I ensure that each user's conversation remains coherent, organized and the Colang flow state without relying on a threadId?

Thank you for your assistance!

Hi @grupocopa, apologies for my delayed reply. As you need bot responses (i.e., message with role of "assistant") it is not possible to use thread_id with Colang 2.x configuration. I tagged this issue as enhancement and will probably open a PR which might resolve this issue.

@Pouyanpi
Copy link
Collaborator

it's hard to maintain the flow control and the consistency of Bot and User intents for each response.

would you explain the issue you are facing a bit more?

@grupocopa
Copy link
Author

Hey @Pouyanpi, how are you? Apologies for the delayed response. Our platform has been growing rapidly, with more and more clients joining, and we're working to integrate NeMo Guardrails into our system.

Regarding your previous inquiries, there are quite a few points to address, and I'm not sure if it's appropriate to go over all of them in this thread. But here we go:

1 - Resetting threadid when there is a bug or after server restart
Whenever the server restarts or when a bug occurs with the threadid, we need to manually delete all the thread IDs for the system to work again. This results in a repetitive manual task, and more importantly, we lose all user state information, which impacts the user experience and continuity of conversations.

2 - Bot occasionally stops responding to users and occasional failure of requests
Sometimes the bot suddenly stops responding to users without any clear reason. The only way we've found to fix this is by resetting the threadid. This might indicate an issue with session management or how the agent handles active threads. Some requests do not work as expected. Unfortunately, we don't get any clear error messages to help us debug the problem.

3 - Events to control the flow and tags to track user behavior, assigning "status" to users to controlling flows
We would like more robust control over conversation flow using events and tags to better understand how users are behaving throughout their customer journey. We believe the Event Channel feature introduced in Colang 2 addresses this problem effectively, but we need more guidance on fully implementing it on Colang 1.

We would also like the ability to assign a “status” to users, allowing us to create different flows based on their previous actions or history. This would greatly enhance the personalization of our interactions. We believe this can also be solved using the Event Channel feature.

4 - Accessing multiple KBs in the same agent via asynchronous requests
In our use case, we need the ability for the agent to access different knowledge bases (KBs), such as with a medical agent that provides user-specific information. We would appreciate a solution that allows us to handle asynchronous requests to multiple KBs within the same agent.

On this link at documentation, on point 3 saying "Using a custom EmbeddingSearchProvider." maybe it's the solution to have multiple knowledge bases inputed by an external database, but I don't find clear examples to implement it.

5 - Collecting sensitive data without sending it to the LLM
We need a way to collect sensitive user data without sending it to the LLM. Are there any recommended configurations to achieve this securely within Guardrails?

We are open to suggestions and would greatly appreciate any feedback or improvements you can offer to help us address these challenges. Again: thank you very much for all support and assistant!

@grupocopa
Copy link
Author

Hello @Pouyanpi, how are you going? Hope is very well!!

Is there any update on the release date for the asynchronous API request functionality in our systems? We currently have around 200 users eagerly awaiting this update. 😁

As part of the NVIDIA Inception program, we recently attended an in-person event to build more connections within the NeMo ecosystem and gather support.

Thank you very much for your attention!

@Pouyanpi
Copy link
Collaborator

Hi @grupocopa it might be available in v0.12.0 release. Unfortunately it won't be part of 0.11.0 which will be released soon. I'll keep you posted.

@grupocopa
Copy link
Author

Hey @Pouyanpi, thank you!! We're growing fast our customer base and healthcare market, getting next to 600~ users, and Colang 2.x has features that we need to have more control on rails with our customers. Thank you very much for your attention!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants