bug: Rails do not produce fast-checking results with gpt-4o but only with gpt-3.5-turbo-instruct #899

tb852 · 2024-12-06T14:03:34Z

Did you check docs and existing issues?

I have read all the NeMo-Guardrails docs
I have updated the package to the latest version before submitting this issue
(optional) I have used the develop branch
I have searched the existing issues of NeMo-Guardrails

Python version (python --version)

3.11.9

Operating system/version

Ubuntu 22.04

NeMo-Guardrails version (if you must use a specific version and not the latest

0.11.0

Describe the bug

NeMo guardrails don't work as they should with the most vanilla case for fact checking.
For instance when I provide the context to the model and I want to fact-check it, it doesn't answer when using gpt-4o model, however, the very same code works when using gpt-3.5-turbo-instruct. Does anyone else has those problems and when it the gpt-4o model predicted to be working for fact checking?

Output to the code below if I use gpt-4o model:
I'm sorry, I can't respond to that.

However, if I change only one line of code in config.yml:
model: gpt-3.5-turbo-instruct
The output is: `Employees are eligible for 20 vacation days per year, accrued monthly.

Steps To Reproduce

python script:

import nest_asyncio
from nemoguardrails import RailsConfig, LLMRails
import os
import openai
nest_asyncio.apply()
path="my_path"
os.chdir(path)
nemo_config = RailsConfig.from_path("my/path/to/config")
os.environ["OPENAI_API_KEY"] = my_api_key

rails = LLMRails(nemo_config)

response = rails.generate(messages=[{
    "role": "context",
    "content": {
        "relevant_chunks": """
            Employees are eligible for the following time off:
              * Vacation: 20 days per year, accrued monthly.
              * Sick leave: 15 days per year, accrued monthly.
              * Personal days: 5 days per year, accrued monthly.
              * Paid holidays: New Year's Day, Memorial Day, Independence Day, Thanksgiving Day, Christmas Day.
              * Bereavement leave: 3 days paid leave for immediate family members, 1 day for non-immediate family members. """
    }
},{
    "role": "user",
    "content": "How many vacation days do I have per year?"
}])
print(response['content'])

config.yml:

models:
 - type: main
   engine: openai
   model: gpt-4o

rails:
  output:
    flows:
      - self check facts

instructions:
  - type: general
    content: |
      Below is a conversation between a user and a bot.
      The bot is designed to answer users questions about their vacation information based on the provided content.
      If the answer to the question is not present in the content, the bot should respond with "I don't have enough
      information to answer this question."

prompts.yml:

prompts:
  - task: self_check_facts
    content: |-
      You are given a task to identify if the hypothesis is grounded and entailed to the evidence.
      You will only use the contents of the evidence and not rely on external knowledge.
      Answer with yes/no. "evidence": {{ evidence }} "hypothesis": {{ response }} "entails":`

rails.co:

define subflow self check facts
  $accuracy = execute self_check_facts
  if $accuracy < 0.5
    bot refuse to respond
    stop

Expected Behavior

The expected behaviour should be an answer similar to this one (like in case when I use gpt-3.5-turbo-instruct):
Employees are eligible for 20 vacation days per year, accrued monthly.

Actual Behavior

When I use gpt-4o, the output is: I'm sorry, I can't respond to that.

The text was updated successfully, but these errors were encountered:

tb852 added bug Something isn't working status: needs triage New issues that have not yet been reviewed or categorized. labels Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Rails do not produce fast-checking results with gpt-4o but only with gpt-3.5-turbo-instruct #899

bug: Rails do not produce fast-checking results with gpt-4o but only with gpt-3.5-turbo-instruct #899

tb852 commented Dec 6, 2024 •

edited

Loading

bug: Rails do not produce fast-checking results with gpt-4o but only with gpt-3.5-turbo-instruct #899

bug: Rails do not produce fast-checking results with gpt-4o but only with gpt-3.5-turbo-instruct #899

Comments

tb852 commented Dec 6, 2024 • edited Loading

Did you check docs and existing issues?

Python version (python --version)

Operating system/version

NeMo-Guardrails version (if you must use a specific version and not the latest

Describe the bug

Steps To Reproduce

Expected Behavior

Actual Behavior

tb852 commented Dec 6, 2024 •

edited

Loading