Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Rails do not produce fast-checking results with gpt-4o but only with gpt-3.5-turbo-instruct #899

Open
3 of 4 tasks
tb852 opened this issue Dec 6, 2024 · 0 comments
Labels
bug Something isn't working status: needs triage New issues that have not yet been reviewed or categorized.

Comments

@tb852
Copy link

tb852 commented Dec 6, 2024

Did you check docs and existing issues?

  • I have read all the NeMo-Guardrails docs
  • I have updated the package to the latest version before submitting this issue
  • (optional) I have used the develop branch
  • I have searched the existing issues of NeMo-Guardrails

Python version (python --version)

3.11.9

Operating system/version

Ubuntu 22.04

NeMo-Guardrails version (if you must use a specific version and not the latest

0.11.0

Describe the bug

NeMo guardrails don't work as they should with the most vanilla case for fact checking.
For instance when I provide the context to the model and I want to fact-check it, it doesn't answer when using gpt-4o model, however, the very same code works when using gpt-3.5-turbo-instruct. Does anyone else has those problems and when it the gpt-4o model predicted to be working for fact checking?

Output to the code below if I use gpt-4o model:
I'm sorry, I can't respond to that.

However, if I change only one line of code in config.yml:
model: gpt-3.5-turbo-instruct
The output is: `Employees are eligible for 20 vacation days per year, accrued monthly.

Steps To Reproduce

python script:

import nest_asyncio
from nemoguardrails import RailsConfig, LLMRails
import os
import openai
nest_asyncio.apply()
path="my_path"
os.chdir(path)
nemo_config = RailsConfig.from_path("my/path/to/config")
os.environ["OPENAI_API_KEY"] = my_api_key

rails = LLMRails(nemo_config)

response = rails.generate(messages=[{
    "role": "context",
    "content": {
        "relevant_chunks": """
            Employees are eligible for the following time off:
              * Vacation: 20 days per year, accrued monthly.
              * Sick leave: 15 days per year, accrued monthly.
              * Personal days: 5 days per year, accrued monthly.
              * Paid holidays: New Year's Day, Memorial Day, Independence Day, Thanksgiving Day, Christmas Day.
              * Bereavement leave: 3 days paid leave for immediate family members, 1 day for non-immediate family members. """
    }
},{
    "role": "user",
    "content": "How many vacation days do I have per year?"
}])
print(response['content'])

config.yml:

models:
 - type: main
   engine: openai
   model: gpt-4o

rails:
  output:
    flows:
      - self check facts

instructions:
  - type: general
    content: |
      Below is a conversation between a user and a bot.
      The bot is designed to answer users questions about their vacation information based on the provided content.
      If the answer to the question is not present in the content, the bot should respond with "I don't have enough
      information to answer this question."

prompts.yml:

prompts:
  - task: self_check_facts
    content: |-
      You are given a task to identify if the hypothesis is grounded and entailed to the evidence.
      You will only use the contents of the evidence and not rely on external knowledge.
      Answer with yes/no. "evidence": {{ evidence }} "hypothesis": {{ response }} "entails":`

rails.co:

define subflow self check facts
  $accuracy = execute self_check_facts
  if $accuracy < 0.5
    bot refuse to respond
    stop

Expected Behavior

The expected behaviour should be an answer similar to this one (like in case when I use gpt-3.5-turbo-instruct):
Employees are eligible for 20 vacation days per year, accrued monthly.

Actual Behavior

When I use gpt-4o, the output is: I'm sorry, I can't respond to that.

@tb852 tb852 added bug Something isn't working status: needs triage New issues that have not yet been reviewed or categorized. labels Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working status: needs triage New issues that have not yet been reviewed or categorized.
Projects
None yet
Development

No branches or pull requests

1 participant