Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testset generation ValueError: invalid literal for int() with base 10: #966

Open
choshiho opened this issue May 17, 2024 · 5 comments
Open
Labels
bug Something isn't working

Comments

@choshiho
Copy link

choshiho commented May 17, 2024

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug

testset = TestsetGenerator.generate_with_langchain_docs() , The program encountered an issue and terminated unexpectedly.
if int(i) - 1 < len(current_nodes.nodes)
       ^^^^^^
ValueError: invalid literal for int() with base 10:

Ragas version: 0.1.7
Python version: 3.11.7

Code to Reproduce
First, I deployed my Qwen1.5-7B-Chat-GPTQ-Int8 using the following command:

CUDA_VISIBLE_DEVICES=1 python -m vllm.entrypoints.openai.api_server --served-model-name Qwen1.5-7B-Chat-GPTQ-Int8 --model /home/zhifeng.zhao/.cache/modelscope/hub/qwen/Qwen1___5-7B-Chat-GPTQ-Int8 --max-model-len 18576

Then, The code in the jupyter notebook is as follows :

chat = ChatOpenAI(
    # streaming=True,
    verbose=True,
    openai_api_key='EMPTY',
    openai_api_base='http://localhost:8000/v1',
    model_name="Qwen1.5-7B-Chat-GPTQ-Int8",
    temperature=0.0,
    max_tokens=2048, # Maximum number of tokens to generate.
    openai_proxy='',
)
embedding_function = SentenceTransformerEmbeddings(model_name="/home/zhifeng.zhao/.cache/modelscope/hub/AI-ModelScope/bge-small-en-v1___5")

from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

langchain_llm = LangchainLLMWrapper(chat)
langchain_embeddings = LangchainEmbeddingsWrapper(embedding_function)

from ragas.testset.generator import TestsetGenerator

generator_llm = LangchainLLMWrapper(llm)
critic_llm = langchain_llm

# generator with custom llm and embeddings
generator = TestsetGenerator.from_langchain(
    generator_llm=chat,
    critic_llm=langchain_llm,
    embeddings=langchain_embeddings,
) 

# default extractor
from ragas.testset.extractor import KeyphraseExtractor
from langchain.text_splitter import TokenTextSplitter
# default DocumentStore
from ragas.testset.docstore import InMemoryDocumentStore

# init the DocumentStore with your own llm and embeddings
splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=100)
keyphrase_extractor = KeyphraseExtractor(llm=langchain_llm)
docstore = InMemoryDocumentStore(
    splitter=splitter,
    embeddings=langchain_embeddings,
    extractor=keyphrase_extractor,
)

from langchain_openai import ChatOpenAI
import os


from ragas.testset.prompts import (
    context_scoring_prompt,
    evolution_elimination_prompt,
    filter_question_prompt,
)
from langchain_community.document_loaders import DirectoryLoader
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context

# remove demonstrations from examples
for prompt in [
    context_scoring_prompt,
    evolution_elimination_prompt,
    filter_question_prompt,
]:
    prompt.examples = []

from ragas.testset.filters import QuestionFilter, EvolutionFilter, NodeFilter

qa_filter = QuestionFilter(langchain_llm, filter_question_prompt)
node_filter = NodeFilter(langchain_llm, context_scoring_prompt=context_scoring_prompt)
evolution_filter = EvolutionFilter(langchain_llm, evolution_elimination_prompt)

distributions = {simple: 0.5, reasoning: 0.25, multi_context: 0.25}

# customise the filters
from ragas.testset.evolutions import ComplexEvolution

for evolution in distributions:
    if evolution.question_filter is None:
        evolution.question_filter = qa_filter
    if evolution.node_filter is None:
        evolution.node_filter = node_filter

    if isinstance(evolution, ComplexEvolution):
        if evolution.evolution_filter is None:
            evolution.evolution_filter = evolution_filter

loader = DirectoryLoader("/home/zhifeng.zhao/prompt-engineering-guide-papers", glob="*.pdf")
documents = loader.load()

for document in documents:
    document.metadata["filename"] = document.metadata["source"]

documents = [doc for doc in documents if len(doc.page_content.split()) > 5000]

# generator = TestsetGenerator.with_openai(chunk_size=512)
testset = generator.generate_with_langchain_docs(
    documents[:10],
    test_size=10,
    raise_exceptions=False,
    with_debugging_logs=False,
    distributions=distributions,
)

Error trace

Runner in Executor raised an exception
Traceback (most recent call last):
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/executor.py", line 79, in _aresults
    r = await future
        ^^^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/asyncio/tasks.py", line 615, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/executor.py", line 38, in sema_coro
    return await coro
           ^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/executor.py", line 112, in wrapped_callable_async
    return counter, await callable(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/testset/evolutions.py", line 144, in evolve
    return await self.generate_datarow(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/testset/evolutions.py", line 210, in generate_datarow
    selected_nodes = [
                     ^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/testset/evolutions.py", line 213, in <listcomp>
    if int(i) - 1 < len(current_nodes.nodes)
       ^^^^^^
ValueError: invalid literal for int() with base 10: 'A: Adam bought 2 boxes of chocolate candy and 5 boxes of caramel candy. If each box has 4 pieces inside it, how much candy did he have total?'
Runner in Executor raised an exception
Traceback (most recent call last):
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/executor.py", line 79, in _aresults
    r = await future
        ^^^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/asyncio/tasks.py", line 615, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/executor.py", line 38, in sema_coro
    return await coro
           ^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/executor.py", line 112, in wrapped_callable_async
    return counter, await callable(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/testset/evolutions.py", line 144, in evolve
    return await self.generate_datarow(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/testset/evolutions.py", line 210, in generate_datarow
    selected_nodes = [
                     ^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/testset/evolutions.py", line 213, in <listcomp>
    if int(i) - 1 < len(current_nodes.nodes)
       ^^^^^^
ValueError: invalid literal for int() with base 10: '1. In the context of the model PaLM-540B, self-consistency aids in error repair by ensuring that reasoning paths generated by the model remain coherent and consistent with the ground truth. This is d
Runner in Executor raised an exception
Traceback (most recent call last):
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/executor.py", line 79, in _aresults
    r = await future
        ^^^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/asyncio/tasks.py", line 615, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/executor.py", line 38, in sema_coro
    return await coro
           ^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/executor.py", line 112, in wrapped_callable_async
    return counter, await callable(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/testset/evolutions.py", line 144, in evolve
    return await self.generate_datarow(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/testset/evolutions.py", line 210, in generate_datarow
    selected_nodes = [
                     ^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/testset/evolutions.py", line 213, in <listcomp>
    if int(i) - 1 < len(current_nodes.nodes)
       ^^^^^^
ValueError: invalid literal for int() with base 10: 'A: Let’s think step by step. Adam bought 2 boxes of chocolate candy and 5 boxes of caramel candy. Each box of candy has 4 pieces inside it. So, Adam bought 10 pieces of candy. Therefore, the answer (
Failed to parse output. Returning None.
Failed to parse output. Returning None.
Runner in Executor raised an exception
Traceback (most recent call last):
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/executor.py", line 79, in _aresults
    r = await future
        ^^^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/asyncio/tasks.py", line 615, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/executor.py", line 38, in sema_coro
    return await coro
           ^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/executor.py", line 112, in wrapped_callable_async
    return counter, await callable(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/testset/evolutions.py", line 144, in evolve
    return await self.generate_datarow(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/testset/evolutions.py", line 210, in generate_datarow
    selected_nodes = [
                     ^
  File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/testset/evolutions.py", line 213, in <listcomp>
    if int(i) - 1 < len(current_nodes.nodes)
       ^^^^^^
ValueError: invalid literal for int() with base 10: '2. Adam bought 2 boxes of chocolate candy and 5 boxes of caramel candy. If each box has 4 pieces inside it, how much candy did he have total? (GT : 28)'
Failed to parse output. Returning None.

Expected behavior
TestsetGenerator.generate_with_langchain_docs() returns a TestDataset object with 10 elements.

Additional context
I have edited File "/home/zhifeng.zhao/anaconda3/lib/python3.11/site-packages/ragas/testset/evolutions.py" as issues #900:
selected_nodes = [ current_nodes.nodes[int(i) - 1] for i in relevant_context_indices if int(i) - 1 < len(current_nodes.nodes) ]

@choshiho choshiho added the bug Something isn't working label May 17, 2024
@jjmachan
Copy link
Member

jjmachan commented Jun 1, 2024

This seems like a specific issue with the model and how we are parsing the inputs. This is something we are aware of and will be fixing in the coming weeks but sadily its not a an easy fix

The easier fix is to use a model that is a bit more capable. I was curious why you are not using models like GPT4 and cluade models for your usecase?

@choshiho
Copy link
Author

choshiho commented Jun 4, 2024

This seems like a specific issue with the model and how we are parsing the inputs. This is something we are aware of and will be fixing in the coming weeks but sadily its not a an easy fix

The easier fix is to use a model that is a bit more capable. I was curious why you are not using models like GPT4 and cluade models for your usecase?

Thank you for your reply! Because I want to find some open source large language models to support our private deployment scenarios. Do ragas have a supported list of open source LLMs to choose from as a critic model, or can we select one from the open source LLMs list and use it for test set generation?

@jjmachan
Copy link
Member

jjmachan commented Jun 4, 2024

the recommendation is to try out something as powerful as GPT4, because at that scale models are much more stirable with prompts.

something else you can also try is our custom model for critic. https://docs.ragas.io/en/stable/howtos/customisations/ragas_custom_model.html

if you want help using it and setting it up let me know, can help @choshiho

@choshiho
Copy link
Author

choshiho commented Jun 6, 2024

the recommendation is to try out something as powerful as GPT4, because at that scale models are much more stirable with prompts.

something else you can also try is our custom model for critic. https://docs.ragas.io/en/stable/howtos/customisations/ragas_custom_model.html

if you want help using it and setting it up let me know, can help @choshiho

Thank you! It seems like the ragas official critic model from url(https://docs.ragas.io/en/stable/howtos/customisations/ragas_custom_model.html) can't handle Chinese. Is there any open source Chinese critic model to recommend to me ?

@jjmachan
Copy link
Member

jjmachan commented Jun 6, 2024

Unfortunately, there is no open-source Chinese critic model at present. The best case would be to use a proprietary model that follows Chinese (gpt4, Claude etc), maybe using Azure OpenAI could help.

alternatively, we might be able to help you fine-tune a model but that will be
something we will be doing custom for you so we will have to charge for that.

what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants