AttributeError('StringIO' object has no attribute 'classifications') #1688

timelesshc · 2024-11-19T04:23:40Z

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
I'm using the latest ragas version and have been encountering the AttributeError('StringIO' object has no attribute 'classifications') error message when evaluating metrics.

I'm using chatglm APIs and wonder if there is a compatibility issue.

Ragas version: 0.2.5
Python version: 3.12

Code to Reproduce

from datasets import Dataset
import pandas as pd
from ragas import evaluate, RunConfig

from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_community.chat_models import ChatZhipuAI
from langchain_community.embeddings import ZhipuAIEmbeddings

from ragas.metrics import (
    Faithfulness,
    ResponseRelevancy,
    LLMContextRecall,
    LLMContextPrecisionWithReference,
)

def read_excel(file_path, sheet_name='Sheet1'):
    data = pd.read_excel(file_path, sheet_name=sheet_name)
    data.fillna('', inplace=True)
    data = [each_row._asdict() for each_row in data.itertuples(index=False)]
    return data

def write_xlsx(file_path, data, sheet_name='Sheet1'):
    data = pd.DataFrame(data)
    data.to_excel(file_path, sheet_name=sheet_name, index=False)

def get_dataset(data):
    questions = []
    answers = []
    contexts = []
    ground_truths = []

    for each in data:
        questions.append(each['query'])
        answers.append(each['answer'])
        contexts.append([each['context']])
        ground_truths.append(each['reference'])

    data = {
        "user_input": questions, 
        "response": answers, 
        "retrieved_contexts": contexts,
        "reference": ground_truths
    }
    dataset = Dataset.from_dict(data)
    return dataset

def setup_llm_and_embedder():
    llm = LangchainLLMWrapper(ChatZhipuAI(
        base_url="https://open.bigmodel.cn/api/paas/v4/", 
        api_key="xxx",  # API Key
        model="glm-4-plus",  
        max_tokens= 8000
    ))

    text_embedder = LangchainEmbeddingsWrapper(ZhipuAIEmbeddings(
        api_base="https://open.bigmodel.cn/api/paas/v4/",
        api_key="xxx",  # API Key
        model="embedding-2"
        
    ))
    return llm, text_embedder

if __name__ == "__main__":
    file = 'xxx'
    save_file = 'xxx'
    data = read_excel(file)
    dataset = get_dataset(data)
    llm, text_embedder = setup_llm_and_embedder()
    run_config = RunConfig(
        max_retries=5,
        max_wait=120,
        timeout=500,
        max_workers=8
    )
    result = evaluate(
        dataset = dataset,
        llm = llm,
        run_config=run_config,
        embeddings = text_embedder,
        metrics=[
            Faithfulness(llm=llm),
            ResponseRelevancy(llm=llm),
            LLMContextRecall(llm=llm),
            LLMContextPrecisionWithReference(llm=llm),
        ],
    )
    df = result.to_pandas()
    write_xlsx(save_file, df)

Error trace
Evaluating: 2%|█▍ | 14/792 [00:51<33:06, 2.55s/it]Exception raised in Job[10]: AttributeError('StringIO' object has no attribute 'classifications')
Evaluating: 5%|████▏ | 41/792 [02:50<1:32:51, 7.42s/it]Exception raised in Job[42]: AttributeError('StringIO' object has no attribute 'classifications')
Evaluating: 5%|████▎ | 42/792 [03:01<1:42:38, 8.21s/it]Exception raised in Job[46]: AttributeError('StringIO' object has no attribute 'classifications')
Evaluating: 7%|█████▋ | 54/792 [04:01<46:35, 3.79s/it]Exception raised in Job[54]: AttributeError('StringIO' object has no attribute 'classifications')
Evaluating: 7%|██████ | 58/792 [04:16<43:52, 3.59s/it]Exception raised in Job[50]: AttributeError('StringIO' object has no attribute 'classifications')
Evaluating: 8%|███████ | 67/792 [04:46<37:19, 3.09s/it]Exception raised in Job[62]: AttributeError('StringIO' object has no attribute 'classifications')
Expected behavior

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

cruiser1174 · 2024-12-03T14:49:19Z

Also having this same problem while evaluating ragas faithfulness through the giskard.rag.evaluate function.

Squire-tomsk · 2024-12-05T12:50:39Z

Looks like it occurs because of fix_output_format_prompt object contains StringIO as output_model instead of type defined in pydantic_object field of RagasOutputParser class. I can`t get a logic of FixOutputFormat class for now.

timelesshc · 2024-12-09T02:38:03Z

@jjmachan @shahules786
Any inputs on this issue? Thanks

baptvit · 2024-12-14T22:23:24Z

Im also facing the same issue.

lailanelkoussy · 2024-12-16T09:29:36Z

I am also facing the same issue while evaluating ragas metrics through the giskard.rag.evaluate function.

cruiser1174 · 2024-12-16T09:39:57Z

I solved the particular problem I had while evaluating via giskard. When saving model results in an AgentAnswer object I was consolidating all of the contexts into a single stringz, whereas they should be saved as a list of strings. Converting to a list of strings solved the problem for me.

Here is a relevant extract from my model class - see comments in all caps

    def wrap_rag_model(
                self,
                question: str, 
                history=[]):
            messages = []
            for message in history:
                if message["role"] == "user":
                    messages.append({"inputs":{"chat_input":message['content']}})
                elif message["role"] == "assistant":
                    messages[-1]["outputs"] = {"chat_output":message["content"]}
            
            # Generate a response using Azure OpenAI
            response = self.call(user_prompt=question, history=messages)
    
            # Ensure that documents is a list of strings
            documents = self.get_response_context(response) if self.get_response_context(response) else []
            documents = [str(d) for d in documents]
            # Instead of returning a simple string, we return the AgentAnswer object which
            # allows us to specify the retrieved context which is used by RAGAS metrics
            return AgentAnswer(
                message=self.get_response_text(response),
                documents=documents # HERE ENSURE DOCUMENTS IS A LIST OF STRINGS
            )
    
    def scan_rag_model(
        self,
        testset: QATestset,
        knowledgebase: KnowledgeBase,
        ragas_metrics: list
        ):
    
        self.rag_scan = evaluate(
            self.wrap_rag_model,  #HERE THE FUNCTION MUST RETURN AN AgentAnswer object
            testset=testset,
            knowledge_base=knowledgebase,
            metrics=ragas_metrics,
            agent_description=self.description
            )
        
        return self.rag_scan

jjmachan · 2024-12-16T16:05:12Z

hey folks - taking a look at this now

lailanelkoussy · 2024-12-18T08:17:58Z

I solved the particular problem I had while evaluating via giskard. When saving model results in an AgentAnswer object I was consolidating all of the contexts into a single stringz, whereas they should be saved as a list of strings. Converting to a list of strings solved the problem for me.

Here is a relevant extract from my model class - see comments in all caps

    def wrap_rag_model(
                self,
                question: str, 
                history=[]):
            messages = []
            for message in history:
                if message["role"] == "user":
                    messages.append({"inputs":{"chat_input":message['content']}})
                elif message["role"] == "assistant":
                    messages[-1]["outputs"] = {"chat_output":message["content"]}
            
            # Generate a response using Azure OpenAI
            response = self.call(user_prompt=question, history=messages)
    
            # Ensure that documents is a list of strings
            documents = self.get_response_context(response) if self.get_response_context(response) else []
            documents = [str(d) for d in documents]
            # Instead of returning a simple string, we return the AgentAnswer object which
            # allows us to specify the retrieved context which is used by RAGAS metrics
            return AgentAnswer(
                message=self.get_response_text(response),
                documents=documents # HERE ENSURE DOCUMENTS IS A LIST OF STRINGS
            )
    
    def scan_rag_model(
        self,
        testset: QATestset,
        knowledgebase: KnowledgeBase,
        ragas_metrics: list
        ):
    
        self.rag_scan = evaluate(
            self.wrap_rag_model,  #HERE THE FUNCTION MUST RETURN AN AgentAnswer object
            testset=testset,
            knowledge_base=knowledgebase,
            metrics=ragas_metrics,
            agent_description=self.description
            )
        
        return self.rag_scan

I tried this in my case and it still did not work (I am using giskard)

tim-hilde · 2025-01-13T08:44:36Z

Issue is still occurring as of 2.10: #1831

andreped · 2025-01-14T06:48:40Z

I'm seeing issues specifically with Faithfulness. This method works perfectly fine for very simple contexts List[str] but as the content inside the list gets increasingly complex something goes wrong. Doesn't matter if I force cast the content within the list to str. It still fails downstream.

Would actually be great if there was more verbose on what exactly goes wrong as it becomes quite an impossible task to debug...

EDIT: No wait, it seems to fail even with very simple List[str] contexts and maybe it is another metric causing it... No idea whats wrong anymore...

andreped · 2025-01-14T08:18:54Z

So after debugging this way too long, I managed to get around the issue for one of my applications by computing the context_recall metric separately. No idea why that was the issue. But if I do something like this, I am able to compute the metrics:

score = ragas.evaluate(
    dataset,
    metrics=[
        answer_correctness,
        faithfulness,
        answer_similarity,
        context_precision,
        answer_relevancy,
        # context_recall,   # <- DONT include this here
    ],
    llm=[...],
    embeddings=[...],
    raise_exceptions=True,
)
result = score.to_pandas()

# compute context recall separately
context_recall_score = ragas.evaluate(
    dataset,
    metrics=[context_recall],  # <- Include this metric here instead
    llm=[...],
    embeddings=[...],
    raise_exceptions=True,
)

# merge context recall score with result
result["context_recall"] = context_recall_score.to_pandas()["context_recall"]

And of course, there is no need to use evaluate() in the second step for this exact case, but maybe others experience issues with multiple metrics and this could be a quick-fix until the real issue has been resolved. Hopefully this gets resolved very soon!

Tested with ragas==0.2.10.

NOTE: This is by no means the fix to the solution but rather a temporary workaround/hack.

timelesshc added the bug Something isn't working label Nov 19, 2024

dosubot bot added the module-metrics this is part of metrics module label Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError('StringIO' object has no attribute 'classifications') #1688

AttributeError('StringIO' object has no attribute 'classifications') #1688

timelesshc commented Nov 19, 2024 •

edited

Loading

cruiser1174 commented Dec 3, 2024

Squire-tomsk commented Dec 5, 2024 •

edited

Loading

timelesshc commented Dec 9, 2024

baptvit commented Dec 14, 2024

lailanelkoussy commented Dec 16, 2024 •

edited

Loading

cruiser1174 commented Dec 16, 2024 •

edited

Loading

jjmachan commented Dec 16, 2024

lailanelkoussy commented Dec 18, 2024

tim-hilde commented Jan 13, 2025

andreped commented Jan 14, 2025 •

edited

Loading

andreped commented Jan 14, 2025 •

edited

Loading

AttributeError('StringIO' object has no attribute 'classifications') #1688

AttributeError('StringIO' object has no attribute 'classifications') #1688

Comments

timelesshc commented Nov 19, 2024 • edited Loading

cruiser1174 commented Dec 3, 2024

Squire-tomsk commented Dec 5, 2024 • edited Loading

timelesshc commented Dec 9, 2024

baptvit commented Dec 14, 2024

lailanelkoussy commented Dec 16, 2024 • edited Loading

cruiser1174 commented Dec 16, 2024 • edited Loading

jjmachan commented Dec 16, 2024

lailanelkoussy commented Dec 18, 2024

tim-hilde commented Jan 13, 2025

andreped commented Jan 14, 2025 • edited Loading

andreped commented Jan 14, 2025 • edited Loading

timelesshc commented Nov 19, 2024 •

edited

Loading

Squire-tomsk commented Dec 5, 2024 •

edited

Loading

lailanelkoussy commented Dec 16, 2024 •

edited

Loading

cruiser1174 commented Dec 16, 2024 •

edited

Loading

andreped commented Jan 14, 2025 •

edited

Loading

andreped commented Jan 14, 2025 •

edited

Loading