Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-254] Issue in Evaluation using local LLM #955

Open
sheetalkamthe55 opened this issue May 15, 2024 · 2 comments
Open

[R-254] Issue in Evaluation using local LLM #955

sheetalkamthe55 opened this issue May 15, 2024 · 2 comments
Assignees
Labels
linear Created by Linear-GitHub Sync question Further information is requested

Comments

@sheetalkamthe55
Copy link

sheetalkamthe55 commented May 15, 2024

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question

“WARNING:ragas.llms.output_parser:Failed to parse output. Returning None.”

I tried including the trace using Langsmith to check for requests and responses. For the given input prompt I believe it is an issue of context length because I get a blank response. I tried different LLMs but the error remains the same.

Code Examples
Hosted LLama 2 model with LLamaCPP. Below is the command I used
python3 -m llama_cpp.server --model /tmp/llama_index/models/llama-13b.Q5_K_M.gguf --port 8009 --host 129.69.217.24 --chat_format llama-2

Following is a sample testset I am using,
Ragas_dataset.csv

Can ignore the dataset part, I tried the same with
fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")
but same issue persists.

Code:

from datasets import load_dataset
dataset = load_dataset("csv", data_files="Ragas_dataset.csv")

from tqdm import tqdm
import pandas as pd
from datasets import Dataset
def create_ragas_dataset( eval_dataset):
  rag_dataset = []
  for row in tqdm(eval_dataset):
    rag_dataset.append(
        {"question" : row["question"],
         # "answer" : result["answer"],
         "answer" : row["ground_truth"],
         "contexts" : [row["contexts"]],
         "ground_truth" : row["ground_truth"]
         }
    )
  rag_df = pd.DataFrame(rag_dataset)
  rag_eval_dataset = Dataset.from_pandas(rag_df)
  return rag_eval_dataset

basic_qa_ragas_dataset = create_ragas_dataset(dataset["train"].select(range(2)))

from langchain.chat_models import ChatOpenAI
from ragas.llms import LangchainLLMWrapper

inference_server_url = "http://localhost:8009/v1"

chat = ChatOpenAI(
    model="/tmp/llama_index/models/llama-13b.Q5_K_M.gguf",
    openai_api_key="no-key",
    openai_api_base=inference_server_url,
    max_tokens=5,
    temperature=0,
)

vllm = LangchainLLMWrapper(chat)

from ragas.metrics import (
    context_precision,
    faithfulness,
    context_recall,
)
from ragas.metrics.critique import harmfulness

# change the LLM

faithfulness.llm = vllm
context_precision.llm = vllm
context_recall.llm = vllm
harmfulness.llm = vllm

from ragas import evaluate

result = evaluate(
    basic_qa_ragas_dataset, 
    metrics=[faithfulness]
)
result
image

Additional context
Please let me know if I should provide more information

R-254

@sheetalkamthe55 sheetalkamthe55 added the question Further information is requested label May 15, 2024
@jjmachan jjmachan added the linear Created by Linear-GitHub Sync label May 17, 2024
@jjmachan jjmachan changed the title Issue in Evaluation using local LLM [R-254] Issue in Evaluation using local LLM May 17, 2024
@jjmachan jjmachan self-assigned this May 17, 2024
@pauljaspersahr
Copy link

pauljaspersahr commented May 29, 2024

Having the same issue with langchain Ollama llm wrapper and llama3

@visionKinger
Copy link

Have the same situation with Langchan Azure OpenAI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear Created by Linear-GitHub Sync question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants