🐢 Open-Source Evaluation & Testing for LLMs and ML models
-
Updated
Jul 3, 2024 - Python
🐢 Open-Source Evaluation & Testing for LLMs and ML models
Test your prompts, agents, and RAGs. Use LLM evals to improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
AI Observability & Evaluation
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
Python SDK for running evaluations on LLM generated responses
Generate ideal question-answers for testing RAG
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Бенчмарк сравнивает русские аналоги ChatGPT: Saiga, YandexGPT, Gigachat
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.
Code for "Prediction-Powered Ranking of Large Language Models", Arxiv 2024.
TypeScript SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Generative agents — computational software agents that simulate believable human behavior and OpenAI LLM models. Our main focus was to develop a game - “Werewolves of Miller’s Hollow”, aiming to replicate human-like behavior.
The prompt engineering, prompt management, and prompt evaluation tool for TypeScript, JavaScript, and NodeJS.
The prompt engineering, prompt management, and prompt evaluation tool for Python
The prompt engineering, prompt management, and prompt evaluation tool for Kotlin.
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
The prompt engineering, prompt management, and prompt evaluation tool for Ruby.
The prompt engineering, prompt management, and prompt evaluation tool for C# and .NET
Add a description, image, and links to the llm-eval topic page so that developers can more easily learn about it.
To associate your repository with the llm-eval topic, visit your repo's landing page and select "manage topics."