We offer a comprehensive set of notebooks that demonstrate how to use Vertex AI LLM Evaluation Services in conjunction with other Vertex AI services. Additionally, we have provided notebooks that delve into the theory behind evaluation metrics.
Computation-Based Evaluation:
- Workflow for Evaluating LLM Performance in a Text Classification Task using Gemini and Vertex AI SDK
- LLM Evaluation workflow for a Classification task using a tuned model and Vertex AI SDK
- LLM Evaluation Workflow for a Classification Task using Gemini and Vertex AI Pipelines
- Complete LLM Model Evaluation Workflow for Classification using KFP Pipelines
Evaluation of RAG Systems:
- Evaluating Retrieval Augmented Generation (RAG) Systems
Theory notebooks:
- Metrics for Classification
- Metrics for Summarization
- Metrics for Text Generation
- Metrics for Q&A
To run the walkthrough and demonstration in the notebook you'll need access to a Google Cloud project with the Vertex AI API enabled.
If you have any questions or find any problems, please report through GitHub issues.