-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-259] Which is the best LLM for evaluation? #981
Comments
Hey @yadavshashank This is a very interesting analysis. This is also a problem we have been thinking about. Fundamentally what matters here is which of these LLMs would suit your level of alignment between automated scoring vs manual scoring. This is the problem we can solve by adding some form of UI component before automated scoring that would allow developers to do some level of manual checking and make sure that scores align with their judgment. We will be working in this direction, but that will only come later. Would love to hop on a call and help/chat with you if you're open. My cal is here. |
I checked the documentation and related resources and couldn't find an answer to my question.
Your Question
Do RAGAS prompts work equally well with other LLMs like Claude 3 Sonnet and Llama 3? If not which model to choose?
Also, is there a way to print and modify the prompts?
Additional context
Radar chart comparison of scores:
![ragas_radar_model_comp](https://private-user-images.githubusercontent.com/16662309/332429267-e18ae327-ae4e-43d2-9a0f-ad0c80183c8b.PNG?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkxMDEyNzQsIm5iZiI6MTcxOTEwMDk3NCwicGF0aCI6Ii8xNjY2MjMwOS8zMzI0MjkyNjctZTE4YWUzMjctYWU0ZS00M2QyLTlhMGYtYWQwYzgwMTgzYzhiLlBORz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIzVDAwMDI1NFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBkYjQ0YzQ3YjVhZGJjNDg0ZTFjNDBmZjY3YjMzMTA5Y2UwNjNjMjk0MGU3Y2ZjZjhhZjA0YjM3NzhmODhjYjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.xPalnvjH6L4CjlWSgpBtcPjKub3G0Reabu-u6Q9ibOg)
R-259
The text was updated successfully, but these errors were encountered: