-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Test Strategy #41
Comments
Possible training data with questions about integreat content: https://huggingface.co/datasets/digitalfabrik/integreat-qa |
Tests based on 9f57f80 (llama3.1:8b, skip questions with no matching documents, chunking at h2 tags)
|
Does not always yield a result. It seems that in 1 of 4 cases the message is not classified as a question that requires an answer. |
Another interesting prompt:
{
"answer": "I don't know. The provided context does not mention cinemas or movie showings in Munich.",
"sources": [
"/muenchen/en/culture-leisure-sport/general-information/",
"/muenchen/en/culture-leisure-sport/be-creative/youth-theatre-workshop-in-the-bellevue-di-monaco/",
"/muenchen/en/culture-leisure-sport/meet-people/meetings-in-the-neighbourhood/"
],
"details": [
{
"source": "/muenchen/en/culture-leisure-sport/be-creative/youth-theatre-workshop-in-the-bellevue-di-monaco/",
"score": 0.7928134202957153
},
{
"source": "/muenchen/en/culture-leisure-sport/general-information/",
"score": 0.855070948600769
},
{
"source": "/muenchen/en/culture-leisure-sport/meet-people/meetings-in-the-neighbourhood/",
"score": 1.0023198127746582
}
],
"status": "success"
} |
Another test question with frequent bad results:
|
*edit: see #61 (comment) Another observation: the chunking (and chunk encoding) might be problematic as well. |
Evaluation of the answers can be found in https://nextcloud.tuerantuer.org/index.php/f/6552668 |
We want to do performance testing on our different modules:
3 of the above mentioned components should be fixed, while we change one of them and test different approaches with our benchmark questions.
Benchmark questions in order of their priority and based on our user stories:
Extended Benchmark questions based on Persona "Iryna"
Extended Benchmark questioins not based on Personas:
How should we judge the quality of answers?
Additional test languages:
The text was updated successfully, but these errors were encountered: