Performance Test Strategy #41

svenseeberg · 2024-09-25T09:16:15Z

We want to do performance testing on our different modules:

Embedding model (already done?)
Chunking methods: Test chunking strategies #38
Prompt
LLM Evaluate MiniLLM Performance #10
Evaluate Translation model performance #50

3 of the above mentioned components should be fixed, while we change one of them and test different approaches with our benchmark questions.

Benchmark questions in order of their priority and based on our user stories:

simple question: How can I learn German?
simple question with complicated words: I need to know the German language for a job. What do I need to do?
question with no answer in content: When was JFK assassinated?
complicated question (double question, more context, etc): How can a 17 years old person from Ukraine learn German?
malformed question (spelling / grammar mistakes): I are Ukraina. Need job.

Extended Benchmark questions based on Persona "Iryna"

I need a German course with parallel child care.
Can I get a mentor that helps me find a job?
When are the next German courses for A2 level?
Where can I get my university degree translated?
My son is 6 years old and has to go to school soon. What do I need to keep in mind?
What support options are available for newcomers in Munich?
Where can I find an overview of important German holidays and their significance?
Are there any specific regulations for Ukrainians in Germany?
What childcare options are available in Munich?
How do I enroll my son in an elementary school?
Are there any recreational activities or sports clubs for children?
What should I do if my son gets sick and needs a doctor?
Are there any Ukrainian communities or meetups in Munich?
How can I meet Germans to improve my language skills?
Where can I attend cultural events or festivals in Munich?
What language courses are suitable for improving my German?
Which language certificate is necessary to apply for German citizenship?
Can I find free or subsidized German courses?
Are there apps or platforms to help me learn German?
What visa or residence regulations apply to me?
How do I open a bank account in Germany?
What insurance policies are important, e.g., for me or my son?
How can I find a new apartment in Munich?
Which public transportation options can I use in Munich?
Where can I buy affordable groceries?
What emergency numbers are important in Germany?
What should I do if I need legal assistance?
Are there any counseling centers for women in difficult situations?

Extended Benchmark questioins not based on Personas:

I'm new to Germany and I was born in Egypt. I've studied computer science and want to work as a software engineer in Germany. What do I have to do?
As a software engineer I normally can work in English. Do I really need a C1 level German certificate?

How should we judge the quality of answers?

Does the answer subjectively answer the question?
The answers to test questions should be based mostly on the content.

Additional test languages:

Ukrainian, Farsi, Arabic

steffenkleinle · 2024-09-29T08:05:40Z

Possible training data with questions about integreat content: https://huggingface.co/datasets/digitalfabrik/integreat-qa
The questions are relatively simple and well phrased, so only cover a subsection of cases mentioned above.

svenseeberg · 2024-10-01T08:59:51Z

Tests based on 9f57f80 (llama3.1:8b, skip questions with no matching documents, chunking at h2 tags)

You can learn German through SPEAK's online language learning groups, where you can interact with others and meet daily for 90 minutes over 2 weeks. Alternatively, you can use various online resources such as Mein Deutschbuch, Deutsch-Uni Online (DUO), and the Basic language course from Deutsche Welle to study at your own pace.
To learn German for a job in Germany, you should take a vocational German course such as DeuFöV. This will help improve your speaking skills and prepare you for the workplace. You can find more information on how to apply for these courses through the Jobcenter or Employment Agency.
no answer
A 17-year-old person from Ukraine can learn German through SPEAK's online language courses, which include interactive groups and video conferencing. Additionally, they can use various free apps and websites such as Ankommen, Serlo ABC, Mein Deutschbuch, and Deutsch-Uni Online to learn German at their own pace. Some of these resources also offer placement tests and certification.
no answer

svenseeberg · 2024-10-01T09:41:07Z

I need to know the German language for a job. What do I need to do?

Does not always yield a result. It seems that in 1 of 4 cases the message is not classified as a question that requires an answer.

svenseeberg · 2024-10-04T15:43:22Z

Another interesting prompt:

Is there a cinema in Munich that shows English movies?

{
  "answer": "I don't know. The provided context does not mention cinemas or movie showings in Munich.",
  "sources": [
    "/muenchen/en/culture-leisure-sport/general-information/",
    "/muenchen/en/culture-leisure-sport/be-creative/youth-theatre-workshop-in-the-bellevue-di-monaco/",
    "/muenchen/en/culture-leisure-sport/meet-people/meetings-in-the-neighbourhood/"
  ],
  "details": [
    {
      "source": "/muenchen/en/culture-leisure-sport/be-creative/youth-theatre-workshop-in-the-bellevue-di-monaco/",
      "score": 0.7928134202957153
    },
    {
      "source": "/muenchen/en/culture-leisure-sport/general-information/",
      "score": 0.855070948600769
    },
    {
      "source": "/muenchen/en/culture-leisure-sport/meet-people/meetings-in-the-neighbourhood/",
      "score": 1.0023198127746582
    }
  ],
  "status": "success"
}

svenseeberg · 2024-10-07T09:16:26Z

Another test question with frequent bad results:

Hi I'm from Afghanistan and 17 years old. How can I learn German?

svenseeberg · 2024-10-16T11:14:58Z

We tried to get more consistent documents from Milvus (see #60) with flat indexes but still got varying results. The only possible conclusion: the embedding model is producing different vectors for the same query.

*edit: see #61 (comment)

Another observation: the chunking (and chunk encoding) might be problematic as well.

svenseeberg · 2024-11-18T08:53:46Z

svenseeberg · 2024-12-02T10:53:37Z

Evaluation of the answers can be found in https://nextcloud.tuerantuer.org/index.php/f/6552668

svenseeberg changed the title ~~Testing-Strategy~~ Performance Test Strategy Sep 25, 2024

svenseeberg added this to the v3 Basic Answer Retrieval milestone Sep 25, 2024

svenseeberg added the analysis Analyse/comparative study of features label Sep 25, 2024

This was referenced Oct 1, 2024

Use current page as context #39

Closed

Prepare training data set #3

Closed

svenseeberg added the component:chat Chat Back End label Oct 4, 2024

svenseeberg modified the milestones: v3 Basic Answer Retrieval, v3.2 Improve LLM perfomance Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Test Strategy #41

Performance Test Strategy #41

svenseeberg commented Sep 25, 2024 •

edited

Loading

steffenkleinle commented Sep 29, 2024

svenseeberg commented Oct 1, 2024 •

edited

Loading

svenseeberg commented Oct 1, 2024 •

edited

Loading

svenseeberg commented Oct 4, 2024 •

edited

Loading

svenseeberg commented Oct 7, 2024

svenseeberg commented Oct 16, 2024 •

edited

Loading

svenseeberg commented Nov 18, 2024 •

edited

Loading

svenseeberg commented Dec 2, 2024

Performance Test Strategy #41

Performance Test Strategy #41

Comments

svenseeberg commented Sep 25, 2024 • edited Loading

steffenkleinle commented Sep 29, 2024

svenseeberg commented Oct 1, 2024 • edited Loading

svenseeberg commented Oct 1, 2024 • edited Loading

svenseeberg commented Oct 4, 2024 • edited Loading

svenseeberg commented Oct 7, 2024

svenseeberg commented Oct 16, 2024 • edited Loading

svenseeberg commented Nov 18, 2024 • edited Loading

svenseeberg commented Dec 2, 2024

svenseeberg commented Sep 25, 2024 •

edited

Loading

svenseeberg commented Oct 1, 2024 •

edited

Loading

svenseeberg commented Oct 1, 2024 •

edited

Loading

svenseeberg commented Oct 4, 2024 •

edited

Loading

svenseeberg commented Oct 16, 2024 •

edited

Loading

svenseeberg commented Nov 18, 2024 •

edited

Loading