Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Test Strategy #41

Open
4 of 5 tasks
svenseeberg opened this issue Sep 25, 2024 · 8 comments
Open
4 of 5 tasks

Performance Test Strategy #41

svenseeberg opened this issue Sep 25, 2024 · 8 comments
Labels
analysis Analyse/comparative study of features component:chat Chat Back End

Comments

@svenseeberg
Copy link
Member

svenseeberg commented Sep 25, 2024

We want to do performance testing on our different modules:

  1. Embedding model (already done?)
  2. Chunking methods: Test chunking strategies #38
  3. Prompt
  4. LLM Evaluate MiniLLM Performance #10
  5. Evaluate Translation model performance #50

3 of the above mentioned components should be fixed, while we change one of them and test different approaches with our benchmark questions.

Benchmark questions in order of their priority and based on our user stories:

  1. simple question: How can I learn German?
  2. simple question with complicated words: I need to know the German language for a job. What do I need to do?
  3. question with no answer in content: When was JFK assassinated?
  4. complicated question (double question, more context, etc): How can a 17 years old person from Ukraine learn German?
  5. malformed question (spelling / grammar mistakes): I are Ukraina. Need job.

Extended Benchmark questions based on Persona "Iryna"

  • I need a German course with parallel child care.
  • Can I get a mentor that helps me find a job?
  • When are the next German courses for A2 level?
  • Where can I get my university degree translated?
  • My son is 6 years old and has to go to school soon. What do I need to keep in mind?
  • What support options are available for newcomers in Munich?
  • Where can I find an overview of important German holidays and their significance?
  • Are there any specific regulations for Ukrainians in Germany?
  • What childcare options are available in Munich?
  • How do I enroll my son in an elementary school?
  • Are there any recreational activities or sports clubs for children?
  • What should I do if my son gets sick and needs a doctor?
  • Are there any Ukrainian communities or meetups in Munich?
  • How can I meet Germans to improve my language skills?
  • Where can I attend cultural events or festivals in Munich?
  • What language courses are suitable for improving my German?
  • Which language certificate is necessary to apply for German citizenship?
  • Can I find free or subsidized German courses?
  • Are there apps or platforms to help me learn German?
  • What visa or residence regulations apply to me?
  • How do I open a bank account in Germany?
  • What insurance policies are important, e.g., for me or my son?
  • How can I find a new apartment in Munich?
  • Which public transportation options can I use in Munich?
  • Where can I buy affordable groceries?
  • What emergency numbers are important in Germany?
  • What should I do if I need legal assistance?
  • Are there any counseling centers for women in difficult situations?

Extended Benchmark questioins not based on Personas:

  • I'm new to Germany and I was born in Egypt. I've studied computer science and want to work as a software engineer in Germany. What do I have to do?
  • As a software engineer I normally can work in English. Do I really need a C1 level German certificate?

How should we judge the quality of answers?

  • Does the answer subjectively answer the question?
  • The answers to test questions should be based mostly on the content.

Additional test languages:

  • Ukrainian, Farsi, Arabic
@svenseeberg svenseeberg changed the title Testing-Strategy Performance Test Strategy Sep 25, 2024
@svenseeberg svenseeberg added this to the v3 Basic Answer Retrieval milestone Sep 25, 2024
@svenseeberg svenseeberg added the analysis Analyse/comparative study of features label Sep 25, 2024
@steffenkleinle
Copy link
Member

Possible training data with questions about integreat content: https://huggingface.co/datasets/digitalfabrik/integreat-qa
The questions are relatively simple and well phrased, so only cover a subsection of cases mentioned above.

@svenseeberg
Copy link
Member Author

svenseeberg commented Oct 1, 2024

Tests based on 9f57f80 (llama3.1:8b, skip questions with no matching documents, chunking at h2 tags)

  1. You can learn German through SPEAK's online language learning groups, where you can interact with others and meet daily for 90 minutes over 2 weeks. Alternatively, you can use various online resources such as Mein Deutschbuch, Deutsch-Uni Online (DUO), and the Basic language course from Deutsche Welle to study at your own pace.
  2. To learn German for a job in Germany, you should take a vocational German course such as DeuFöV. This will help improve your speaking skills and prepare you for the workplace. You can find more information on how to apply for these courses through the Jobcenter or Employment Agency.
  3. no answer
  4. A 17-year-old person from Ukraine can learn German through SPEAK's online language courses, which include interactive groups and video conferencing. Additionally, they can use various free apps and websites such as Ankommen, Serlo ABC, Mein Deutschbuch, and Deutsch-Uni Online to learn German at their own pace. Some of these resources also offer placement tests and certification.
  5. no answer

@svenseeberg
Copy link
Member Author

svenseeberg commented Oct 1, 2024

I need to know the German language for a job. What do I need to do?

Does not always yield a result. It seems that in 1 of 4 cases the message is not classified as a question that requires an answer.

This was referenced Oct 1, 2024
@svenseeberg
Copy link
Member Author

svenseeberg commented Oct 4, 2024

Another interesting prompt:

Is there a cinema in Munich that shows English movies?

{
  "answer": "I don't know. The provided context does not mention cinemas or movie showings in Munich.",
  "sources": [
    "/muenchen/en/culture-leisure-sport/general-information/",
    "/muenchen/en/culture-leisure-sport/be-creative/youth-theatre-workshop-in-the-bellevue-di-monaco/",
    "/muenchen/en/culture-leisure-sport/meet-people/meetings-in-the-neighbourhood/"
  ],
  "details": [
    {
      "source": "/muenchen/en/culture-leisure-sport/be-creative/youth-theatre-workshop-in-the-bellevue-di-monaco/",
      "score": 0.7928134202957153
    },
    {
      "source": "/muenchen/en/culture-leisure-sport/general-information/",
      "score": 0.855070948600769
    },
    {
      "source": "/muenchen/en/culture-leisure-sport/meet-people/meetings-in-the-neighbourhood/",
      "score": 1.0023198127746582
    }
  ],
  "status": "success"
}

@svenseeberg svenseeberg added the component:chat Chat Back End label Oct 4, 2024
@svenseeberg
Copy link
Member Author

Another test question with frequent bad results:

Hi I'm from Afghanistan and 17 years old. How can I learn German?

@svenseeberg
Copy link
Member Author

svenseeberg commented Oct 16, 2024

We tried to get more consistent documents from Milvus (see #60) with flat indexes but still got varying results. The only possible conclusion: the embedding model is producing different vectors for the same query.

*edit: see #61 (comment)

Another observation: the chunking (and chunk encoding) might be problematic as well.

@svenseeberg
Copy link
Member Author

svenseeberg commented Nov 18, 2024

image

image

image

@svenseeberg
Copy link
Member Author

Evaluation of the answers can be found in https://nextcloud.tuerantuer.org/index.php/f/6552668

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Analyse/comparative study of features component:chat Chat Back End
Projects
None yet
Development

No branches or pull requests

2 participants