Sampling MTEB #647

NehaB18 · 2024-05-08T08:37:57Z

is there any way to run the evaluation on the sample of datasets for example 5% of all Retrieval tasks?

KennethEnevoldsen · 2024-05-09T15:46:38Z

Not at the moment @NehaB18, we are however working on speeding the benchmark up and have already had a few drastic improvements (#572, #481).

For retrieval there is an ongoing discussion over at #638

Implementing a downsampling function for the retrieval might be a reasonable solution to speeding it up.

If you simply want to run it on a selected subset of the retrieval task you can do something like:

import mteb
import random

tasks = mteb.get_tasks(languages = ["eng"], domains = ["Legal"], task_types = ["Retrieval"]) 
task_list = [t for t in tasks]
random.shuffle(task_list)
tasks_to_run = tasks[:10] # select the 10 first tasks

KennethEnevoldsen · 2024-06-05T18:02:51Z

@NehaB18 I will move this over to the discussions

embeddings-benchmark locked and limited conversation to collaborators Jun 5, 2024

KennethEnevoldsen converted this issue into discussion #884 Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Sampling MTEB #647

Sampling MTEB #647

NehaB18 commented May 8, 2024

KennethEnevoldsen commented May 9, 2024

KennethEnevoldsen commented Jun 5, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

Sampling MTEB #647

Sampling MTEB #647

Comments

NehaB18 commented May 8, 2024

KennethEnevoldsen commented May 9, 2024

KennethEnevoldsen commented Jun 5, 2024

This issue was moved to a discussion.