Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling MTEB #647

Closed
NehaB18 opened this issue May 8, 2024 · 2 comments
Closed

Sampling MTEB #647

NehaB18 opened this issue May 8, 2024 · 2 comments

Comments

@NehaB18
Copy link

NehaB18 commented May 8, 2024

is there any way to run the evaluation on the sample of datasets for example 5% of all Retrieval tasks?

@KennethEnevoldsen
Copy link
Contributor

Not at the moment @NehaB18, we are however working on speeding the benchmark up and have already had a few drastic improvements (#572, #481).

For retrieval there is an ongoing discussion over at #638

Implementing a downsampling function for the retrieval might be a reasonable solution to speeding it up.

If you simply want to run it on a selected subset of the retrieval task you can do something like:

import mteb
import random

tasks = mteb.get_tasks(languages = ["eng"], domains = ["Legal"], task_types = ["Retrieval"]) 
task_list = [t for t in tasks]
random.shuffle(task_list)
tasks_to_run = tasks[:10] # select the 10 first tasks

@KennethEnevoldsen
Copy link
Contributor

@NehaB18 I will move this over to the discussions

@embeddings-benchmark embeddings-benchmark locked and limited conversation to collaborators Jun 5, 2024
@KennethEnevoldsen KennethEnevoldsen converted this issue into discussion #884 Jun 5, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants