-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v2] convert category from length descriptor to modality in task metadata #1767
Comments
Converting it to modality makes sense to me! s2p, p2p are much less specific than the actual lengths! |
Yes! This would align MTEB and MIEB in a much better way. The change I see from this is:
|
I think |
|
For now in NV-Embed this used, but in simple way mteb/mteb/models/nvidia_models.py Lines 51 to 53 in c3b46b7
and I created also for jasper a bit more complicated mteb/mteb/models/jasper_models.py Lines 47 to 48 in c3b46b7
|
I can't see that is it used in nv-embed? Am I missing something?
I reviewed #1768 and I am not quite sure why s2s or s2p is required here. Read the model card for jasper but couldn't find any case. I might be missing something, but queries and passages can the disambiguate by the prompt. p in s2p as I understand stands for paragraph not passage. |
Agree with Kenneth above, and
|
Not sure if this is a good idea. Currently it is already somewhat vaguely defined.
I believe the original intention is to tell us something about the length (s2p: sentence to paragraph), but we know have the descriptive statistics which is a much better source.
However in MIEB it is used as "t2i", text to image.
@Muennighoff would love to know what you think:
here is a sampel from the desc. statistics:
@isaac-chung you have also been involved greatly in both parts.
(an alternative is to convert the annotation in mieb into "s2i" meaning sentence to image)
The text was updated successfully, but these errors were encountered: