Releases: embeddings-benchmark/mteb
1.29.9
1.29.8
1.29.8 (2025-01-17)
Fix
-
fix: Added Misc Chinese models (#1819)
-
Added moka and piccolo models to overview file
-
Added Text2Vec models
-
Added various Chinese embedding models
Co-authored-by: Isaac Chung <[email protected]> (9823529
)
-
fix: Added way more training dataset annotations (#1765)
-
fix: Leaderboard:
K
instead ofM
Fixes #1752 -
format
-
fixed existing annotations to refer to task name instead of hf dataset
-
added annotation to nvidia
-
added voyage
-
added uae annotations
-
Added stella annotations
-
sentence trf models
-
added salesforce and e5
-
jina
-
bge + model2vec
-
added llm2vec annotations
-
add jasper
-
format
-
format
-
Updated annotations and moved jina models
-
fix: add even more training dataset annotations (#1793)
-
fix: update max tokens for OpenAI (#1772)
update max tokens
-
ci: skip AfriSentiLID for now (#1785)
-
skip AfriSentiLID for now
-
skip relevant test case instead
Co-authored-by: Isaac Chung <[email protected]>
- 1.28.7
Automatically generated by python-semantic-release
-
ci: fix model loading test (#1775)
-
pass base branch into the make command as an arg
-
test a file that has custom wrapper
-
what about overview
-
just dont check overview
-
revert instance check
-
explicitly omit overview and init
-
remove test change
-
try on a lot of models
-
revert test model file
Co-authored-by: Isaac Chung <[email protected]>
-
feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (#1787)
-
feat: Update task filtering, fixing bug on MTEB
- Updated task filtering adding exclusive_language_filter and hf_subset
- fix bug in MTEB where cross-lingual splits were included
- added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)
The following code outlines the problems:
import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
# was eq. to:
task = mteb.get_task("STS22", languages=["eng"])
task.hf_subsets
# correct filtering to English datasets:
# ['en', 'de-en', 'es-en', 'pl-en', 'zh-en']
# However it should be:
# ['en']
# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
task.hf_subsets
# ['en']
# eq. to
task = mteb.get_task("STS22", hf_subsets=["en"])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task("STS22", languages=["eng"], exclusive_language_filter=True)
-
format
-
remove "en-ext" from AmazonCounterfactualClassification
-
fixed mteb(deu)
-
fix: simplify in a few areas
-
fix: Add gritlm
-
1.29.0
Automatically generated by python-semantic-release
-
fix: Added more annotations!
-
fix: Added C-MTEB (#1786)
Added C-MTEB
- 1.29.1
Automatically generated by python-semantic-release
-
docs: Add contact to MMTEB benchmarks (#1796)
-
Add myself to MMTEB benchmarks
-
lint
-
fix: loading pre 11 (#1798)
-
fix loading pre 11
-
add similarity
-
lint
-
run all task types
-
1.29.2
Automatically generated by python-semantic-release
-
fix: allow to load no revision available (#1801)
-
fix allow to load no revision available
-
lint
-
add require_model_meta to leaderboard
-
lint
-
1.29.3
Automatically generated by python-semantic-release
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]> (3b2d074
)
- fix: bm25s (#1827)
Co-authored-by: sam021313 <[email protected]> (96420a2
)
- fix: Added Chinese Stella models (#1824)
Added Chinese Stella models (74b495c
)
1.29.7
1.29.7 (2025-01-16)
Ci
-
ci: only return 1 model_name per file (#1818)
-
only return 1 model_name per file
-
fix args parse
-
revert test change (
d7a7791
)
Fix
- fix: add bge-m3
ModelMeta
(#1821)
add bge (4ac59bc
)
Unknown
-
Add model inf-retriever-v1 (#1744)
-
feat(models): add infly/inf-retriever-v1 model metadata- Add inf_models.py file with metadata for infly/inf-retriever-v1 model
- Update overview.py to include inf_models in model imports
-
Reformat code
-
Update inf-retriever-v1 ModelMeta
-
Fill more information for inf-retriever-v1
-
Add license information for inf-retriever-v1
Co-authored-by: Samuel Yang <[email protected]> (60c4980
)
1.29.6
1.29.5
1.29.4
1.29.4 (2025-01-15)
Fix
-
fix: Added
ModelMeta
for BGE, GTE Chinese and multilingual models (#1811) -
Added BGE Chinese and multilingual-gemma models
-
Added GTE multilingual and Chinese models
-
Fixed date format (
3f5ee82
) -
fix: Zero shot and aggregation on Leaderboard (#1810)
-
Made join_revision filter out no_revision_available when other revisions have been run on the task
-
Fixed zero-shot filtering
-
Fixed aggregation of task types
-
Ran linting (
0acc166
)
1.29.3
1.29.2
1.29.1
1.29.0
1.29.0 (2025-01-13)
Ci
-
ci: fix model loading test (#1775)
-
pass base branch into the make command as an arg
-
test a file that has custom wrapper
-
what about overview
-
just dont check overview
-
revert instance check
-
explicitly omit overview and init
-
remove test change
-
try on a lot of models
-
revert test model file
Co-authored-by: Isaac Chung <[email protected]> (9b117a8
)
Feature
-
feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (#1787)
-
feat: Update task filtering, fixing bug on MTEB
- Updated task filtering adding exclusive_language_filter and hf_subset
- fix bug in MTEB where cross-lingual splits were included
- added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)
The following code outlines the problems:
import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
# was eq. to:
task = mteb.get_task("STS22", languages=["eng"])
task.hf_subsets
# correct filtering to English datasets:
# ['en', 'de-en', 'es-en', 'pl-en', 'zh-en']
# However it should be:
# ['en']
# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
task.hf_subsets
# ['en']
# eq. to
task = mteb.get_task("STS22", hf_subsets=["en"])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task("STS22", languages=["eng"], exclusive_language_filter=True)
-
format
-
remove "en-ext" from AmazonCounterfactualClassification
-
fixed mteb(deu)
-
fix: simplify in a few areas (
4a70e5d
)