Skip to content

Releases: embeddings-benchmark/mteb

1.28.7

13 Jan 11:01
Compare
Choose a tag to compare

1.28.7 (2025-01-13)

Ci

  • ci: skip AfriSentiLID for now (#1785)

  • skip AfriSentiLID for now

  • skip relevant test case instead


Co-authored-by: Isaac Chung <[email protected]> (71dbd61)

Fix

  • fix: update max tokens for OpenAI (#1772)

update max tokens (0c5c3a5)

1.28.6

11 Jan 17:05
Compare
Choose a tag to compare

1.28.6 (2025-01-11)

Fix

  • fix: added annotations for training data (#1742)

  • fix: Added annotations for arctic embed models

  • added google and bge

  • added cohere

  • Added e5

  • added bge based model2vec

  • annotated oAI

  • format and update annotations (3f093c8)

1.28.5

11 Jan 16:22
Compare
Choose a tag to compare

1.28.5 (2025-01-11)

Fix

  • fix: Leaderboard: K instead of M (#1761)

Fixes #1752 (972463e)

Unknown

  • other: add script for leaderboard compare (#1758)

  • add script

  • remove changes

  • remove changes

  • add comment

  • lint

  • order like in benchmark object

  • round results (8bc80aa)

1.28.4

10 Jan 15:32
Compare
Choose a tag to compare

1.28.4 (2025-01-10)

Fix

  • fix: fixes implementation of similarity() (#1748)

  • fix(#1594): fixes implementation of similarity()

  • fix: add similarity to SentenceTransformerWrapper


Co-authored-by: sam021313 <[email protected]> (3fe9264)

1.28.3

10 Jan 14:24
Compare
Choose a tag to compare

1.28.3 (2025-01-10)

Fix

  • fix: Fixed definition of zero-shot in ModelMeta (#1747)

  • Corrected zero_shot definition to be based on task names, not dataset path (407e205)

1.28.2

10 Jan 14:06
Compare
Choose a tag to compare

1.28.2 (2025-01-10)

Fix

  • fix: Fixed task_type aggregation on leaderboard (#1746)

  • Fixed task_type aggregation in leaderboard

  • Fixed an error due to unneccesary indentation in get_score (76bb070)

1.28.1

10 Jan 13:24
Compare
Choose a tag to compare

1.28.1 (2025-01-10)

Fix

  • fix: Leaderboard Speedup (#1745)

  • Added get_scores_fast

  • Made leaderboard faster with smarter dependency graph and event management and caching

  • Changed print to logger.info (9eff8ca)

Test

  • test: Add script to test model loading below n_parameters threshold (#1698)

  • add model loading test for models below 2B params

  • add failure message to include model namne

  • use the real get_model_meta

  • use cache folder

  • teardown per function

  • fix directory removal

  • write to file

  • wip loading from before

  • wip

  • Rename model_loading_testing.py to model_loading.py

  • Delete tests/test_models/test_model_loading.py

  • checks for models below 2B

  • try not using cache folder

  • update script with scan_cache_dir and add args

  • add github CI: detect changed model files and run model loading test

  • install all model dependencies

  • dependecy installations and move file location

  • should trigger a model load test in CI

  • find correct commit for diff

  • explicity fetch base branch

  • add make command

  • try to run in python instead and add pytest

  • fix attribute error and add read mode

  • separate script calling

  • let pip install be cached and specify repo path

  • check ancestry

  • add cache and rebase

  • try to merge instead of rebase

  • try without merge base

  • check if file exists first

  • Apply suggestions from code review
    Co-authored-by: Kenneth Enevoldsen <[email protected]>

  • Update .github/workflows/model_loading.yml
    Co-authored-by: Kenneth Enevoldsen <[email protected]>

  • address review comments to run test once from CI and not pytest


Co-authored-by: Kenneth Enevoldsen <[email protected]> (8d033f3)

Unknown

  • Fixed result loading on leaderboard (#1739)

  • Only main_score gets loaded for leaderboard thereby avoiding OOM errors

  • Fixed plot failing because of missing embedding dimensions

  • Ran linting (752d2b8)

1.28.0

09 Jan 12:11
Compare
Choose a tag to compare

1.28.0 (2025-01-09)

Feature

  • feat: Add nomic modern bert (#1684)

  • add nomic modern bert

  • use SentenceTransformerWrapper

  • use SentenceTransformerWrapper

  • try nomic wrapper

  • update

  • use all prompts

  • pass prompts

  • use fp16

  • lint

  • change to version

  • remove commented code (95f143a)

Fix

  • fix: allow kwargs in init for RerankingWrapper (#1676)

  • allow kwargs in init

  • fix retrieval

  • convert corpus_in_pair to list (f5962c6)

1.27.0

08 Jan 21:46
Compare
Choose a tag to compare

1.27.0 (2025-01-08)

Feature

  • feat: reduce logging for load_results()
  • redacts missing subsets to avoid 100+ subsets printed
  • reduce to logging.info
  • removed splits that are commonly never evaluated on and thus also the errors for them being missing

The second part removed quite a few warnings (4930 to XX)

It also seems like the splits were accidentally included in some of the MMTEB benchmark.

This will remove those splits from those benchmarks (which are all in beta). We will have to recompute the tables for the paper though (we should do that anyway)

Other potential thing to consider:

  • Scifact is included in MTEB(Medical). I have removed the "train" split from it as I think that was a mistake. (checked other dataset in benchmark)

Here is a count of the current top errors:

{
    &#34;MassiveScenarioClassification: Missing splits {&#39;validation&#39;}&#34;: 238,  # included in e.g. mteb(fra)
    &#34;MassiveIntentClassification: Missing splits {&#39;validation&#39;}&#34;: 237, # included in e.g. mteb(fra)
    &#34;MassiveScenarioClassification: Missing subsets {&#39;af&#39;, &#39;da&#39;, ...} for split test&#34;: 230,
    &#34;AmazonReviewsClassification: Missing splits {&#39;validation&#39;}&#34;: 229, # included in e.g. mteb(deu)
    &#34;MassiveIntentClassification: Missing subsets {&#39;af&#39;, &#39;da&#39;, ...} for split test&#34;: 228,
    &#34;STS22: Missing subsets {&#39;fr-pl&#39;, &#39;de-en&#39;, ...} for split test&#34;: 223,
    &#34;AmazonReviewsClassification: Missing subsets {&#39;es&#39;, &#39;ja&#39;, ...} for split test&#34;: 196,
    &#34;MTOPDomainClassification: Missing splits {&#39;validation&#39;}&#34;: 195, # included in mteb(fra)
    &#34;MTOPIntentClassification: Missing splits {&#39;validation&#39;}&#34;: 194, # included in mteb(fra)
    &#34;AmazonCounterfactualClassification: Missing splits {&#39;validation&#39;}&#34;: 189, # included in mteb(deu)
    &#34;MTOPDomainClassification: Missing subsets {&#39;es&#39;, &#39;th&#39;, ...} for split test&#34;: 165,
    &#34;STS17: Missing subsets {&#39;en-ar&#39;, &#39;es-es&#39;, ...} for split test&#34;: 164,
    &#34;MTOPIntentClassification: Missing subsets {&#39;es&#39;, &#39;th&#39;, ...} for split test&#34;: 164,
    &#34;AmazonCounterfactualClassification: Missing subsets {&#39;de&#39;, &#39;ja&#39;, ...} for split test&#34;: 148,
}
``` ([`7e16fa2`](https://github.com/embeddings-benchmark/mteb/commit/7e16fa2565b2058e12303a1feedbd0d4dea96a41))

1.26.6

08 Jan 16:18
Compare
Choose a tag to compare

1.26.6 (2025-01-08)

Fix

  • fix: Added zero shot tag to benchmark (#1710)

  • Added method for determining whether a model is zero shot

  • Added .items() where intended

  • Added filtering functions for zero shot models

  • Added zero-shot filtering button and error message when table is empty.:

  • Ran linting

  • Fixed docstring linting error

  • is_zero_shot returns None when no training data is specified

  • Added zero-shot emoji column to leaderboard

  • Added explanation for zero shot column

  • Added soft and hard zero-shot buttons

  • Added training data annotations to 24 models from HuggingFace Hub (8702815)