Releases: embeddings-benchmark/mteb
1.28.7
1.28.6
1.28.5
1.28.4
1.28.4 (2025-01-10)
Fix
-
fix: fixes implementation of similarity() (#1748)
-
fix(#1594): fixes implementation of similarity()
-
fix: add similarity to SentenceTransformerWrapper
Co-authored-by: sam021313 <[email protected]> (3fe9264
)
1.28.3
1.28.2
1.28.1
1.28.1 (2025-01-10)
Fix
-
fix: Leaderboard Speedup (#1745)
-
Added get_scores_fast
-
Made leaderboard faster with smarter dependency graph and event management and caching
-
Changed print to logger.info (
9eff8ca
)
Test
-
test: Add script to test model loading below n_parameters threshold (#1698)
-
add model loading test for models below 2B params
-
add failure message to include model namne
-
use the real get_model_meta
-
use cache folder
-
teardown per function
-
fix directory removal
-
write to file
-
wip loading from before
-
wip
-
Rename model_loading_testing.py to model_loading.py
-
Delete tests/test_models/test_model_loading.py
-
checks for models below 2B
-
try not using cache folder
-
update script with scan_cache_dir and add args
-
add github CI: detect changed model files and run model loading test
-
install all model dependencies
-
dependecy installations and move file location
-
should trigger a model load test in CI
-
find correct commit for diff
-
explicity fetch base branch
-
add make command
-
try to run in python instead and add pytest
-
fix attribute error and add read mode
-
separate script calling
-
let pip install be cached and specify repo path
-
check ancestry
-
add cache and rebase
-
try to merge instead of rebase
-
try without merge base
-
check if file exists first
-
Apply suggestions from code review
Co-authored-by: Kenneth Enevoldsen <[email protected]> -
Update .github/workflows/model_loading.yml
Co-authored-by: Kenneth Enevoldsen <[email protected]> -
address review comments to run test once from CI and not pytest
Co-authored-by: Kenneth Enevoldsen <[email protected]> (8d033f3
)
Unknown
1.28.0
1.27.0
1.27.0 (2025-01-08)
Feature
- feat: reduce logging for load_results()
- redacts missing subsets to avoid 100+ subsets printed
- reduce to logging.info
- removed splits that are commonly never evaluated on and thus also the errors for them being missing
The second part removed quite a few warnings (4930 to XX)
It also seems like the splits were accidentally included in some of the MMTEB benchmark.
This will remove those splits from those benchmarks (which are all in beta). We will have to recompute the tables for the paper though (we should do that anyway)
Other potential thing to consider:
- Scifact is included in MTEB(Medical). I have removed the "train" split from it as I think that was a mistake. (checked other dataset in benchmark)
Here is a count of the current top errors:
{
"MassiveScenarioClassification: Missing splits {'validation'}": 238, # included in e.g. mteb(fra)
"MassiveIntentClassification: Missing splits {'validation'}": 237, # included in e.g. mteb(fra)
"MassiveScenarioClassification: Missing subsets {'af', 'da', ...} for split test": 230,
"AmazonReviewsClassification: Missing splits {'validation'}": 229, # included in e.g. mteb(deu)
"MassiveIntentClassification: Missing subsets {'af', 'da', ...} for split test": 228,
"STS22: Missing subsets {'fr-pl', 'de-en', ...} for split test": 223,
"AmazonReviewsClassification: Missing subsets {'es', 'ja', ...} for split test": 196,
"MTOPDomainClassification: Missing splits {'validation'}": 195, # included in mteb(fra)
"MTOPIntentClassification: Missing splits {'validation'}": 194, # included in mteb(fra)
"AmazonCounterfactualClassification: Missing splits {'validation'}": 189, # included in mteb(deu)
"MTOPDomainClassification: Missing subsets {'es', 'th', ...} for split test": 165,
"STS17: Missing subsets {'en-ar', 'es-es', ...} for split test": 164,
"MTOPIntentClassification: Missing subsets {'es', 'th', ...} for split test": 164,
"AmazonCounterfactualClassification: Missing subsets {'de', 'ja', ...} for split test": 148,
}
``` ([`7e16fa2`](https://github.com/embeddings-benchmark/mteb/commit/7e16fa2565b2058e12303a1feedbd0d4dea96a41))
1.26.6
1.26.6 (2025-01-08)
Fix
-
fix: Added zero shot tag to benchmark (#1710)
-
Added method for determining whether a model is zero shot
-
Added .items() where intended
-
Added filtering functions for zero shot models
-
Added zero-shot filtering button and error message when table is empty.:
-
Ran linting
-
Fixed docstring linting error
-
is_zero_shot returns None when no training data is specified
-
Added zero-shot emoji column to leaderboard
-
Added explanation for zero shot column
-
Added soft and hard zero-shot buttons
-
Added training data annotations to 24 models from HuggingFace Hub (
8702815
)