CHANGELOG

v1.3.2 (2024-03-29)

Documentation

docs: Update links in README.md (#296) (76056b5)

Fix

fix: Added tasks from SEB (#287)
Added tasks from SEB
docs: fix link
fix: ran linting
fix typing for 3.8
fixed annotation for v3.8 (39cff49)

v1.3.1 (2024-03-26)

Fix

fix: updated version in transition to semantic release ci (238ab82)

v1.3.0 (2024-03-26)

Breaking

feat: Updating version

BREAKING CHANGE: Bump version (caee2e9)

Ci

ci: disable changelog (b7d3cde)
ci: moved release to the correct folder (b4fa85a)
ci: renamed test job and workflow (#282)

ci: Added tests (6675bb8)

Documentation

docs: typos in readme (#268) (aa9234c)
docs: add dataset schemas (#255)
docs: update AbsTaskClassification.py document schema for classification task
update AbsTaskBitextMining.py
update BornholmskBitextMining.py
update AbsTaskClustering.py and BlurbsClusteringP2P.py
update 8 files
update 9 files
update AbsTaskReranking.py
update BlurbsClusteringP2P.py
update CMTEBPairClassification.py
update GerDaLIRRetrieval.py
update 7 files
update AbsTaskBitextMining.py
update AbsTaskClassification.py (c3ce1ac)
docs: Add development installation instructions (#246)
docs: Add development installation instructions
removed unused requirements file

I don't believe this is nec. with the setup.py specifying the same dependencies

docs: Updated make file with new dependencies
ci: Update ci to use make commands

This ensure that the user runs exactly what the CI expects

ci: Avoid specifying tests folder as it causes issuew ith tests
ci: removed unec. args for test ci
Added dev install (0048878)

Feature

feat: bump version again (294ab91)
feat: bump version again (acf68c7)

Fix

fix: dead link in readme (ecbb776)
fix: Added sizes to the metadata (#276)
restructing the readme
added mmteb
removed unec. method
Added docstring to metadata
Updated outdated examples
formatting documents
fix: Updated form to be parsed correctly
fix: Added sizes to the metadata

this allow for automatic metadata generations

Updated based on feedback
Apply suggestions from code review

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

updated based on feedback
Added suggestion from review
added correction based on review
reformatted empty fields to None

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (cd4a012)

Refactor

refactor: add metadata basemodel (#260)
refactor: rename description to metadata dict
refactor: add TaskMetadata and first example
update 9 files
update TaskMetadata.py
update TaskMetadata.py
update TaskMetadata.py
update LICENSE, TaskMetadata.py and requirements.dev.txt
update 151 files
update 150 files
update 43 files and delete 1 file
update 106 files
update 45 files
update 6 files
update 14 files
Added model results to repo and updated CLI to create consistent folder structure. (#254)
Added model results to repo and updated CLI to create consistent folder structure.
ci: updated ci to use make install
Added missing pytest dependencies
Update README.md

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Restructing the readme (#262)
restructing the readme
removed double specification of versions and moved all setup to pyproject.toml
correctly use flat-layout for the package
build(deps): update TaskMetadata.py and pyproject.toml
update 221 files
build(deps): update pyproject.toml
build(deps): update pyproject.toml
build(deps): update pyproject.toml

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (dd5d617)

Unknown

overwrite version (bc60c9d)
v1.3.0 (50b856c)
v1.3.0 (61c12d8)
Merge branch 'main' of https://github.com/embeddings-benchmark/mteb (7b0a766)
Ci-fix (#289)
added release pipeline
v1.3.0
ci: moved release to the correct folder (7f56c1a)
Merge branch 'main' of https://github.com/embeddings-benchmark/mteb (57f500f)
v1.3.0
added release pipeline
v1.3.0 (5e4d10e)
v1.3.0 (cdda2f2)
added release pipeline (69a440b)
tests: speed up tests (#283)

update Makefile and test_all_abstasks.py (2155bf6)

update TaskMetadata.py (#281) (acfd7d4)
Merge branch 'main' of https://github.com/embeddings-benchmark/mteb (c9d1a03)
Enable ruff ci (#279)
restructing the readme
added mmteb
removed unec. method
Added docstring to metadata
Updated outdated examples
formatting documents
fix: Updated form to be parsed correctly
fix: Added sizes to the metadata

this allow for automatic metadata generations

Updated based on feedback
Apply suggestions from code review

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

updated based on feedback
Added suggestion from review
added correction based on review
reformatted empty fields to None
CI: Enable linter

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (a16eb07)

Added MMTEB (#275)
restructing the readme
added mmteb
removed unec. method
Added docstring to metadata
Updated outdated examples
formatting documents
fix: Updated form to be parsed correctly
Updated based on feedback
Apply suggestions from code review

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

updated based on feedback
Added suggestion from review
added correction based on review

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (c0dc49a)

dev: add ruff as suggested extension (#274) (b08913f)
dev: add isort (#271)
dev: add isort
dev: add isort (845099d)
dev: run tests on pull request towards any branch (13f759a)
Merge branch 'main' of https://github.com/embeddings-benchmark/mteb (b42abe4)
replaced linter with ruff (#265)
restructing the readme
removed double specification of versions and moved all setup to pyproject.toml
correctly use flat-layout for the package
replaced linter with ruff
rerun tests
ci: Added in newer workflow

some of them are disables as they require other issues to be solved

Update Makefile

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (023e881)

Restructing the readme (#262)
restructing the readme
removed double specification of versions and moved all setup to pyproject.toml
correctly use flat-layout for the package (769157b)
restructing the readme (364be7f)
Added model results to repo and updated CLI to create consistent folder structure. (#254)
Added model results to repo and updated CLI to create consistent folder structure.
ci: updated ci to use make install
Added missing pytest dependencies
Update README.md

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (8a758bc)

dev: add workspace defaults in VSCode (#253)
dev: add black as default formatter in vscode
Update .vscode/settings.json

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> (30e5b9e)

Add Danish Discourse dataset (#247)
misc.
update ddisco.py
chore: delete ddisco.py, ddisco.test.tsv and ddisco.train.tsv
Update mteb/tasks/Classification/DdiscoCohesionClassification.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

Update mteb/tasks/Classification/DdiscoCohesionClassification.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

Update mteb/tasks/Classification/DdiscoCohesionClassification.py

Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>

Update mteb/tasks/Classification/DdiscoCohesionClassification.py

Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>

Update mteb/tasks/Classification/DdiscoCohesionClassification.py

Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> (d46d0f5)

Update structure of mteb/tasks to mteb/tasks/{type}/{language} (#245)
Fix structure of mteb/tasks Fixes #243
fix: Added missing init files (b1c78c1)
tests: do not run tests on collection (#249)

test: update test_all_abstasks.py (236614a)

Update README.md with correct DRESModel location (399edf4)
Fix typo (9610378)
Set dev version (716f59c)

v1.2.0 (2024-03-07)

Unknown

Release: 1.2.0 (9e9dca8)
Rmv superfluous file (d772fed)
Remove duplicate & outdated code (12bcb83)
Adapt scripts (36b9234)
Add example (273ff4a)
Simplify retrieval (#233)
Simplify retrieval
Simplify
Make call method
Add splits
Rmv outdated test
Fix name & \n
Add qrels
Add revisions

Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>

Add hf hub org
Add test
Add missing revision
Rename test

Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>

log dres compat

Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> (c9fccbc)

Fixed missing revision error on Norwegian Bitext Mining (#221)
Removed revision specification from Norwegian Bitext Mining task
Update to latest revision

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (b249c67)

Remove HAGRID from french benchmark (#235)
add Masakhane dataset config
add trigram lang code for dataset who use it
create french script eval
fix French word
add some documentation
add script to process and upload alloprof on HF
build script for HF
adding dataset processing for mteb
add script to process and upload alloprof on HF
build script for HF
adding dataset processing for mteb
refactor few thing
remove whitespaces
4 pair classification (#10)
add Opusparcus dataset
multilingual usage
use eval_split of config files
change eval_split according to data

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

add script to process and upload alloprof on HF
build script for HF
adding dataset processing for mteb
refactor few thing
remove whitespaces
Clustering with HAL S2S dataset (#11)

HAL S2S dataset creation and evaluation on clustering task.

adding BSARD dataset
add BSARD to benchmark
adding Hagrid dataset
DiaBLa and Flores Bitext Mining evaluation (#12)
Add DiaBLa dataset for bitext mining
Add DiaBLa dataset for bitext mining
deduplicate bitext task
add Flores
format files
add flores to evaluation script
remove prints
add revision

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

add script to process and upload alloprof on HF
build script for HF
adding dataset processing for mteb
refactor few thing
remove whitespaces
adding dataset processing for mteb
adding BSARD dataset
add BSARD to benchmark
adding Hagrid dataset
fix change on langmapping
reset alphabetical order
add revision handling
Clustering: Add AlloProf dataset (#17)

AlloProf dataset for clustering task

handling of revision
change split + add revision handling
add script to process and upload alloprof on HF
build script for HF
adding dataset processing for mteb
refactor few thing
remove whitespaces
adding dataset processing for mteb
adding BSARD dataset
add BSARD to benchmark
adding Hagrid dataset
add script to process and upload alloprof on HF
adding dataset processing for mteb
refactor few thing
reset alphabetical order
add revision handling
handling of revision
change split + add revision handling
use eval variable
alphabetic order
Add MLSUM dataset for clustering task (#21)
Use Masakhane dataset for clustering task (#23)
16 add datasets to readmemd (#18)
run task table
run task table
Add MLSUM dataset for clustering task (#21)
Use Masakhane dataset for clustering task (#23)
run task table
refresh readme
refresh readme
run task table
refresh readme

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> Co-authored-by: Marion Schaeffer <92590517+schmarion@users.noreply.github.com>

load only test split (#25)

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

Update mteb/tasks/BitextMining/DiaBLaBitextMining.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/tasks/Clustering/HALClusteringS2S.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

renaming masakhane (#28)

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

Syntec dataset addition (#26)
add scrpit to process & load to HF
add script to enable download of data from HF
add syntec dataset files to gitignore
add syntecretrieval
add syntec retrival
build dataloading script
remove datasets
correct typo

Co-authored-by: Sequeira Gabriel <gabriel.sequeira@outlook.fr>

30 add syntec reranking (#31)
change name to secify retrieval
add reranking tasks
create script to upload dataset fo reranking task
create reranking task
add reranking tasks
add model name in description
SummEval translated to french (#32)
7 sts (#33)
taike into account multilingual tasks
add stsbenchmark multilingual dataset
add STS tasks
taike into account multilingual tasks
add stsbenchmark multilingual dataset
add STS tasks
add coma
Adding sick fr dataset to sts tasks (#34)
Adding sick fr dataset to sts tasks
modifying dataset in load function to have the right column names
Fix alloprof dataset (#36)
change revision to use
remove duplicate data
change main metric because dataset is hard (#37)
Fix alloprof dataset (#40)
change revision to use
remove duplicate data
change revision
handle queries train test split
change dataset creation method
change revision
handle queries train test split
change dataset creation method
Fix DiaBLa by inheriting CrossLingual class (#42)
Fix DiaBLa by inheriting CrossLingual class
remove remaining print
Fix DiaBLa integration
Update mteb/tasks/BitextMining/FloresBitextMining.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update README.md

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update README.md

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/tasks/Classification/MasakhaNEWSClassification.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update README.md

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update README.md
Update mteb/tasks/BitextMining/FloresBitextMining.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/abstasks/AbsTaskPairClassification.py

Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>

Update README.md
Update scripts/data/syntec/create_data_reranking.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update scripts/data/alloprof/create_data_reranking.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update scripts/run_mteb_french.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update scripts/run_mteb_french.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/tasks/Retrieval/HagridRetrieval.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/tasks/Clustering/MLSUMClusteringP2P.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/tasks/Clustering/MLSUMClusteringS2S.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/tasks/Clustering/MasakhaNEWSClusteringP2P.py
Update mteb/tasks/Clustering/MasakhaNEWSClusteringS2S.py
Update mteb/tasks/STS/SickFrSTS.py
Inherit OpusparcusPC init from MultilingualTask
remove unnecessary init
Remove train split from evaluation on MasakhaNEWSClassification (#52)

remove train split from evaluation

put script on HF dataset repos (#56)
put script on HF dataset repos
remove scripts
49 fix dictionnary in syntecretrieval (#54)
add trust remote code arg
leave corpus as dict
remove trust remote code
add Tatoeba & BUCC BitextMining tasks (#57)

add bucc and tatoeba bitextmining tasks

46 add other languages to masakhaneweclusterings2s and p2p (#58)
add other language to clustering tasks
fix main score and S2S task
update run fr becnhmark script
Update run_mteb_french.py
Update AbsTaskClustering.py
remove train and validation splits
remove Hagrid (#60)

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> Co-authored-by: Marion Schaeffer <92590517+schmarion@users.noreply.github.com> Co-authored-by: mciancone@openstudio.fr <mciancone@openstudio.fr> Co-authored-by: Sequeira Gabriel <gabriel.sequeira@outlook.fr> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: wissam-sib <36303760+wissam-sib@users.noreply.github.com> Co-authored-by: Wissam Siblini <wissam.siblini92@gmail.com> (d01d053)

Restore TRECCOVID import (9f8e897)
Extend MTEB with French datasets (#218)
add Masakhane dataset config
add trigram lang code for dataset who use it
create french script eval
fix French word
add some documentation
add script to process and upload alloprof on HF
build script for HF
adding dataset processing for mteb
add script to process and upload alloprof on HF
build script for HF
adding dataset processing for mteb
refactor few thing
remove whitespaces
4 pair classification (#10)
add Opusparcus dataset
multilingual usage
use eval_split of config files
change eval_split according to data

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

add script to process and upload alloprof on HF
build script for HF
adding dataset processing for mteb
refactor few thing
remove whitespaces
Clustering with HAL S2S dataset (#11)

HAL S2S dataset creation and evaluation on clustering task.

adding BSARD dataset
add BSARD to benchmark
adding Hagrid dataset
DiaBLa and Flores Bitext Mining evaluation (#12)
Add DiaBLa dataset for bitext mining
Add DiaBLa dataset for bitext mining
deduplicate bitext task
add Flores
format files
add flores to evaluation script
remove prints
add revision

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

add script to process and upload alloprof on HF
build script for HF
adding dataset processing for mteb
refactor few thing
remove whitespaces
adding dataset processing for mteb
adding BSARD dataset
add BSARD to benchmark
adding Hagrid dataset
fix change on langmapping
reset alphabetical order
add revision handling
Clustering: Add AlloProf dataset (#17)

AlloProf dataset for clustering task

handling of revision
change split + add revision handling
add script to process and upload alloprof on HF
build script for HF
adding dataset processing for mteb
refactor few thing
remove whitespaces
adding dataset processing for mteb
adding BSARD dataset
add BSARD to benchmark
adding Hagrid dataset
add script to process and upload alloprof on HF
adding dataset processing for mteb
refactor few thing
reset alphabetical order
add revision handling
handling of revision
change split + add revision handling
use eval variable
alphabetic order
Add MLSUM dataset for clustering task (#21)
Use Masakhane dataset for clustering task (#23)
16 add datasets to readmemd (#18)
run task table
run task table
Add MLSUM dataset for clustering task (#21)
Use Masakhane dataset for clustering task (#23)
run task table
refresh readme
refresh readme
run task table
refresh readme

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> Co-authored-by: Marion Schaeffer <92590517+schmarion@users.noreply.github.com>

load only test split (#25)

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

Update mteb/tasks/BitextMining/DiaBLaBitextMining.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/tasks/Clustering/HALClusteringS2S.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

renaming masakhane (#28)

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

Syntec dataset addition (#26)
add scrpit to process & load to HF
add script to enable download of data from HF
add syntec dataset files to gitignore
add syntecretrieval
add syntec retrival
build dataloading script
remove datasets
correct typo

Co-authored-by: Sequeira Gabriel <gabriel.sequeira@outlook.fr>

30 add syntec reranking (#31)
change name to secify retrieval
add reranking tasks
create script to upload dataset fo reranking task
create reranking task
add reranking tasks
add model name in description
SummEval translated to french (#32)
7 sts (#33)
taike into account multilingual tasks
add stsbenchmark multilingual dataset
add STS tasks
taike into account multilingual tasks
add stsbenchmark multilingual dataset
add STS tasks
add coma
Adding sick fr dataset to sts tasks (#34)
Adding sick fr dataset to sts tasks
modifying dataset in load function to have the right column names
Fix alloprof dataset (#36)
change revision to use
remove duplicate data
change main metric because dataset is hard (#37)
Fix alloprof dataset (#40)
change revision to use
remove duplicate data
change revision
handle queries train test split
change dataset creation method
change revision
handle queries train test split
change dataset creation method
Fix DiaBLa by inheriting CrossLingual class (#42)
Fix DiaBLa by inheriting CrossLingual class
remove remaining print
Fix DiaBLa integration
Update mteb/tasks/BitextMining/FloresBitextMining.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update README.md

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update README.md

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/tasks/Classification/MasakhaNEWSClassification.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update README.md

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update README.md
Update mteb/tasks/BitextMining/FloresBitextMining.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/abstasks/AbsTaskPairClassification.py

Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>

Update README.md
Update scripts/data/syntec/create_data_reranking.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update scripts/data/alloprof/create_data_reranking.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update scripts/run_mteb_french.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update scripts/run_mteb_french.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/tasks/Retrieval/HagridRetrieval.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/tasks/Clustering/MLSUMClusteringP2P.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/tasks/Clustering/MLSUMClusteringS2S.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/tasks/Clustering/MasakhaNEWSClusteringP2P.py
Update mteb/tasks/Clustering/MasakhaNEWSClusteringS2S.py
Update mteb/tasks/STS/SickFrSTS.py
Inherit OpusparcusPC init from MultilingualTask
remove unnecessary init
Remove train split from evaluation on MasakhaNEWSClassification (#52)

remove train split from evaluation

put script on HF dataset repos (#56)
put script on HF dataset repos
remove scripts
49 fix dictionnary in syntecretrieval (#54)
add trust remote code arg
leave corpus as dict
remove trust remote code
add Tatoeba & BUCC BitextMining tasks (#57)

add bucc and tatoeba bitextmining tasks

46 add other languages to masakhaneweclusterings2s and p2p (#58)
add other language to clustering tasks
fix main score and S2S task
update run fr becnhmark script
Update run_mteb_french.py
Update AbsTaskClustering.py
remove train and validation splits

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> Co-authored-by: Marion Schaeffer <92590517+schmarion@users.noreply.github.com> Co-authored-by: mciancone@openstudio.fr <mciancone@openstudio.fr> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: mciancone <73994289+Sunalwing@users.noreply.github.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: wissam-sib <36303760+wissam-sib@users.noreply.github.com> Co-authored-by: Wissam Siblini <wissam.siblini92@gmail.com> (3d8b8ec)

dev (c16eddc)
Dev (08c7317)
Add tasks for Spanish Embedding Evaluation (#227)
feat: add xmarket es dataset
refactor: use multilingual dataset
fix: update revision id
refactor: add constant for language
feat: add two clustering datasets

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>

feat: import classes

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>

refactor: flores dataset

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>

feat: add miracl reranking task for spanish
feat: use hf repo with all reranking langs
feat: update revision hash
refactor: use description for language
feat: add stses task
fix: get scores from label column
refactor: add revision to data loading
Added spanish passage retrieval
feat: mintaka and xpqa retrieval tasks

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>

feat: import classes

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>

fix: typo in data loading
fix: id

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>

refactor: try out multilingual task

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>

refactor: multilingual task import

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>

refactor: cmon man

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>

refactor: go back to monolingual tasks

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>

refactor: remove unused import

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>

refactor: loading logic

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>

feat: add miracl as retrieval task
fix: nested corpus
refactor: get lang from description
Update mteb/tasks/Retrieval/MIRACLRetrieval.py

Co-authored-by: Michael Günther <michael.guenther@jina.ai>

feat: allow multlingual reranking tasks
feat: make miraclreranking multilingual
refactor: rename miraclretrieval

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

style: add missing eof empty line
feat: make xmarket retrieval task multilingual
refactor: rename xmarket
refactor: turn spanish tasks multilingual (#11)
refactor: make xpqa retrieval multilingual
fix: formatting of xpqa dataset
refactor: make mintaka into multilingual task
refactor: make miracl retrieval multilingual
feat: add revision ids for hf datasets
refactor: remove patool
Update mteb/tasks/Reranking/init.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update mteb/tasks/STS/init.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Signed-off-by: jupyterjazz <saba.sturua@jina.ai> Co-authored-by: guenthermi <guenthermi50@gmail.com> Co-authored-by: jupyterjazz <saba.sturua@jina.ai> Co-authored-by: Markus Krimmel <markus.krimmel@jina.ai> Co-authored-by: Michael Günther <michael.guenther@jina.ai> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (52d5c9f)

v1.1.2 (2024-02-16)

Feature

feat: update revision id of wikicitiesclustering task (fb90c02)

Fix

fix: remove debugging print statement (d292d93)
fix: pass parallel_retrieval kwarg to use DenseRetrievalParallelExactSearch (19b8f66)

Unknown

Release: 1.1.2 (def3c91)
Add task list (#228)
Add task list
Update mteb/init.py
Update README.md (10bf6f8)
Update BeIRPLTask.py (#225)
Update BeIRPLTask.py
Update BeIRPLTask.py (a8922c1)
Allow multiple languages (2cc222e)
Add Korean Text Search Tasks to MTEB (#210)
add Ko-miracl, Ko-StrategyQA, Ko-mrtydi tasks
Update mteb/abstasks/AbsTaskRetrieval.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update AbsTaskRetrieval.py
Update mteb/abstasks/AbsTaskRetrieval.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Update scripts/run_mteb_korean.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (dadf2da)

Add MultiLongDocRetrieval task to MTEB. (#224)
Update AbsTaskRetrieval.py.
Add Retrieval Task: MultiLongDocRetrieval
Update AbsTaskRetrieval.py and MLDR task
Update reference of MLDR (2f65179)
Fix name (2989f76)
only save top-k (#209)
Update AbsTaskRetrieval.py
Add json import; rename kwarg
Pass OF
Update mteb/abstasks/AbsTaskRetrieval.py
Update AbsTaskRetrieval.py
Update AbsTaskRetrieval.py
Update mteb/abstasks/AbsTaskRetrieval.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (f58888d)

Add tasks for German Embedding Evaluation (#214)
chore: solve merge conflict
fix: gerdalir dataset
fix: lang from en to de
chore: solve merge conflict
chore: add ir datasets to requirements
refactor: limit queries to 10k
refactor: update description of task with limit
revert style changes
feat: add german stsbenchmarksts task
feat: update revision id
refactor: update revision id after changes in scores
add XMarket dataset
add xmarket to init file
feat: add revision id
add paws x dataset
Add ir_datasets as dependency
add GermanDPR dataset
fix loading
Update mteb/tasks/Retrieval/GermanDPRRetrieval.py

Co-authored-by: Saba Sturua <45267439+jupyterjazz@users.noreply.github.com>

feat: add miracl reranking task for german
refactor: cleanup task
prevent duplicate pos docs
fix: use test split in MIRACL (#13)

Fixes mismatch between description and HuggingFace dataset

refactor: remove WikiCLIR
fix: double import; xmarket name
add German tasks to run_mteb_german script
fupdate revisions and style
update MIRACL to work with latest version
revert adding ir_datasets
support multilingual pair classification
remove print statement
Apply suggestions from code review

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

fix monolingual pair classification
remove lang for monolingual tasks

Co-authored-by: Isabelle Mohr <isabelle.mohr@jina.ai> Co-authored-by: Markus Krimmel <markus.krimmel@jina.ai> Co-authored-by: Saba Sturua <45267439+jupyterjazz@users.noreply.github.com> Co-authored-by: Markus Krimmel <montcyril@gmail.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (9aba9ee)

Simplify (1cd07db)
Refer to other works (8f28bcb)
Update mteb/tasks/Retrieval/GermanQuADRetrieval.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (09a9cb0)

clean up (51c40fd)
WIP: implement requested changes (58baad2)
remove code for writing JSONL dataset (d23eac3)
add docstring, remove local qrels (af7ee50)
fix query id in qrel dataset, ready to merge (33c9dd4)
WIP: use HF dataset instead of local JSONL (db3fea1)
rename BeIRDETask (e56cf86)
Update scripts/run_mteb_german.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (4b18a7e)

Update mteb/tasks/Retrieval/GermanRetrieval.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (3fef61a)

add reference to GermanQuAD (ae268e0)
fix results folder path (dc7fc01)
copy from local (9c0880d)
Update mteb/abstasks/AbsTaskRetrieval.py (be1fcc1)
Pass OF (b0e6316)
Add json import; rename kwarg (d39c21c)
Update AbsTaskRetrieval.py (4eb8e02)
Added Norwegian Bokmål-Nynorsk bitext mining task (c3fb742)
Add STS revisions (38277ae)
Add RTR revisions (8da9487)
Add RRK revisions (2011cd8)
Add PCLF revisions (9b6f4b9)
Add CLST revisions (da73236)
Add CLF revisions (fd91a9c)
Update Revision (6b0fae5)
Fix SweFAQ linkage (2341c48)
Fix SummEval linkage (7252322)
Fix Dalaj linkage (fb9ccd8)
Fix medrxiv mislinkage (620defc)
Fix stripping (02e84b2)
add datasets for long document evaluation

Co-authored-by: Isabelle Mohr <retrospect@protonmail.com> (88beb46)

Do not enforce rich import (aa11fe7)
fix RerankingEvaluator's compute_metrics_individual (fd7bfac)
Fix SummEval import (859d38e)
Increment version (4d75ddf)

v1.1.1 (2023-09-20)

Fix

fix: msmarco-v2 uses dev.tsv, not dev1.tsv (6908d21)
fix: add missing task-langs attribute (#152) (bc22909)

Unknown

Release: 1.1.1 (d3aaf4f)
Merge branch 'main' into fixconversion (d292258)
Fix eval_lang (7836148)
Simplify code snippets (d434f52)
Simplify wording (3adb0b5)
Clarify multi-gpu usage (5a2da23)
Fix splits (93f6f85)
Improve Cust Model explanation (52c1fd8)
Add bs to Clustering test (4df0d2e)
Rely on auto-conversion to tensor in score function (d8512f7)
Rely on standard encode kwargs only (4c1660e)
Improve Cust Model explanation (23d758f)
Add bs to Clustering test (6e0c0d2)
Rely on auto-conversion to tensor in score function (7ec4c57)
Rely on standard encode kwargs only (2fad0f9)
Update README.md (d9aa70f)
Update README.md (2211f83)
Simplify assertion (f7fcbc1)
Default to false (d64f6c7)
Add multi gpu eval to readme (#140)

update readme (1b1c9d3)

Support Multi-node Evaluation (#132)
styling
USE_HF_DATASETS
Support DRPES
we use beir.datasets.data_loader_hf in case of non dist
distributed fixes
update run command
cleanup
.
sugg
ruff (0dd82a9)
Add Chinese tasks (C-MTEB) (#134)
add C_MTEB
add C_MTEB
rename MMarcoReranking
rename MMarcoReranking
Update mteb/tasks/Retrieval/CMTEBRetrieval.py
Update README.md
Allow custom encode functions

Co-authored-by: shitao <stxiao@bupt.edu.cn> Co-authored-by: Nouamane Tazi <nouamane98@gmail.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (071974a)

Add Polish tasks (PL-MTEB) (#137)
Add Polish tasks (PL-MTEB)
Add Polish datasets to README
Add newline

Co-authored-by: rposwiata <rposwiata@opi.org.pl> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (2779344)

Add BEIR-PL datasets to MTEB (#121)
Add BIER-PL benchmark
Update README with BEIR-PL datasets
Update names
Add tasks to init to be visible during evaluation

Co-authored-by: Konrad Wojtasik <konrad.wojtasik@pwr.edu.pl> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (5972c02)

Replaced prints with logging (#133)
Make sure that main score is added to bitext mining tasks
Added scandinavian languages: da, no, sv
merge upstream main
fix: Replaced prints with logging statements
chore: removed accidental commits (d7ca378)
add logging (6412a6a)
Merge pull request #131 from embeddings-benchmark/nouamane/quick-fixes

Code cleanup (4fb97d0)

. (3ebb039)
add eval_splits arg (c407c4b)
quick fixes (6c5a3fa)
clean MTEB tasks (b276f1d)
clean args (9365755)
styling (dd02b48)
black (652d07c)
Set dev version (bf98c2c)

v1.1.0 (2023-07-31)

Unknown

Release: 1.1.0 (80d0344)
Bump version ID and update PyPI (#128)

Bump version ID and update PyPI after adding additional tasks. (4a4b54b)

Fix typo (33a3140)
Sort imports (ab2eef8)
Sort imports (3432374)
Raise error first (0b1bfd2)
Added support for Scandinavian Languages (#124)
Make sure that main score is added to bitext mining tasks
Added scandinavian languages: da, no, sv
Updated readme with scandinavian tasks
Changes n samples for the nordic lang CLF
Added scandinavian models to init
Added error logs to gitignore
fix import error
fix dataset columns
rename dataset columns
remove swefaq
fix: Added functionality to raise error
fix: Updated names
fix: Removed no as a language
Added missing data transformation
Fix spelling error (acb0f59)
Install beir (c50b8ab)
Update README.md (29ffedf)
ruff (6a58b5d)
Update README.md (5825536)
fix revision hash for TenKGnadClusteringP2P dataset

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (eb622f8)

change dataset order for BlurbsClustering in README

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (f6e49ba)

change dataset order for TenKGnadClustering in README

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (2a2c47f)

fix descriptions for German clustering datasets (30a966c)
add German clustering tasks to README (62457e3)
update reference & category for TenKGnad datasets (2174a47)
add German clustering tasks (ab469be)
Allow abs path (b56528c)
Add @property annotation to description method of AbsTask (98b0443)
fix typo (37a986b)
fix extend lang pairs (865dffc)
Fix clustering eval, black, isort (bc43665)
Add 'auto' to sklearn clustering, add test, fix warning (15ce352)
Update MSMARCORetrieval.py (d913f56)
Revert to old split (1f3ff6e)
Add wheel instruction (62fad9b)
Dev version (d988e48)

v1.0.2 (2023-03-28)

Unknown

Release: 1.0.2 (e189bae)
Add comment

Co-authored-by: Nouamane Tazi <nouamane98@gmail.com> (3e72ee8)

Fix naming (33f2db9)
Cleaner logging & tqdm usage (542d871)
Add kwargs (e0b801d)
Produce embeddings in one go (e88bcf2)
Fix naming (6c62f18)
Make inputs always List[str] & call in one (bdeeedf)
Fix SummEval description (0c2b1be)
fix SemmEval description

Unless I'm missing something, I think the SemmEval description is incorrect---the dataset consists of summaries of news articles, not biomedical abstracts. (1ccc068)

Clarify script for running all of MTEB English (9f72434)
Update run_mteb_english.py (6ff57d3)
Update run_mteb_english.py (7803eea)
Point to English benchmarking script (57f3371)
Eexample script for benchmarking all of MTEB English (77e6b22)
Clarify MSMARCO split (bbeada8)
Allow re-merging (b0ce501)
Set dataset name; Sort imports (2a5a661)
Standardize CQA merging script (5d5a2fb)
Update merge_cqadupstack.py (b0304c1)
Update README.md (8c60c22)
Update README.md (6255449)
Remove validation split (875a98e)
Remove validation set (b3f9585)
Update ClassificationEvaluator.py (93b89b6)
Set dev version (8a0d6b1)

v1.0.1 (2022-11-29)

Unknown

Release: 1.0.1 (b9f423b)
Delete mteb_diagram.png (76dc363)
Deactivate beir (b263157)
Update BeIRTask.py (37b7b79)
Remove validation (6922840)
Fix typo (7247233)
Add files via upload (9d2bb67)
Increment version & use abslink (a792a65)

v1.0.0 (2022-10-17)

Unknown

Release: 1.0.0 (9c544a4)
Add paper (b73457a)
Fix formatting (c523d16)
print -> logging (4f3a559)
Do not ignore data scripts (891b455)
Reorganize scripts (e157bb0)
Add release instructions & dev suffix to version (164b9ae)

v0.9.1 (2022-10-13)

Unknown

Release: 0.9.1 (5c438cc)
Merge pull request #80 from embeddings-benchmark/Muennighoff-patch-5

Update STS22CrosslingualSTS.py (1459309)

Update installation (f96ee73)
Update SummEvalSummrization.py (d8f232d)
Update AmazonPolarityClassification.py (114b0e3)
Update STS22CrosslingualSTS.py (c8df727)
Temporarily change README installation instruction (e53e77c)
Fix res keyword (769ac67)
Update example to be visible for non-registered users (d4f75fc)
Merge pull request #79 from Muennighoff/feature/leaderboardexp

Add leaderboard instructions (4d2683a)

Move meta script (7a8398f)
dataset_version -> dataset_revision & logging (fe34f84)
Add leaderboard instructions (f325aca)
Merge pull request #78 from embeddings-benchmark/feature/add-mteb-ds-name

Add ds name to res dict (53b763a)

Update MTEB.py (ae86e2f)
Merge pull request #73 from Muennighoff/fix/cqadupstackbeir11

Fallback to old dataloader for cqadupstack (7791b41)

Merge pull request #77 from Muennighoff/fix/bcpc

Update init imports (865bf47)

Update init imports (39b7712)
Merge pull request #76 from Muennighoff/fix/bcpc

BC -> PC (82d3228)

Merge branch 'main' into fix/bcpc (f18c6df)
BC -> PC (7a430c2)
Merge pull request #75 from Muennighoff/feature/leaderboard

Add LB link (36dbd14)

Merge pull request #72 from Muennighoff/fix/revisions

Fix/revisions (4a8d3db)

Merge pull request #74 from Muennighoff/fix/mteblogo

Update logo files (d939de6)

Add LB link (6aeb7ed)
Update logo files (5bfb65a)
Fallback to old dataloader for cqadupstack (262930e)
Add revision (488f1f7)
Add revisions 2/2 (c8ba2b8)
Add revisions 1/2 (c75a503)
Merge pull request #69 from Muennighoff/feature/custombeirmodel

Feature/custombeirmodel (da9ae9a)

BeIRModel -> DRES (ff554bb)
Do not wrap 2x (255c416)
Adapt naming (3c8f672)
Add explanation of BeIRModel (3edad09)
Merge pull request #68 from Muennighoff/feature/beirmrr

Add MRR (7a0993d)

Allow custom BeIR model (cd5098b)
Add MRR (6dbb97c)
Merge pull request #67 from Muennighoff/fix/s2p

Fix categories (03ed576)

Fix categories (08088d7)
Update RedditClusteringP2P.py (77a1606)
Merge pull request #62 from Muennighoff/feature/hublinks

Feature/hublinks (4f04719)

Fix hub mistakes (02f9e6c)
Merge branch 'feature/hublinks' of https://github.com/Muennighoff/mteb into feature/hublinks (c98b9a6)
Add dataset stats (bbf2a82)
Add desc (46078aa)
Add desc (9ca92b0)
Update MSMARCOv2Retrieval.py (f43cd1a)
Merge pull request #63 from embeddings-benchmark/Muennighoff-patch-4

Add desc (f93abff)

Add desc (c972cc9)
Merge pull request #61 from embeddings-benchmark/fix/nolangs

Fix no langs (c15e1a7)

Merge branch 'main' into feature/hublinks (c3990d6)
Simplify (936eee2)
Add Hub links & descriptions (b8182bb)
Update MTEB.py (0be4a06)
Merge pull request #57 from embeddings-benchmark/Muennighoff-patch-2

Update README.md (1ebca84)

Merge pull request #59 from embeddings-benchmark/Muennighoff-patch-3

Update README.md (3f53c85)

Update README.md (8097f31)
Update README.md (5b260a4)
Merge pull request #56 from Muennighoff/feature/readmelinks

Add README Links & Images (f473dbd)

Center title (1341db7)
Center title (8b80471)
Beautify (1ab8764)
Merge pull request #49 from Muennighoff/fix/cqadupstack

Fix CQADupstack (3a4dd84)

Merge pull request #50 from Muennighoff/fix/redditp2p

New RedditP2P Script (7bc547e)

Merge pull request #52 from Muennighoff/fix/bucc

Default to 1-indexed gold (9aff7f2)

Merge pull request #54 from embeddings-benchmark/Muennighoff-patch-1

Update MSMARCORetrieval.py (3951c41)

Update MSMARCORetrieval.py (6922be0)
Default to 1-indexed gold (f29e1fb)
New RedditP2P Script (f73b179)
Fix split (e3ea40b)
Add CQADupStack subsets (a32c00b)
Fix CQADupstack (a26229f)
Merge pull request #46 from Muennighoff/fix/scidocs

Fix/scidocs (ea10703)

Update README name (afddfd3)
Merge pull request #45 from Muennighoff/feature/cachetestembs

Feature/cachetestembs (475420a)

Merge pull request #44 from Muennighoff/fix/silentskip

Fix/silentskip (f7d6fd1)

Merge pull request #43 from Muennighoff/main

Add flag to overwrite results (ece590f)

Merge pull request #33 from Muennighoff/fix/summeval

Fix SummEval NaN scores (48586e2)

Merge branch 'main' into main (e986cd1)
Merge pull request #42 from Muennighoff/feature/versioning

Feature/versioning (1aeaede)

Update mteb/evaluation/MTEB.py (23a473f)
Rename SciDocs (edc2917)
Return test cache in all clf evaluators (309a867)
Cache test embedding / exp for all clf evals (7dd867f)
Add testcache (08cb352)
Split into two lines (f756399)
Sort tasks (03658fa)
Log known tasks (86f9cd6)
Log tasks not found (9ab0a7a)
Add flag to overwrite (529541d)
Version mteb & ds (78b90e9)
Formatting (67f6070)
Add versioning (fa852de)
Merge pull request #41 from Muennighoff/fix/sts22 (064e47c)
Rmv superfluous imports (7e8ee18)
Make revision optional (90afba5)
Remove space (e0d22bc)
Modify script to invert scores (9b9f43a)
Add revision to CL (5f68fda)
Add revision kwarg (3448d1e)
Merge pull request #26 from AmrMKayid/return-results (8f3242c)
Merge pull request #38 from Muennighoff/fix/seeds (720c597)
Update docs (dd4a1f2)
Merge pull request #37 from embeddings-benchmark/mindref

Fix Mind Reference (1834041)

Seed cuda (d33d748)
Merge pull request #35 from embeddings-benchmark/bootstrap-logs (3ff35c5)
Update mteb/abstasks/AbsTaskClassification.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> (9255249)

Remove superfluous import (124bebe)
Remove superfluous comments (bf5f912)
Add seed to task (acf8b1c)
Add missing super calls (b32195e)
Set evaluation seeds (e69d40b)
Set seeds (ef2985b)
Fix Mind Reference

Two other notes:

The renaming can create confusion as there exists a test set just that I assume we don't have the labels
MIND uses AUC & MRR & NDCG scores, not MAP, see https://msnews.github.io/ (7ce4bb1)

Update mteb/evaluation/evaluators/SummarizationEvaluator.py

Co-authored-by: Nouamane Tazi <nouamane98@gmail.com> (f667749)

Merge pull request #36 from embeddings-benchmark/mindsmall-test (6fc710b)
rename validationsplit to test (9c4d5c6)
styling (c66610e)
add logs for classification bootstrap experiments (e4000e1)
Merge pull request #32 from Muennighoff/fixsplits (39d0926)
Add consistent brackets (2cdd283)
Remove debug leftovers (c674d0a)
Remove superfluous imports (68f7307)
Skip samples with no variance (d39be65)
Drop nans (20c22a9)
Fix BEIR splits (752d49f)
Fix splits (07bea18)
Merge branch 'main' into return-results (314e5d7)
Merge pull request #30 from embeddings-benchmark:selected_tasks

fix printing selected tasks for evaluation (f1cab40)

fix printing selected tasks for evaluation (ba0dd76)
Merge pull request #29 from cycycc/fix-sickr-hf-hub-name (cb87c7a)
fix sick-r huggingface hub name (2ea195a)
Update mteb/evaluation/MTEB.py

Co-authored-by: holidaydrien <adrien.morisot@gmail.com> (a4d952b)

Update mteb/evaluation/MTEB.py

Co-authored-by: holidaydrien <adrien.morisot@gmail.com> (c4acb76)

Returning Evaluation results (3d60490)
Merge pull request #18 from Muennighoff/evalfix (4dabbaf)
Merge pull request #19 from Muennighoff/patch-2 (9e56ad3)
Merge pull request #20 from Muennighoff/updatemainscores (a0fbd83)
Update to ndcg_at_10 (8d010d0)
Update main scores (c0e773a)
Update README.md (8b495b6)
Fix task splits (1755356)
Merge pull request #15 from Muennighoff/mainscorefix (4b5fe2b)
Fix monolingual mainscore (61647df)
Fix main score warning multilingual (831a218)
Merge pull request #14 from Muennighoff/patch-1 (6055ecc)
Fix task language example (115c280)
styling (2ff07d2)
update example (b581d00)
we can now select all tasks of a specific language (b36e58c)
update test (53d123e)
keep only langs defined in task's description when loading (efa189f)
better prints for multilingual and crosslingual evaluation (5b86950)
styling (8fd8fb0)
move scripts to respective folders (028ed3e)
Update gitignore (a3cee03)
update setup.py (89aaa43)
update setup (2645323)
update setup.cfg (bc5ec1d)
Create first pip version (210d012)
make default evaluation for classification 10 experiments each using 8 samples per label (b062405)
use seed from init arg (f58f8da)
styling (4d1bd09)
add error message when trying to load beir (a3d58f3)
add argument to specify error logs path (d6cef16)
make beir an optional package (5bcee12)
quick modifications (d774ce6)
add example (21fc624)
make beir optional dependency (fdd922a)
Smaller fixes in Classification task (c6eda26)
update available tasks (0923e50)
update available tasks (e192823)
add evaluation time to final scores (9a1ca7d)
quick fix loading beir task (8e46cc8)
add available tasks (b7a1987)
Merge pull request #11 from embeddings-benchmark/summarization (bdb2691)
add more scores to summarization evaluator (12ae05f)
add SummEval task (3ba3e65)
add Summarization abstract task (f2b0e53)
add specifying language for task example (cdf1f18)
fix bitext mining evaluation (073a254)
update README (3b30e9b)
update README (529ec6b)
add --available_tasks flag to CLI (de97d9a)
styling (324b94c)
fix missing params eval_splits in load_data (ecb9d12)
CLI quick fixes (693bffa)
Merge branch 'main' of https://github.com/embeddings-benchmark/mteb-draft into main (bba225d)
quick fixes (2c01099)
styling (75d0449)
fix eval_splits loading using beir (26ec6b9)
capture errors instead of failing (c6aafa4)
quick fixes (8a7e3ec)
update BeIRModel (e8b5ff9)
load data and free it after each task evaluation (aa467f2)
update reqs (6005c10)
fixing beir imports (5d74d42)
Merge pull request #10 from embeddings-benchmark/optimisation (2b6caf2)
add multiproc test (fe8b963)
update BitextMining main scores (3b0f912)
support distributed evaluation for IR 🥳 (5e91971)
remove "train" from eval_splits (6da5ed1)
gather all nodes outputs in CPU after distributed computation (5eb3661)
support DRPES for Parallel IR evaluation (36962e9)
quick fix (6e0e6bd)
set logistic regression default max_iter to 200 (8963b83)
add evaluators logs 📜 (e9d326f)
make style (ab8f13e)
add Makefile and better styling tools ✨ (156e828)
dataloading moved from init to run (c2b7901)
Merge pull request #8 from embeddings-benchmark/beir-integration

Beir integration (f7f2426)

Merge branch 'main' into beir-integration (af12b49)
Merge pull request #9 from embeddings-benchmark/display

Display (11e5758)

fixes (8902f59)
fixes (a394cc2)
fixes+black (b0527a8)
beautiful task display (0ff2db2)
rich library (27cd4cb)
datasets (895c23d)
fever (0724070)
quora (43b93e5)
dbpedia (50d6700)
climatefever (e506637)
cqadupstack (217009f)
arguana (019b2b7)
beir retrieval (e52171b)
only save if output_folder argument is specified (2e1eb24)
Update python-package.yml (6c32b6b)
all tests are passing now ✅ (6c41b75)
Create python-package.yml (3cce88f)
Merge pull request #6 from embeddings-benchmark/testing (06bd1df)
Merge branch 'main' into testing (5226907)
normalize STS scores (6f98396)
normalize score names (6db134f)
format @k scores (dcb77a0)
rename CrossLingual to Crosslingual (3af3b4f)
remove train split from evaluation splits (6317bb6)
bug fix (ba8c906)
calculate AP only in binary classification (bc293ca)
add kwargs and batch_size to evaluate funcs (7926d3c)
update main scores for some tasks (099a32b)
add limit argument to limit evaluation data (92e5d09)
add test for PairClassificationEvaluator (ecffd35)
use evaluators.PairClassificationEvaluator instead of sent-formers BinaryClassificationEvaluator (9ffdf2b)
reformatting (ca25e17)
add test for RerankingEvaluator (9646892)
reformat RerankingEvaluator (3ce99cf)
more docs (8d07d59)
tests folder (9cd9dc2)
add test_RetrievalEvaluator (5236588)
more docs (06ff3d1)
add AP score to ClassificationEvaluator (c950ce8)
add nDCG score to RerankingEvaluator (e4170c8)
Merge pull request #5 from embeddings-benchmark:update-reranking

Support multiple queries in Reranking tasks (cf51493)

quick fix (0d133e1)
use max cross similarity in case of multiple queries (3f80a70)
support multiple queries in Reranking tasks (47f871f)
bug fixes (a3dc4f6)
rename binary classification to pair classification (63374fe)
rename available_splits to eval_splits (04b9f55)
rename available_langs to eval_langs (99ad04c)
minor fixes (c2307ef)
Merge pull request #4 from embeddings-benchmark/packaging (297560d)
quick fix bug (fc8ea9f)
report stderr in AbsTaskClassification in case of bootstrapping (d3723a5)
add STS22CrosslingualSTS (87f92e5)
add MindSmallReranking (940642a)
precision recall f1 bitext evaluator (4f4a9e2)
korean to sts17 (4767300)
quick fix RetrievalEvaluator (afb574a)
update README (db6edde)
update example (d363a49)
remove useless import (4a8966f)
fix cmd.py arguments (2b17b4a)
add kwargs where needed (30f0efd)
add cli script (5a77900)
adopt pbr packaging (c2fc3c1)
quick fixes (f5d3287)
rename kNNClassification to Classification (44ceb4a)
add bootstrap parameters to AbsTaskKNNClassification (8ecf9d4)
add EmotionClassification (e03db39)
add TweetSentimentExtractionClassification (5c7ef5c)
add ToxicConversationsClassification (94bfb4f)
add AmazonCounterfactualClassification (4052ae1)
add ImdbClassification task (eb70842)
add AmazonPolarityClassification dataset (75684ed)
hack fix bug loading tasks twice (23cc372)
add AmazonReviewsClassification (5f4731c)
add create data script for amazon reviews multi (761b70e)
make shuffling reproducible in logReg-10-splits-5-intents (52e4743)
add logReg-10-splits-5-intents for kNNClassificationEvaluator (72c67e0)
quick fix batch size (a88d8bf)
quick fixes (d4e5549)
add batch size to kNNClassificationEvaluator (c5127d8)
Merge pull request #3 from embeddings-benchmark/cross-lingual

Cross lingual (c844875)

black (a6ce618)
bitext mining evaluator (db7a934)
bucc (96848c1)
tatoeba (4783ace)
bitext mining (e0ec3a5)
bitext mining (50b2f48)
add MTOP classification tasks (1ecec57)
crosslingual tasks (582aa15)
STS17 benchmark (0c38bf0)
add methods (49afe21)
formatting (cebaf56)
quick fix (2284d17)
Merge pull request #2 from embeddings-benchmark/knn-classification (6a5faec)
add MultilingualTask (5614a03)
fix loading for multilingual datasets (8658f68)
skip task if results alrdy exist (24f83c1)
add banking77 and massive scenario datasets (12d4d40)
add logRegClassificationEvaluator (804e3b0)
add kNNClassificationEvaluatorPytorch (31bf4d1)
cosine and euclidean distances in kNNClassificationEvaluator (4720480)
add requirements dev file (a446d3f)
update results json file format to account for multi langs (1fe472a)
load_dataset directly inside AbsTask (4475fee)
add default language as "en" for all tasks (faee9db)
WIP add kNN Classification and MassiveIntentClassification task (885c06d)
tasks can be provided as class now in task_list (3bcb767)
add bs param in clusteringevaluator (b4c83e0)
quick docs fixes (1a48f29)
fix line length (faa978f)
linting (49a4138)
add reqs (006756c)
redditp2p + sep2p (2ec9c44)
clustering tasks (b8d37a0)
scripts (2760969)
first commit (7fbd064)
loading scripts (24a4310)
Update README.md (c03618c)
init file (fd182b6)
Update README.md (97c6a99)
retrieval evaluator (39db013)
removed results folder (751e1fd)
reranking evaluator (b62a0f5)
added custom evaluators (1bf7c94)
STS datasets (9093bc1)
gitignore (8d309e4)
added STS (3a2f4b9)
reranking (0cf6e1a)
binary classification (3a15b96)
added verbosity level (1ee8f3d)
added file logging (36f7cf3)
added available tasks/categories/selected list (5cc63a5)
finegrained task selection (7c0087b)
added retrieval (22415e9)
fixed seed (50ada77)
typos (16dc4a9)
added clustering tasks (1106c15)
seeded benchmarks (518bc82)
evaluation schema (bdb79d0)
basic tasks schema (bacb9d0)
proof of concept (6886d1b)
Create README.md (26df27b)
Initial commit (7841bca)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

CHANGELOG

v1.3.2 (2024-03-29)

Documentation

Fix

v1.3.1 (2024-03-26)

Fix

v1.3.0 (2024-03-26)

Breaking

Ci

Documentation

Feature

Fix

Refactor

Unknown

v1.2.0 (2024-03-07)

Unknown

v1.1.2 (2024-02-16)

Feature

Fix

Unknown

v1.1.1 (2023-09-20)

Fix

Unknown

v1.1.0 (2023-07-31)

Unknown

v1.0.2 (2023-03-28)

Unknown

v1.0.1 (2022-11-29)

Unknown

v1.0.0 (2022-10-17)

Unknown

v0.9.1 (2022-10-13)

Unknown

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

CHANGELOG

v1.3.2 (2024-03-29)

Documentation

Fix

v1.3.1 (2024-03-26)

Fix

v1.3.0 (2024-03-26)

Breaking

Ci

Documentation

Feature

Fix

Refactor

Unknown

v1.2.0 (2024-03-07)

Unknown

v1.1.2 (2024-02-16)

Feature

Fix

Unknown

v1.1.1 (2023-09-20)

Fix

Unknown

v1.1.0 (2023-07-31)

Unknown

v1.0.2 (2023-03-28)

Unknown

v1.0.1 (2022-11-29)

Unknown

v1.0.0 (2022-10-17)

Unknown

v0.9.1 (2022-10-13)

Unknown