Skip to content

Latest commit

 

History

History
2905 lines (1465 loc) · 113 KB

CHANGELOG.md

File metadata and controls

2905 lines (1465 loc) · 113 KB

CHANGELOG

v1.3.2 (2024-03-29)

Documentation

  • docs: Update links in README.md (#296) (76056b5)

Fix

  • fix: Added tasks from SEB (#287)

  • Added tasks from SEB

  • docs: fix link

  • fix: ran linting

  • fix typing for 3.8

  • fixed annotation for v3.8 (39cff49)

v1.3.1 (2024-03-26)

Fix

  • fix: updated version in transition to semantic release ci (238ab82)

v1.3.0 (2024-03-26)

Breaking

  • feat: Updating version

BREAKING CHANGE: Bump version (caee2e9)

Ci

  • ci: disable changelog (b7d3cde)

  • ci: moved release to the correct folder (b4fa85a)

  • ci: renamed test job and workflow (#282)

ci: Added tests (6675bb8)

Documentation

  • docs: typos in readme (#268) (aa9234c)

  • docs: add dataset schemas (#255)

  • docs: update AbsTaskClassification.py document schema for classification task

  • update AbsTaskBitextMining.py

  • update BornholmskBitextMining.py

  • update AbsTaskClustering.py and BlurbsClusteringP2P.py

  • update 8 files

  • update 9 files

  • update AbsTaskReranking.py

  • update BlurbsClusteringP2P.py

  • update CMTEBPairClassification.py

  • update GerDaLIRRetrieval.py

  • update 7 files

  • update AbsTaskBitextMining.py

  • update AbsTaskClassification.py (c3ce1ac)

  • docs: Add development installation instructions (#246)

  • docs: Add development installation instructions

  • removed unused requirements file

I don't believe this is nec. with the setup.py specifying the same dependencies

  • docs: Updated make file with new dependencies

  • ci: Update ci to use make commands

This ensure that the user runs exactly what the CI expects

  • ci: Avoid specifying tests folder as it causes issuew ith tests

  • ci: removed unec. args for test ci

  • Added dev install (0048878)

Feature

  • feat: bump version again (294ab91)

  • feat: bump version again (acf68c7)

Fix

  • fix: dead link in readme (ecbb776)

  • fix: Added sizes to the metadata (#276)

  • restructing the readme

  • added mmteb

  • removed unec. method

  • Added docstring to metadata

  • Updated outdated examples

  • formatting documents

  • fix: Updated form to be parsed correctly

  • fix: Added sizes to the metadata

this allow for automatic metadata generations

  • Updated based on feedback

  • Apply suggestions from code review

Co-authored-by: Niklas Muennighoff <[email protected]>

  • updated based on feedback

  • Added suggestion from review

  • added correction based on review

  • reformatted empty fields to None


Co-authored-by: Niklas Muennighoff <[email protected]> (cd4a012)

Refactor

  • refactor: add metadata basemodel (#260)

  • refactor: rename description to metadata dict

  • refactor: add TaskMetadata and first example

  • update 9 files

  • update TaskMetadata.py

  • update TaskMetadata.py

  • update TaskMetadata.py

  • update LICENSE, TaskMetadata.py and requirements.dev.txt

  • update 151 files

  • update 150 files

  • update 43 files and delete 1 file

  • update 106 files

  • update 45 files

  • update 6 files

  • update 14 files

  • Added model results to repo and updated CLI to create consistent folder structure. (#254)

  • Added model results to repo and updated CLI to create consistent folder structure.

  • ci: updated ci to use make install

  • Added missing pytest dependencies

  • Update README.md

Co-authored-by: Niklas Muennighoff <[email protected]>


Co-authored-by: Niklas Muennighoff <[email protected]>

  • Restructing the readme (#262)

  • restructing the readme

  • removed double specification of versions and moved all setup to pyproject.toml

  • correctly use flat-layout for the package

  • build(deps): update TaskMetadata.py and pyproject.toml

  • update 221 files

  • build(deps): update pyproject.toml

  • build(deps): update pyproject.toml

  • build(deps): update pyproject.toml


Co-authored-by: Kenneth Enevoldsen <[email protected]> Co-authored-by: Niklas Muennighoff <[email protected]> (dd5d617)

Unknown

update Makefile and test_all_abstasks.py (2155bf6)

  • update TaskMetadata.py (#281) (acfd7d4)

  • Merge branch 'main' of https://github.com/embeddings-benchmark/mteb (c9d1a03)

  • Enable ruff ci (#279)

  • restructing the readme

  • added mmteb

  • removed unec. method

  • Added docstring to metadata

  • Updated outdated examples

  • formatting documents

  • fix: Updated form to be parsed correctly

  • fix: Added sizes to the metadata

this allow for automatic metadata generations

  • Updated based on feedback

  • Apply suggestions from code review

Co-authored-by: Niklas Muennighoff <[email protected]>

  • updated based on feedback

  • Added suggestion from review

  • added correction based on review

  • reformatted empty fields to None

  • CI: Enable linter


Co-authored-by: Niklas Muennighoff <[email protected]> (a16eb07)

  • Added MMTEB (#275)

  • restructing the readme

  • added mmteb

  • removed unec. method

  • Added docstring to metadata

  • Updated outdated examples

  • formatting documents

  • fix: Updated form to be parsed correctly

  • Updated based on feedback

  • Apply suggestions from code review

Co-authored-by: Niklas Muennighoff <[email protected]>

  • updated based on feedback

  • Added suggestion from review

  • added correction based on review


Co-authored-by: Niklas Muennighoff <[email protected]> (c0dc49a)

  • dev: add ruff as suggested extension (#274) (b08913f)

  • dev: add isort (#271)

  • dev: add isort

  • dev: add isort (845099d)

  • dev: run tests on pull request towards any branch (13f759a)

  • Merge branch 'main' of https://github.com/embeddings-benchmark/mteb (b42abe4)

  • replaced linter with ruff (#265)

  • restructing the readme

  • removed double specification of versions and moved all setup to pyproject.toml

  • correctly use flat-layout for the package

  • replaced linter with ruff

  • rerun tests

  • ci: Added in newer workflow

some of them are disables as they require other issues to be solved

  • Update Makefile

Co-authored-by: Niklas Muennighoff <[email protected]>


Co-authored-by: Niklas Muennighoff <[email protected]> (023e881)

  • Restructing the readme (#262)

  • restructing the readme

  • removed double specification of versions and moved all setup to pyproject.toml

  • correctly use flat-layout for the package (769157b)

  • restructing the readme (364be7f)

  • Added model results to repo and updated CLI to create consistent folder structure. (#254)

  • Added model results to repo and updated CLI to create consistent folder structure.

  • ci: updated ci to use make install

  • Added missing pytest dependencies

  • Update README.md

Co-authored-by: Niklas Muennighoff <[email protected]>


Co-authored-by: Niklas Muennighoff <[email protected]> (8a758bc)

  • dev: add workspace defaults in VSCode (#253)

  • dev: add black as default formatter in vscode

  • Update .vscode/settings.json


Co-authored-by: Kenneth Enevoldsen <[email protected]> (30e5b9e)

  • Add Danish Discourse dataset (#247)

  • misc.

  • update ddisco.py

  • chore: delete ddisco.py, ddisco.test.tsv and ddisco.train.tsv

  • Update mteb/tasks/Classification/DdiscoCohesionClassification.py

Co-authored-by: Kenneth Enevoldsen <[email protected]>

  • Update mteb/tasks/Classification/DdiscoCohesionClassification.py

Co-authored-by: Kenneth Enevoldsen <[email protected]>

  • Update mteb/tasks/Classification/DdiscoCohesionClassification.py

Co-authored-by: Imene Kerboua <[email protected]>

  • Update mteb/tasks/Classification/DdiscoCohesionClassification.py

Co-authored-by: Imene Kerboua <[email protected]>

  • Update mteb/tasks/Classification/DdiscoCohesionClassification.py

Co-authored-by: Imene Kerboua <[email protected]>


Co-authored-by: Kenneth Enevoldsen <[email protected]> Co-authored-by: Imene Kerboua <[email protected]> (d46d0f5)

  • Update structure of mteb/tasks to mteb/tasks/{type}/{language}  (#245)

  • Fix structure of mteb/tasks Fixes #243

  • fix: Added missing init files (b1c78c1)

  • tests: do not run tests on collection (#249)

test: update test_all_abstasks.py (236614a)

  • Update README.md with correct DRESModel location (399edf4)

  • Fix typo (9610378)

  • Set dev version (716f59c)

v1.2.0 (2024-03-07)

Unknown

  • Release: 1.2.0 (9e9dca8)

  • Rmv superfluous file (d772fed)

  • Remove duplicate & outdated code (12bcb83)

  • Adapt scripts (36b9234)

  • Add example (273ff4a)

  • Simplify retrieval (#233)

  • Simplify retrieval

  • Simplify

  • Make call method

  • Add splits

  • Rmv outdated test

  • Fix name & \n

  • Add qrels

  • Add revisions

Co-authored-by: Imene Kerboua <[email protected]>

  • Add hf hub org

  • Add test

  • Add missing revision

  • Rename test

Co-authored-by: Imene Kerboua <[email protected]>

  • log dres compat

Co-authored-by: Imene Kerboua <[email protected]> (c9fccbc)

  • Fixed missing revision error on Norwegian Bitext Mining (#221)

  • Removed revision specification from Norwegian Bitext Mining task

  • Update to latest revision


Co-authored-by: Niklas Muennighoff <[email protected]> (b249c67)

  • Remove HAGRID from french benchmark (#235)

  • add Masakhane dataset config

  • add trigram lang code for dataset who use it

  • create french script eval

  • fix French word

  • add some documentation

  • add script to process and upload alloprof on HF

  • build script for HF

  • adding dataset processing for mteb

  • add script to process and upload alloprof on HF

  • build script for HF

  • adding dataset processing for mteb

  • refactor few thing

  • remove whitespaces

  • 4 pair classification (#10)

  • add Opusparcus dataset

  • multilingual usage

  • use eval_split of config files

  • change eval_split according to data


Co-authored-by: Gabriel Sequeira <[email protected]>

  • add script to process and upload alloprof on HF

  • build script for HF

  • adding dataset processing for mteb

  • refactor few thing

  • remove whitespaces

  • Clustering with HAL S2S dataset (#11)

HAL S2S dataset creation and evaluation on clustering task.

  • adding BSARD dataset

  • add BSARD to benchmark

  • adding Hagrid dataset

  • DiaBLa and Flores Bitext Mining evaluation (#12)

  • Add DiaBLa dataset for bitext mining

  • Add DiaBLa dataset for bitext mining

  • deduplicate bitext task

  • add Flores

  • format files

  • add flores to evaluation script

  • remove prints

  • add revision


Co-authored-by: Gabriel Sequeira <[email protected]>

  • add script to process and upload alloprof on HF

  • build script for HF

  • adding dataset processing for mteb

  • refactor few thing

  • remove whitespaces

  • adding dataset processing for mteb

  • adding BSARD dataset

  • add BSARD to benchmark

  • adding Hagrid dataset

  • fix change on langmapping

  • reset alphabetical order

  • add revision handling

  • Clustering: Add AlloProf dataset (#17)

AlloProf dataset for clustering task

  • handling of revision

  • change split + add revision handling

  • add script to process and upload alloprof on HF

  • build script for HF

  • adding dataset processing for mteb

  • refactor few thing

  • remove whitespaces

  • adding dataset processing for mteb

  • adding BSARD dataset

  • add BSARD to benchmark

  • adding Hagrid dataset

  • add script to process and upload alloprof on HF

  • adding dataset processing for mteb

  • refactor few thing

  • reset alphabetical order

  • add revision handling

  • handling of revision

  • change split + add revision handling

  • use eval variable

  • alphabetic order

  • Add MLSUM dataset for clustering task (#21)

  • Use Masakhane dataset for clustering task (#23)

  • 16 add datasets to readmemd (#18)

  • run task table

  • run task table

  • Add MLSUM dataset for clustering task (#21)

  • Use Masakhane dataset for clustering task (#23)

  • run task table

  • refresh readme

  • refresh readme

  • run task table

  • refresh readme


Co-authored-by: Gabriel Sequeira <[email protected]> Co-authored-by: Marion Schaeffer <[email protected]>

  • load only test split (#25)

Co-authored-by: Gabriel Sequeira <[email protected]>

  • Update mteb/tasks/BitextMining/DiaBLaBitextMining.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/tasks/Clustering/HALClusteringS2S.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • renaming masakhane (#28)

Co-authored-by: Gabriel Sequeira <[email protected]>

  • Syntec dataset addition (#26)

  • add scrpit to process & load to HF

  • add script to enable download of data from HF

  • add syntec dataset files to gitignore

  • add syntecretrieval

  • add syntec retrival

  • build dataloading script

  • remove datasets

  • correct typo


Co-authored-by: Sequeira Gabriel <[email protected]>

  • 30 add syntec reranking (#31)

  • change name to secify retrieval

  • add reranking tasks

  • create script to upload dataset fo reranking task

  • create reranking task

  • add reranking tasks

  • add model name in description

  • SummEval translated to french (#32)

  • 7 sts (#33)

  • taike into account multilingual tasks

  • add stsbenchmark multilingual dataset

  • add STS tasks

  • taike into account multilingual tasks

  • add stsbenchmark multilingual dataset

  • add STS tasks

  • add coma

  • Adding sick fr dataset to sts tasks (#34)

  • Adding sick fr dataset to sts tasks

  • modifying dataset in load function to have the right column names

  • Fix alloprof dataset (#36)

  • change revision to use

  • remove duplicate data

  • change main metric because dataset is hard (#37)

  • Fix alloprof dataset (#40)

  • change revision to use

  • remove duplicate data

  • change revision

  • handle queries train test split

  • change dataset creation method

  • change revision

  • handle queries train test split

  • change dataset creation method

  • Fix DiaBLa by inheriting CrossLingual class (#42)

  • Fix DiaBLa by inheriting CrossLingual class

  • remove remaining print

  • Fix DiaBLa integration

  • Update mteb/tasks/BitextMining/FloresBitextMining.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update README.md

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update README.md

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/tasks/Classification/MasakhaNEWSClassification.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update README.md

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update README.md

  • Update mteb/tasks/BitextMining/FloresBitextMining.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/abstasks/AbsTaskPairClassification.py

Co-authored-by: Imene Kerboua <[email protected]>

  • Update README.md

  • Update scripts/data/syntec/create_data_reranking.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update scripts/data/alloprof/create_data_reranking.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update scripts/run_mteb_french.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update scripts/run_mteb_french.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/tasks/Retrieval/HagridRetrieval.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/tasks/Clustering/MLSUMClusteringP2P.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/tasks/Clustering/MLSUMClusteringS2S.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/tasks/Clustering/MasakhaNEWSClusteringP2P.py

  • Update mteb/tasks/Clustering/MasakhaNEWSClusteringS2S.py

  • Update mteb/tasks/STS/SickFrSTS.py

  • Inherit OpusparcusPC init from MultilingualTask

  • remove unnecessary init

  • Remove train split from evaluation on MasakhaNEWSClassification (#52)

remove train split from evaluation

  • put script on HF dataset repos (#56)

  • put script on HF dataset repos

  • remove scripts

  • 49 fix dictionnary in syntecretrieval (#54)

  • add trust remote code arg

  • leave corpus as dict

  • remove trust remote code

  • add Tatoeba & BUCC BitextMining tasks (#57)

add bucc and tatoeba bitextmining tasks

  • 46 add other languages to masakhaneweclusterings2s and p2p (#58)

  • add other language to clustering tasks

  • fix main score and S2S task

  • update run fr becnhmark script

  • Update run_mteb_french.py

  • Update AbsTaskClustering.py

  • remove train and validation splits

  • remove Hagrid (#60)


Co-authored-by: Gabriel Sequeira <[email protected]> Co-authored-by: Marion Schaeffer <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Sequeira Gabriel <[email protected]> Co-authored-by: Imene Kerboua <[email protected]> Co-authored-by: Niklas Muennighoff <[email protected]> Co-authored-by: wissam-sib <[email protected]> Co-authored-by: Wissam Siblini <[email protected]> (d01d053)

  • Restore TRECCOVID import (9f8e897)

  • Extend MTEB with French datasets (#218)

  • add Masakhane dataset config

  • add trigram lang code for dataset who use it

  • create french script eval

  • fix French word

  • add some documentation

  • add script to process and upload alloprof on HF

  • build script for HF

  • adding dataset processing for mteb

  • add script to process and upload alloprof on HF

  • build script for HF

  • adding dataset processing for mteb

  • refactor few thing

  • remove whitespaces

  • 4 pair classification (#10)

  • add Opusparcus dataset

  • multilingual usage

  • use eval_split of config files

  • change eval_split according to data


Co-authored-by: Gabriel Sequeira <[email protected]>

  • add script to process and upload alloprof on HF

  • build script for HF

  • adding dataset processing for mteb

  • refactor few thing

  • remove whitespaces

  • Clustering with HAL S2S dataset (#11)

HAL S2S dataset creation and evaluation on clustering task.

  • adding BSARD dataset

  • add BSARD to benchmark

  • adding Hagrid dataset

  • DiaBLa and Flores Bitext Mining evaluation (#12)

  • Add DiaBLa dataset for bitext mining

  • Add DiaBLa dataset for bitext mining

  • deduplicate bitext task

  • add Flores

  • format files

  • add flores to evaluation script

  • remove prints

  • add revision


Co-authored-by: Gabriel Sequeira <[email protected]>

  • add script to process and upload alloprof on HF

  • build script for HF

  • adding dataset processing for mteb

  • refactor few thing

  • remove whitespaces

  • adding dataset processing for mteb

  • adding BSARD dataset

  • add BSARD to benchmark

  • adding Hagrid dataset

  • fix change on langmapping

  • reset alphabetical order

  • add revision handling

  • Clustering: Add AlloProf dataset (#17)

AlloProf dataset for clustering task

  • handling of revision

  • change split + add revision handling

  • add script to process and upload alloprof on HF

  • build script for HF

  • adding dataset processing for mteb

  • refactor few thing

  • remove whitespaces

  • adding dataset processing for mteb

  • adding BSARD dataset

  • add BSARD to benchmark

  • adding Hagrid dataset

  • add script to process and upload alloprof on HF

  • adding dataset processing for mteb

  • refactor few thing

  • reset alphabetical order

  • add revision handling

  • handling of revision

  • change split + add revision handling

  • use eval variable

  • alphabetic order

  • Add MLSUM dataset for clustering task (#21)

  • Use Masakhane dataset for clustering task (#23)

  • 16 add datasets to readmemd (#18)

  • run task table

  • run task table

  • Add MLSUM dataset for clustering task (#21)

  • Use Masakhane dataset for clustering task (#23)

  • run task table

  • refresh readme

  • refresh readme

  • run task table

  • refresh readme


Co-authored-by: Gabriel Sequeira <[email protected]> Co-authored-by: Marion Schaeffer <[email protected]>

  • load only test split (#25)

Co-authored-by: Gabriel Sequeira <[email protected]>

  • Update mteb/tasks/BitextMining/DiaBLaBitextMining.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/tasks/Clustering/HALClusteringS2S.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • renaming masakhane (#28)

Co-authored-by: Gabriel Sequeira <[email protected]>

  • Syntec dataset addition (#26)

  • add scrpit to process & load to HF

  • add script to enable download of data from HF

  • add syntec dataset files to gitignore

  • add syntecretrieval

  • add syntec retrival

  • build dataloading script

  • remove datasets

  • correct typo


Co-authored-by: Sequeira Gabriel <[email protected]>

  • 30 add syntec reranking (#31)

  • change name to secify retrieval

  • add reranking tasks

  • create script to upload dataset fo reranking task

  • create reranking task

  • add reranking tasks

  • add model name in description

  • SummEval translated to french (#32)

  • 7 sts (#33)

  • taike into account multilingual tasks

  • add stsbenchmark multilingual dataset

  • add STS tasks

  • taike into account multilingual tasks

  • add stsbenchmark multilingual dataset

  • add STS tasks

  • add coma

  • Adding sick fr dataset to sts tasks (#34)

  • Adding sick fr dataset to sts tasks

  • modifying dataset in load function to have the right column names

  • Fix alloprof dataset (#36)

  • change revision to use

  • remove duplicate data

  • change main metric because dataset is hard (#37)

  • Fix alloprof dataset (#40)

  • change revision to use

  • remove duplicate data

  • change revision

  • handle queries train test split

  • change dataset creation method

  • change revision

  • handle queries train test split

  • change dataset creation method

  • Fix DiaBLa by inheriting CrossLingual class (#42)

  • Fix DiaBLa by inheriting CrossLingual class

  • remove remaining print

  • Fix DiaBLa integration

  • Update mteb/tasks/BitextMining/FloresBitextMining.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update README.md

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update README.md

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/tasks/Classification/MasakhaNEWSClassification.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update README.md

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update README.md

  • Update mteb/tasks/BitextMining/FloresBitextMining.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/abstasks/AbsTaskPairClassification.py

Co-authored-by: Imene Kerboua <[email protected]>

  • Update README.md

  • Update scripts/data/syntec/create_data_reranking.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update scripts/data/alloprof/create_data_reranking.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update scripts/run_mteb_french.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update scripts/run_mteb_french.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/tasks/Retrieval/HagridRetrieval.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/tasks/Clustering/MLSUMClusteringP2P.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/tasks/Clustering/MLSUMClusteringS2S.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/tasks/Clustering/MasakhaNEWSClusteringP2P.py

  • Update mteb/tasks/Clustering/MasakhaNEWSClusteringS2S.py

  • Update mteb/tasks/STS/SickFrSTS.py

  • Inherit OpusparcusPC init from MultilingualTask

  • remove unnecessary init

  • Remove train split from evaluation on MasakhaNEWSClassification (#52)

remove train split from evaluation

  • put script on HF dataset repos (#56)

  • put script on HF dataset repos

  • remove scripts

  • 49 fix dictionnary in syntecretrieval (#54)

  • add trust remote code arg

  • leave corpus as dict

  • remove trust remote code

  • add Tatoeba & BUCC BitextMining tasks (#57)

add bucc and tatoeba bitextmining tasks

  • 46 add other languages to masakhaneweclusterings2s and p2p (#58)

  • add other language to clustering tasks

  • fix main score and S2S task

  • update run fr becnhmark script

  • Update run_mteb_french.py

  • Update AbsTaskClustering.py

  • remove train and validation splits


Co-authored-by: Gabriel Sequeira <[email protected]> Co-authored-by: Marion Schaeffer <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Imene Kerboua <[email protected]> Co-authored-by: mciancone <[email protected]> Co-authored-by: Niklas Muennighoff <[email protected]> Co-authored-by: wissam-sib <[email protected]> Co-authored-by: Wissam Siblini <[email protected]> (3d8b8ec)

  • dev (c16eddc)

  • Dev (08c7317)

  • Add tasks for Spanish Embedding Evaluation (#227)

  • feat: add xmarket es dataset

  • refactor: use multilingual dataset

  • fix: update revision id

  • refactor: add constant for language

  • feat: add two clustering datasets

Signed-off-by: jupyterjazz <[email protected]>

  • feat: import classes

Signed-off-by: jupyterjazz <[email protected]>

  • refactor: flores dataset

Signed-off-by: jupyterjazz <[email protected]>

  • feat: add miracl reranking task for spanish

  • feat: use hf repo with all reranking langs

  • feat: update revision hash

  • refactor: use description for language

  • feat: add stses task

  • fix: get scores from label column

  • refactor: add revision to data loading

  • Added spanish passage retrieval

  • feat: mintaka and xpqa retrieval tasks

Signed-off-by: jupyterjazz <[email protected]>

  • feat: import classes

Signed-off-by: jupyterjazz <[email protected]>

  • fix: typo in data loading

  • fix: id

Signed-off-by: jupyterjazz <[email protected]>

  • refactor: try out multilingual task

Signed-off-by: jupyterjazz <[email protected]>

  • refactor: multilingual task import

Signed-off-by: jupyterjazz <[email protected]>

  • refactor: cmon man

Signed-off-by: jupyterjazz <[email protected]>

  • refactor: go back to monolingual tasks

Signed-off-by: jupyterjazz <[email protected]>

  • refactor: remove unused import

Signed-off-by: jupyterjazz <[email protected]>

  • refactor: loading logic

Signed-off-by: jupyterjazz <[email protected]>

  • feat: add miracl as retrieval task

  • fix: nested corpus

  • refactor: get lang from description

  • Update mteb/tasks/Retrieval/MIRACLRetrieval.py

Co-authored-by: Michael Günther <[email protected]>

  • feat: allow multlingual reranking tasks

  • feat: make miraclreranking multilingual

  • refactor: rename miraclretrieval

Co-authored-by: Niklas Muennighoff <[email protected]>

  • style: add missing eof empty line

  • feat: make xmarket retrieval task multilingual

  • refactor: rename xmarket

  • refactor: turn spanish tasks multilingual (#11)

  • refactor: make xpqa retrieval multilingual

  • fix: formatting of xpqa dataset

  • refactor: make mintaka into multilingual task

  • refactor: make miracl retrieval multilingual

  • feat: add revision ids for hf datasets

  • refactor: remove patool

  • Update mteb/tasks/Reranking/init.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update mteb/tasks/STS/init.py

Co-authored-by: Niklas Muennighoff <[email protected]>


Signed-off-by: jupyterjazz <[email protected]> Co-authored-by: guenthermi <[email protected]> Co-authored-by: jupyterjazz <[email protected]> Co-authored-by: Markus Krimmel <[email protected]> Co-authored-by: Michael Günther <[email protected]> Co-authored-by: Niklas Muennighoff <[email protected]> (52d5c9f)

v1.1.2 (2024-02-16)

Feature

  • feat: update revision id of wikicitiesclustering task (fb90c02)

Fix

  • fix: remove debugging print statement (d292d93)

  • fix: pass parallel_retrieval kwarg to use DenseRetrievalParallelExactSearch (19b8f66)

Unknown

  • Release: 1.1.2 (def3c91)

  • Add task list (#228)

  • Add task list

  • Update mteb/init.py

  • Update README.md (10bf6f8)

  • Update BeIRPLTask.py (#225)

  • Update BeIRPLTask.py

  • Update BeIRPLTask.py (a8922c1)

  • Allow multiple languages (2cc222e)

  • Add Korean Text Search Tasks to MTEB (#210)

  • add Ko-miracl, Ko-StrategyQA, Ko-mrtydi tasks

  • Update mteb/abstasks/AbsTaskRetrieval.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update AbsTaskRetrieval.py

  • Update mteb/abstasks/AbsTaskRetrieval.py

Co-authored-by: Niklas Muennighoff <[email protected]>

  • Update scripts/run_mteb_korean.py

Co-authored-by: Niklas Muennighoff <[email protected]>


Co-authored-by: Niklas Muennighoff <[email protected]> (dadf2da)

  • Add MultiLongDocRetrieval task to MTEB. (#224)

  • Update AbsTaskRetrieval.py.

  • Add Retrieval Task: MultiLongDocRetrieval

  • Update AbsTaskRetrieval.py and MLDR task

  • Update reference of MLDR (2f65179)

  • Fix name (2989f76)

  • only save top-k (#209)

  • Update AbsTaskRetrieval.py

  • Add json import; rename kwarg

  • Pass OF

  • Update mteb/abstasks/AbsTaskRetrieval.py

  • Update AbsTaskRetrieval.py

  • Update AbsTaskRetrieval.py

  • Update mteb/abstasks/AbsTaskRetrieval.py


Co-authored-by: Niklas Muennighoff <[email protected]> (f58888d)

  • Add tasks for German Embedding Evaluation (#214)

  • chore: solve merge conflict

  • fix: gerdalir dataset

  • fix: lang from en to de

  • chore: solve merge conflict

  • chore: add ir datasets to requirements

  • refactor: limit queries to 10k

  • refactor: update description of task with limit

  • revert style changes

  • feat: add german stsbenchmarksts task

  • feat: update revision id

  • refactor: update revision id after changes in scores

  • add XMarket dataset

  • add xmarket to init file

  • feat: add revision id

  • add paws x dataset

  • Add ir_datasets as dependency

  • add GermanDPR dataset

  • fix loading

  • Update mteb/tasks/Retrieval/GermanDPRRetrieval.py

Co-authored-by: Saba Sturua <[email protected]>

  • feat: add miracl reranking task for german

  • refactor: cleanup task

  • prevent duplicate pos docs

  • fix: use test split in MIRACL (#13)

Fixes mismatch between description and HuggingFace dataset

  • refactor: remove WikiCLIR

  • fix: double import; xmarket name

  • add German tasks to run_mteb_german script

  • fupdate revisions and style

  • update MIRACL to work with latest version

  • revert adding ir_datasets

  • support multilingual pair classification

  • remove print statement

  • Apply suggestions from code review

Co-authored-by: Niklas Muennighoff <[email protected]>

  • fix monolingual pair classification

  • remove lang for monolingual tasks


Co-authored-by: Isabelle Mohr <[email protected]> Co-authored-by: Markus Krimmel <[email protected]> Co-authored-by: Saba Sturua <[email protected]> Co-authored-by: Markus Krimmel <[email protected]> Co-authored-by: Niklas Muennighoff <[email protected]> (9aba9ee)

  • Simplify (1cd07db)

  • Refer to other works (8f28bcb)

  • Update mteb/tasks/Retrieval/GermanQuADRetrieval.py

Co-authored-by: Niklas Muennighoff <[email protected]> (09a9cb0)

  • clean up (51c40fd)

  • WIP: implement requested changes (58baad2)

  • remove code for writing JSONL dataset (d23eac3)

  • add docstring, remove local qrels (af7ee50)

  • fix query id in qrel dataset, ready to merge (33c9dd4)

  • WIP: use HF dataset instead of local JSONL (db3fea1)

  • rename BeIRDETask (e56cf86)

  • Update scripts/run_mteb_german.py

Co-authored-by: Niklas Muennighoff <[email protected]> (4b18a7e)

  • Update mteb/tasks/Retrieval/GermanRetrieval.py

Co-authored-by: Niklas Muennighoff <[email protected]> (3fef61a)


Co-authored-by: Isabelle Mohr <[email protected]> (88beb46)

  • Do not enforce rich import (aa11fe7)

  • fix RerankingEvaluator's compute_metrics_individual (fd7bfac)

  • Fix SummEval import (859d38e)

  • Increment version (4d75ddf)

v1.1.1 (2023-09-20)

Fix

  • fix: msmarco-v2 uses dev.tsv, not dev1.tsv (6908d21)

  • fix: add missing task-langs attribute (#152) (bc22909)

Unknown

  • Release: 1.1.1 (d3aaf4f)

  • Merge branch 'main' into fixconversion (d292258)

  • Fix eval_lang (7836148)

  • Simplify code snippets (d434f52)

  • Simplify wording (3adb0b5)

  • Clarify multi-gpu usage (5a2da23)

  • Fix splits (93f6f85)

  • Improve Cust Model explanation (52c1fd8)

  • Add bs to Clustering test (4df0d2e)

  • Rely on auto-conversion to tensor in score function (d8512f7)

  • Rely on standard encode kwargs only (4c1660e)

  • Improve Cust Model explanation (23d758f)

  • Add bs to Clustering test (6e0c0d2)

  • Rely on auto-conversion to tensor in score function (7ec4c57)

  • Rely on standard encode kwargs only (2fad0f9)

  • Update README.md (d9aa70f)

  • Update README.md (2211f83)

  • Simplify assertion (f7fcbc1)

  • Default to false (d64f6c7)

  • Add multi gpu eval to readme (#140)

update readme (1b1c9d3)

  • Support Multi-node Evaluation (#132)

  • styling

  • USE_HF_DATASETS

  • Support DRPES

  • we use beir.datasets.data_loader_hf in case of non dist

  • distributed fixes

  • update run command

  • cleanup

  • .

  • sugg

  • ruff (0dd82a9)

  • Add Chinese tasks (C-MTEB) (#134)

  • add C_MTEB

  • add C_MTEB

  • rename MMarcoReranking

  • rename MMarcoReranking

  • Update mteb/tasks/Retrieval/CMTEBRetrieval.py

  • Update README.md

  • Allow custom encode functions


Co-authored-by: shitao <[email protected]> Co-authored-by: Nouamane Tazi <[email protected]> Co-authored-by: Niklas Muennighoff <[email protected]> (071974a)

  • Add Polish tasks (PL-MTEB) (#137)

  • Add Polish tasks (PL-MTEB)

  • Add Polish datasets to README

  • Add newline


Co-authored-by: rposwiata <[email protected]> Co-authored-by: Niklas Muennighoff <[email protected]> (2779344)

  • Add BEIR-PL datasets to MTEB (#121)

  • Add BIER-PL benchmark

  • Update README with BEIR-PL datasets

  • Update names

  • Add tasks to init to be visible during evaluation


Co-authored-by: Konrad Wojtasik <[email protected]> Co-authored-by: Niklas Muennighoff <[email protected]> (5972c02)

  • Replaced prints with logging (#133)

  • Make sure that main score is added to bitext mining tasks

  • Added scandinavian languages: da, no, sv

  • merge upstream main

  • fix: Replaced prints with logging statements

  • chore: removed accidental commits (d7ca378)

  • add logging (6412a6a)

  • Merge pull request #131 from embeddings-benchmark/nouamane/quick-fixes

Code cleanup (4fb97d0)

v1.1.0 (2023-07-31)

Unknown

  • Release: 1.1.0 (80d0344)

  • Bump version ID and update PyPI (#128)

Bump version ID and update PyPI after adding additional tasks. (4a4b54b)

  • Fix typo (33a3140)

  • Sort imports (ab2eef8)

  • Sort imports (3432374)

  • Raise error first (0b1bfd2)

  • Added support for Scandinavian Languages (#124)

  • Make sure that main score is added to bitext mining tasks

  • Added scandinavian languages: da, no, sv

  • Updated readme with scandinavian tasks

  • Changes n samples for the nordic lang CLF

  • Added scandinavian models to init

  • Added error logs to gitignore

  • fix import error

  • fix dataset columns

  • rename dataset columns

  • remove swefaq

  • fix: Added functionality to raise error

  • fix: Updated names

  • fix: Removed no as a language

  • Added missing data transformation

  • Fix spelling error (acb0f59)

  • Install beir (c50b8ab)

  • Update README.md (29ffedf)

  • ruff (6a58b5d)

  • Update README.md (5825536)

  • fix revision hash for TenKGnadClusteringP2P dataset

Co-authored-by: Niklas Muennighoff <[email protected]> (eb622f8)

  • change dataset order for BlurbsClustering in README

Co-authored-by: Niklas Muennighoff <[email protected]> (f6e49ba)

  • change dataset order for TenKGnadClustering in README

Co-authored-by: Niklas Muennighoff <[email protected]> (2a2c47f)

  • fix descriptions for German clustering datasets (30a966c)

  • add German clustering tasks to README (62457e3)

  • update reference & category for TenKGnad datasets (2174a47)

  • add German clustering tasks (ab469be)

  • Allow abs path (b56528c)

  • Add @property annotation to description method of AbsTask (98b0443)

  • fix typo (37a986b)

  • fix extend lang pairs (865dffc)

  • Fix clustering eval, black, isort (bc43665)

  • Add 'auto' to sklearn clustering, add test, fix warning (15ce352)

  • Update MSMARCORetrieval.py (d913f56)

  • Revert to old split (1f3ff6e)

  • Add wheel instruction (62fad9b)

  • Dev version (d988e48)

v1.0.2 (2023-03-28)

Unknown

  • Release: 1.0.2 (e189bae)

  • Add comment

Co-authored-by: Nouamane Tazi <[email protected]> (3e72ee8)

  • Fix naming (33f2db9)

  • Cleaner logging & tqdm usage (542d871)

  • Add kwargs (e0b801d)

  • Produce embeddings in one go (e88bcf2)

  • Fix naming (6c62f18)

  • Make inputs always List[str] & call in one (bdeeedf)

  • Fix SummEval description (0c2b1be)

  • fix SemmEval description

Unless I'm missing something, I think the SemmEval description is incorrect---the dataset consists of summaries of news articles, not biomedical abstracts. (1ccc068)

  • Clarify script for running all of MTEB English (9f72434)

  • Update run_mteb_english.py (6ff57d3)

  • Update run_mteb_english.py (7803eea)

  • Point to English benchmarking script (57f3371)

  • Eexample script for benchmarking all of MTEB English (77e6b22)

  • Clarify MSMARCO split (bbeada8)

  • Allow re-merging (b0ce501)

  • Set dataset name; Sort imports (2a5a661)

  • Standardize CQA merging script (5d5a2fb)

  • Update merge_cqadupstack.py (b0304c1)

  • Update README.md (8c60c22)

  • Update README.md (6255449)

  • Remove validation split (875a98e)

  • Remove validation set (b3f9585)

  • Update ClassificationEvaluator.py (93b89b6)

  • Set dev version (8a0d6b1)

v1.0.1 (2022-11-29)

Unknown

v1.0.0 (2022-10-17)

Unknown

v0.9.1 (2022-10-13)

Unknown

  • Release: 0.9.1 (5c438cc)

  • Merge pull request #80 from embeddings-benchmark/Muennighoff-patch-5

Update STS22CrosslingualSTS.py (1459309)

  • Update installation (f96ee73)

  • Update SummEvalSummrization.py (d8f232d)

  • Update AmazonPolarityClassification.py (114b0e3)

  • Update STS22CrosslingualSTS.py (c8df727)

  • Temporarily change README installation instruction (e53e77c)

  • Fix res keyword (769ac67)

  • Update example to be visible for non-registered users (d4f75fc)

  • Merge pull request #79 from Muennighoff/feature/leaderboardexp

Add leaderboard instructions (4d2683a)

  • Move meta script (7a8398f)

  • dataset_version -> dataset_revision & logging (fe34f84)

  • Add leaderboard instructions (f325aca)

  • Merge pull request #78 from embeddings-benchmark/feature/add-mteb-ds-name

Add ds name to res dict (53b763a)

  • Update MTEB.py (ae86e2f)

  • Merge pull request #73 from Muennighoff/fix/cqadupstackbeir11

Fallback to old dataloader for cqadupstack (7791b41)

  • Merge pull request #77 from Muennighoff/fix/bcpc

Update init imports (865bf47)

  • Update init imports (39b7712)

  • Merge pull request #76 from Muennighoff/fix/bcpc

BC -> PC (82d3228)

  • Merge branch 'main' into fix/bcpc (f18c6df)

  • BC -> PC (7a430c2)

  • Merge pull request #75 from Muennighoff/feature/leaderboard

Add LB link (36dbd14)

  • Merge pull request #72 from Muennighoff/fix/revisions

Fix/revisions (4a8d3db)

  • Merge pull request #74 from Muennighoff/fix/mteblogo

Update logo files (d939de6)

  • Add LB link (6aeb7ed)

  • Update logo files (5bfb65a)

  • Fallback to old dataloader for cqadupstack (262930e)

  • Add revision (488f1f7)

  • Add revisions 2/2 (c8ba2b8)

  • Add revisions 1/2 (c75a503)

  • Merge pull request #69 from Muennighoff/feature/custombeirmodel

Feature/custombeirmodel (da9ae9a)

  • BeIRModel -> DRES (ff554bb)

  • Do not wrap 2x (255c416)

  • Adapt naming (3c8f672)

  • Add explanation of BeIRModel (3edad09)

  • Merge pull request #68 from Muennighoff/feature/beirmrr

Add MRR (7a0993d)

  • Allow custom BeIR model (cd5098b)

  • Add MRR (6dbb97c)

  • Merge pull request #67 from Muennighoff/fix/s2p

Fix categories (03ed576)

  • Fix categories (08088d7)

  • Update RedditClusteringP2P.py (77a1606)

  • Merge pull request #62 from Muennighoff/feature/hublinks

Feature/hublinks (4f04719)

Add desc (f93abff)

  • Add desc (c972cc9)

  • Merge pull request #61 from embeddings-benchmark/fix/nolangs

Fix no langs (c15e1a7)

  • Merge branch 'main' into feature/hublinks (c3990d6)

  • Simplify (936eee2)

  • Add Hub links & descriptions (b8182bb)

  • Update MTEB.py (0be4a06)

  • Merge pull request #57 from embeddings-benchmark/Muennighoff-patch-2

Update README.md (1ebca84)

  • Merge pull request #59 from embeddings-benchmark/Muennighoff-patch-3

Update README.md (3f53c85)

  • Update README.md (8097f31)

  • Update README.md (5b260a4)

  • Merge pull request #56 from Muennighoff/feature/readmelinks

Add README Links & Images (f473dbd)

  • Center title (1341db7)

  • Center title (8b80471)

  • Beautify (1ab8764)

  • Merge pull request #49 from Muennighoff/fix/cqadupstack

Fix CQADupstack (3a4dd84)

  • Merge pull request #50 from Muennighoff/fix/redditp2p

New RedditP2P Script (7bc547e)

  • Merge pull request #52 from Muennighoff/fix/bucc

Default to 1-indexed gold (9aff7f2)

  • Merge pull request #54 from embeddings-benchmark/Muennighoff-patch-1

Update MSMARCORetrieval.py (3951c41)

  • Update MSMARCORetrieval.py (6922be0)

  • Default to 1-indexed gold (f29e1fb)

  • New RedditP2P Script (f73b179)

  • Fix split (e3ea40b)

  • Add CQADupStack subsets (a32c00b)

  • Fix CQADupstack (a26229f)

  • Merge pull request #46 from Muennighoff/fix/scidocs

Fix/scidocs (ea10703)

  • Update README name (afddfd3)

  • Merge pull request #45 from Muennighoff/feature/cachetestembs

Feature/cachetestembs (475420a)

  • Merge pull request #44 from Muennighoff/fix/silentskip

Fix/silentskip (f7d6fd1)

  • Merge pull request #43 from Muennighoff/main

Add flag to overwrite results (ece590f)

  • Merge pull request #33 from Muennighoff/fix/summeval

Fix SummEval NaN scores (48586e2)

  • Merge branch 'main' into main (e986cd1)

  • Merge pull request #42 from Muennighoff/feature/versioning

Feature/versioning (1aeaede)

  • Update mteb/evaluation/MTEB.py (23a473f)

  • Rename SciDocs (edc2917)

  • Return test cache in all clf evaluators (309a867)

  • Cache test embedding / exp for all clf evals (7dd867f)

  • Add testcache (08cb352)

  • Split into two lines (f756399)

  • Sort tasks (03658fa)

  • Log known tasks (86f9cd6)

  • Log tasks not found (9ab0a7a)

  • Add flag to overwrite (529541d)

  • Version mteb & ds (78b90e9)

  • Formatting (67f6070)

  • Add versioning (fa852de)

  • Merge pull request #41 from Muennighoff/fix/sts22 (064e47c)

  • Rmv superfluous imports (7e8ee18)

  • Make revision optional (90afba5)

  • Remove space (e0d22bc)

  • Modify script to invert scores (9b9f43a)

  • Add revision to CL (5f68fda)

  • Add revision kwarg (3448d1e)

  • Merge pull request #26 from AmrMKayid/return-results (8f3242c)

  • Merge pull request #38 from Muennighoff/fix/seeds (720c597)

  • Update docs (dd4a1f2)

  • Merge pull request #37 from embeddings-benchmark/mindref

Fix Mind Reference (1834041)

  • Seed cuda (d33d748)

  • Merge pull request #35 from embeddings-benchmark/bootstrap-logs (3ff35c5)

  • Update mteb/abstasks/AbsTaskClassification.py

Co-authored-by: Niklas Muennighoff <[email protected]> (9255249)

Two other notes:

  • The renaming can create confusion as there exists a test set just that I assume we don't have the labels
  • MIND uses AUC & MRR & NDCG scores, not MAP, see https://msnews.github.io/ (7ce4bb1)
  • Update mteb/evaluation/evaluators/SummarizationEvaluator.py

Co-authored-by: Nouamane Tazi <[email protected]> (f667749)

  • Merge pull request #36 from embeddings-benchmark/mindsmall-test (6fc710b)

  • rename validationsplit to test (9c4d5c6)

  • styling (c66610e)

  • add logs for classification bootstrap experiments (e4000e1)

  • Merge pull request #32 from Muennighoff/fixsplits (39d0926)

  • Add consistent brackets (2cdd283)

  • Remove debug leftovers (c674d0a)

  • Remove superfluous imports (68f7307)

  • Skip samples with no variance (d39be65)

  • Drop nans (20c22a9)

  • Fix BEIR splits (752d49f)

  • Fix splits (07bea18)

  • Merge branch 'main' into return-results (314e5d7)

  • Merge pull request #30 from embeddings-benchmark:selected_tasks

fix printing selected tasks for evaluation (f1cab40)

  • fix printing selected tasks for evaluation (ba0dd76)

  • Merge pull request #29 from cycycc/fix-sickr-hf-hub-name (cb87c7a)

  • fix sick-r huggingface hub name (2ea195a)

  • Update mteb/evaluation/MTEB.py

Co-authored-by: holidaydrien <[email protected]> (a4d952b)

  • Update mteb/evaluation/MTEB.py

Co-authored-by: holidaydrien <[email protected]> (c4acb76)

  • Returning Evaluation results (3d60490)

  • Merge pull request #18 from Muennighoff/evalfix (4dabbaf)

  • Merge pull request #19 from Muennighoff/patch-2 (9e56ad3)

  • Merge pull request #20 from Muennighoff/updatemainscores (a0fbd83)

  • Update to ndcg_at_10 (8d010d0)

  • Update main scores (c0e773a)

  • Update README.md (8b495b6)

  • Fix task splits (1755356)

  • Merge pull request #15 from Muennighoff/mainscorefix (4b5fe2b)

  • Fix monolingual mainscore (61647df)

  • Fix main score warning multilingual (831a218)

  • Merge pull request #14 from Muennighoff/patch-1 (6055ecc)

  • Fix task language example (115c280)

  • styling (2ff07d2)

  • update example (b581d00)

  • we can now select all tasks of a specific language (b36e58c)

  • update test (53d123e)

  • keep only langs defined in task's description when loading (efa189f)

  • better prints for multilingual and crosslingual evaluation (5b86950)

  • styling (8fd8fb0)

  • move scripts to respective folders (028ed3e)

  • Update gitignore (a3cee03)

  • update setup.py (89aaa43)

  • update setup (2645323)

  • update setup.cfg (bc5ec1d)

  • Create first pip version (210d012)

  • make default evaluation for classification 10 experiments each using 8 samples per label (b062405)

  • use seed from init arg (f58f8da)

  • styling (4d1bd09)

  • add error message when trying to load beir (a3d58f3)

  • add argument to specify error logs path (d6cef16)

  • make beir an optional package (5bcee12)

  • quick modifications (d774ce6)

  • add example (21fc624)

  • make beir optional dependency (fdd922a)

  • Smaller fixes in Classification task (c6eda26)

  • update available tasks (0923e50)

  • update available tasks (e192823)

  • add evaluation time to final scores (9a1ca7d)

  • quick fix loading beir task (8e46cc8)

  • add available tasks (b7a1987)

  • Merge pull request #11 from embeddings-benchmark/summarization (bdb2691)

  • add more scores to summarization evaluator (12ae05f)

  • add SummEval task (3ba3e65)

  • add Summarization abstract task (f2b0e53)

  • add specifying language for task example (cdf1f18)

  • fix bitext mining evaluation (073a254)

  • update README (3b30e9b)

  • update README (529ec6b)

  • add --available_tasks flag to CLI (de97d9a)

  • styling (324b94c)

  • fix missing params eval_splits in load_data (ecb9d12)

  • CLI quick fixes (693bffa)

  • Merge branch 'main' of https://github.com/embeddings-benchmark/mteb-draft into main (bba225d)

  • quick fixes (2c01099)

  • styling (75d0449)

  • fix eval_splits loading using beir (26ec6b9)

  • capture errors instead of failing (c6aafa4)

  • quick fixes (8a7e3ec)

  • update BeIRModel (e8b5ff9)

  • load data and free it after each task evaluation (aa467f2)

  • update reqs (6005c10)

  • fixing beir imports (5d74d42)

  • Merge pull request #10 from embeddings-benchmark/optimisation (2b6caf2)

  • add multiproc test (fe8b963)

  • update BitextMining main scores (3b0f912)

  • support distributed evaluation for IR 🥳 (5e91971)

  • remove "train" from eval_splits (6da5ed1)

  • gather all nodes outputs in CPU after distributed computation (5eb3661)

  • support DRPES for Parallel IR evaluation (36962e9)

  • quick fix (6e0e6bd)

  • set logistic regression default max_iter to 200 (8963b83)

  • add evaluators logs 📜 (e9d326f)

  • make style (ab8f13e)

  • add Makefile and better styling tools ✨ (156e828)

  • dataloading moved from init to run (c2b7901)

  • Merge pull request #8 from embeddings-benchmark/beir-integration

Beir integration (f7f2426)

  • Merge branch 'main' into beir-integration (af12b49)

  • Merge pull request #9 from embeddings-benchmark/display

Display (11e5758)

  • fixes (8902f59)

  • fixes (a394cc2)

  • fixes+black (b0527a8)

  • beautiful task display (0ff2db2)

  • rich library (27cd4cb)

  • datasets (895c23d)

  • fever (0724070)

  • quora (43b93e5)

  • dbpedia (50d6700)

  • climatefever (e506637)

  • cqadupstack (217009f)

  • arguana (019b2b7)

  • beir retrieval (e52171b)

  • only save if output_folder argument is specified (2e1eb24)

  • Update python-package.yml (6c32b6b)

  • all tests are passing now ✅ (6c41b75)

  • Create python-package.yml (3cce88f)

  • Merge pull request #6 from embeddings-benchmark/testing (06bd1df)

  • Merge branch 'main' into testing (5226907)

  • normalize STS scores (6f98396)

  • normalize score names (6db134f)

  • format @k scores (dcb77a0)

  • rename CrossLingual to Crosslingual (3af3b4f)

  • remove train split from evaluation splits (6317bb6)

  • bug fix (ba8c906)

  • calculate AP only in binary classification (bc293ca)

  • add kwargs and batch_size to evaluate funcs (7926d3c)

  • update main scores for some tasks (099a32b)

  • add limit argument to limit evaluation data (92e5d09)

  • add test for PairClassificationEvaluator (ecffd35)

  • use evaluators.PairClassificationEvaluator instead of sent-formers BinaryClassificationEvaluator (9ffdf2b)

  • reformatting (ca25e17)

  • add test for RerankingEvaluator (9646892)

  • reformat RerankingEvaluator (3ce99cf)

  • more docs (8d07d59)

  • tests folder (9cd9dc2)

  • add test_RetrievalEvaluator (5236588)

  • more docs (06ff3d1)

  • add AP score to ClassificationEvaluator (c950ce8)

  • add nDCG score to RerankingEvaluator (e4170c8)

  • Merge pull request #5 from embeddings-benchmark:update-reranking

Support multiple queries in Reranking tasks (cf51493)

  • quick fix (0d133e1)

  • use max cross similarity in case of multiple queries (3f80a70)

  • support multiple queries in Reranking tasks (47f871f)

  • bug fixes (a3dc4f6)

  • rename binary classification to pair classification (63374fe)

  • rename available_splits to eval_splits (04b9f55)

  • rename available_langs to eval_langs (99ad04c)

  • minor fixes (c2307ef)

  • Merge pull request #4 from embeddings-benchmark/packaging (297560d)

  • quick fix bug (fc8ea9f)

  • report stderr in AbsTaskClassification in case of bootstrapping (d3723a5)

  • add STS22CrosslingualSTS (87f92e5)

  • add MindSmallReranking (940642a)

  • precision recall f1 bitext evaluator (4f4a9e2)

  • korean to sts17 (4767300)

  • quick fix RetrievalEvaluator (afb574a)

  • update README (db6edde)

  • update example (d363a49)

  • remove useless import (4a8966f)

  • fix cmd.py arguments (2b17b4a)

  • add kwargs where needed (30f0efd)

  • add cli script (5a77900)

  • adopt pbr packaging (c2fc3c1)

  • quick fixes (f5d3287)

  • rename kNNClassification to Classification (44ceb4a)

  • add bootstrap parameters to AbsTaskKNNClassification (8ecf9d4)

  • add EmotionClassification (e03db39)

  • add TweetSentimentExtractionClassification (5c7ef5c)

  • add ToxicConversationsClassification (94bfb4f)

  • add AmazonCounterfactualClassification (4052ae1)

  • add ImdbClassification task (eb70842)

  • add AmazonPolarityClassification dataset (75684ed)

  • hack fix bug loading tasks twice (23cc372)

  • add AmazonReviewsClassification (5f4731c)

  • add create data script for amazon reviews multi (761b70e)

  • make shuffling reproducible in logReg-10-splits-5-intents (52e4743)

  • add logReg-10-splits-5-intents for kNNClassificationEvaluator (72c67e0)

  • quick fix batch size (a88d8bf)

  • quick fixes (d4e5549)

  • add batch size to kNNClassificationEvaluator (c5127d8)

  • Merge pull request #3 from embeddings-benchmark/cross-lingual

Cross lingual (c844875)