Include Vespa Lexical Search as an option to BEIR benchmark #76

thigm85 · 2022-03-01T19:13:25Z

@NThakur20 could you take a look at this PR? The idea is to make it easier to benchmark Vespa applications using the BEIR datasets and framework. We started with Lexical Search but will make it more general later.

…nchmarking

Introduce Vespa Lexical experiment into Beir

Improve Vespa lexical experiment

thakur-nandan · 2022-03-23T20:10:05Z

Hi @thigm85, thank you for providing this PR!

in the next upcoming days, I will have a look at the PR. I'm happy to see Vespa being included within BEIR!

I was busy shifting the repository last few weeks. Will soon merge with the development branch and later release with the next version of beir updates!

Kind Regards,
Nandan Thakur

thakur-nandan

Hi @thigm85, I went through your PR. Thanks for all the code added and it looks good.

I have a few updates mentioned here.

in setup.py can you add pyvespa and tenacity as optional dependencies something similar to what I have done for tensorflow? Something like:

optional_packages = {
    "tf" : ['tensorflow>=2.2.0', 'tensorflow-text', 'tensorflow-hub'],
    "vespa": ["pyvespa", "tenacity"]
}

Can you create a tests folder within the main directory of BEIR and place test_retrieval_lexical_vespa.py inside this tests folder. I currently do not have any unittests implemented for other methods, this will unify in future all unittests at one place.
could you shift your example benchmark_lexical_vespa.py from examples/benchmarking to examples/retrieval/evaluation/lexical because the other place contains all sample scripts to evaluate different lexical search and will be easier for the user to find it.
Could you mention briefly on top of benchmark_lexical_vespa.py a few steps on how to run vespa lexical search? What must a user should have in place to run vespa search? Or how to download and run the vespa application? You can have a look at evaluate_bm25.py for reference.

Thank you! Will merge once it the small updates mentioned has been resolved.

Kind Regards,
Nandan Thakur

thigm85 and others added 30 commits January 24, 2022 08:38

include pyvespa dependency

3d45c01

include pycharm config files on gitignore

27b3cfd

create a class for vespa lexical search

5674034

basic tests for vespa lexical search

064c34e

include vespa lexical search as a possivle argument in the evaluator

d764954

add vespa benchmark script

eac2910

increase timeout

0c54ada

Merge branch 'tgm/vespa_lexical_search' into tgm/add-vespa-lexical-be…

4497ece

…nchmarking

include deployment parameters. Fix result object.

8197a73

include remove app method

14ab0c5

use remove app method on the unit tests

caf5131

Merge branch 'tgm/vespa_lexical_search' into tgm/add-vespa-lexical-be…

23ca3f8

…nchmarking

use remove method

2ba1211

get container by name

8d94af4

Merge branch 'tgm/vespa_lexical_search' into tgm/add-vespa-lexical-be…

0a77411

…nchmarking

add progress info

ed06a11

include all datasets

4faf388

Include tenacity

7ae1a29

add retry strategy

5be07ea

update script

34cd6c5

Merge pull request #3 from thigm85/tgm/vespa-lexical-experiment

88f0002

Introduce Vespa Lexical experiment into Beir

add issue link

066ef2d

process queries in parallel

e5b24b6

pre-process queries

4c5ac5c

improve feeding and output information

48a7c14

Merge pull request #4 from thigm85/tgm/vespa-lexical-experiment

ac640ee

Improve Vespa lexical experiment

exclude cases where queries is returned as hits

1061122

add option to not exclude the dataset and not remove the app

80f4f3a

document benchmark script

7ee7962

add split_type and fix deployment parameters

713887c

thigm85 added 2 commits March 9, 2022 09:43

expose timeout and async connections. Continue in case of empty hits.

b08a90e

msmarco uses dev set

cc51233

thakur-nandan self-assigned this Mar 30, 2022

thakur-nandan reviewed Mar 30, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include Vespa Lexical Search as an option to BEIR benchmark #76

Include Vespa Lexical Search as an option to BEIR benchmark #76

thigm85 commented Mar 1, 2022 •

edited

thakur-nandan commented Mar 23, 2022

thakur-nandan left a comment

Include Vespa Lexical Search as an option to BEIR benchmark #76

Are you sure you want to change the base?

Include Vespa Lexical Search as an option to BEIR benchmark #76

Conversation

thigm85 commented Mar 1, 2022 • edited

thakur-nandan commented Mar 23, 2022

thakur-nandan left a comment

Choose a reason for hiding this comment

thigm85 commented Mar 1, 2022 •

edited