Anserini Regression Experiments

Internally at Waterloo, tuna.cs.uwaterloo.ca is used for the development of Anserini and is set up to run the regression experiments described here. The regression script src/main/python/run_regression.py runs end-to-end regression experiments for various collections, which includes:

Building the index from scratch.
Running all retrieval runs in Anserini documentation.
Verifying results against effectiveness figures stored in src/main/resources/regression/.

We keep a change log whenever effectiveness changes or when new regressions are added.

Requirements

Python>=2.6 or Python>=3.5

pip install -r src/main/python/requirements.txt

Invocations

tl;dr - Copy and paste the following lines into console on tuna to run the regressions without building indexes from scratch:

nohup python src/main/python/run_regression.py --collection disk12 >& log.disk12 &
nohup python src/main/python/run_regression.py --collection robust04 >& log.robust04 &
nohup python src/main/python/run_regression.py --collection robust05 >& log.robust05 &
nohup python src/main/python/run_regression.py --collection core17 >& log.core17 &
nohup python src/main/python/run_regression.py --collection core18 >& log.core18 &

nohup python src/main/python/run_regression.py --collection mb11 >& log.mb11 &
nohup python src/main/python/run_regression.py --collection mb13 >& log.mb13 &

nohup python src/main/python/run_regression.py --collection wt10g >& log.wt10g &
nohup python src/main/python/run_regression.py --collection gov2 >& log.gov2 &
nohup python src/main/python/run_regression.py --collection cw09b >& log.cw09b &
nohup python src/main/python/run_regression.py --collection cw12b13 >& log.cw12b13 &
nohup python src/main/python/run_regression.py --collection cw12 >& log.cw12 &

nohup python src/main/python/run_regression.py --collection car17 >& log.car17 &

Copy and paste the following lines into console on tuna to run the regressions from the raw collection, which includes building indexes from scratch (note difference is the additional --index option):

nohup python src/main/python/run_regression.py --collection disk12 --index >& log.disk12 &
nohup python src/main/python/run_regression.py --collection robust04 --index >& log.robust04 &
nohup python src/main/python/run_regression.py --collection robust05 --index >& log.robust05 &
nohup python src/main/python/run_regression.py --collection core17 --index >& log.core17 &
nohup python src/main/python/run_regression.py --collection core18 --index >& log.core18 &

nohup python src/main/python/run_regression.py --collection mb11 --index >& log.mb11 &
nohup python src/main/python/run_regression.py --collection mb13 --index >& log.mb13 &

nohup python src/main/python/run_regression.py --collection wt10g --index >& log.wt10g &
nohup python src/main/python/run_regression.py --collection gov2 --index >& log.gov2 &
nohup python src/main/python/run_regression.py --collection cw09b --index >& log.cw09b &
nohup python src/main/python/run_regression.py --collection cw12b13 --index >& log.cw12b13 &
nohup python src/main/python/run_regression.py --collection cw12 --index >& log.cw12 &

nohup python src/main/python/run_regression.py --collection car17 --index >& log.car17 &

Watch out: the full cw12 regress takes a couple days to run and generates a 12TB index!

Details of each specific regression:

disk12: Experiments on Disks 1 & 2
robust04: Experiments on Disks 4 & 5 (Robust04)
robust05: Experiments on AQUAINT (Robust05)
core17: Experiments on the New York Times (Core17)
core18: Experiments on the Washington Post (Core18)
wt10g: Experiments on Wt10g
gov2: Experiments on Gov2
cw09b: Experiments on ClueWeb09 (Category B)
cw12b13: Experiments on ClueWeb12-B13
cw12: Experiments on ClueWeb12
mb11: Experiments on Tweets2011 (MB11 & MB12)
mb13: Experiments on Tweets2013 (MB13 & MB14)
car17: Experiments on Car17

Additional Regressions

JDIQ 2018 Experiments
TREC 2018 runbook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regressions.md

regressions.md

Anserini Regression Experiments

Requirements

Invocations

Additional Regressions

Files

regressions.md

Latest commit

History

regressions.md

File metadata and controls

Anserini Regression Experiments

Requirements

Invocations

Additional Regressions