Enable recursive graph bisection? #2289

jpountz · 2023-12-05T09:11:14Z

Since Anserini is often used for search performance benchmarks, enabling recursive graph bisection would help. For reference, most if not all PISA performance benchmarks seem to enable recursive graph bisection.

Since Lucene 9.9, Lucene is able to hook recursive graph bisection into the merging process, which makes it easier to enable. For instance, you can do the following to enable recursive graph bisection in the final merge if you plan on doing a IndexWriter.forceMerge(1) call before searching documents:

IndexWriterConfig iwc = new IndexWriterConfig();

BPIndexReorderer reorderer = new BPIndexReorderer();
reorderer.setForkJoinPool(ForkJoinPool.commonPool()); // run reordering on multiple threads

BPReorderingMergePolicy mp = new BPReorderingMergePolicy(iwc.getMergePolicy(), reorderer);
mp.setMinNaturalMergeNumDocs(Integer.MAX_VALUE); // only run reordering on forced merges

iwc.setMergePolicy(mp);

But you can also enable it on background merges if you don't plan on doing a final force-merge, in a wat that the bigger segments will be reordered. Note: benchmarks on the Wikipedia dataset suggest that this approach yields an index-time overhead in the order of ~30%.

IndexWriterConfig iwc = new IndexWriterConfig();

BPIndexReorderer reorderer = new BPIndexReorderer();

BPReorderingMergePolicy mp = new BPReorderingMergePolicy(iwc.getMergePolicy(), reorderer);
mp.setMinNaturalMergeNumDocs(100_000); // only reorder segments that have more than 100k docs

iwc.setMergePolicy(mp);

This assumes a default configuration for index reordering, which looks at all indexed fields, runs up to 20 iterations per level, etc. Much of this is configurable, see BPIndexReorderer javadocs and BPReorderingMergePolicy javadocs.

These classes are in the lucene-misc module, which I can't see in Anserini's current dependencies, so it would need to be added.

I'm happy to help on this, let me know if you have questions.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable recursive graph bisection? #2289

Enable recursive graph bisection? #2289

jpountz commented Dec 5, 2023

Enable recursive graph bisection? #2289

Enable recursive graph bisection? #2289

Comments

jpountz commented Dec 5, 2023