Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable recursive graph bisection? #2289

Open
jpountz opened this issue Dec 5, 2023 · 0 comments
Open

Enable recursive graph bisection? #2289

jpountz opened this issue Dec 5, 2023 · 0 comments

Comments

@jpountz
Copy link
Contributor

jpountz commented Dec 5, 2023

Since Anserini is often used for search performance benchmarks, enabling recursive graph bisection would help. For reference, most if not all PISA performance benchmarks seem to enable recursive graph bisection.

Since Lucene 9.9, Lucene is able to hook recursive graph bisection into the merging process, which makes it easier to enable. For instance, you can do the following to enable recursive graph bisection in the final merge if you plan on doing a IndexWriter.forceMerge(1) call before searching documents:

IndexWriterConfig iwc = new IndexWriterConfig();

BPIndexReorderer reorderer = new BPIndexReorderer();
reorderer.setForkJoinPool(ForkJoinPool.commonPool()); // run reordering on multiple threads

BPReorderingMergePolicy mp = new BPReorderingMergePolicy(iwc.getMergePolicy(), reorderer);
mp.setMinNaturalMergeNumDocs(Integer.MAX_VALUE); // only run reordering on forced merges

iwc.setMergePolicy(mp);

But you can also enable it on background merges if you don't plan on doing a final force-merge, in a wat that the bigger segments will be reordered. Note: benchmarks on the Wikipedia dataset suggest that this approach yields an index-time overhead in the order of ~30%.

IndexWriterConfig iwc = new IndexWriterConfig();

BPIndexReorderer reorderer = new BPIndexReorderer();

BPReorderingMergePolicy mp = new BPReorderingMergePolicy(iwc.getMergePolicy(), reorderer);
mp.setMinNaturalMergeNumDocs(100_000); // only reorder segments that have more than 100k docs

iwc.setMergePolicy(mp);

This assumes a default configuration for index reordering, which looks at all indexed fields, runs up to 20 iterations per level, etc. Much of this is configurable, see BPIndexReorderer javadocs and BPReorderingMergePolicy javadocs.

These classes are in the lucene-misc module, which I can't see in Anserini's current dependencies, so it would need to be added.

I'm happy to help on this, let me know if you have questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant