Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Graph Reordering into HNSW #9

Open
BlaiseMuhirwa opened this issue May 25, 2023 · 3 comments
Open

Integrate Graph Reordering into HNSW #9

BlaiseMuhirwa opened this issue May 25, 2023 · 3 comments

Comments

@BlaiseMuhirwa
Copy link

BlaiseMuhirwa commented May 25, 2023

Coleman et al. showed that reordering the nodes in every layer such that the neighbors of each node are laid out closer in memory improves query time performance by about 40%. The idea is that reordering provides a cache-efficient search mechanism that reduces the search overhead due to random accesses in HNSW.

They also showed that using hierarchy is not strictly necessary in certain settings. They replaced the hierarchy with "a process where we randomly sample 50 nodes and use the closest option as the initialization." They observed no statistically significant difference between the hierarchical search procedure and this random sampling process in terms of recall or query time over 10k items.

I can work on integrating these features.

@jianshu93
Copy link

jianshu93 commented May 25, 2023

Hello all,

Jean, we discussed this idea in our GSearch (https://www.biorxiv.org/content/10.1101/2022.10.21.513218v2) paper but have not implemented it yet. It would be nice to have this idea implemented if @BlaiseMuhirwa is interested. I feel like the cache ordering is useful while the random sampling is more complicated considering the fact that the hierarchy is already there.

Thanks,
Jianshu

@BlaiseMuhirwa
Copy link
Author

I would be interested in adding the graph reordering feature. On a separate note, it would also be useful to include benchmark numbers (recall@k, QPS) in the README on the ANN benchmark datasets.

@jean-pierreBoth
Copy link
Owner

jean-pierreBoth commented Jun 14, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants