Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving indexing and mapping performance with low cost #311

Open
leoisl opened this issue Nov 21, 2022 · 3 comments
Open

Improving indexing and mapping performance with low cost #311

leoisl opened this issue Nov 21, 2022 · 3 comments

Comments

@leoisl
Copy link
Collaborator

leoisl commented Nov 21, 2022

Currently our main indexing structure, which maps kmer hashes to their PRGs and locations, is a std::unordered_map. There are better alternatives, like boost::unordered_flat_map (a review here: https://bannalia.blogspot.com/2022/11/inside-boostunorderedflatmap.html?m=1) and robin hood hashing (unsure about this one, but nice to document, https://github.com/martinus/unordered_dense). There might be even more suited data structures making use of the fact that after built our index is immutable

@iqbal-lab
Copy link
Collaborator

(fwiw John Lees and Johanna in the office next to us has been using robin hood hashing and got Dan into it too)

@leoisl
Copy link
Collaborator Author

leoisl commented Feb 14, 2023

Might be worth trying Google's SparseHash (https://github.com/sparsehash/sparsehash). Lower memory usage but slower, could have some use cases (e.g. roundhound)

@leoisl
Copy link
Collaborator Author

leoisl commented Mar 10, 2023

Andreas has more experience than us on these data structures, and he recommends using this map: https://gitlab.ub.uni-bielefeld.de/gi/sans/-/blob/kc/src/tsl/sparse_map.h (best compromise for both speed and RAM). We might have hash maps that use even less RAM with a cost of being (much) slower, which is sth that we might consider for tools like RH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants