Improving indexing and mapping performance with low cost #311

leoisl · 2022-11-21T09:49:54Z

Currently our main indexing structure, which maps kmer hashes to their PRGs and locations, is a std::unordered_map. There are better alternatives, like boost::unordered_flat_map (a review here: https://bannalia.blogspot.com/2022/11/inside-boostunorderedflatmap.html?m=1) and robin hood hashing (unsure about this one, but nice to document, https://github.com/martinus/unordered_dense). There might be even more suited data structures making use of the fact that after built our index is immutable

The text was updated successfully, but these errors were encountered:

iqbal-lab · 2022-11-21T10:13:37Z

(fwiw John Lees and Johanna in the office next to us has been using robin hood hashing and got Dan into it too)

leoisl · 2023-02-14T12:41:32Z

Might be worth trying Google's SparseHash (https://github.com/sparsehash/sparsehash). Lower memory usage but slower, could have some use cases (e.g. roundhound)

leoisl · 2023-03-10T15:53:19Z

Andreas has more experience than us on these data structures, and he recommends using this map: https://gitlab.ub.uni-bielefeld.de/gi/sans/-/blob/kc/src/tsl/sparse_map.h (best compromise for both speed and RAM). We might have hash maps that use even less RAM with a cost of being (much) slower, which is sth that we might consider for tools like RH

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving indexing and mapping performance with low cost #311

Improving indexing and mapping performance with low cost #311

leoisl commented Nov 21, 2022

iqbal-lab commented Nov 21, 2022

leoisl commented Feb 14, 2023

leoisl commented Mar 10, 2023 •

edited

Loading

Improving indexing and mapping performance with low cost #311

Improving indexing and mapping performance with low cost #311

Comments

leoisl commented Nov 21, 2022

iqbal-lab commented Nov 21, 2022

leoisl commented Feb 14, 2023

leoisl commented Mar 10, 2023 • edited Loading

leoisl commented Mar 10, 2023 •

edited

Loading