-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create low-allocation immutable Roaring bitmaps #264
Comments
That is our use case. We have many bitmaps in many boltdbs on disk. Bitmaps have about 100M bits. The whole collection has a up to ~6B objects that are indexed, but none of the shards have more than 32 bit indices. |
@joenall Thanks for chiming in. It is really important to document the use case to motivate a public solution. |
We have a similar use case where we create a lot of immutable bitmaps in memory and only read from / iterate them. It would be great if there were improvements that could be made for immutable maps. |
@kevinconaway Just checking... do you create them in this manner....? newrb:= roaring.New()
newrb.ReadFrom(buf) |
No, for this use case we typically create a new rb := roaring.New()
rb.Add()
rb.Add()
...
rb.RunOptimize() We notice these occupying a fair amount of heap space / heap objects so I was curious if there was a more optimized in-memory layout when we know that we won't modify the Apologies if I interjected on an unrelated ticket here. |
If you sum up |
We were facing this problem at Dgraph Labs too, so we built a library called sroar where the in-memory representation matches the on-disk representation (no need for marshalling / unmarshalling). Perhaps you will find this useful. |
Regarding the last comment, I created an issue specifically for it at #298 |
Apologies for the confusion, the main concern here is the number of objects on the heap (the various |
I realize that... hence my question. The (To be clear, I am not arguing. I am just clarifying.) |
Would it help to reuse CRoaring's frozen bitmap format? |
For people reading this... The issue is that you might have, say, an array of 64-bit integer on disk. In theory, you can just memory map the file and point at the array and load the 64-bit integers. This works at the system level on modern hardware. But C/C++ programmers will insist that you only load 64-bit words if they are 64-bit aligned because that is how systems used to work. (Some legacy or embedded system still have this requirement but I don't think that they run Go.) You can side-step this constraint by calling There is no good portable all-around solution in C/C++, I am afraid. The compiler can assume alignment. Visual Studio allow you declare an unaligned pointer, but I don't know if an equivalent in other compilers. The frozen bitmap format assumes that the pointer where the bitmap is found has some alignment (if not, you get a failure), and if the proper alignment is found, then it uses padding some that you can redirect pointers at the raw byte pointer... the assumption being that your system cannot load words unaligned (which is an incorrect assumption on modern hardware but one that is still with us as undefined behaviour in C/C++). It has a few constraints such as... you can't put the frozen serialized code anywhere, it needs to be aligned. Here is the specification in C code: In Go, we simply map the pointers directly... roaring/serialization_littleendian.go Line 92 in 4eb34b7
Thus, maybe ironically, Go allows us to be more lower-level and we do not need padding. It possible that the frozen format could have benefit in Go, but the main motivation (loading alignment) for creating the format in C is not present in Go. We already do direct pointer assignment in Go while ignoring the alignment. In effect, loading a frozen instance in C is very much like the |
Right. I thought it was a format issue, that's why I suggested just recycling the old one instead of creating a new one. So it's just implementing a new load? |
There are performance benefits on using aligned data AFAIR, tho. Reads and writes of aligned data should hit the bus less often. Same way, they should invalidate the cache less often. I assumed that's why we store stuff aligned in a way it matches natural alignment AVX2 registers for bitsets. |
Yes. People often claim that alignment is important for performance. Back in 2012, I wrote a blog post on the topic... Data alignment for speed: myth or reality? If you are scanning an array, you might hit one extra cache line due to lack of alignment. So you can build relatively contrived examples where this will impact your overall performance... but my claim is that by the time it is a concern, you will have hit many more low-hanging fruit. It comes down to having simpler code. But if I could wrap the byte arrays in C with memcpy calls, without going insane, then we could do away entirely with the frozen format. In C++, I could have a wrapper that calls memcpy... but in C, I'd probably need macros, and then it would get ugly. |
But if you do |
Off-topic: there's a case where it really is more important to align to cache, when atomics are involved. Atomic operations invalidate a cache line on all processors, so you want to avoid those invalidating more than one line and invalidating non-related data (so you usually try to fit more related data in the same line so it's really invalid for others anyway). |
You do not. You memcpy the individual words which brings them into register from an mmap pointer. You do not have to memcpy the array. There is no difference between
If you are loading individual words and you put each of them at a page boundary, every other page, then you could, theoretically, have to double the number of pages you must load. But I can also come up with pathological case where alignment triggers massive cache aliasing issues (where all addresses have the same least significant bits) that destroys your cache. Pathological cases should not guide your generic design decisions. |
My blog post does allude to this issue. In a multithreaded context, being aware of cache lines can pay in specific instances. |
Oh, I thought |
Coming back to this. I think an option would be to:
It's a bit convoluted, requires either a |
Is there still interest in this? Could I work on it? If we are immediately performing intersection/merging after |
@GiedriusS Yes. There is still much work to do.
We support frozen views, see roaring/serialization_littleendian.go Line 298 in a550de6
See also discussions above. It is deliberately not very well documented because it is not very well tested (in our project, although it is used in production by some). |
Some users have a many bitmaps on disk. Loading them up temporarily for operations is expensive due in part to the allocation of new instances in this process...
Related PR #263
Also related to issue #193
cc @guymolinari @jacksonrnewhouse
The text was updated successfully, but these errors were encountered: