Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a memory limited hashset with
LocalVocab
#1570base: master
Are you sure you want to change the base?
Use a memory limited hashset with
LocalVocab
#1570Changes from all commits
02c93e2
2d13d39
26de36f
2848c9b
05a0297
6638e9b
46a5e2d
3510dc6
2cd299b
9724d3e
71d7075
2067fb3
397ad87
983483c
c918f3b
a73c84a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you can figure out if the small buffer optimization applies.
(basically check if
&content <= content.data() < (&content + sizeof(content))
.(but again, not supejr important).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little wasteful, as we always have to obtain a global mutex.
I think for a follow-up (or preparation) PR,
You could implement a wrapper around the memoryLeftThreadsafe object, that stores a (single-threaded) small pool of memory and only goes to the global
wlock()
when that pool is exhausted.Currently
increase (1), increase(1), increase(1)
needs a global synchronization for each of the three inserts, which seems wastefult.(But as you have abstracted away all the memory handling, that is easy to integrate.)
Check warning on line 109 in src/util/HashSet.h
Codecov / codecov/patch
src/util/HashSet.h#L108-L109
Check warning on line 112 in src/util/HashSet.h
Codecov / codecov/patch
src/util/HashSet.h#L111-L112
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we might violate the memory limit before this call.
And additionally, when this
updateSlotArray...
you keep holding too much memory.So probably you have to do something like "check if inserting would cause a rehash" (needs some more intrinsic checking of abseils mechanisms, i.e. when is the rehashing threshold, how much larger do you make the hash table afterwards but it is doable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for the correct counting of all spaces for slots etc,
you can just add an allocator to your
node_hash_set
that doesn't throw, but just counts the bytes somewhere, so you can before the insert guess, and after the insert check exactly. Such an allocator can also be very useful to analyze thememory usage of the node_hash_set when writing your code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the following
(easy, but very effective approach):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here as above,
Probably our size getter should only account for the additionally allocated memory, because everything else we can handle using the internal memory tracking of the absl classes.