-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Client] Fix the BufferExhaustedException thrown when write a high volume of data to log table with arrow format #290
base: main
Are you sure you want to change the base?
Conversation
…lume of data to log table with arrow format
798dee0
to
a857d90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left some comments.
// pre-allocate a list of memory segments to hold by the arrow batch to avoid deadlock | ||
// in sender thread. | ||
int pageSize = memorySegmentPool.pageSize(); | ||
int memorySegmentSize = batchSize / pageSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int memorySegmentSize = batchSize / pageSize; | |
int memorySegmentSize = batchSize / pageSize + (batchSize%pageZise ==0 ? 0:1) |
if (currentSegmentIndex < memorySegments.size()) { | ||
return memorySegments.get(currentSegmentIndex++); | ||
} else { | ||
MemorySegment segment = segmentPool.nextSegment(waitingSegment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are't the PreAllocatedManagedPagedOutputView be allocated all the memory in pre (which is memorySegments)? Why still need to get segment from segmentPool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PreAllocatedManagedPagedOutputView
only pre-allocate an approximate amount of memory in advance, which may not be enough to serialize the ArrowWriteBatch
.
@@ -521,14 +528,15 @@ private RecordAppendResult appendNewBatch( | |||
// If the table is kv table we need to create a kv batch, otherwise we create a log batch. | |||
WriteBatch batch; | |||
if (writeBatchType == WriteBatch.WriteBatchType.KV) { | |||
MemorySegment memorySegment = segments.get(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a check whether segments is only one element, in case that the allocateMemorySegments
change later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WriterMemoryBuffer will be removed in the future, so there is no need to add a check here.(Always one memeorySegment).
Just curious about why pre-allocated segments can solve this exception... |
Purpose
Linked issue: #267
This pr is aims to fix the
BufferExhaustedException
thrown when write a high volume of data to log table with arrow format. Now, we will pre-allocated some memory forArrowLogWriteBatch
in writer thread instead of allocate memory in sender thread.Tests
API and Format
Documentation