Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Client] Fix the BufferExhaustedException thrown when write a high volume of data to log table with arrow format #290

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

swuferhong
Copy link
Collaborator

Purpose

Linked issue: #267

This pr is aims to fix the BufferExhaustedException thrown when write a high volume of data to log table with arrow format. Now, we will pre-allocated some memory for ArrowLogWriteBatch in writer thread instead of allocate memory in sender thread.

Tests

API and Format

Documentation

Copy link
Collaborator

@loserwang1024 loserwang1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some comments.

// pre-allocate a list of memory segments to hold by the arrow batch to avoid deadlock
// in sender thread.
int pageSize = memorySegmentPool.pageSize();
int memorySegmentSize = batchSize / pageSize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int memorySegmentSize = batchSize / pageSize;
int memorySegmentSize = batchSize / pageSize + (batchSize%pageZise ==0 ? 0:1)

if (currentSegmentIndex < memorySegments.size()) {
return memorySegments.get(currentSegmentIndex++);
} else {
MemorySegment segment = segmentPool.nextSegment(waitingSegment);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are't the PreAllocatedManagedPagedOutputView be allocated all the memory in pre (which is memorySegments)? Why still need to get segment from segmentPool?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PreAllocatedManagedPagedOutputView only pre-allocate an approximate amount of memory in advance, which may not be enough to serialize the ArrowWriteBatch.

@@ -521,14 +528,15 @@ private RecordAppendResult appendNewBatch(
// If the table is kv table we need to create a kv batch, otherwise we create a log batch.
WriteBatch batch;
if (writeBatchType == WriteBatch.WriteBatchType.KV) {
MemorySegment memorySegment = segments.get(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a check whether segments is only one element, in case that the allocateMemorySegments change later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WriterMemoryBuffer will be removed in the future, so there is no need to add a check here.(Always one memeorySegment).

@luoyuxia
Copy link
Collaborator

luoyuxia commented Jan 2, 2025

Just curious about why pre-allocated segments can solve this exception...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants