Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler becomes a bottleneck when tasks return large python objects #41

Open
spitz-dan-l opened this issue Jun 24, 2022 · 0 comments
Open

Comments

@spitz-dan-l
Copy link
Contributor

Hey there Redun team!

This is a scaling issue that we've run into on my team. When running a pipeline in which multiple tasks return large python objects, the redun scheduler becomes very slow as it unpickles, re-pickles, and stores each large object in the db/S3 one at a time.

Maybe it is possible to move this part of the telemetry collection process to run in the execution node, after the task has completed but before terminating? Then the scheduler would not have to handle the actual large objects itself, instead passing around references.

The current workaround we're using is to save these large objects to Files in S3 and return the File objects. (While we're at it we pick more contextually-specific serialization formats than pickle.) Another possible enhancement to Redun might be to support this pattern more directly, so that users don't have to spend as much code serializing, finding an S3 key, uploading, downloading, deserializing every time.

Happy to provide more details if there is interest.

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant