-
Notifications
You must be signed in to change notification settings - Fork 680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary IDL Attribute Access for Map Task #5942
Binary IDL Attribute Access for Map Task #5942
Conversation
Signed-off-by: Future-Outlier <[email protected]>
case *core.Literal_Scalar: | ||
default: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the right logic, we should stop the iteration if this is not a map, collection, or offloaded literal.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5942 +/- ##
=======================================
Coverage 36.84% 36.84%
=======================================
Files 1309 1309
Lines 130979 130994 +15
=======================================
+ Hits 48256 48268 +12
- Misses 78534 78536 +2
- Partials 4189 4190 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Future-Outlier <[email protected]>
In this PR (flyteorg/flyte#5942), I’m considering whether the propeller might crash if the resolved value is a collection containing over 1000 dataclass instances. The current approach would serialize it as a binary collection and pass it to the Flytekit array node task: Current Approach
Alternative ApproachTo reduce potential memory pressure, we could consider an alternative that serializes the collection directly as binary, then processes it as a Python value in Flytekit:
Alternative Code flytepropeller This alternative may reduce the load on the propeller by avoiding the need to handle a large nested structure directly, potentially improving stability and performance in cases with large collections. My ThoughtsI prefer the first one, which is this PR, since Byron's attribute access did similar things before. flyte/flytepropeller/pkg/controller/nodes/attr_path_resolver.go Lines 212 to 253 in 856e606
According to the performance testing by the msgpack's benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is correct. I'm a little worried about unions, but i think we should revisit that after the other issues are sorted if there is an issue there.
Tracking Issue
Flyte Issue #5318
Why are the changes needed?
This update ensures that the following example workflow functions as expected:
What changes are proposed in this pull request?
The primary change is to ensure that a
collection literal
is created when the resolved value is alist
, allowing map tasks to function properly with collections.Updated Workflow Lifecycle
Non-Map Task (List handling):
Python value -> binary IDL -> resolved value -> binary IDL -> list transformer -> Python value
Python value -> binary IDL -> resolved value -> binary IDL in literal collection -> list transformer -> expected Python type transformer -> Python value
Map Task:
The array node handler didn’t receive a collection literal[1], so it couldn’t determine the length of the literal value.
Python value -> binary IDL -> resolved value -> binary IDL in literal collection -> ArrayNodeMapTask (Flytekit) -> expected Python type transformer -> Python value
References
Testing
This patch has been verified using unit tests and remote execution in Flytekit.
Screenshots
Checklist